maxsmooth Theory and Algorithm

This section has been adapted from section 4 of the maxsmooth paper in order to explain how the algorithm works. What follows is a discussion of the fitting problem and the maxsmooth algorithm. To state concisely the problem being fitted we have

\[\begin{split}&\min_{a,~s}~~\frac{1}{2}~\mathbf{a}^T~\mathbf{Q}~\mathbf{a}~+~\mathbf{q}^T~\mathbf{a}, \\ &\mathrm{s.t.}~~\mathbf{G(s)~a} \leq \mathbf{0}.\end{split}\]

where \({\mathbf{s}}\) are the maxsmooth signs corresponding to the signs on the derivatives. \({\mathbf{G}}\) is a matrix of prefactors on the derivatives, \({\mathbf{a}}\) are the parameters we are optimising for and their product gives the derivatives we are constraining with each fit. \({\mathbf{Q}}\) is the dot product of the matrix of basis functions and its transpose and \(\mathbf{q}\) is the negative of the transposed data, \(\mathbf{y}\) dotted with the basis functions. For more details on this equation see the maxsmooth paper. A `problem’ in this context is the combination of the data, order, basis function and constraints on the DCF.

With maxsmooth we can test all possible sign combinations on the constrained derivatives. This is a reliable method and, provided the problem can be solved with quadratic programming, will always give the correct global minimum. When the problem we are interested in is “well defined”, we can develop a quicker algorithm that searches or navigates through the discrete maxsmooth sign spaces to find the global minimum. Each sign space is a discrete parameter space with its own global minimum. Using quadratic programming on a fit with a specific sign combination will find this global minimum, and we are interested in finding the minimum of these global minima.

A “well defined” problem is one in which the discrete sign spaces have large variance in their minimum \({\chi^2}\) values and the sign space for the global minimum is easily identifiable. In contrast we can have an “ill defined” problem in which the variance in minimum \({\chi^2}\) across all sign combinations is small. This concept of “well defined” and “ill defined” problems is explored further in the following two sections.

Well Defined Problems and Discrete Sign Space Searches

The \({\chi^2}\) Distribution

We investigate the distribution of \({\chi^2}\) values, shown in the figure below, for a 10 \({^{th}}\) order y-log(x) space MSF fit to a \({y = x^{-2.5}}\) power law plus gaussian noise.

In the figure, a combination of all positive derivatives~(negative signs) and all negative derivatives~(positive signs) corresponds to sign combination numbers 255 and 0 respectively. Specifically, the maxsmooth signs, \({\mathbf{s}}\), are related to the sign combination number by its \({C}\) bit binary representation, here \({C = (N -2)}\). In binary the sign combination numbers run from 00000000 to 11111111. Each bit represents the sign on the \({m^{th}}\) order derivative with a 1 representing a negative maxsmooth sign.

https://github.com/htjb/maxsmooth/raw/master/docs/images/chi_dist_theory.png

The distribution appears to be composed of smooth steps or shelves; however, when each shelf is studied closer, we find a series of peaks and troughs. This can be seen in the subplot of the above figure which shows the distribution in the neighbourhood of the global minimum found in the large or global’ well. This type of distribution with a large variance in :math:`{chi^2} is characteristic of a “well defined” problem. We use this example \({\chi^2}\) distribution to motivate the maxsmooth algorithm outlined in the following section.

The maxsmooth Sign Navigating Algorithm

Exploration of the discrete sign spaces for high \({N}\) can be achieved by exploring the spaces around an iteratively updated optimum sign combination. The maxsmooth algorithm begins with a randomly generated set of signs for which the objective function is evaluated and the optimum parameters are found. We flip each individual sign one at a time beginning with the lowest order constrained derivative first. When the objective function is evaluated to be lower than that for the optimum sign combination, we replace it with the new set and repeat the process in a `cascading’ routine until the objective function stops decreasing in value.

The local minima shown in the \({\chi^2}\) distribution above mean that the cascading algorithm is not sufficient to consistently find the global minimum. We can demonstrate this by performing 100 separate runs of the cascading algorithm on \({y = x^{-2.5} + \mathrm{noise}}\), and we use a y-log(x) space \({10^{th}}\) order MSF again. We find the true global minimum 79 times and a second local minimum 21 times.

To prevent the routine terminating in a local minimum we perform a complete search of the sign spaces surrounding the minimum found after the cascading routine. We refer to this search as a directional exploration and impose limits on its extent. In each direction we limit the number of sign combinations to explore and we limit the maximum allowed increase in \({\chi^2}\) value. These limits can be modified by the user. We prevent repeated calculations of the minimum for given signs and treat the minimum of all tested signs as the global minimum.

We run the consistency test again, with the full maxsmooth algorithm, and find that for all 100 trial fits we find the same \({\chi^2}\) found when testing all sign combinations. In the figure below, the red arrows show the approximate path taken through the discrete sign spaces against the complete distribution of \({\chi^2}\). Point (1a) shows the random starting point in the algorithm, and point (1b) shows a rejected sign combination evaluated during the cascade from point (1a) to (2). Point (2), therefore, corresponds to a step through the cascade. Point (3) marks the end of the cascade and the start of the left directional exploration. Finally, point (4) shows the end of the right directional exploration where the calculated \({\chi^2}\) value exceeds the limit on the directional exploration.

https://github.com/htjb/maxsmooth/raw/master/docs/images/routine.png

The global well tends to be associated with signs that are all positive, all negative or alternating. We see this in the figure above where the minimum falls at sign combination number 169 and number 170, characteristic of the derivatives for a \({x^{-2.5}}\) power law, corresponds to alternating positive and negative derivatives from order \({m = 2}\). Standard patterns of derivative signs can be seen for all data following approximate power laws. All positive derivatives, all negative and alternating signs correspond to data following the approximate power laws \({y\approx x^{k}}\), \({y\approx -x^{k}}\), \({y\approx x^{-k}}\) and \({y\approx -x^{-k}}\).

The maxsmooth algorithm assumes that the global well is present in the \({\chi^2}\) distribution and this is often the case. The use of DCFs is primarily driven by a desire to constrain previously proposed polynomial models to foregrounds. As a result we would expect that the data being fitted could be described by one of the four approximate power laws highlighted above and that the global minimum will fall around an associated sign combination. In rare cases the global well is not clearly defined and this is described in the following subsection.

Ill Defined Problems and their Identification

We can illustrate an “ill defined” problem, with a small variation in \({\chi^2}\) across the maxsmooth sign spaces, by adding a non-smooth signal of interest into the foreground model, \({x^{-2.5}}\) and fitting this with a 10 \({^{th}}\) order log(y)-log(x) space MSF. We add an additional noise of \({0.020}\) to the mock data. The resultant \({\chi^2}\) distribution with its global minimum is shown in the top panel of the figure below.

The global minimum, shown as a black data point, cannot be found using the maxsmooth algorithm. The cascading algorithm may terminate in any of the approximately equal minima and the directional exploration will then quickly terminate because of the limits imposed.

https://github.com/htjb/maxsmooth/raw/master/docs/images/combined_chi.png

If we repeat the above fit and perform it with a y-x space MSF we find that the problem is well defined with a larger \({\chi^2}\) variation across sign combinations. This is shown in the bottom panel of the above figure. The results, when using the log(y)-log(x) space MSF, are significantly better than when using y-x space MSF meaning it is important to be able to solve “ill defined” problems. This can be done by testing all maxsmooth signs but knowing when this is necessary is important if you are expecting to run multiple DCF fits to the same data set. We can focus on diagnosing whether a DCF fit to the data is “ill defined” because a joint fit to the same data set of a DCF and signal of interest will also feature an “ill defined” \({\chi^2}\) distribution.

We can identify an “ill defined” problem by producing the equivalent of the above figure using maxsmooth and visually assessing the \({\chi^2}\) distribution for a DCF fit. Alternatively, we can use the parameter space plots, detailed in the maxsmooth paper and later in this documentation, to identify whether the constraints are weak or not, and if a local minima is returned from the sign navigating routine then the minimum in these plots will appear off centre.

Assessment of the first derivative of the data can also help to identify an “ill defined” problem. For the example problem this is shown in the figure below where the derivatives have been approximated using \({\Delta y/ \Delta x}\). Higher order derivatives of the data will have similarly complex or simplistic structures in the respective spaces. There are many combinations of parameters that will provide smooth fits with similar \({\chi^2}\) values in logarithmic space leading to the presence of local minima. This issue will also be present in any data set where the noise or signal of interest are of a similar magnitude to the foreground in y - x space.

https://github.com/htjb/maxsmooth/raw/master/docs/images/Gradients_fits.png

maxsmooth Example Codes

This section is designed to introduce the user to the software and the form in which it is run. It provides basic examples of data fitting with a built in MSF model and a user defined model.

There are also examples of functions that can be used pre-fitting and post-fitting for various purposes including; determination of the best DCF model from the built in library for the problem being fitted, analysis of the \({\chi^2}\) distribution as a function of the discrete sign spaces and analysis of the parameter space surrounding the optimum results.

The data used for all of this examples is available here.

The example codes can be found here and corresponding Jupyter Notebooks are provided here.

Simple Example code

In order to run the maxsmooth software using the built in DCF models for a simple fit the user can follow the simple structure detailed here.

An important point to make is that by default maxsmooth fits a Maximally Smooth Function or MSF to the data. An MSF, as stated in the introduction to the documentation, is a function which has derivatives of order \({m \geq 2}\) constrained so that they do not cross 0. This means that they do not have inflection points or non smooth structure produced by higher order derivatives. More generally a DCF follows the constraint,

for every constrained order \({m}\). The set of \({m}\) can be any set of derivative orders as long as those derivatives exist for the function.

This means we can use maxsmooth to produce different DCF models. MSFs are one of two special cases of DCF and we can also have a Completely Smooth Function (CSF) with orders \({m \geq 1}\) constrained. Alternatively we can have Partially Smooth Functions (PSF) which are much more general and can have arbitrary sets of derivatives constrained. We illustrate how this is implemented towards the end of this example but we begin with the default case fitting a MSF.

The user should begin by importing the smooth class from maxsmooth.DCF.

from maxsmooth.DCF import smooth

The user should then import the data they wish to fit.

import numpy as np

x = np.load('Data/x.npy')
y = np.load('Data/y.npy') + np.random.normal(0, 0.02, len(x))

and define the polynomial orders they wish to fit.

N = [3, 4, 5, 6, 7, 8, 9, 10, 11]
for i in range(len(N)):
    `act on N[i]`

or for example,

N = 15

We can also plot the data to illustrate what is happening. Here the data is a scaled \({x^{-2.5}}\) power law and I have added gaussian noise in with a standard deviation of 0.02.

import matplotlib.pyplot as plt

plt.plot(x, y)
plt.xlabel('x')
plt.ylabel('y')
plt.show()
https://github.com/htjb/maxsmooth/raw/master/docs/images/simple_program_data.png

smooth can be called as is shown below. It takes the x and y data as standard inputs as well as the order of the fit. There are a set of keyword arguments also available that change the type of function being fitted and these are detailed in the documentation.

result = smooth(x, y, N)

and it’s resulting attributes can be accessed by writing result.attribute_name. For example printing the outputs is done like so,

print('Objective Funtion Evaluations:\n', result.optimum_chi)
print('RMS:\n', result.rms)
print('Parameters:\n', result.optimum_params)
print('Fitted y:\n', result.y_fit)
print('Sign Combinations:\n', result.optimum_signs)
print('Derivatives:\n', result.derivatives)

plt.plot(x, y - result.y_fit)
plt.xlabel('x', fontsize=12)
plt.ylabel(r'$\delta y$', fontsize=12)
plt.tight_layout()
plt.show()
https://github.com/htjb/maxsmooth/raw/master/docs/images/simple_program_msf_residuals.png

To fit the data with a CSF we can use the ‘constraints’ keyword argument in smooth(). ‘constraints’ sets the minimum constrained derivative for the function which for a CSF we want to be one.

res = smooth(x, y, N, constraints=1)

Note in the printed results the number of constrained derivatives has increased by 1 and the only derivative that is allowed to cross through 0 (Zero Crossings Used?) is the the \({0^{th}}\) order i.e. the data.

plt.plot(x, y - res.y_fit)
plt.xlabel('x', fontsize=12)
plt.ylabel(r'$\delta y$', fontsize=12)
plt.tight_layout()
plt.show()
https://github.com/htjb/maxsmooth/raw/master/docs/images/simple_program_csf_residuals.png

A Partially Smooth Function can have derivatives constrained via \({m \geq a}\) where \({a}\) is any order above 2 or it can have a set of derivatives that are allowed to cross zero. For the first case we can once again use the ‘constraints’ keyword argument. For example we can constrain derivatives with orders \({\geq 3}\) which will allow the \({1^{st}}\) and \({2^{nd}}\) order derivatives to cross zero. This is useful when our data features an inflection point we want to model with our fit.

res = smooth(x, y, N, constraints=3)

plt.plot(x, y - res.y_fit)
plt.xlabel('x', fontsize=12)
plt.ylabel(r'$\delta y$', fontsize=12)
plt.tight_layout()
plt.show()
https://github.com/htjb/maxsmooth/raw/master/docs/images/simple_program_psf1_residuals.png

To allow a particular set of derivatives to cross zero we use the ‘zero_crossings’ keyword. In the example below we are lifting the constraints on the \({3^{rd}}\), \({4^{th}}\) and \({5^{th}}\) order derivatives but our minimum constrained derivative is still set at the default 2. Therefore this PSF has derivatives of order \({m = [2, 6, 7, 8, 9]}\) constrained via the condition at the begining of this example code.

res = smooth(x, y, N, zero_crossings=[3, 4, 5])

plt.plot(x, y - res.y_fit)
plt.xlabel('x', fontsize=12)
plt.ylabel(r'$\delta y$', fontsize=12)
plt.tight_layout()
plt.show()
https://github.com/htjb/maxsmooth/raw/master/docs/images/simple_program_psf2_residuals.png

While PSFs can seem like an attractive way to improve the quality of fit they are less ‘smooth’ than a MSF or CSF and consequently they can introduce additional turning points in to your residuals obscuring any signals of intrest.

Turning Points and Inflection Points

This example will walk the user through implementing DCF fits to data sets with turning points and inflection points. It builds on the details in the ‘Simple Example Code’ and uses the ‘constraints’ keyword argument introduced there. The ‘constraints’ keyword argument is used to adjust the type of DCF that is being fitted. Recall that by default maxsmooth implements a Maximally Smooth Function or MSF with constraints=2 i.e. derivatives of order \({m \geq 2}\) constrained so that they do not cross zero. This allows for turning points in the DCF as illustrated below.

We start by generating some noisy data that we know will include a turning point and defining the order of the DCF we would like to fit.

import numpy as np

x = np.linspace(-10, 10, 100)
noise = np.random.normal(0, 0.02, 100)
y = x**(2) + noise

N = 10

We can go ahead and plot this data just to double check it features a turning point.

import matplotlib.pyplot as plt

plt.plot(x, y)
plt.xlabel('x', fontsize=12)
plt.ylabel('y', fontsize=12)
plt.show()
https://github.com/htjb/maxsmooth/raw/master/docs/images/turning_point_example.png

As already stated maxsmooth does not constrain the first derivative of the DCF by default so we can go ahead and fit the data.

from maxsmooth.DCF import smooth

res = smooth(x, y, N)

If we than plot the resultant residuals we will see that despite the data having a turning point present we have recovered the Gaussian noise.

plt.plot(x, y- res.y_fit, label='Recovered Noise')
plt.plot(x, noise, label='Actual Noise')
plt.ylabel(r'$\delta y$', fontsize=12)
plt.xlabel('x', fontsize=12)
plt.legend()
plt.show()
https://github.com/htjb/maxsmooth/raw/master/docs/images/turning_point_example_res.png

To illustrate what happens when there is an inflection point in the data we can define some sinusoidal data as so.

x = np.linspace(1, 5, 100)
noise = np.random.normal(0, 0.02, 100)
y = np.sin(x) + noise

N = 10

plt.plot(x, y)
plt.xlabel('x', fontsize=12)
plt.ylabel('y', fontsize=12)
plt.show()
https://github.com/htjb/maxsmooth/raw/master/docs/images/inflection_point_example.png

If we proceed to fit this with smooth() in its default settings we will get a poor fit as by default the second derivative is constrained. We need to lift this constraint to allow for the prominent inflection point to be modelled. We do this by setting the keyword argument constraints=3 creating a Partially Smooth Function or PSF.

res_msf = smooth(x, y, N)
res_psf = smooth(x, y, N, constraints=3)

plt.plot(x, y, label='Data')
plt.plot(x, res_msf.y_fit, label=r'MSF fit, $m \geq 2$')
plt.plot(x, res_psf.y_fit, label=r'PSF fit, $m \geq 3$')
plt.xlabel('x', fontsize=12)
plt.ylabel('y', fontsize=12)
plt.legend()
plt.show()
https://github.com/htjb/maxsmooth/raw/master/docs/images/inflection_point_example_fits.png

Finally, we can plot the residuals to further see that by lifting the constraint on the second derivative we have allowed an inflection point in the data.

plt.plot(x, y- res_psf.y_fit, label='Recovered Noise')
plt.plot(x, noise, label='Actual Noise')
plt.ylabel(r'$\delta y$', fontsize=12)
plt.xlabel('x', fontsize=12)
plt.legend()
plt.show()
https://github.com/htjb/maxsmooth/raw/master/docs/images/inflection_point_example_res.png

New Basis Example

This example code illustrates how to define your own basis function for the DCF model. It implements a modified version of the built in normalized polynomial model but the structure is the same for more elaborate models.

As always we need to import the data, define an order \({N}\) and import the function fitting routine, smooth().

import numpy as np
from maxsmooth.DCF import smooth

x = np.load('Data/x.npy')
y = np.load('Data/y.npy')

N=10

There are several requirements needed to define a new basis function completely for maxsmooth to be able to fit it. They are as summarized below and then examples of each are given in more detail,

  • args: Additional non-standard arguments needed in the definition of the basis. The standard arguments are the data (x and y), the order of the fit N, the pivot point about which a model can be fit, the derivative order \({m}\) and the params. While the pivot point is not strictly needed it is a required argument for the functions defining a new basis to help the user in their definition.

  • basis_functions: This function defines the basis of the DCF model, \({\phi}\) where the model can be generally defined as,

    \[y = \sum_{k = 0}^N a_k \phi_k(x)\]

    where \({a_k}\) are the fit parameters.

  • model: This is the function described by the equation above.

  • derivative: This function defines the \({m^{th}}\) order derivative.

  • derivative_pre: This function defines the prefactors, \({\mathbf{G}}\) on the derivatives where CVXOPT, the quadratic programming routine used, evaluates the constraints as,

    \[\mathbf{Ga} \leq \mathbf{h}\]

    where \({\mathbf{a}}\) is the matrix of parameters and \({\mathbf{h}}\) is the matrix of constraint limits. For more details on this see the maxsmooth paper.

We can begin defining our new basis function by defining the additional arguments needed to fit the model as a list,

arguments = [x[-1]*10, y[-1]*10]

The next step is to define the basis functions \({\phi}\). This needs to be done in a function that has the arguments (x, y, pivot_point, N, *args). ‘args’ is optional but since we need them for this basis we are passing it in.

The basis functions, \({\phi}\), should be an array of dimensions len(x) by N and consequently evaluated at each N and x data point as shown below.

def basis_functions(x, y, pivot_point, N, *args):

    phi = np.empty([len(x), N])
    for h in range(len(x)):
        for i in range(N):
            phi[h, i] = args[1]*(x[h]/args[0])**i

    return phi

We can define the model that we are fitting in a function like that shown below. This is used for evaluating \({\chi^2}\) and returning the optimum fitted model once the code has finished running. It requires the arguments (x, y, pivot_point, N, params, *args) in that order and again where ‘args’ is optional. ‘params’ is the parameters of the fit, \({\mathbf{a}}\) which should have length \({N}\).

The function should return the fitted estimate of y.

def model(x, y, pivot_point, N, params, *args):

    y_sum = args[1]*np.sum([
        params[i]*(x/args[0])**i
        for i in range(N)], axis=0)

    return y_sum

Next we have to define a function for the derivatives of the model which takes arguments (m, x, y, N, pivot_point, params, *args) where \({m}\) is the derivative order. The function should return the \({m^{th}}\) order derivative evaluation and is used for checking that the constraints have been met and returning the derivatives of the optimum fit to the user.

def derivative(m, x, y, N, pivot_point, params, *args):

    mth_order_derivative = []
    for i in range(N):
        if i <= m - 1:
            mth_order_derivative.append([0]*len(x))
    for i in range(N - m):
            mth_order_derivative_term = args[1]*np.math.factorial(m+i) / \
                np.math.factorial(i) * \
                params[int(m)+i]*(x)**i / \
                (args[0])**(i + 1)
            mth_order_derivative.append(
                mth_order_derivative_term)

    return mth_order_derivative

Finally we have to define \({\mathbf{G}}\) which is used by CVXOPT to build the derivatives and constrain the functions. It takes arguments (m, x, y, N, pivot_point, *args) and should return the prefactor on the \({m^{th}}\) order derivative. For a more thorough definition of the prefactor on the derivative and an explanation of how the problem is constrained in quadratic programming see the maxsmooth paper.

def derivative_pre(m, x, y, N, pivot_point, *args):

    mth_order_derivative = []
    for i in range(N):
        if i <= m - 1:
            mth_order_derivative.append([0]*len(x))
    for i in range(N - m):
            mth_order_derivative_term = args[1]*np.math.factorial(m+i) / \
                np.math.factorial(i) * \
                (x)**i / \
                (args[0])**(i + 1)
            mth_order_derivative.append(
                mth_order_derivative_term)

    return mth_order_derivative

With our functions and additional arguments defined we can pass these to the maxsmooth smooth() function as is shown below. This overwrites the built in DCF model but you are still able to modify the fit type i.e. testing all available sign combinations or sampling them.

result = smooth(x, y, N,
    basis_functions=basis_functions, model=model,
    derivatives=derivative, der_pres=derivative_pre, args=arguments)

The output of the fit can be accessed as before,

print('Objective Funtion Evaluations:\n', result.optimum_chi)
print('RMS:\n', result.rms)
print('Parameters:\n', result.optimum_params)
print('Fitted y:\n', result.y_fit)
print('Sign Combinations:\n', result.optimum_signs)
print('Derivatives:\n', result.derivatives)

Best Basis Example

This function can be used to identify which of the built in DCFs fits the data best before running joint fits.

To use it we begin by loading in the data,

import numpy as np

x = np.load('Data/x.npy')
y = np.load('Data/y.npy')

and then importing the basis_test() function.

from maxsmooth.best_basis import basis_test

To call the function we use,

basis_test(x, y, base_dir='examples/', N=np.arange(3, 16, 1))

The function only requires the data but we can provide it with a base directory, fit type and range of DCF orders to test. By defualt it uses the sign navigating algorithm and tests \({N = 3 - 13}\). Here we test the range :math:{N = 3 - 15}. The resultant graph is saved in the base directory and the example generated here is shown below.

https://github.com/htjb/maxsmooth/raw/master/docs/images/Basis_functions.png

The graph shows us which basis is the optimum for solving this problem from the built in library (that which can reach the minimum :math:{\chi^2}). If we were to go to higher N we would also find that the :math:{\chi^2} value would stop decreasing in value. The value of N for which this occurs at is the optimum DCF order. (See the maxsmooth paper for a real world application of this concept.)

We can also provide this function with additional arguments such as the fit type, minimum constrained derivative, directional exploration limits ect. (see the maxsmooth Functions section).

\({\chi^2}\) Distribution Example

This example will show you how to generate a plot of the \({\chi^2}\) distribution as a function of the discrete sign combinations on the constrained derivatives.

First you will need to import your data and fit this using maxsmooth as was done in the simple example code.

import numpy as np

x = np.load('Data/x.npy')
y = np.load('Data/y.npy')

from maxsmooth.DCF import smooth

N = 10
result = smooth(x, y, N, base_dir='examples/',
  data_save=True, fit_type='qp')

Here we have used some additional keyword arguments for the ‘smooth’ fitting function. ‘data_save’ ensures that the files containing the tested sign combinations and the corresponding objective function evaluations exist in the base directory which we have changed to ‘base_dir=’examples/’’. These files are essential for the plotting the \({\chi^2}\) distribution and are not saved by maxsmooth without ‘data_save=True’. We have also set the ‘fit_type’ to ‘qp’ rather than the default ‘qp-sign_flipping’. This ensures that all of the available sign combinations are tested rather than a sampled set giving us a full picture of the distribution when we plot it. We have used the default DCF model to fit this data.

We can import the ‘chi_plotter’ like so,

from maxsmooth.chidist_plotter import chi_plotter

and produce the fit which gets placed in the base directory with the following code,

chi_plotter(N, base_dir='examples/', fit_type='qp')

We pass the same ‘base_dir’ as before so that the plotter can find the correct output files. We also give the function the same ‘fit_type’ used for the fitting which ensures that the files can be read.

The resultant plot is shown below and the yellow star shows the global minimum. This can be used to determine how well the sign sampling approach using a descent and directional exploration can find the global minimum. If the distribution looks like noise then it is unlikely the sign sampling algorithm will consistently find the global minimum. Rather it will likely repeatedly return the local minima found after the descent algorithm and you should use the ‘qp’ method testing all available sign combinations in any future fits to the data with this DCF model.

https://github.com/htjb/maxsmooth/raw/master/docs/images/chi_distribution.png

Parameter Plotter Example

We can assess the parameter space around the optimum solution found using maxsmooth with the param_plotter() function. This can help us identify how well a problem can be solved using the sign navigating approach employed by maxsmooth or simply be used to identify correlations between the foreground parameters. For more details on this see the maxsmooth paper.

We begin by importing and fitting the data as with the chi_plotter() function illustrated above.

import numpy as np

x = np.load('Data/x.npy')
y = np.load('Data/y.npy')

from maxsmooth.DCF import smooth

N = 5
result = smooth(x, y, N, base_dir='examples/', fit_type='qp')

We have changed the order of the fit to 5 to illustrate that for order \({N \leq 5}\) and fits with derivatives \({m \geq 2}\) constrained the function will plot each region of the graph corresponding to different sign combinations in a different colourmap. Recall that by default the function smooth() fits a maximally smooth function (MSF) with derivatives of order \({m \geq 2}\). If the constraints are different or the order is greater than 5 then the viable regions will have a single colourmap. Invalid regions are plotted as black shaded colourmaps and the contour lines are contours of \({\chi^2}\).

Specifically, invalid regions violate the condition

\[\pm_m \frac{\delta^m y}{\delta x^m} \leq 0\]

where \({m}\) represents the derivative order, \({y}\) is the dependent variable and \({x}\) is the independent variable. Violation of the condition means that one or more of the constrained derivatives crosses 0 in the band of interest. For an MSF, as mentioned, \({m \geq 2}\) and the sign \({\pm_m}\) applies to specific derivative orders. For this specific example there are 3 constrained derivatives, \({m = 2, 3, 4}\) and consequently 3 signs to optimise for alongside the parameters \({a_k}\). The coloured valid regions therefore correspond to a specific combination of \({\pm_m}\) for the problem. \({\pm_m}\) is also referred to as \({\mathbf{s}}\) in the theory section and the maxsmooth paper.

We can import the function like so,

from maxsmooth.parameter_plotter import param_plotter

and access it using,

param_plotter(result.optimum_params, result.optimum_signs,
    x, y, N, base_dir='examples/')

The function takes in the optimum parameters and signs found after the fit as well as the data and order of the fit. There are a number of keyword arguments detailed in the following section and the resultant fit is shown below. The function by default samples the parameter ranges 50% either side of the optimum and calculates 50 samples for each parameter. In each panel the two labelled parameters are varied while the others are maintained at their optimum values.

https://github.com/htjb/maxsmooth/raw/master/docs/images/Parameter_plot.png

We are also able to plot the data, fit and residuals alongside the parameter plot and this can be done by setting data_plot=True. We can also highlight the central region in each panel of the parameter space by setting center_plot=True.

param_plotter(result.optimum_params, result.optimum_signs,
    x, y, N, base_dir='examples/', data_plot=True, center_plot=True)

which gives us the graph below.

https://github.com/htjb/maxsmooth/raw/master/docs/images/Parameter_plot_extended.png

maxsmooth Functions

This section details the specifics of the built in functions in maxsmooth including the relevant keyword arguments and default parameters for all. Where keyword arguments are essential for the functions to run this is stated.

smooth()

smooth, as demonstrated in the examples section, is used to call the fitting routine. There are a number of \({^{**}}\) kwargs that can be assigned to the function which change how the fit is performed, the model that is fit and various other attributes. These are detailed below.

class maxsmooth.DCF.smooth(x, y, N, **kwargs)[source]

Parameters:

x: numpy.array
The x data points for the set being fitted.
y: numpy.array
The y data points for fitting.
N: int
The number of terms in the DCF.

Kwargs:

fit_type: Default = ‘qp-sign_flipping’
This kwarg allows the user to switch between sampling the available discrete sign spaces (default) or testing all sign combinations on the derivatives which can be accessed by setting to ‘qp’.
model_type: Default = ‘difference_polynomial’
Allows the user to access default Derivative Constrained Functions built into the software. Available options include the default, ‘polynomial’, ‘normalised_polynomial’, ‘legendre’, ‘log_polynomial’, ‘loglog_polynomial’ and ‘exponential’. For more details on the functional form of the built in basis see the maxsmooth paper.

pivot_point: Default = len(x)//2 otherwise an integer between -len(x) and len(x)

Some of the built in models rely on pivot points in the data sets which by defualt is set as the middle index. This can be altered via this kwarg which can occasionally lead to a better quality fit.
base_dir: Default = ‘Fitted_Output/’
The location of the outputted data from maxsmooth. This must be a string and end in ‘/’. If the file does not exist then maxsmooth will create it. By default the only outputted data is a summary of the best fit but additional data can be recorded by setting the keyword argument ‘data_save = True’.
data_save: Default = False
By setting this to True the algorithm will save every tested set of parameters, signs and objective function evaluations into files in base_dir. Theses files will be over written on repeated runs but they are needed to run the ‘chidist_plotter’.
print_output: Default = 1
The parameter can take a value of 0, 1, 2. If set to 2 this outputs to the terminal every fit performed by the algorithm. By default the parameter has a value of 1 and the only output is the optimal solution once the code is finished. Setting this to 0 will prevent any output to the terminal. This is useful when running the code inside a nested sampling loop for example.
cvxopt_maxiter: Default = 10000 else integer
This shouldn’t need changing for most problems however if CVXOPT fails with a ‘maxiters reached’ error message this can be increased. Doing so arbitrarily will however increase the run time of maxsmooth.
initial_params: Default = None else list of length N
Allows the user to overwrite the default initial parameters used by CVXOPT.

constraints: Default = 2 else an integer less than or equal to N - 1

The minimum constrained derivative order which is set by default to 2 for a Maximally Smooth Function.
zero_crossings: Default = None else list of integers
Allows you to specify if the conditions should be relaxed on any of the derivatives between constraints and the highest order derivative. e.g. a 6th order fit with just a constrained 2nd and 3rd order derivative would have zero_crossings = [4, 5].
cap: Default = (len(available_signs)//N) + N else an integer
Determines the maximum number of signs explored either side of the minimum \({\chi^2}\) value found after the decent algorithm has terminated.
chi_squared_limit: Default = 2 else float or int
The prefactor on the maximum allowed increase in \({\chi^2}\) during the directional exploration which is defaulted at 2. If this value multiplied by the minimum \({\chi^2}\) value found after the descent algorithm is exceeded then the exploration in one direction is stopped and started in the other. For more details on this and ‘cap’ see the maxsmooth paper.

The following Kwargs can be used by the user to define their own basis function and will overwrite the ‘model_type’ kwarg.

basis_function: Default = None else function with parameters (x, y, pivot_point, N)

This is a function of basis functions for the quadratic programming. The variable pivot_point is the index at the middle of the datasets x and y by default but can be adjusted.

model: Default = None else function with parameters (x, y, pivot_point, N, params)

This is a user defined function describing the model to be fitted to the data.

der_pres: Default = None else function with parameters (m, x, y, N, pivot_point)

This function describes the prefactors on the mth order derivative used in defining the constraint.

derivatives: Default = None else function with parameters (m, x, y, N, pivot_point, params)

User defined function describing the mth order derivative used to check that conditions are being met.
args: Default = None else list
Extra arguments for smooth to pass to the functions detailed above.

Output

.y_fit: numpy.array
The fitted array of y data from smooth().
.optimum_chi: float
The optimum \({\chi^2}\) value for the fit calculated by,
\[{X^2=\sum(y-y_{fit})^2}.\]
.optimum_params: numpy.array
The set of parameters corresponding to the optimum fit.
.rms: float
The rms value of the residuals \({y_{res}=y-y_{fit}}\) calculated by,
\[{rms=\sqrt{\frac{\sum(y-y_{fit})^2}{n}}}\]

where \(n\) is the number of data points.

.derivatives: numpy.array
The \(m^{th}\) order derivatives.
.optimum_signs: numpy.array
The sign combinations corresponding to the optimal result. The nature of the constraint means that a negative maxsmooth sign implies a positive \({m^{th}}\) order derivative and visa versa.

best_basis()

As demonstrated, this function allows you to test the built in basis and their ability to fit the data. It produces a plot that shows chi squared as a function of \({N}\) for the 7 built in models and saves the figure to the base directory.

class maxsmooth.best_basis.basis_test(x, y, **kwargs)[source]

Parameters:

x: numpy.array
The x data points for the set being fitted.
y: numpy.array
The y data points for fitting.

Kwargs:

fit_type: Default = ‘qp-sign_flipping’
This kwarg allows the user to switch between sampling the available discrete sign spaces (default) or testing all sign combinations on the derivatives which can be accessed by setting to ‘qp’.
base_dir: Default = ‘Fitted_Output/’
The location of the outputted graph from function. This must be a string and end in ‘/’. If the file does not exist then the function will create it.

N: Default = [3, .., 13] in steps of 1 else list or numpy array of integers

The DCF orders to test each basis function with. In some instances the basis function may fail for a given \({N}\) and higher orders due to overflow/underflow errors or CVXOPT errors.

pivot_point: Default = len(x)//2 otherwise an integer between -len(x) and len(x)

Some of the built in models rely on pivot points in the data sets which by defualt is set as the middle index. This can be altered via this kwarg which can occasionally lead to a better quality fit.

constraints: Default = 2 else an integer less than or equal to N - 1

The minimum constrained derivative order which is set by default to 2 for a Maximally Smooth Function.
zero_crossings: Default = None else list of integers
Allows you to specify if the conditions should be relaxed on any of the derivatives between constraints and the highest order derivative. e.g. a 6th order fit with just a constrained 2nd and 3rd order derivative would have zero_crossings = [4, 5].
cap: Default = (len(available_signs)//N) + N else an integer
Determines the maximum number of signs explored either side of the minimum \({\chi^2}\) value found after the decent algorithm has terminated.
chi_squared_limit: Default = 2 else float or int
The prefactor on the maximum allowed increase in \({\chi^2}\) during the directional exploration which is defaulted at 2. If this value multiplied by the minimum \({\chi^2}\) value found after the descent algorithm is exceeded then the exploration in one direction is stopped and started in the other. For more details on this and ‘cap’ see the maxsmooth paper.
cvxopt_maxiter: Default = 10000 else integer
This shouldn’t need changing for most problems however if CVXOPT fails with a ‘maxiters reached’ error message this can be increased. Doing so arbitrarily will however increase the run time of maxsmooth.

chidist_plotter()

This function allows the user to produce plots of the chi squared distribution as a function of the available discrete sign spaces for the constrained derivatives. This can be used to identify whether or not the problem is ill defined, see the maxsmooth paper for a definition, and if it can be solved using the sign sampling approach.

It can also be used to determine whether or not the ‘cap’ and maximum allowed increase on the value of chi squared during the directional exploration are sufficient to identify the global minimum for the problem.

The function is reliant on the output of the maxsmooth smooth() function. The required outputs can be saved when running smooth() using the ‘data_save = True’ kwarg.

class maxsmooth.chidist_plotter.chi_plotter(N, **kwargs)[source]

Parameters:

N: int
The number of terms in the DCF.

Kwargs:

fit_type: Default = ‘qp-sign_flipping’
This kwarg is the same as for the smooth() function. Here it allows the files to be read from the base directory.
base_dir: Default = ‘Fitted_Output/’
The location of the outputted data from maxsmooth. This must be a string and end in ‘/’ and must contain the files ‘Output_Evaluations/’ and ‘Output_Signs/’ which can be obtained by running smooth() with data_save=True.
chi: Default = None else list or numpy array
A list of chi squared evaluations. If provided then this is used over outputted data in the base directory. It must have the same length as the ouputted signs in the file ‘Output_Signs/’ in the base directory. It must also be ordered correctly otherwise the returned graph will not be correct. A correct ordering is one for which each entry in the array corresponds to the correct sign combination in ‘Output_Signs/’. Typically this will not be needed but if the chi squared evaluation in ‘Output_Evaluations/’ in the base directory is not in the desired parameter space this can be useful. For example the built in logarithmic model calculates chi squared in logarithmic space. To plot the distribution in linear space we can calculate chi squared in linear space using a function for the model and the tested parameters which are found in ‘Output_Parameters/’ in the base directory.

constraints: Default = 2 else an integer less than or equal to N - 1

The minimum constrained derivative order which is set by default to 2 for a Maximally Smooth Function. Used here to determine the number of possible sign combinations available.
zero_crossings: Default = None else list of integers
Allows you to specify if the conditions should be relaxed on any of the derivatives between constraints and the highest order derivative. e.g. a 6th order fit with just a constrained 2nd and 3rd order derivative would have a zero_crossings = [4, 5]. Again this is used in determining the possible sign combinations available.
plot_limits: Default = False
Determines whether the limits on the directional exploration are plotted on top of the chi squared distribution.
cap: Default = (len(available_signs)//N) + N else an integer
Determines the maximum number of signs explored either side of the minimum chi squared value found after the decent algorithm has terminated when running smooth(). Here it is used when plot_limits=True.
chi_squared_limit: Default = 2 else float or int
The prefactor on the maximum allowed increase in chi squared during the directional exploration which is defaulted at 2. If this value multiplied by the minimum chi squared value found after the descent algorithm is exceeded then the exploration in one direction is stopped and started in the other. For more details on this and ‘cap’ see the maxsmooth paper. Again this is used here when plot_limits=True.

parameter_plotter()

This function allows you to plot the parameter space around the optimum solution found when running maxsmooth and visualise the constraints with contour lines given by chi squared.

class maxsmooth.parameter_plotter.param_plotter(best_params, optimum_signs, x, y, N, **kwargs)[source]

Parameters:

best_params: numpy.array
The optimum parameters found when running a DCF fit to the data.
optimum_signs: numpy.array
The optimum signs for the DCF fit which are used when the derivatives are equal to 0 across the band.
x: numpy.array
The x data points.
y: numpy.array
The y data points.
N: int
The number of terms in the DCF.

Kwargs:

model_type: Default = ‘difference_polynomial’
The functional form of the model being plotted. If a the user has defined their own basis they can supply this with the Kwargs below and this will be overwritten.
base_dir: Default = ‘Fitted_Output/’
The location in which the parameter plot is saved.

constraints: Default = 2 else an integer less than or equal to N - 1

The minimum constrained derivative order which is set by default to 2 for a Maximally Smooth Function. Used here to determine the number of possible sign combinations available.
zero_crossings: Default = None else list of integers
Allows you to specify if the conditions should be relaxed on any of the derivatives between constraints and the highest order derivative. e.g. a 6th order fit with just a constrained 2nd and 3rd order derivative would have an zero_crossings = [4, 5]. Again this is used in determining the possible sign combinations available.
samples: Default = 50
The sampling rate across the parameter ranges defined with the optimum solution and width.
width: Default = 0.5
The range of each parameter to explore. The default value of 0.5 means that the \({\chi^2}\) values for parameters ranging 50% either side of the optimum result are tested.
warnings: Default = True
Used to highlight when a derivative is 0 across the band and that in these instances the optimum signs are assumed for the colourmap if \({N \leq 5}\), constraints=2 and the zero_crossings is empty.
girdlines: Default = False
Plots gridlines showing the central value for each parameter in each panel of the plot.
center_plot: Default = False
Setting this equal to True will highlight the central region with a white circle.
data_plot: Default = False
Setting to True will plot the data, fitted model and the residuals, \({y - y_{fit}}\), alongside the parameter graph.

The following Kwargs are used to plot the parameter space for a user defined basis function and will overwrite the ‘model_type’ kwarg.

basis_function: Default = None else function with parameters (x, y, pivot_point, N)

This is a function of basis functions for the quadratic programming. The variable pivot_point is the index at the middle of the datasets x and y by default but can be adjusted.

model: Default = None else function with parameters (x, y, pivot_point, N, params)

This is a user defined function describing the model to be fitted to the data.

der_pres: Default = None else function with parameters (m, x, y, N, pivot_point)

This function describes the prefactors on the mth order derivative used in defining the constraint.

derivatives: Default = None else function with parameters (m, x, y, N, pivot_point, params)

User defined function describing the mth order derivative used to check that conditions are being met.
args: Default = None else list
Extra arguments for smooth to pass to the functions detailed above.

Change Log

Unreleased changes are not yet included in the pip install but are pushed to the github.

Unrealeased

Version 1.1.0

  • Two bug fixes in param_plotter()
  • Extension of param_plotter() function to plot data, fit and residuals alongside the parameter space if required.
  • Extension of param_plotter() to allow for highlighting of central regions in each panel if required.
  • Inclusion of some theory into the documentation

Version 1.2.0

  • Minor bug fix in param_plotter()
  • Extension of the basis_test() function to allow users to compare different types of DCF not just MSFs.