Introduction to uGMAR

Savi Virolainen


The package uGMAR contains tools to estimate and analyze univariate Gaussian mixture autoregressive (GMAR), Student’s t mixture autoregressive (StMAR) and Gaussian and Student’s t mixture autoregressive (G-StMAR) models. We refer to these three models as the GSMAR models. This vignette does not explain details about the models and it’s assumed that the reader is familiar with the cited articles introducing the models. There are currently no published references for the G-StMAR model, so it will be discussed briefly in the section “The G-StMAR model”.

The models in uGMAR are defined as class gsmar S3 objects whose can be created with the estimation function fitGSMAR or with the constructor function GSMAR. The created gsmar objects can then be conveniently used as main arguments in several other functions, allowing one for example to perform quantile residual based model diagnostics, simulate from the processes, and to forecast. It’s thus easy to carry out further analyses after the model has been estimated. Some tasks, however, such as setting up initial population for the genetic algorithm, applying linear constraints or building gsmar models with pre-specified parameter values, require knowledge of the model details such as the form of the parameter vector. These details will be therefore explained in this vignette.

The rest of this vignette is organized as follows. In the first section it’s explained what the G-StMAR model is and when and why one should use it. In the second section notations for the parameter vector are described in detail, and it’s also shown how to apply constraints to autoregressive parameters of the models. Finally, in the third section some useful functions provided by uGMAR are briefly described.

The G-StMAR model

To motivate introduction of the G-StMAR model, observe that the conditional variances of the component processes of the StMAR model depend on the past observations through the same parameters as the conditional means. This specific formulation is required to obtain the stationary Student’s t autoregressions, and it also enables the model to capture stronger forms conditional heteroskedasticity than the Gaussian case. However, this relation between the component specific conditional variance and mean might be restrictive if some regimes exhibit strong conditional mean but weak conditional variance (or vice versa). If such case occurs in a series modeled with the StMAR model, the degrees of freedom parameters of the corresponding regimes will get large estimates, for it makes the component process’s conditional variance of approach a constant variance parameter (which equals conditional variance of the linear Gaussian autoregressions that the GMAR model considers as its component processes). Moreover, since a large degrees of freedom parameter value makes the shape the innovations’ distribution to resemble Gaussian distribution, it seems natural to allow these regimes to be Gaussian, resulting the G-StMAR model.

The G-StMAR model can be described as a combination of the GMAR and the StMAR model. Its first M1 component processes are taken to be linear Gaussian autoregressions as in the GMAR model, and the rest M2 component processes are taken to be conditionally heteroscedastic linear Student’s t autoregressions as in the StMAR model, yielding total of M1+M2=M mixture components. Theoretical and practical properties of the G-StMAR model are similar to the ones of the GMAR model and the StMAR model. The conditional and also stationary distribution of a G-StMAR process is a mixture of Gaussian distributions and Student’s t distributions.

Besides reducing redundant parameters from the model, using the G-StMAR model instead of the StMAR model with large degrees of freedom parameters also comes with further advantages. One problem with very large degrees of freedom parameters is that their profile log-likelihoods are very flat near the estimates implying very low information indicating a near-indentification problem. Moreover, very large degrees of freedom parameter values cause inconsistent numerical inaccuracies in calculation of the log-likelihood function values and thus makes numerical approximations its derivatives biased. Approximate standard errors (obtained from numerically approximated observed information matrix) therefore become spurious and the first and second order conditions inaccessible for checking whether an estimate denotes a maximum point. The problems disappear when one reduces the problematic degrees of freedom parameters by switching to a G-StMAR model. Furthermore, uGMAR’s implementation of the quantile residual tests proposed by Kalliovirta (2012) is not applicable for models with such near-identification problem.

Parameter vector and constraints

Building a GSMAR model requires the user to specify the autoregressive (AR) order of the model p and the number of mixture components M. For the G-StMAR model one has to define the number of GMAR type components M1 and the number of StMAR type components M2. If one wished to build a model with pre-specified parameter values rather than estimating them, knowledge of the exact form of the parameter vector is obviously necessary. In uGMAR, the form of the parameter vector depends on specifics of the model: is GMAR, StMAR or G-StMAR model considered, are all the AR coefficients restricted to be the same for all regimes and/or are linear constraints applied to the AR-parameters? It’s vital to use the correct type of parameter vector accordingly.

Unconstrained GSMAR models

In the following, the intercept parametrization with intercept parameters \(\phi_{m,0}\) is considered. One may alternatively use the mean parametrization; in that case, one simply needs to replace each intercept parameter with the corresponding mean parameter \(\mu_m=\phi_{m,0}/(1-\sum_{i=1}^p\phi_{i,m}),\enspace m=1,...,M.\)

The GMAR model

The parameter vector for unconstrained GMAR model is a size (M(p+3)-1)x1 vector of the form \[\boldsymbol{\theta}=(\boldsymbol{\upsilon_{1}},...,\boldsymbol{\upsilon_{M}}, \alpha_{1},...,\alpha_{M-1}),\quad where\] \[\boldsymbol{\upsilon_{m}}=(\phi_{m,0},\boldsymbol{\phi_{m}}, \sigma_{m}^2) \enspace and \enspace \boldsymbol{\phi_{m}}=(\phi_{m,1},...,\phi_{m,p}) ,\quad m=1,...,M.\] Symbol \(\phi_{m,i}\) denotes an AR coefficient, \(\sigma_m^2\) is a variance parameter and \(\alpha_m\) a mixing weight parameter.

The StMAR model

For the StMAR model, the parameter vector has to be expanded to include the degrees of freedom parameters. The parameter vector for unconstrained StMAR model is thus a size (M(p+4)-1)x1 vector of the form \[(\boldsymbol{\theta}, \boldsymbol{\nu}),\quad where \quad \boldsymbol{\nu}=(\nu_{1},...,\nu_{M})\] contains the degrees of freedom parameters and the parameter \(\boldsymbol{\theta}\) is as in the case of the GMAR model. To ensure existence of finite second moments, the degrees of freedom parameters \(\nu_{m}\) are assumed to be larger than \(2\).

The G-StMAR model

In the G-StMAR model the first M1 components are GMAR type and the rest M2 components are StMAR type. Parameter vector of the G-StMAR model is similar to the one of the StMAR model but with M2 degrees of freedom parameters for the StMAR components. That is, a size (M(p+3)+M2-1)x1 vector of the form \[(\boldsymbol{\theta}, \boldsymbol{\nu}),\quad where \quad \boldsymbol{\nu}=(\nu_{M1+1},...,\nu_{M})\] contains the degrees of freedom parameters and the parameter \(\boldsymbol{\theta}\) is as in the case of the GMAR model. As in the StMAR case, the degrees of freedom parameters are assumed to be larger than two.

Restricted GSMAR models

In addition to unconstrained GSMAR models, uGMAR gives an option to analyze restricted models whose AR coefficients \(\phi_{m,1},...,\phi_{m,p}\) are restricted to be the same for all regimes \(m=1,..,M\). Structure of the parameter vector is different for restricted and non-restricted models.

The GMAR model

Parameter vector of the restricted GMAR model is a size (3M-p+1)x1 vector of the form \[\boldsymbol{\theta}=(\phi_{1,0},...,\phi_{M,0},\boldsymbol{\phi},\sigma_{1}^2,...,\sigma_{M}^2,\alpha_{1},...,\alpha_{M-1}), \quad where \quad \boldsymbol{\phi}=(\phi_{1},...,\phi_{p}).\]

The StMAR model

Parameter vector of the restricted StMAR model is then defined by adding the degrees of freedom parameters, yielding a size (4M-p+1)x1 vector of the form \[(\boldsymbol{\theta}, \boldsymbol{\nu}),\quad where \quad \boldsymbol{\nu}=(\nu_{1},...,\nu_{M})\] again contains the degrees of freedom parameters and parameter \(\boldsymbol{\theta}\) is as in the case of the GMAR model.

G-StMAR model

Parameter vector of the restricted G-StMAR model is similar to the StMAR model’s one but with M2 degrees of freedom parameters for the StMAR type components.

So one will have to work with different kind of parameter vectors depending on whether you work with restricted or non-restricted model. In order to restrict the AR parameters or to implicate that the parameter vector is restricted, one needs to supply the considered function with the argument restricted=TRUE.

Applying linear constraints and how it affects the parameter vector

uGMAR makes it easy to apply linear constraints to the autoregressive parameters of GSMAR models. Considering non-restricted models, each mixture component has its own constraint matrix. uGMAR considers constraints of the form \[\boldsymbol{\phi_{m}}=\boldsymbol{C_{m}\psi_{m}}, \enspace m=1,...,M,\] where \(\boldsymbol{C_{m}}\) is a known size \((pxq_{m})\) constraint matrix of full column rank and \(\boldsymbol{\psi_{m}}\) is a size \((q_{m}x1)\) parameter vector. Observe that this particular specification for linear constraints is not the most general one. However, it keeps the constraint matrices small and simple, and it’s convenient for applying the most typical constraints such as constraining some of the AR coefficients to be zero. For instance, in order to constraint the second AR coefficient of the second regime to zero in a model with p=2 and M=2, the constraint matrix for the first regime is diag(2) implying no constraints, while the constraint matrix for the second regime is simply matrix(1:0).

For further illustration, consider the following special case of linear constraints. We obtain a mixture version of the Heterogenous Autoregressive model (see Corsi 2009 for the original version) by setting \[\boldsymbol{C_{m}}=\left[{\begin{array}{ccc} \boldsymbol{\iota}_{5} & \frac{1}{5}\boldsymbol{1}_{5} & \frac{1}{22}\boldsymbol{1}_{5} \\ 0_{17} & 0_{17} & \frac{1}{22}\boldsymbol{1}_{17} \\ \end{array}}\right],\] where \(\boldsymbol{\iota}_{5}=[1,0,0,0,0]'\) for all regimes \(m=1,...,M\) and applying the constraints to the GMAR(22,M) model.

In order to apply the linear constraints in uGMAR, one simply needs to parametrize the model with vectors \(\boldsymbol{\psi_{m}}\) instead of \(\boldsymbol{\phi_{m}}\) and provide the constraint matrices \(\boldsymbol{C_{m}}\) in the argument constraints (or if one estimates the parameters, only the constraint matrices need to be provided). Note that despite the lengths of \(\boldsymbol{\psi_{m}}\), the nominal order of AR coefficients is always \(p\) for all regimes.

Non-restricted GSMAR models

Similarly to the case of unconstrained GMAR model, parameter vector for the constrained GMAR model is of the form \[\boldsymbol{\theta}=(\boldsymbol{\upsilon_{1}},...,\boldsymbol{\upsilon_{M}}, \alpha_{1},...,\alpha_{M-1}),\] but now the vectors \(\boldsymbol{\upsilon_{m}}\) are defined by using the vectors \(\boldsymbol{\psi_{m}}\), that is, \[\boldsymbol{\upsilon_{m}}=(\phi_{m,0},\boldsymbol{\psi_{m}}, \sigma_{m}^2) \enspace and \enspace \boldsymbol{\psi_{m}}=(\psi_{m,1},...,\psi_{m,q_{m}}), \enspace m=1,...,M.\] The user has to also provide a list of constraint matrices \(\boldsymbol{R_{m}}\) that satisfy \(\boldsymbol{\phi_{m}}=\boldsymbol{R_{m}\psi_{m}}\) for all \(m=1,...,M.\)

Parameter vector for the constrained StMAR model is defined by simply adding the degrees of freedom parameters to the GMAR’s parameter vector, that is, \[(\boldsymbol{\theta}, \boldsymbol{\nu}),\quad where \quad \boldsymbol{\nu}=(\nu_{1},...,\nu_{M}),\] and \(\boldsymbol{\theta}\) is as in the case of constrained GMAR model.

Parameter vector for the constrained G-StMAR model is similar to the one of the constrained StMAR model, but with degrees of freedom parameters for the StMAR components only.

Restricted GSMAR models

Analogously non-restricted models, the parameter vectors for constrained versions of restricted GSMAR models are defined by simply replacing vector \(\boldsymbol{\phi}\) with vector \(\boldsymbol{\psi}\). Hence the parameter vector for restricted and constrained GMAR model has the form \[\boldsymbol{\theta}=(\phi_{1,0},...,\phi_{M,0},\boldsymbol{\psi},\sigma_{1}^2,...,\sigma_{M}^2,\alpha_{1},...,\alpha_{M-1}), \quad where \quad \boldsymbol{\psi}=(\psi_{1},...,\psi_{p}).\] The constraint matrix \(\boldsymbol{C}\) needs to be provided and it’s assumed to satisfy \(\boldsymbol{\phi}=\boldsymbol{R\psi}.\)

Parameter vector for the restricted and constrained StMAR model is then again defined by adding the degrees of freedom parameters, that is \((\boldsymbol{\theta}, \boldsymbol{\nu})\) where \(\boldsymbol{\nu}=(\nu_{1},...,\nu_{M}).\) For the restricted and constrained G-StMAR model, the parameter vector is similar to the one of restricted and constrained StMAR model but with degrees of freedom parameters for the StMAR components only.

Useful functions in uGMAR

Estimating a GSMAR model

The function used to estimate models in uGMAR is fitGSMAR. It estimates the model parameters using the method of maximum likelihood and employs a hybrid estimation scheme that is performed in two phases. In the first phase fitGSMAR uses a genetic algorithm to find starting values for the gradient based variable metric algorithm, which it then uses in the second phase for finalize the estimation. It’s important to note that it’s not guaranteed that the numerical estimation algorithms end up in the global maximum point rather than a local one or a saddle point. Because of multimodality and challenging surface of the log-likelihood function, it’s actually expected that many of the estimation rounds won’t find the global maximum point. For this reason one should always perform multiple estimation rounds since more estimation rounds yield more reliable result. The number of estimation rounds can be chosen with the argument ncalls but multiple estimation rounds is also performed default. To shorten the estimation time, uGMAR uses parallel computing to run multiple estimation rounds in parallel. The number of cores used can be set with the argument ncores.

There is also an option to perform some quantile residual tests for the estimated model to get a quick sense on how the model fits to the data.

If the model estimates poorly, it’s often because the number of mixture components is chosen too large. One may also adjust settings of the genetic algorithm employed, or set up an initial population with guesses for the estimates. This can by done by passing arguments in fitGSMAR to the (non-exported) function GAfit which implements the genetic algorithm. To check the available settings, read the documentation ?GAfit. If the iteration limit is reached when estimating the model, the function iterate_more can be used to finish the estimation.

The parameters of the estimated model are printed in an illustrative and easy to read form. In order to easily compare approximate standard errors to certain estimates, it’s advisable to use the summary method, which prints the errors inside brackets next to the estimates. Numerical approximation of the gradient and Hessian matrix of the log-likelihood function at the estimates can be obtained conveniently with the functions get_gradient and get_hessian. The estimated objects also have their own plot method.

Use the function ‘stmar_to_gstmar’ in order to conveniently switch from a StMAR model with large degrees of freedom estimates to the corresponding G-StMAR model.

Model diagnostics

uGMAR considers model diagnostics based on quantile residuals (see Kalliovirta 2012). Quantile residuals are asymptotically standard normal distributed if the model is correctly specified, and they can be hence used for graphical diagnostics and testing.

The function quantileResidualTests performs the quantile residual tests introduced by Kalliovirta (2012), testing for normality, autocorrelation and conditional heteroscedasticity. For graphical diagnostics, one may use the functions diagnosticPlot and quantileResidualPlot.

Consider installing the suggested package gsl for much faster evaluations of quantile residuals in the cases of StMAR and G-StMAR models. If the model and data are both large, performing quantile residuals tests may take significantly long time for StMAR and G-StMAR models without the package gsl because numerical integration is used. It’s not imported because, in our experience, it might not install to some platforms directly when installing uGMAR.

Constructing class ‘gsmar’ model without estimation

One may wish to construct an arbitrary model without estimating the parameters, for example in order to simulate from the particular process of interest. An arbitrary model can be created with the function GSMAR. If one wants to add or update data to the model afterwards, it’s advisable to use the function add_data.

Simulating a GSMAR process

The function simulateGSMAR is the one for the job. As the main argument it uses a gsmar object created with fitGSMAR or GSMAR.

Forecasting GSMAR process

We advice to directly use the function simulateGSMAR for quantile based forecasting. However, uGMAR contains the predict method predict.gsmar for forecasting GSMAR processes. For one step predictions using the exact formula for conditional mean is supported, but the forecasts further than that are based on independent simulations. The predictions are either sample means or medians and the confidence intervals are based on sample quantiles. The objects generated by predict.gsmar have their own plot method.

Multivariate versions of the models

For analysing multivariate versions of the model, you are welcome to try the package gmvarkit. It currently supports the GMVAR model which is the multivariate extension of the GMAR model.