# Poster: FOM Veldhoven 2016

## Gaussian Process Regression Models for Space Weather Prediction.

### Space Weather

Space weather is a branch of space physics and aeronomy concerned with the time varying conditions within the Solar System, including the solar wind, emphasizing the space surrounding the Earth, including conditions in the magnetosphere, ionosphere and thermosphere.

Space weather is distinct from the terrestrial weather of the Earth’s atmosphere (troposphere and stratosphere).

Source: Wikipedia

### Geomagnetic Activity Indices

Space Weather exhibits complex non-linear dynamics due to the high number of variables that one encounters in the physics based models, their inter dependences and complexity of the governing equations. It is therefore instructive from the point of view of prediction, to condense the geomagnetic response of the Earth to a set of representative indices, some of which are summarized in the table below.

Name | Significance | Frequency | Values |
---|---|---|---|

Kp | Global geomagnetic storm index and is based on 3 hour measurements of the K-indices, for a given value, for each of the past days | 3 hours | 0-9 |

Dst | Average ring current around magnetic equator | hourly | Real Number |

AE | The AE index is derived from geomagnetic variations in the horizontal component observed at selected (10-13) observatories along the auroral zone in the northern hemisphere | hourly | Real Number |

#### D_{st}

We focus on modeling the Disturbance Storm Time, though our results can be generalised for the AE and K_{p} indices as well. The chart below explains how different values of D_{st} relate to the state of the Earth’s magnetosphere.

Thus the modeling of D_{st} is important in the recognition and prediction of geo-magnetic disturbances and storms.

### Gaussian Process Regression

Given below is the formulation of a *Gaussian Process* regression model. For a detailed introduction on *Gaussian Processes* you can refer to the book written by Ramussen and Williams.

#### Assumptions

We assume that our target data are noisy observations of an unknown function . This modeling assumption leads to a *Stochastic Process* formulation for the prior distribution on this unknown function.

The existence of such a *stochastic process* is established in the Kolmogorov Extension Theorem with the assumption of existence of a positive semi-definite, symmetric covariance function .

We further assume that the finite dimensional distributions are multivariate gaussian, leading to the following set of equations for the finite dimensional distributions of the unknown function .

#### Formulation

#### Posterior Predictive Distribution

In the presence of training data , one may calculate using *Bayes Theorem* the posterior predictive distribution assuming , the test inputs are known.

### Gaussian Process D_{st} models: RBF vs FBM Kernels.

We model D_{st} as a scalar valued function of the solar wind speed.

In the equations above, and are the hyper-parameters of the *Fractional Brownian* and *Radial Basis Function* kernels respectively, when training *Gaussian Process* models, it is imperetive to choose optimal values of these hyper-parameters which can be achieved by a number of means (refer Ramussen and Williams for further details). Some of the common algorithms applied for this purpose are based on maximum likelihood and cross-validation techniques. In this case we have used maximum likelihood driven search on a pre-defined grid of hyper-parameters.

We compare the performance of two *Gaussian Process* regression models for D_{st}, one with the *Radial Basis Function* kernel given by and the *Fractional Brownian Motion* kernel given by . Both models are trained and tested on sub-sampled versions of the Omni data from the years 2007 and 2006 respectively.

### Results

The performance metrics of both the constructed models on the test set are summarized below.

Kernel | Data: Train, Test | MAE | RMSE | R^{2} |
---|---|---|---|---|

RBF | 300, 1000 | 1.5044 | 6.9752 | 0.7925 |

FBM | 300, 1000 | 0.0312 | 0.0461 | 0.9999 |

While both models do a decent job of predicting the D_{st} index given the solar wind velocity, a closer look at the residual histograms and goodness of fit charts shows the differences between them.

#### Radial Basis Function Kernel

The *Radial Basis Function* kernel which tries to fit smooth splines to the data exhibits an interesting pathology: it is unable to predict anamalous geo-magnetic conditions, namely it can not predict with sufficient reliability the onset of geo-magnetic storms (). This can be observed in the long tails in its error distribution and on the goodness of fit one can clearly observe those points as being far away from the “best fit” regression line.

Plot | RBF Kernel |
---|---|

Fit | |

Histogram |

#### Fractional Brownian Motion Kernel

The *Fractional Brownian* kernel gives far more accurate D_{st} predictions for both slow and turbulent solar wind conditions, pointing to the idea that fitting smooth splines is not a reasonable modeling assumption when learning models of the form .

Plot | FBM Kernel |
---|---|

Fit | |

Histogram |