# Poster: FOM Veldhoven 2016

## Gaussian Process Regression Models for Space Weather Prediction.

### Space Weather

Space weather is a branch of space physics and aeronomy concerned with the time varying conditions within the Solar System, including the solar wind, emphasizing the space surrounding the Earth, including conditions in the magnetosphere, ionosphere and thermosphere.

Space weather is distinct from the terrestrial weather of the Earth’s atmosphere (troposphere and stratosphere).

Source: Wikipedia

### Geomagnetic Activity Indices

Space Weather exhibits complex non-linear dynamics due to the high number of variables that one encounters in the physics based models, their inter dependences and complexity of the governing equations. It is therefore instructive from the point of view of prediction, to condense the geomagnetic response of the Earth to a set of representative indices, some of which are summarized in the table below.

Name Significance Frequency Values
Kp Global geomagnetic storm index and is based on 3 hour measurements of the K-indices, for a given value, for each of the past days 3 hours 0-9
Dst Average ring current around magnetic equator hourly Real Number
AE The AE index is derived from geomagnetic variations in the horizontal component observed at selected (10-13) observatories along the auroral zone in the northern hemisphere hourly Real Number

#### Dst

We focus on modeling the Disturbance Storm Time, though our results can be generalised for the AE and Kp indices as well. The chart below explains how different values of Dst relate to the state of the Earth’s magnetosphere.

Thus the modeling of Dst is important in the recognition and prediction of geo-magnetic disturbances and storms.

### Gaussian Process Regression

Given below is the formulation of a Gaussian Process regression model. For a detailed introduction on Gaussian Processes you can refer to the book written by Ramussen and Williams.

#### Assumptions

We assume that our target data are noisy observations of an unknown function $f(x)$. This modeling assumption leads to a Stochastic Process formulation for the prior distribution on this unknown function.

The existence of such a stochastic process is established in the Kolmogorov Extension Theorem with the assumption of existence of a positive semi-definite, symmetric covariance function $C(x,y): \Omega \times \Omega \rightarrow \mathbb{R} \ \ x,y \in \Omega$.

We further assume that the finite dimensional distributions are multivariate gaussian, leading to the following set of equations for the finite dimensional distributions of the unknown function $f(x)$.

#### Posterior Predictive Distribution

In the presence of training data $X = (x_1, x_2, \cdot , x_n) \ y = (y_1, y_2, \cdot , y_n)$, one may calculate using Bayes Theorem the posterior predictive distribution $\mathbf{f_*}|X,\mathbf{y},X_*$ assuming $X_*$, the test inputs are known.

### Gaussian Process Dst models: RBF vs FBM Kernels.

We model Dst as a scalar valued function of the solar wind speed.

In the equations above, $H \in (0,1]$ and $\sigma$ are the hyper-parameters of the Fractional Brownian and Radial Basis Function kernels respectively, when training Gaussian Process models, it is imperetive to choose optimal values of these hyper-parameters which can be achieved by a number of means (refer Ramussen and Williams for further details). Some of the common algorithms applied for this purpose are based on maximum likelihood and cross-validation techniques. In this case we have used maximum likelihood driven search on a pre-defined grid of hyper-parameters.

We compare the performance of two Gaussian Process regression models for Dst, one with the Radial Basis Function kernel given by $C_{rbf}$ and the Fractional Brownian Motion kernel given by $C_{fbm}$. Both models are trained and tested on sub-sampled versions of the Omni data from the years 2007 and 2006 respectively.

### Results

The performance metrics of both the constructed models on the test set are summarized below.

Kernel Data: Train, Test MAE RMSE R2
RBF 300, 1000 1.5044 6.9752 0.7925
FBM 300, 1000 0.0312 0.0461 0.9999

While both models do a decent job of predicting the Dst index given the solar wind velocity, a closer look at the residual histograms and goodness of fit charts shows the differences between them.

The Radial Basis Function kernel which tries to fit smooth splines to the data exhibits an interesting pathology: it is unable to predict anamalous geo-magnetic conditions, namely it can not predict with sufficient reliability the onset of geo-magnetic storms ($D_{st} \leq -100 nT$). This can be observed in the long tails in its error distribution and on the goodness of fit one can clearly observe those points as being far away from the “best fit” regression line.

Plot RBF Kernel
Fit
Histogram

#### Fractional Brownian Motion Kernel

The Fractional Brownian kernel gives far more accurate Dst predictions for both slow and turbulent solar wind conditions, pointing to the idea that fitting smooth splines is not a reasonable modeling assumption when learning models of the form $D_{st}(v_{solar wind})$.

Plot FBM Kernel
Fit
Histogram