Cover Image

Analysis of a new spatial interpolation weighting method to estimate missing data applied to rainfall records

Jorge Luis Morales, Francisco Antonio Horta -Rangel, Ignacio Segovia-Domínguez, Agustín Robles Morua, Jesús Horacio Hernández

Abstract

In the present work, two new generalized weighted methods of imputation of missing data are developed and tested using a daily rainfall series. The proposed methodology allows to fully rebuild the time series while preserving its statistical properties. Rainfall records in the state of Tabasco, Mexico, during the period 1980-2012 were used to test and evaluate the proposed methodology. The imputation of missing data in a given weather station is performed by using daily data from neighboring stations with a similar rainfall behavior. The choice of optimal parameters for the proposed formulae is based on minimizing the mean absolute error (MAE) via an evolutionary strategy (CMA-ES). The K-means method was used with the Euclidean distance in order to select the adequate neighboring weather stations. Five different methods were applied to estimate the optimal number of clusters: the elbow method, gap statistics, TraceW, Hartigan and Krzanowski-Lai indices. In addition, the structural stability of the chosen clusters was evaluated in order to demonstrate that these represent the correct data structure and are not the result of an artificial internal procedure of the grouping algorithm. Results from two different statistical tests, Friedman and Nemenyi post hoc, showed that our two new methods produce significantly and statistically better estimation when compared to existing methods in the literature. 

Keywords

missing data; rainfall data; K-means clustering; optimization; deterministic interpolation methods

Full Text:

PDF