Thesis: Machine Learning Applied To Build ‘Nowcasting’ Models to Predict Irish Rainfall

2 minute read

Rain forecasting plays a significant part in our daily lives. Just as long term forecasting, the short term forecasting (Nowcasting) is important as well. While planning for activities, such as outdoor sports, daily commute or even tours, it becomes imperative to check the rain first. Nowcasting is also of interest in airports since extreme conditions, particularly strong winds or storms, are not desired while flight take-off/landings.

In this study, the rainfall for the next 12 hours is forecasted based on the statistical and the machine learning techniques, rather than the traditional Numerical Weather Prediction (NWP) techniques that rely on computationally intensive models / machines. Though rainfall is forecasted to the next 12 hours in this study, the focus is on very short term rain forecasting for up to 3-6 hours. The goal is to test whether the statistical and the machine learning techniques have any skill (over “persistence”) in very short-range forecasting of the the occurrence (or non-occurrence) and intensity of hourly rainfall. The study has focused on rainfall (rather than wind speeds, temperature or other atmospheric parameters) as it is a very “noisy” parameter that heavily challenges machine learning techniques. The data is gathered from the Dublin Airport weather station. The historical hourly Rainfall data with several parameters such as, Pressure, Temperature, Specific Humidity and Wind Speed, is modelled as times series. The most significant parameters that determine future rainfall are determined. The stationary time series rainfall data are fitted with the Vector AutoRegression (VAR), the AutoRegressive Integrated Moving Average (ARIMA) and the Neural Network models in order to determine which method yields the most accurate forecasts. The persistence model, where the last observed rainfall is forecasted for the next twelve hours, is considered as the baseline and the other models are compared to this baseline model to know the skills of these models. While the persistence model seems very basic, it has considerably good predicting power, since if it is dry weather now, it is likely to be dry in the next few hours. The accuracy metrics Root Mean Squared Errors (RMSE) and Mean Absolute Percentage Errors (MAPE) are used as the performance measures. In addition, skill scores, which are based on Mean Squared Errors(MSE), are also used to determine how the given model performs compared to the baseline model. Several case studies are conducted to understand how the different models differ in forecasting rainfall. While the hourly case studies show different cases of varying rainfall throughout the day and the respective forecasts of the models, the monthly case studies compares the different hourly forecasts of the models. On examination of RMSE, MAPE and skill score values, and the hourly and monthly case studies, it is found that the VAR model is the most accurate model, closely followed by the ARIMA model. Though Neural Network model took the last place in terms of the accuracy, it is better for accurately forecasting imminent peaks in rainfall.

The thesis can be found here: Thesis