An information theoretic approach to select alternate subsets of predictors for data-driven hydrological models

2016-11-01
TAORMİNA, RİCCARDO
GALELLİ, STEFANO
Karakaya, Gülşah
Ahipasaoglu, S. D.
This work investigates the uncertainty associated to the presence of multiple subsets of predictors yielding data-driven models with the same, or similar, predictive accuracy. To handle this uncertainty effectively, we introduce a novel input variable selection algorithm, called Wrapper for Quasi Equally Informative Subset Selection (W-QEISS), specifically conceived to identify all alternate subsets of predictors in a given dataset. The search process is based on a four-objective optimization problem that minimizes the number of selected predictors, maximizes the predictive accuracy of a data-driven model and optimizes two information theoretic metrics of relevance and redundancy, which guarantee that the selected subsets are highly informative and with little intra-subset similarity. The algorithm is first tested on two synthetic test problems and then demonstrated on a real-world streamfiow prediction problem in the Yampa River catchment (US). Results show that complex hydro-meteorological datasets are characterized by a large number of alternate subsets of predictors, which provides useful insights on the underlying physical processes. Furthermore, the presence of multiple subsets of predictors and associated models helps find a better trade-off between different measures of predictive accuracy commonly adopted for hydrological modelling problems.

Citation Formats
R. TAORMİNA, S. GALELLİ, G. Karakaya, and S. D. Ahipasaoglu, “An information theoretic approach to select alternate subsets of predictors for data-driven hydrological models,” JOURNAL OF HYDROLOGY, vol. 542, pp. 18–34, 2016, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/35873.