An information theoretic approach to select alternate subsets of predictors for data-driven hydrological models

Karakaya, Gülşah
Ahipasaoglu, S. D.
This work investigates the uncertainty associated to the presence of multiple subsets of predictors yielding data-driven models with the same, or similar, predictive accuracy. To handle this uncertainty effectively, we introduce a novel input variable selection algorithm, called Wrapper for Quasi Equally Informative Subset Selection (W-QEISS), specifically conceived to identify all alternate subsets of predictors in a given dataset. The search process is based on a four-objective optimization problem that minimizes the number of selected predictors, maximizes the predictive accuracy of a data-driven model and optimizes two information theoretic metrics of relevance and redundancy, which guarantee that the selected subsets are highly informative and with little intra-subset similarity. The algorithm is first tested on two synthetic test problems and then demonstrated on a real-world streamfiow prediction problem in the Yampa River catchment (US). Results show that complex hydro-meteorological datasets are characterized by a large number of alternate subsets of predictors, which provides useful insights on the underlying physical processes. Furthermore, the presence of multiple subsets of predictors and associated models helps find a better trade-off between different measures of predictive accuracy commonly adopted for hydrological modelling problems.


Impact of Rescaling Approaches in Simple Fusion of Soil Moisture Products
Afshar, Mahdı Hesamı ; Yılmaz, Mustafa Tuğrul (American Geophysical Union (AGU), 2019-09-10)
In this study, the impact of various rescaling approaches in the framework of data fusion is explored. Four different soil moisture products (Advanced Scatterometer; Advanced Microwave Scanning Radiometer for EOS, AMSR-E; Antecedent Precipitation Index; and Global Land Data Assimilation System-NOAH) are fused. The systematic differences between products are removed before the fusion utilizing various rescaling approaches focusing on different methods (regression, variance/cumulative distribution function (C...
ULA, TA (Elsevier BV, 1992-12-01)
Certain aspects of data generation are studied through multivariate autoregressive (AR) models. The main emphasis is on the preservation of certain desired moments and the effect of initial values on these moments. The problem of preservation of moments is approached in a nontraditional way by starting with the initial values. For this purpose, general AR processes with a random start and with time-varying parameters are introduced to lay a foundation for the analysis of all types of AR processes, including...
Performance evaluation of satellite- and model-based precipitation products over varying climate and complex topography
Amjad, Muhammad; Yılmaz, Mustafa Tuğrul; Yücel, İsmail; Yılmaz, Koray Kamil (Elsevier BV, 2020-05-01)
Accuracy assessment of precipitation retrievals is a pre-requisite for many hydrological studies as it helps to understand the source and the magnitude of the uncertainty in hydrological response variables, particularly over regions with complex topography. This study evaluates GPM IMERGv05, TMPA 3B42V7, ERA-Interim, and ERA5 precipitation products using 256 ground-based gauge stations between 2014 and 2018 over Turkey known to have complex topography and varying climate. Error statistics, categorical perfo...
Periodic stationarity conditions for periodic autoregressive moving average processes as eigenvalue problems
Ula, TA; Smadi, AA (American Geophysical Union (AGU), 1997-08-01)
The determination of periodic stationarity conditions for periodic autoregressive moving average (PARMA) processes is a prerequisite to their analysis. Means of obtaining these conditions in analytically simple forms are sought. It is shown that periodic stationarity conditions for univariate and multivariate PARMA processes can always be reduced to eigenvalue problems, which are computationally and analytically easier to deal with. Two different lumpings of the periodic process are considered along this li...
A genetic algorithm for TSP with backhauls based on conventional heuristics
Önder, İlter; Özdemirel, Nur Evin; Department of Information Systems (2007)
A genetic algorithm using conventional heuristics as operators is considered in this study for the traveling salesman problem with backhauls (TSPB). Properties of a crossover operator (Nearest Neighbor Crossover, NNX) based on the nearest neighbor heuristic and the idea of using more than two parents are investigated in a series of experiments. Different parent selection and replacement strategies and generation of multiple children are tried as well. Conventional improvement heuristics are also used as mut...
Citation Formats
R. TAORMİNA, S. GALELLİ, G. Karakaya, and S. D. Ahipasaoglu, “An information theoretic approach to select alternate subsets of predictors for data-driven hydrological models,” JOURNAL OF HYDROLOGY, pp. 18–34, 2016, Accessed: 00, 2020. [Online]. Available: