An entropy based input variable selection approach to identify equally informative subsets for data driven hydrological models

2015-04-15
Karakaya, Gülşah
Galelli, Stefano
Ahipaşoğlu, Selin Damla
Input Variable Selection (IVS) is an essential step in hydrological modelling problems, since it allows determining the optimal subset of input variables from a large set of candidates to characterize a preselected output. Interestingly, most of the existing IVS algorithms select a single subset, or, at most, one subset of input variables for each cardinality level, thus overlooking the fact that, for a given cardinality, there can be several subsets with similar information content. In this study, we develop a novel IVS approach specifically conceived to account for this issue. The approach is based on the formulation of a four-objective optimization problem that aims at minimizing the number of selected variables and maximizing the prediction accuracy of a data-driven model, while optimizing two entropy-based measures of relevance and redundancy. The redundancy measure ensures that the cross-dependence between the variables in a subset is minimized, while the relevance measure guarantees that the information content of each subset is maximized. In addition to the capability of selecting equally informative subsets, the approach is characterized by two other properties, namely 1) the capability of handling nonlinear interactions between the candidate input variables and preselected output, and 2) computational efficiency. These properties are guaranteed by the adoption of Extreme Learning Machine and Borg MOEA as data-driven model and heuristic optimization procedure, respectively. The approach is demonstrated on a long-term streamflow prediction problem, with the input dataset including both hydro-meteorological variables and climate indices representing dominant modes of climate variability. Results show that the availability of several equally informative subsets allows 1) determining the relative importance of each candidate input, thus supporting the understanding of the underlying physical processes, and 2) finding a better trade-off between multiple measures of prediction accuracy (e.g., RMSE, NSE, MAE) and the desired number of input variables.
European Geosciences Union (EGU) General Assembly 2015 (12 - 15 Nisan 2015)

Suggestions

An interactive approximation algorithm for multi-objective integer programs
Lokman, Banu; Korhonen, Pekka J.; Wallenius, Jyrki (2018-08-01)
We develop an interactive algorithm that approximates the most preferred solution for any multi-objective integer program with a desired level of accuracy, provided that the decision maker's (DM's) preferences are consistent with a nondecreasing quasiconcave value function. Using pairwise comparisons of the DM, we construct convex cones and eliminate the inferior regions that are close to being dominated by the cones in addition to the regions dominated by the cones. The algorithm allows the DM to change th...
An interactive algorithm for multiobjective ranking for underlying linear and quasiconcave value functions
TEZCANER ÖZTÜRK, DİCLEHAN; Köksalan, Mustafa Murat (Wiley, 2019-07-29)
We develop interactive algorithms to find a strict total order for a set of discrete alternatives for two different value functions: linear and quasiconcave. The algorithms first construct a preference matrix and then find a strict total order. Based on the ordering, they select a meaningful pair of alternatives to present the decision maker (DM) for comparison. We employ methods to find all implied preferences of the DM, after he or she makes a preference. Considering all the preferences of the DM, the pre...
An interactive approach for biobjective integer programs under quasiconvex preference functions
Ozturk, Diclehan Tezcaner; Köksalan, Mustafa Murat (2016-09-01)
We develop an interactive algorithm for biobjective integer programs that finds the most preferred solution of a decision maker whose preferences are consistent with a quasiconvex preference function to be minimized. During the algorithm, preference information is elicited from the decision maker. Based on this preference information and the properties of the underlying quasiconvex preference function, the algorithm reduces the search region and converges to the most preferred solution progressively. Findin...
A New MILP Model for Matrix Multiplications with Applications to KLEIN and PRINCE
İlter, Murat Burhan; Selçuk, Ali Aydın (2021-01-01)
Mixed integer linear programming (MILP) models are applied extensively in the field of cryptanalysis. Finding the minimum number of active S-boxes and the best differential characteristic in a differential attack are two main problems examined using the MILP approach. In this study, KLEIN and PRINCE block ciphers are modeled with MILP to search for an exact solution to these problems. Both ciphers contain matrix multiplication operations, which can be calculated using multiple xor operations. The standard M...
A clustering method for web data with multi-type interrelated components
Bolelli, Levent; Ertekin Bolelli, Şeyda; Zhou, Ding; Giles, C Lee (2007-05-08)
Traditional clustering algorithms work on "flat" data, making the assumption that the data instances can only be represented by a set of homogeneous and uniform features. Many real world data, however, is heterogeneous in nature, comprising of multiple types of interrelated components. We present a clustering algorithm, K-SVMeans, that integrates the well known K-Means clustering with the highly popular Support Vector Machines(SVM) in order to utilize the richness of data. Our experimental results on author...
Citation Formats
G. Karakaya, S. Galelli, and S. D. Ahipaşoğlu, “An entropy based input variable selection approach to identify equally informative subsets for data driven hydrological models,” presented at the European Geosciences Union (EGU) General Assembly 2015 (12 - 15 Nisan 2015), Vienna, Austria, 2015, Accessed: 00, 2021. [Online]. Available: https://hdl.handle.net/11511/84509.