Multi-perspective analysis and systematic benchmarking for binary-classification performance evaluation instruments

Download
2019
Canbek, Gürol
This thesis proposes novel methods to analyze and benchmark binary-classification performance evaluation instruments. It addresses critical problems found in the literature, clarifies terminology and distinguishes instruments as measure, metric, and as a new category indicator for the first time. The multi-perspective analysis introduces novel concepts such as canonical form, geometry, duality, complementation, dependency, and leveling with formal definitions as well as two new basic instruments. An indicator named Accuracy Barrier is also proposed and tested in re-evaluating performances of surveyed machine-learning classifications. An exploratory table is designed to represent all the concepts for over 50 instruments. The table’s real use cases such as domain-specific metrics reporting are demonstrated. Furthermore, this thesis proposes a systematic benchmarking method comprising 3 stages to assess metrics’ robustness over new concepts such as metametrics (metrics about metrics) and metric-space. Benchmarking 13 metrics reveals significant issues especially in accuracy, F1, and normalized mutual information conventional metrics and identifies Matthews Correlation Coefficient as the most robust metric. The benchmarking method is evaluated with the literature. Additionally, this thesis formally demonstrates publication and confirmation biases due to reporting non-robust metrics. Finally, this thesis gives recommendations on precise and concise performance evaluation, comparison, and reporting. The developed software library, analysis/benchmarking platform, visualization and calculator/dashboard tools, and datasets were also released online. This research is expected to re-establish and facilitate classification performance evaluation domain as well as contribute towards responsible open research in performance evaluation to use the most robust and objective instruments.

Suggestions

PToPI: A Comprehensive Review, Analysis, and Knowledge Representation of Binary Classification Performance Measures/Metrics
Canbek, Gürol; Taşkaya Temizel, Tuğba; SAĞIROĞLU, ŞEREF (2023-1-01)
Although few performance evaluation instruments have been used conventionally in different machine learning-based classification problem domains, there are numerous ones defined in the literature. This study reviews and describes performance instruments via formally defined novel concepts and clarifies the terminology. The study first highlights the issues in performance evaluation via a survey of 78 mobile-malware classification studies and reviews terminology. Based on three research questions, it propose...
Designing energy-efficient high-precision multi-pass turning processes via robust optimization and artificial intelligence
Khalilpourazari, Soheyl; Khalilpourazary, Saman; ÇİFTÇİOĞLU, AYBİKE ÖZYÜKSEL; Weber, Gerhard Wilhelm (Springer Science and Business Media LLC, 2020-09-01)
This paper suggests a novel robust formulation designed for optimizing the parameters of the turning process in an uncertain environment for the first time. The aim is to achieve the lowest energy consumption and highest precision. With this aim, the current paper considers uncertain parameters, objective functions, and constraints in the offered mathematical model. We proposed several uncertain models and validated the results in real-world case studies. In addition, several artificial intelligence-based s...
Multipath Characteristics of Frequency Diverse Arrays Over a Ground Plane
Cetintepe, Cagri; Demir, Şimşek (Institute of Electrical and Electronics Engineers (IEEE), 2014-07-01)
This paper presents a theoretical framework for an analytical investigation of multipath characteristics of frequency diverse arrays (FDAs), a task which is attempted for the first time in the open literature. In particular, transmitted field expressions are formulated for an FDA over a perfectly conducting ground plane first in a general analytical form, and these expressions are later simplified under reasonable assumptions. Developed formulation is then applied to a uniform, linear, continuous-wave opera...
Stable controller design for T-S fuzzy systems based on Lie algebras
Banks, SP; Gurkan, E; Erkmen, İsmet (Elsevier BV, 2005-12-01)
In this paper, we study the stability of fuzzy control systems of Takagi-Sugeno-(T-S) type based on the classical theory of Lie algebras. T-S fuzzy systems are used to model nonlinear systems as a set of rules with consequents of the type x(t) = A(l)x (t) + B(l)u (t). We conduct the stability analysis of such T-S fuzzy models using the Lie algebra LA generated by the A(l) matrices of these subsystems for each rule in the rule base. We first develop our approach of stability analysis for a commuting algebra ...
An experimental study for simulation based assessment of information system design performance
Ayyildiz, Bulent; Akman, Ibrahim; Arifoğlu, Ali (2007-07-04)
This paper presents an experimental study for evaluating the decision support value of queueing network (QN) based simulation models for information system design performance. For illustration, queueing network simulation models have been extracted corressponding to three annotated design alternatives of a selected case study. The design alternatives are produced using logical requirements of the selected system. The performance of each alternative is then predicted using quantifiable parameters considering...
Citation Formats
G. Canbek, “Multi-perspective analysis and systematic benchmarking for binary-classification performance evaluation instruments,” Thesis (Ph.D.) -- Graduate School of Natural and Applied Sciences. Information Systems., Middle East Technical University, 2019.