PToPI: A Comprehensive Review, Analysis, and Knowledge Representation of Binary Classification Performance Measures/Metrics

Canbek, Gürol
Taşkaya Temizel, Tuğba
Although few performance evaluation instruments have been used conventionally in different machine learning-based classification problem domains, there are numerous ones defined in the literature. This study reviews and describes performance instruments via formally defined novel concepts and clarifies the terminology. The study first highlights the issues in performance evaluation via a survey of 78 mobile-malware classification studies and reviews terminology. Based on three research questions, it proposes novel concepts to identify characteristics, similarities, and differences of instruments that are categorized into ‘performance measures’ and ‘performance metrics’ in the classification context for the first time. The concepts reflecting the intrinsic properties of instruments such as canonical form, geometry, duality, complementation, dependency, and leveling, aim to reveal similarities and differences of numerous instruments, such as redundancy and ground-truth versus prediction focuses. As an application of knowledge representation, we introduced a new exploratory table called PToPI (Periodic Table of Performance Instruments) for 29 measures and 28 metrics (69 instruments including variant and parametric ones). Visualizing proposed concepts, PToPI provides a new relational structure for the instruments including graphical, probabilistic, and entropic ones to see their properties and dependencies all in one place. Applications of the exploratory table in six examples from different domains in the literature have shown that PToPI aids overall instrument analysis and selection of the proper performance metrics according to the specific requirements of a classification problem. We expect that the proposed concepts and PToPI will help researchers comprehend and use the instruments and follow a systematic approach to classification performance evaluation and publication.
SN Computer Science


Accuracy Barrier (ACCBAR): A novel performance indicator for binary classification
Canbek, Gurol; Taşkaya Temizel, Tuğba; SAĞIROĞLU, ŞEREF (2022-01-01)
Although several binary classification performance metrics have been defined, a few of them are used for performance evaluation of classifiers and performance comparison/reporting in the literature. Specifically, F1 and Accuracy (ACC) are the most known and conventionally used metrics. Despite their popularity and easy-to-understand characteristics, those metrics exhibit critical robustness issues. This paper suggests a new instrument category named 'performance indicators' and proposes a novel indicator na...
Multi-perspective analysis and systematic benchmarking for binary-classification performance evaluation instruments
Canbek, Gürol; Taşkaya Temizel, Tuğba; Department of Information Systems (2019)
This thesis proposes novel methods to analyze and benchmark binary-classification performance evaluation instruments. It addresses critical problems found in the literature, clarifies terminology and distinguishes instruments as measure, metric, and as a new category indicator for the first time. The multi-perspective analysis introduces novel concepts such as canonical form, geometry, duality, complementation, dependency, and leveling with formal definitions as well as two new basic instruments. An indicat...
Some Inequalities Between Pairs of Marginal and Joint Bayesian Lower Bounds
Bacharach, Lucien; Chaumette, Eric; Fritsche, Carsten; Orguner, Umut (2019-01-01)
In this paper, tightness relations (or inequalities) between Bayesian lower bounds (BLBs) on the mean-squared-error are derived which result from the marginalization of a joint probability density function (pdf) depending on both parameters of interest and extraneous or nuisance parameters. In particular, it is shown that for a large class of BLBs, the BLB derived from the marginal pdf is at least as tight as the corresponding BLB derived from the joint pdf. A Bayesian linear regression example is used to i...
Stochastic modelling of biochemical networks and inference of modelparameters
Purutçuoğlu Gazi, Vilda (null, Springer, 2018-01-01)
There are many approaches to model the biochemical systems deterministically or stochastically. In deterministic approaches, we aim to describe the steady-state behaviours of the system, whereas, under stochastic models, we present the random nature of the system, for instance, during transcription or translation processes. Here, we represent the stochastic modelling approaches of biological networks and explain in details the inference of the model parameters within the Bayesian framework.
Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm
Kaya, Semih; Vural, Elif (2021-01-01)
While many approaches exist in the literature to learn low-dimensional representations for data collections in multiple modalities, the generalizability of multi-modal nonlinear embeddings to previously unseen data is a rather overlooked subject. In this work, we first present a theoretical analysis of learning multi-modal nonlinear embeddings in a supervised setting. Our performance bounds indicate that for successful generalization in multi-modal classification and retrieval problems, the regularity of th...
Citation Formats
G. Canbek, T. Taşkaya Temizel, and Ş. SAĞIROĞLU, “PToPI: A Comprehensive Review, Analysis, and Knowledge Representation of Binary Classification Performance Measures/Metrics,” SN Computer Science, vol. 4, no. 1, pp. 0–0, 2023, Accessed: 00, 2023. [Online]. Available: