Binary Classification Performance Measures/Metrics: A Comprehensive Visualized Roadmap to Gain New Insights

2017-10-08
Canbek, Gurol
SAĞIROĞLU, Şeref
Taşkaya Temizel, Tuğba
Baykal, Nazife
Binary classification is one of the most frequent studies in applied machine learning problems in various domains, from medicine to biology to meteorology to malware analysis. Many researchers use some performance metrics in their classification studies to report their success. However, the literature has shown a widespread confusion about the terminology and ignorance of the fundamental aspects behind metrics. This paper clarifies the confusing terminology, suggests formal rules to distinguish between measures and metrics for the first time, and proposes a new comprehensive visualized roadmap in a leveled structure for 22 measures and 22 metrics for exploring binary classification performance. Additionally, we introduced novel concepts such as canonical notation, duality, and complementation for measures/metrics, and suggested two new canonical base measures simplifying equations. It is expected that the study will guide other studies to have standardized approach to performance metrics for machine learning based solutions.

Suggestions

Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques
Deniz, Ayca; Kiziloz, Hakan Ezgi; Dokeroglu, Tansel; Coşar, Ahmet (2017-06-07)
This study investigates the success of a multiobjective genetic algorithm (GA) combined with state-of-the-art machine learning (ML) techniques for the feature subset selection (FSS) in binary classification problem (BCP). Recent studies have focused on improving the accuracy of BCP by including all of the features, neglecting to determine the best performing subset of features. However, for some problems, the number of features may reach thousands, which will cause too much computation power to be consumed ...
HYPERSPECTRAL CLASSIFICATION USING STACKED AUTOENCODERS WITH DEEP LEARNING
Özdemir, Ataman; Cetin, C. Yasemin Yardimci (2014-06-27)
In this study, stacked autoencoders which are widely utilized in deep learning research are applied to remote sensing domain for hyperspectral classification. High dimensional hyperspectral data is an excellent candidate for deep learning methods. However, there are no works in literature that focuses on such deep learning approaches for hyperspectral imagery. This study aims to fill this gap by utilizing stacked autoencoders. Experiments are conducted on the Pavia University scene. Using stacked autoencode...
Mesh Learning for Object Classification using fMRI Measurements
Ekmekci, Ömer; Ozay, Mete; Oztekin, Ilke; GİLLAM, İLKE; Oztekin, Uygar (2013-09-18)
Machine learning algorithms have been widely used as reliable methods for modeling and classifying cognitive processes using functional Magnetic Resonance Imaging (fMRI) data. In this study, we aim to classify fMRI measurements recorded during an object recognition experiment. Previous studies focus on Multi Voxel Pattern Analysis (MVPA) which feeds a set of active voxels in a concatenated vector form to a machine learning algorithm to train and classify the cognitive processes. In most of the MVPA methods,...
GPCRsort-Responding to the Next Generation Sequencing Data Challenge: Prediction of G Protein-Coupled Receptor Classes Using Only Structural Region Lengths
Sahın, Mehmet Emre; Can, Tolga; Son, Çağdaş Devrim (2014-10-01)
Next generation sequencing (NGS) and the attendant data deluge are increasingly impacting molecular life sciences research. Chief among the challenges and opportunities is to enhance our ability to classify molecular target data into meaningful and cohesive systematic nomenclature. In this vein, the G protein-coupled receptors (GPCRs) are the largest and most divergent receptor family that plays a crucial role in a host of pathophysiological pathways. For the pharmaceutical industry, GPCRs are a major drug ...
Computational representation of protein sequences for homology detection and classification
Oğul, Hasan; Mumcuoğlu, Ünal Erkan; Department of Information Systems (2006)
Machine learning techniques have been widely used for classification problems in computational biology. They require that the input must be a collection of fixedlength feature vectors. Since proteins are of varying lengths, there is a need for a means of representing protein sequences by a fixed-number of features. This thesis introduces three novel methods for this purpose: n-peptide compositions with reduced alphabets, pairwise similarity scores by maximal unique matches, and pairwise similarity scores by...
Citation Formats
G. Canbek, Ş. SAĞIROĞLU, T. Taşkaya Temizel, and N. Baykal, “Binary Classification Performance Measures/Metrics: A Comprehensive Visualized Roadmap to Gain New Insights,” 2017, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/54269.