ESTRA: An easy streaming data analysis tool

Download

12626137.pdf

Date

2021-2-28

Author

Savaş Başak, Ecehan

Metadata

Show full item record

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Item Usage Stats

376
views

892
downloads

Easy Streaming Data Analysis Tool (ESTRA) is designed with the aim of creating an easy-to-use data stream analysis platform that serves the purpose of a quick and efficient tool to explore and prototype machine learning solutions on various datasets. ESTRA is developed as a web-based, scalable, extensible, and open-source data analysis tool with a user-friendly and easy to use user interface. ESTRA comes with a bundle of datasets (Electricity, KDD Cup’99, and Covertype), dataset generators (Sea and Hyperplane), and implementations of various analysis and learning algorithms (D3, Hoeffding Tree, CluStream, DenStream, kNN, k-means, and StreamKM++). Moreover, ESTRA provides an easy way to investigate various properties of the datasets and to observe the results of executed machine learning algorithms. ESTRA’s straightforward and clean architecture with open source tools allows it to be extensible. Used libraries and frameworks in ESTRA like React, Python and ScikitMultiflow are popular open source tools with broad community support and extensions. ESTRA’s capabilities of easy prototyping and exploring machine learning solutions are demonstrated by repeating the machine learning experiments performed in various studies.

Subject Keywords

Data streaming, Data analysis, Real-Time Data Analysis, Data Stream Analysis, Data Stream Analysis Tool

URI

https://hdl.handle.net/11511/89668

Collections

Graduate School of Natural and Applied Sciences, Thesis

Suggestions

OpenMETU
Core

Learning to rank web data using multivariate adaptive regression splines Altınok, Gülşah; Batmaz, İnci; Karagöz, Pınar; Department of Statistics (2018) A new trend, called learning to rank, has recently come to light in a wide variety of applications in Information Retrieval (IR), Natural Language Processing (NLP), and Data Mining (DM), to utilize machine learning techniques to automatically build the ranking models. Typical applications are document retrieval, expert search, definition search, collaborative filtering, question answering, and machine translation. In IR, there are three approaches used for ranking. The one is traditional model approaches su...
Data science technology selection: development of a decision-making approach Nazlıel, Kerem; Eren, Pekin Erhan; Kayabay, Kerem; Department of Information Systems (2022-12-29) Developments in IT, Cloud, Analytics, and related fields have created an abundance of Data Science technologies for practitioners, developers, and organizations to use. This abundance and variety complicate the Data Science technology selection and management processes for the analytics teams. When teams select and use improper tools and technologies, they encounter problems and inefficiencies, also known as technical debt. As a remedy, this thesis proposes a systematic technology selection method consideri...
Data mining analysis of economic indicators of countries Güngör, Erdem; Yozgatlıgil, Ceylan; Department of Statistics (2020-8) Data Mining is becoming a famous analysis day by day to reveal the hidden information within big data. In the study, we use data mining techniques on the economic indicators of the countries. The four data mining techniques are to be implemented on the dataset. Making homogenous groups of the countries whose economic characteristics are similar are obtained by the Clustering Algorithm. After the clustering algorithm is performed, we pass to Association Rule Data Mining to investigate the most exported produ...
Using data analytics for collaboration patterns in distributed software team simulations Dafoulas, Georgios A.; Serce, Fatma C.; SWİGGER, Kathleen; BRAZİLE, Robert; Alpaslan, Ferda Nur; Alpaslan, Ferda Nur; Milewski, Allen (2016-08-05) This paper discusses how previous work on global software development learning teams is extended with the introduction of data analytics. The work is based on several years of studying student teams working in distributed software team simulations. The scope of this paper is twofold. First it demonstrates how data analytics can be used for the analysis of collaboration between members of distributed software teams. Second it describes the development of a dashboard to be used for the visualization of variou...
Mask Combination of Multi-Layer Graphs for Global Structure Inference Bayram, Eda; Thanou, Dorina; Vural, Elif; Frossard, Pascal (2020-01-01) Structure inference is an important task for network data processing and analysis in data science. In recent years, quite a few approaches have been developed to learn the graph structure underlying a set of observations captured in a data space. Although real-world data is often acquired in settings where relationships are influenced by a priori known rules, such domain knowledge is still not well exploited in structure inference problems. In this paper, we identify the structure of signals defined in a da...

Citation Formats

E. Savaş Başak, “ESTRA: An easy streaming data analysis tool,” M.S. - Master of Science, Middle East Technical University, 2021.