Fostering Undergraduate Data Science

2020-01-01
Data Science is one of the newest interdisciplinary areas. It is transforming our lives unexpectedly fast. This transformation is also happening in our learning styles and practicing habits. We advocate an approach to data science training that uses several types of computational tools, including R, bash, awk, regular expressions, SQL, and XPath, often used in tandem. We discuss ways for undergraduate mentees to learn about data science topics, at an early point in their training. We give some intuition for researchers, professors, and practitioners about how to effectively embed real-life examples into data science learning environments. As a result, we have a unified program built on a foundation of team-oriented, data-driven projects.
AMERICAN STATISTICIAN

Suggestions

Non-subjective priors for wrapped Cauchy distributions
Ghosh, Malay; Zhong, Xiaolong; SenGupta, Ashis; Zhang, Ruoyang (Elsevier BV, 2019-10-01)
Circular data can arise from many sources, such as image processing, protein structure, and geological data, just to name a few. Wrapped stable family of distributions constitute one of the most widely used class of distributions for the analysis of such data. Wrapped Cauchy distribution is a member of this family and it is the only one known to have a single term explicit pdf compared to the infinite series representations for all the others. We develop in this paper reference priors and probability matchi...
Extended lasso-type MARS (LMARS) model in the description of biological network
Agraz, Melih; Purutçuoğlu Gazi, Vilda (Informa UK Limited, 2019-01-02)
The multivariate adaptive regression splines (MARS) model is one of the well-known, additive non-parametric models that can deal with highly correlated and nonlinear datasets successfully. From our previous analyses, we have seen that lasso-type MARS (LMARS) can be a strong alternative of the Gaussian graphical model (GGM) which is a well-known probabilistic method to describe the steady-state behaviour of the complex biological systems via the lasso regression. In this study, we extend our original LMARS m...
Mutual information model selection algorithm for time series
Akca, Elif; Yozgatlıgil, Ceylan (Informa UK Limited, 2020-09-01)
Time series model selection has been widely studied in recent years. It is of importance to select the best model among candidate models proposed for a series in terms of explaining the procedure that governs the series and providing the most accurate forecast for the future observations. In this study, it is aimed to create an algorithm for order selection in Box-Jenkins models that combines penalized natural logarithm of mutual information among the original series and predictions coming from each candida...
A new outlier detection method based on convex optimization: application to diagnosis of Parkinson's disease
TAYLAN, PAKİZE; Yerlikaya-Ozkurt, Fatma; Bilgic Ucak, Burcu; Weber, Gerhard Wilhelm (Informa UK Limited, 2020-12-01)
Neuroscience is a combination of different scientific disciplines which investigate the nervous system for understanding of the biological basis. Recently, applications to the diagnosis of neurodegenerative diseases like Parkinson's disease have become very promising by considering different statistical regression models. However, well-known statistical regression models may give misleading results for the diagnosis of the neurodegenerative diseases when experimental data contain outlier observations that l...
MARS as an alternative approach of Gaussian graphical model for biochemical networks
AYYILDIZ DEMİRCİ, EZGİ; Agraz, Melih; Purutçuoğlu Gazi, Vilda (Informa UK Limited, 2017-01-01)
The Gaussian graphical model (GGM) is one of the well-known modelling approaches to describe biological networks under the steady-state condition via the precision matrix of data. In literature there are different methods to infer model parameters based on GGM. The neighbourhood selection with the lasso regression and the graphical lasso method are the most common techniques among these alternative estimation methods. But they can be computationally demanding when the system's dimension increases. Here, we ...
Citation Formats
F. Gökalp Yavuz, “Fostering Undergraduate Data Science,” AMERICAN STATISTICIAN, pp. 8–16, 2020, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/43893.