APPLICATION OF TEXT MINING TO TECHNOLOGY MANAGEMENT DOMAIN TO EXTRACT TOPICS AND TRENDS

Download
2022-1-17
Tekin, Yaşar
Topic modeling is a widely used technique to extract latent topics from large document collections. One of the most remarkable uses of it is its application to scientific fields. If topic modeling is applied to all articles published in a specific scientific field, it provides an overall view of topics and trends for the time period under consideration. If it is applied to a single conference or journal, it reveals differences from global trends. The most popular method used for topic modeling is Latent Dirichlet Allocation (LDA). Although LDA is used in many different fields, the problems of how to optimize model parameters and how to eliminate topic instability have not been fully solved yet. This thesis consists of two main parts: 1) An empirical investigation is conducted: a) to investigate the level of topic instability in ordered documents, b) to search for methods to eliminate (if not possible, to alleviate) the effects of the topic instability, c) to evaluate the use of word vector representations to optimize LDA parameters. It is found out that: a) the level of instability is high even in ordered documents, b) average scores of replicated topic models can be used to alleviate the effects of topic instability, c) Skip-gram similarity score is an acceptable measure in optimizing LDA parameters. 2) By using the method proposed, topic modeling is applied to Technology Management (TM) domain. Top topics, the most studied industries, the most used methods and surprising topics of TM literature are identified.

Suggestions

Comparison of feature-based and image registration-based retrieval of image data using multidimensional data access methods
Arslan, Serdar; Yazıcı, Adnan; Sacan, Ahmet; Toroslu, İsmail Hakkı; Acar, Esra (Elsevier BV, 2013-07-01)
In information retrieval, efficient similarity search in multimedia collections is a critical task In this paper, we present a rigorous comparison of three different approaches to the image retrieval problem, including cluster-based indexing, distance-based indexing, and multidimensional scaling methods. The time and accuracy trade-offs for each of these methods are demonstrated on three different image data sets. Similarity of images is obtained either by a feature-based similarity measure using four MPEG-...
Gibbs Sampling in Inference of Copula Gaussian Graphical Model Adapted to Biological Networks
Purutçuoğlu Gazi, Vilda (2017-09-01)
Markov chain Monte Carlo methods (MCMC) are iterative algorithms that are used in many Bayesian simulation studies, where the inference cannot be easily obtained directly through the defined model. Reversible jump MCMC methods belong to a special type of MCMC methods, in which the dimension of parameters can change in each iteration. In this study, we suggest Gibbs sampling in place of RJMCMC, to decrease the computational demand of the calculation of high dimensional systems. We evaluate the performance of...
Abstract or Full-text in Topic Modeling? Konu Modellemede Özet mi Tam Metin mi?
Tekin, Yasar; Coşar, Ahmet (2022-01-01)
Topic modeling is a text mining technique used for automatic extraction of topics addressed in document collections. Although there are different topic models proposed by researchers, the most preferred one is Latent Dirichlet Allocation (LDA). Despite such widespread use, uncertainties about LDA have not been fully resolved yet. In this study, the effect of using abstracts or full-text articles on LDA model parameters is investigated. For this purpose, LDA parameters are optimized on abstracts and full-tex...
Topic-centric querying of web information resources
Altıngövde, İsmail Sengör; Ulusoy, O; Ozsoyoglu, G; Ozsoyoglu, ZM (2001-01-01)
This paper deals with the problem of modeling web information resources using expert knowledge and personalized user information, and querying them in terms of topics and topic relationships. We propose a model for web information resources, and a query language SQL-TC (Topic-Centric SQL) to query the model. The model is composed of web-based information resources (XML or HTML documents on the web), expert advice repositories (domain-expert-specified metadata for information resources), and personalized inf...
Comparison of multidimensional data access methods for feature-based image retrieval
Arslan, Serdar; Saçan, Ahmet; Açar, Esra; Toroslu, İsmail Hakkı; Yazıcı, Adnan (2010-11-18)
Within the scope of information retrieval, efficient similarity search in large document or multimedia collections is a critical task. In this paper, we present a rigorous comparison of three different approaches to the image retrieval problem, including cluster-based indexing, distance-based indexing, and multidimensional scaling methods. The time and accuracy tradeoffs for each of these methods are demonstrated on a large Corel image database. Similarity of images is obtained via a featurebased similarity...
Citation Formats
Y. Tekin, “APPLICATION OF TEXT MINING TO TECHNOLOGY MANAGEMENT DOMAIN TO EXTRACT TOPICS AND TRENDS,” Ph.D. - Doctoral Program, Middle East Technical University, 2022.