Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation

Download
2009-01-01
Bolelli, Levent
Ertekin Bolelli, Şeyda
Giles, C. Lee
Algorithms that enable the process of automatically mining distinct topics in document collections have become, increasingly important clue to their applications in many fields and the extensive growth of the number of documents in various domains. In this paper, we propose a generative model based on latent Dirichlet allocation that integrates the temporal ordering of the documents into the generative process in an iterative fashion. The document collection is divided into time segments where the, discovered topics in each segment, is propagated to influence the, topic discovery in the subsequent time segments. Our experimental results on a collection of academic papers from CiteSeer repository show that, segmented topic model call effectively detect, distinct; topics and their evolution over time.

Suggestions

Topic and trend detection in text collections using latent dirichlet allocation
Bolelli, Levent; Ertekin Bolelli, Şeyda; Giles, C Lee ( Springer, 2009-01-01)
Algorithms that enable the process of automatically mining distinct topics in document collections have become increasingly important due to their applications in many fields and the extensive growth of the number of documents in various domains. In this paper, we propose a generative model based on latent Dirichlet allocation that integrates the temporal ordering of the documents into the generative process in an iterative fashion. The document collection is divided into time segments where the discovered ...
APPLICATION OF TEXT MINING TO TECHNOLOGY MANAGEMENT DOMAIN TO EXTRACT TOPICS AND TRENDS
Tekin, Yaşar; Karagöz, Pınar; Department of Science and Technology Policy Studies (2022-1-17)
Topic modeling is a widely used technique to extract latent topics from large document collections. One of the most remarkable uses of it is its application to scientific fields. If topic modeling is applied to all articles published in a specific scientific field, it provides an overall view of topics and trends for the time period under consideration. If it is applied to a single conference or journal, it reveals differences from global trends. The most popular method used for topic modeling is Latent Dir...
CLUSTER STABILITY ESTIMATION BASED ON A MINIMAL SPANNING TREES APPROACH
Volkovich, Zeev (Vladimir); Barzily, Zeev; Weber, Gerhard Wilhelm; Toledano-Kitai, Dvora (2009-06-03)
Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stabil...
Concept discovery on relational databases: New techniques for search space pruning and rule quality improvement
Kavurucu, Yusuf; Karagöz, Pınar; Toroslu, İsmail Hakkı (Elsevier BV, 2010-12-01)
Multi-relational data mining has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. Several relational knowledge discovery systems have been developed employing various search strategies, heuristics, language pattern limitations and hypothesis evaluation criteria, in order to cope with intractably large search space and to be able to generate high-quality patterns. In this work, we introduce an ILP-based c...
Clustering of manifold-modeled data based on tangent space variations
Gökdoğan, Gökhan; Vural, Elif; Department of Electrical and Electronics Engineering (2017)
An important research topic of the recent years has been to understand and analyze data collections for clustering and classification applications. In many data analysis problems, the data sets at hand have an intrinsically low-dimensional structure and admit a manifold model. Most state-of-the-art clustering methods developed for data of non-linear and low-dimensional structure are based on local linearity assumptions. However, clustering algorithms based on locally linear representations can tolerate diff...
Citation Formats
L. Bolelli, Ş. Ertekin Bolelli, and C. L. Giles, “Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation,” 2009, vol. 5478, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/48299.