Topic and trend detection in text collections using latent dirichlet allocation

2009-01-01
Bolelli, Levent
Ertekin Bolelli, Şeyda
Giles, C Lee
Algorithms that enable the process of automatically mining distinct topics in document collections have become increasingly important due to their applications in many fields and the extensive growth of the number of documents in various domains. In this paper, we propose a generative model based on latent Dirichlet allocation that integrates the temporal ordering of the documents into the generative process in an iterative fashion. The document collection is divided into time segments where the discovered topics in each segment is propagated to influence the topic discovery in the subsequent time segments. Our experimental results on a collection of academic papers from CiteSeer repository show that segmented topic model can effectively detect distinct topics and their evolution over time.

Suggestions

Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation
Bolelli, Levent; Ertekin Bolelli, Şeyda; Giles, C. Lee (2009-01-01)
Algorithms that enable the process of automatically mining distinct topics in document collections have become, increasingly important clue to their applications in many fields and the extensive growth of the number of documents in various domains. In this paper, we propose a generative model based on latent Dirichlet allocation that integrates the temporal ordering of the documents into the generative process in an iterative fashion. The document collection is divided into time segments where the, discover...
Concept discovery on relational databases: New techniques for search space pruning and rule quality improvement
Kavurucu, Yusuf; Karagöz, Pınar; Toroslu, İsmail Hakkı (Elsevier BV, 2010-12-01)
Multi-relational data mining has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. Several relational knowledge discovery systems have been developed employing various search strategies, heuristics, language pattern limitations and hypothesis evaluation criteria, in order to cope with intractably large search space and to be able to generate high-quality patterns. In this work, we introduce an ILP-based c...
APPLICATION OF TEXT MINING TO TECHNOLOGY MANAGEMENT DOMAIN TO EXTRACT TOPICS AND TRENDS
Tekin, Yaşar; Karagöz, Pınar; Department of Science and Technology Policy Studies (2022-1-17)
Topic modeling is a widely used technique to extract latent topics from large document collections. One of the most remarkable uses of it is its application to scientific fields. If topic modeling is applied to all articles published in a specific scientific field, it provides an overall view of topics and trends for the time period under consideration. If it is applied to a single conference or journal, it reveals differences from global trends. The most popular method used for topic modeling is Latent Dir...
CLUSTER STABILITY ESTIMATION BASED ON A MINIMAL SPANNING TREES APPROACH
Volkovich, Zeev (Vladimir); Barzily, Zeev; Weber, Gerhard Wilhelm; Toledano-Kitai, Dvora (2009-06-03)
Among the areas of data and text mining which are employed today in science, economy and technology, clustering theory serves as a preprocessing step in the data analyzing. However, there are many open questions still waiting for a theoretical and practical treatment, e.g., the problem of determining the true number of clusters has not been satisfactorily solved. In the current paper, this problem is addressed by the cluster stability approach. For several possible numbers of clusters we estimate the stabil...
Confidence-based concept discovery in relational databases
Kavurucu, Yusuf; Karagöz, Pınar; Toroslu, İsmail Hakkı (2009-11-16)
Multi-relational data mining has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. Several relational knowledge discovery systems have been developed employing various search strategies, heuristics, language pattern limitations and hypothesis evaluation criteria, in order to cope with intractably large search space and to be able to generate high-quality patterns. In this work, we improve an ILP-based con...
Citation Formats
L. Bolelli, Ş. Ertekin Bolelli, and C. L. Giles, Topic and trend detection in text collections using latent dirichlet allocation. 2009, p. 780.