A Database query based solution for chemical compound and drug name recognition

Download
2014
Ata, Çağlar
Searching structured information in unstructured free text is one of the most difficult challenges in computer science. Relevant information from documents has to be ready for use not only with accurate precision but also be ready in a fast manner. Although numerous studies on document searching has been published, only few of them specifically target chemical compound and drug names. Chemical compound and drug names have specific morphological properties. These unique morphological properties have to be examined before developing automatic text searching methods. These properties should also be integrated into chemical compound and drug name retrieval systems. In this thesis, we focus on named entity recognition problem with a newly proposed method on chemical compound and drug name recognition model using queries on a very domain specific database. PubChem Power User Gateway (PUG) system is used as the main database for this specific domain to demonstrate the method. Chemical compound and drug name grammar and morphological properties are used as base for constructing the model. These features are deeply examined and used for optimizing the queries and increase the recall with precision on finding relevant chemical compound and drug names in documents. This new proposed method also presents a unique chemical compound and drug name tokenizer designed for specifically tokenizing chemical words in an article. The proposed method is applied on significant amount of chemical compound and drug name containing documents. Results of our proposed method are compared against the state of the art methods that target the same problem.

Suggestions

A new hybrid multi-relational data mining technique
Toprak, Seda Dağlar; Toroslu, İ. Hakkı; Department of Computer Engineering (2005)
Multi-relational learning has become popular due to the limitations of propositional problem definition in structured domains and the tendency of storing data in relational databases. As patterns involve multiple relations, the search space of possible hypotheses becomes intractably complex. Many relational knowledge discovery systems have been developed employing various search strategies, search heuristics and pattern language limitations in order to cope with the complexity of hypothesis space. In this w...
Improving the efficiency of distributed information retrieval using hybrid index partitioning
Hafızoğlu, Fatih; Altıngövde, İsmail Sengör; Department of Computer Engineering (2018)
Selective search with traditional partitioning have advantages over exhaustive search in terms of total query cost. However, it can suffer from query latency and load imbalance for most of the time due to its nature. To overcome these issues, we proposed a new partitioning method in this thesis, namely Hybrid partitioning. Our studies shows that it is possible to obtain significant savings in query latency with this new partitioning methodology. In addition to that, query processing with Hybrid partitioning...
A transcoding robust data hiding method for image communication applications
Candan, Çağatay (2005-09-14)
We present a data embedding method for image communication applications. Our goal is to implement novel multimedia applications such as multi-language captions, interactive programming and title specific features over the existing image communication channel. To this aim, we present a data embedding method for JPEG images which has the desired degree of robustness to transcoding or bitrate adjustments that may take place in the communication channel. The described system is designed for JPEG images but can ...
A generalized expert system for database design
Doğaç, Asuman; Yürüten, Betigül; Spaccapietra, Stefano (Institute of Electrical and Electronics Engineers (IEEE), 1989-4)
Generalized Expert System for Database Design (GESDD) is a compound expert system made up of two parts: (1) an expert system for generating methodologies for database design, called ESGM; and (2) an expert system for database design, called ESDD. ESGM provides a tool for the database design expert to specify different design methodologies or to modify existing ones. The database designer uses ESDD in conjunction with one of these methodologies to design a database starting from the requirement specification...
An interdisciplinary heuristic evaluation method for universal building design
AFACAN, YASEMİN; Erbuğ, Çiğdem (Elsevier BV, 2009-07-01)
This study highlights how heuristic evaluation as a usability evaluation method can feed into Current building design practice to conform to universal design principles. It provides a definition of universal usability that is applicable to an architectural design context. It takes the seven universal design principles as a set of heuristics and applies an iterative sequence of heuristic evaluation in a shopping mall, aiming to achieve a cost-effective evaluation process. The evaluation was composed of three...
Citation Formats
Ç. Ata, “A Database query based solution for chemical compound and drug name recognition,” M.S. - Master of Science, Middle East Technical University, 2014.