Identifying textual personal information using bidirectional LSTM networks

2018-07-09
Data-driven approaches based on the data collected from individuals are improving everyday life as a result of the developments in big data studies. Prior to developing such an approach, removal of personal information from the data is important since personal information contained in data would jeopardize people's privacy and may harm related individuals. Especially in the field of health sciences, identifying personal information in the collected data is a difficult task as most of the data collected in hospitals are in plain text format. In this work, a method for automatically identifying words which includes personal information is proposed. The proposed method uses natural language processing techniques and bi-directional long short term memory networks. Development of the proposed method is done by using a de-identification challenge dataset which is composed of discharge summaries of 889 patients. The proposed method in this study is able to identify words that include personal information from their surrounding words without using dictionaries such as name lists or city lists. The tests at the end of this study show that proposed method can identify words containing personal information with an accuracy of 99.43%.

Suggestions

Identifying textual personal information with artificial neural networks
Demir, Memduh Çağrı; Ertekin Bolelli, Şeyda; Department of Computer Engineering (2019)
Solutions to many natural language processing problems need language-specific labeled data to be learned. However, both the endeavor of compiling a new dataset in a new language and the practice of translating an existing dataset to another language require human expert effort which can not be automated. To learn a solution in a new target language in an automated manner without any extra data, we focus on the known problem of dialogue act classification and propose two solutions that combine existing dialo...
Modeling Human Activities via Long Short Term Memory Networks
Solmaz, Berkan; Karaman, Kaan (2019-01-01)
The presence of rapidly increasing visual data adds importance to the computer vision studies for automatic analysis and interpretation of content. Although the nervous and sensory systems in humans easily perform the processes such as understanding and recognizing activities that take place on a stage, these processes are among the most challenging research topics of computer vision. The activities vary according to the number of participants. For instance, a single person can perform activities consisting...
Using tag similarity in SVD-based recommendation systems
Osmanli, Osman Nuri; Toroslu, İsmail Hakkı (2011-12-01)
Data analysis has become a very important area for both companies and researchers as a consequence of the technological developments in recent years. Companies are trying to increase their profit by analyzing the existing data about their customers and making decisions for the future according to the results of these analyses. Parallel to the need of companies, researchers are investigating different methodologies to analyze data more accurately with high performance. In this paper, we adopted free-formatte...
Activity Learning from Lifelogging Images
Belli, Kader; Akbaş, Emre; Yazıcı, Adnan (2019-01-01)
The analytics of lifelogging has generated great interest for data scientists because big and multi-dimensional data are generated as a result of lifelogging activities. In this paper, the NTCIR Lifelog dataset is used to learn activities from an image point of view. Minute definitions are classified into activity classes using images and annotations, which serve as a basis for various classification techniques, namely SVMs and convolutional neural network structures (CNN), for learning activities. The perf...
Using data analytics for collaboration patterns in distributed software team simulations
Dafoulas, Georgios A.; Serce, Fatma C.; SWİGGER, Kathleen; BRAZİLE, Robert; Alpaslan, Ferda Nur; Alpaslan, Ferda Nur; Milewski, Allen (2016-08-05)
This paper discusses how previous work on global software development learning teams is extended with the introduction of data analytics. The work is based on several years of studying student teams working in distributed software team simulations. The scope of this paper is twofold. First it demonstrates how data analytics can be used for the analysis of collaboration between members of distributed software teams. Second it describes the development of a dashboard to be used for the visualization of variou...
Citation Formats
Ş. Ertekin Bolelli, “Identifying textual personal information using bidirectional LSTM networks,” 2018, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/37425.