Abstractive text summarization on WikiHow dataset using sentence embeddings

Download
2019
Tozyılmaz, Bahattin
Summarization is a well known natural language processing task that is used in our day-to-day lives. The field saw recent research using neural networks and word embeddings. We use WikiHow dataset and show that we can match performance of a similar model using sentence embeddings, and using abstractive summarization. We show that we can use sentence embeddings and lower input data size without impacting performance too much.

Suggestions

Diacritics correction in Turkish with context-aware sequence to sequence modeling
Köksal, Asiye Tuba; Bozal, Özge; Özge, Umut (2022-1-01)
Digital texts in many languages have examples of missing or misused diacritics which makes it hard for natural language processing applications to disambiguate the meaning of words. Therefore, diacritics restoration is a crucial step in natural language processing applications for many languages. In this study we approach this problem as bidirectional transformation of diacritical letters and their ASCII counterparts, rather than unidirectional diacritic restoration. We propose a context-aware character-lev...
Idioms as multi-word expressions in Turkish
Güven, Arzu Burcu; Bozşahin, Hüseyin Cem; Department of Cognitive Sciences (2020-10)
Idioms constitute several challenges for both Natural Language Processing (NLP) and linguistic analysis. A better understanding of idioms will yield valuable insights about natural language as well as the way it is processed. The relevance of idioms, along with the fact that Turkish is a rather unexplored language from this perspective, motivates us to work on Turkish idioms. Here, we aim to demonstrate a grammatical study on Turkish idioms that were selected in accordance with distributional models.
Evaluating cross-lingual textual similarity on dictionary alignment problem
Sever, Yiğit (Springer Science and Business Media LLC, 2020-06-01)
Bilingual or even polylingual word embeddings created many possibilities for tasks involving multiple languages. While some tasks like cross-lingual information retrieval aim to satisfy users' multilingual information needs, some enable transferring valuable information from resource-rich languages to resource-poor ones. In any case, it is important to build and evaluate methods that operate in a cross-lingual setting. In this paper, Wordnet definitions in 7 different languages are used to create a semantic...
On-demand conversation customization for services in large smart environments
Elgedawy, I. (IBM, 2011-01-01)
Services in large smart environments, as defined in this paper, are "aware" of their users' contexts and goals and are able to automatically interact with one another in order to achieve these goals. Unfortunately, interactions between services (i.e., service conversations) are not necessarily compatible, as services could have different interfaces (i.e., signature incompatibilities), as well as different logic for message ordering (i.e., protocol incompatibilities). Such conversation incompatibilities crea...
Message Scheduling for the FlexRay Protocol: The Static Segment
Schmidt, Klaus Verner; Schmidt, Şenan Ece (Institute of Electrical and Electronics Engineers (IEEE), 2009-06-01)
In recent years, time-triggered communication protocols have been developed to support time-critical applications for in-vehicle communication. In this respect, the FlexRay protocol is likely to become the de facto standard. In this paper, we investigate the scheduling problem of periodic signals in the static segment of FlexRay. We identify and solve two subproblems and introduce associated performance metrics: 1) The signals have to be packed into equal-size messages to obey the restrictions of the FlexRa...
Citation Formats
B. Tozyılmaz, “Abstractive text summarization on WikiHow dataset using sentence embeddings,” Thesis (M.S.) -- Graduate School of Natural and Applied Sciences. Computer Engineering., Middle East Technical University, 2019.