Learning to Generate Unambiguous Spatial Referring Expressions for Real-World Environments

Dagan, Fethiye Irmak
Kalkan, Sinan
Leite, Iolanda
Referring to objects in a natural and unambiguous manner is crucial for effective human-robot interaction. Previous research on learning-based referring expressions has focused primarily on comprehension tasks, while generating referring expressions is still mostly limited to rule-based methods. In this work, we propose a two-stage approach that relies on deep learning for estimating spatial relations to describe an object naturally and unambiguously with a referring expression. We compare our method to the state of the art algorithm in ambiguous environments (e.g., environments that include very similar objects with similar relationships). We show that our method generates referring expressions that people find to be more accurate (similar to 30% better) and would prefer to use (similar to 32% more often).


An Eye Tracking Analysis of Conversational Violations in Dyadic and Collaborative Interaction
Cagiltay, Bengisu; Acartürk, Cengiz (2022-01-01)
Linguistic principles are crucial in maintaining reliable and transparent communication for dyadic interactions. However, violating these principles might result in unwieldy and problematic communications. We use gaze as a medium to explore how visual attention and task performance changes when conversational violations occur. We conducted an eye-tracking study (N = 17) measuring changes in visual patterns in response to social communication errors, specifically Grice's Maxims violations. Our study investig...
Unsupervised Learning of Affordance Relations on a Humanoid Robot
Akgun, Baris; Dag, Nilguen; Bilal, Tahir; Atil, Ilkay; Şahin, Erol (2009-09-16)
In this paper, we study how the concepts learned by a robot can be linked to verbal concepts that humans use in language. Specifically, we develop a simple tapping behaviour on the iCub humanoid robot simulator and allow the robot to interact with a set of objects of different types and sizes to learn affordance relations in its environment. The robot records its perception, obtained from a range camera, as a feature vector, before and after applying tapping on an object. We compute effect features by subtr...
Automatic sense prediction of implicit discourse relations in Turkish
Kurfalı, Murathan; Zeyrek Bozşahin, Deniz; Department of Cognitive Sciences (2016)
In discourse parsing, the sense prediction of the Implicit discourse relations poses the most significant challenge. The thesis aims to develop a supervised system to predict the sense of implicit discourse relations in Turkish Discourse Bank (TDB). In order to accomplish that goal, the discourse level annotations obtained from TDB are used. TDB follows the PDTB-2’s sense hierarchy and for all experiments within the current study, only CLASS senses are considered. As the primary experiment, the classifiers ...
Using constrained intuitionistic linear logic for hybrid robotic planning problems
Saranlı, Uluç (2007-04-14)
Synthesis of robot behaviors towards nontrivial goals often requires reasoning about both discrete and continuous aspects of the underlying domain. Existing approaches in building automated tools for such synthesis problems attempt to augment methods from either discrete planning or continuous control with hybrid elements, but largely fail to ensure a uniform treatment of both aspects of the domain. In this paper, we present a new formalism, Constrained Intuitionistic Linear Logic (CILL), merging continuous...
Novel approach to emotion recognition in voice: a convolutional neural network approach and grad-cam generation
Canpolat, Salih Fıra; Zeyrek Bozşahin, Deniz; Department of Cognitive Sciences (2019)
Emotion is one of the essential components in human and human-machine interaction. One of the most common communication channels is the sound. Understanding the underlying mechanisms of emotion recognition in the sound signal is an essential step in improving both types of interaction. For this purpose, we developed an emotion recognition model, and a Turkish-specific database, referred to as the Turkish Emotion-Voice (TurEV) database. The database contains one-word-vocalizations of four emotion types; angr...
Citation Formats
F. I. Dagan, S. Kalkan, and I. Leite, “Learning to Generate Unambiguous Spatial Referring Expressions for Real-World Environments,” 2019, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/49001.