Project Ideas

Vague Spatial Descriptions in GISs

A core challenge within artificial intelligence is the so-called symbol grounding problem : How are symbolic statements of a logical/linguistic nature related to actual physical descriptions? This thesis addresses a species of this problem. Specifically it explores machine learning approaches to setting numerical parameters in logic programs so as to ground vague descriptions (e.g. "the car is next to the house", "Subway is across from Åhlens", etc.). I have developed and documented an initial approach that is both tractable and context-dependent. Your task will be to extend a prototype implementation and integrate it with PostGIS followed by a performance evaluation. During this project you will develop your GIS skills as well as your understanding of machine learning. 30 ECTS.


Obtaining NL/MRL Corpora for Machine Learning of Natural Language Interfaces

Because natural language interfaces (NLIs) are so difficult to build, there has been great interest in learning such interfaces from corpora of natural language (NL) expressions paired with meaning representation language (MRL) expressions. While such approaches have shown great promise, a tricky conundrum is how to obtain large high-quality corpora in the first place. This thesis will primarily be a literature study in which all of the prominent approaches to obtaining NL/MRL corpora will be documented and analyzed. Following this, time perming, the student will propose, implement and evaluate their own corpora tool. In this thesis you will extend your understanding of machine learning and computational linguistics. 15 ECTS or 30 ECTS.


Biasing Random MRL Expression Generators

In many applications ranging from data mining to search, there is a need to generate random meaning representation language (MRL) expressions (e.g. first-order logic, database query languages, etc). While a simple weighted grammar might seem to suffice, there are two issues that complicate the matter: the first is that valid MRL expressions often include variable and type restrictions so that their specifications are beyond context-free; The second issue is that often we wish to introduce systematic bias in the random generation process based on feedback of the suitability of expressions. This thesis will be of a formal nature and will survey and analyze various approaches to this problem from an algorithmic perspective. In this thesis you will develop a better understanding of formal language theory as well as machine learning. 30 ECTS.


Building a Speech Interface to Ultra's Bus Schedules

It certainly would be nice to have answers to speech questions like "next bus downtown" when we are standing in the cold at Universum's bus stop. In this thesis you will determine how feasible such a location-based, speech-based interface is for current generation mobile phones. We have collected a corpora of common transportation questions posed to Ultra and have defined and partly populated a spatio-temporal database of bus positions. In this thesis you will first prepare a robust interface to this database to cover the corpus using C-Phrase and then, as your major effort, you will integrate speech recognition and text-to-speech to support speech access to the bus database. Once finished you will evaluate this system under a range of conditions. Given a good result, it may be possible to pursue the actual deployment of your system with Ultra. In this thesis you will learn about spatio-temporal databases, mobile devices, NLIs and speech technology. Given the significant system related aspects of this project, I am open to teams of two students taking this thesis. 2 * 30 ECTS.