Keyword Search on Spatial Network Databases: Road network indexing for efficient query processing
MetadataShow full item record
Given a spatial location and a set of keywords, a spatial keyword query locates spatio-textual objects based on both the location of the objects, and the textual relevance of the query keywords to the description of the objects. Spatial keyword queries can be used to answer challenging questions such as finding the nearest spatio-textual object relevant for the query keywords "restaurant sushi".Our focus in this project is on a new type of spatial keyword query that takes a road network into account during query processing. These queries are based on the fact that the distance between two objects in the real-world are constrained by the pre-defined paths that comprise the road network. Different from traditional spatial keyword queries that employ the Euclidean distance, the spatial keyword query on road networks assumes the shortest path between the query location and the objects. Unfortunately, no approach currently exists that supports processing of spatial keyword queries on road networks.In this thesis we address the challenging problem of locating spatio-textual objects in a road network given a spatial location and a set of keywords. We first propose a baseline framework that combines existing state-of-the-art approaches to support processing of keyword-based spatial queries such as range and k-nearest neighbour on road networks. Then, we present a novel framework termed Road Network Indexing (RNI) that permits efficient processing of such queries by indexing the spatio-textual objects in each road segment using inverted files.Moreover, we present algorithms to evaluate keyword k-nearest neighbour and keyword range queries on both the baseline and the RNI framework.Finally, we show through an experimental evaluation using real-world datasets, that our RNI framework performs nearest neighbour queries on road networks in around one order of magnitude faster than the baseline approach in terms of response time and I/O.