Database Content Exploration and Exploratory Analysis of User Queries
MetadataVis full innførsel
Content providers, such as enterprises and organizations who publish their content on the Internet, aim at making their content visible and easily accessible to the users. The vast amount of data contained in databases impedes their e orts, as users often nd it challenging to navigate through the available data and nd the items that best suit their needs. It is therefore necessary for content providers to motivate users to explore the available data and assist them in nding items that are interesting to them. State-of-the-art approaches such as top-k queries are not appropriate for data exploration as they require the users to be aware of the database structure and the content they are exploring. In this thesis, we study the problem of enhancing the visibility of database content through exploratory search and analysis. We propose exploratory algorithms that return to the user a small number of results, which at the same time provide a wide overview of the available content. In addition, we present algorithms that identify items that are appealing to users and can be exploited for o ering users an insight of the available items and motivating them to explore the database. In particular, the main contributions of the thesis are: We develop a framework for organizing and summarizing keyword search results based on their textual content and temporal data. We introduce a new type of query, the eXploratory Top-k Join (XTJk) query, which creates object combinations that are better suited to user preferences than single objects, and we present algorithms for the e cient processing of XTJk queries. We introduce the continuous in uential query, which returns objects that are continuously attractive to a large number of users for long periods, and we present algorithms for the e cient retrieval of continuous in uential objects. We model the diversity of database objects based on user preferences, and we propose e cient algorithms for selecting products that are attractive to a wide range of users with diverse preferences. We describe the Best-terms problem which is the problem of increasing the rank of a spatio-textual object through the enhancement of its textual description. We show that the problem is NP-hard and we present approximate algorithms that retrieve high quality results. The proposed approaches have been evaluated through extensive experimental evaluation. The experiments were conducted using both synthetic and real datasets and demonstrate the e ciency of the proposed methods.