SemSearch 2010 : International Semantic Search workshop at WWW 2010


When Apr 26, 2010 - Apr 26, 2010
Where Raleigh, USA
Submission Deadline Mar 6, 2010
Notification Due Mar 28, 2010
Final Version Due Apr 6, 2010
Call For Papers

In recent years we have witnessed tremendous interest and substantial economic exploitation of search technologies, both at web and enterprise scale. However, the representation of user queries and resource content in existing search appliances is still almost exclusively achieved by simple syntax-based descriptions of the resource content and the information need such as in the predominant keyword-centric paradigm (i.e. keyword queries matched against bag-of-words document representation).

On the other hand, recent advances in the field of semantic technologies have resulted in tools and standards that allow for the articulation of domain knowledge in a formal manner at a high level of expressivity. At the same time, semantic repositories and reasoning engines have only now advanced to a state where querying and processing of this knowledge can scale to realistic IR scenarios.

In parallel to these developments, in the past years we have also seen the emergence of important results in adapting ideas from IR to the problem of search in RDF/OWL data, folksonomies, microformat collections or semantically tagged natural text. Common to these scenarios is that the search is focused not on a document collection, but on metadata (which may be possibly linked to or embedded in textual information). Search and ranking in metadata stores is another key topic addressed by the workshop.

As such, semantic technologies are now in a state to provide significant contributions to IR problems.
In this context, several challenges arise for Semantic Search systems. These include, among others:

* How can semantic technologies be exploited to capture the information need of the user?
* How can the information need of the user be translated to expressive formal queries without enforcing the user to be capable of handling the difficult query syntax?
* How can expressive resource descriptions be extracted (acquired) from documents (users)?
* How can expressive resource descriptions be stored and queried efficiently on a large scale?
* How can vague information needs and incomplete resource descriptions be handled?
* How can semantic search systems be evaluated and compared with standard IR systems?

In this context, challenges for Semantic Search research will include, among others:

* How can semantic technologies be applied to the IR problems?
* How to address scalability and effectiveness of data Web search (by applying IR technologies)?
* How to allow web users to exploit the expressiveness of the semantic data on the Web? I.e. how to lower the technical barriers for users to ask complex questions and to interact with web data to obtain concrete answers for complex needs?
* And most importantly, how can this new generation of search systems that successfully exploit semantics for IR or for data Web search can be evaluated and compared (with standard IR systems or semantic repositories)?

Topics of Interest

Semantic Search is defined through two main directions. First is Semantic-driven IR, the application of semantic technologies to the IR problem. The second is Semantic Data Search, which mainly deals with the retrieval of semantic data. Main topics of interest for the envisioned workshop contributions include (but are not limited to) the following:
Semantic-driven IR

* Expressive Document Models
* Knowledge Extraction for Building Expressive Document Representation
* Matching and Ranking based on Expressive Document Representation
* Infrastructure for Semantic-driven IR

Semantic Data Search

* Crawling, Storage and Indexing of Semantic Data
* Semantic Data Search and Ranking
* Data Web Search: Search in Multi-Data-Source, Multi-Repository Scenarios
* Dealing with Vague, Incomplete and Dirty Semantic Data
* Infrastructure for Searching Semantic Data on the Web

Interaction Paradigms for Semantic Search

* Natural Language Interfaces
* Keyword-based Query Interfaces
* Hybrid Query Interfaces (A Combination of NL, Keywords, Forms, Facets, and Formal Queries)
* Visualization of Semantic Data and Expressive Document Representation on the Web

Evaluation of Semantic Search

* Evaluation Methodologies for Semantic Search
* Standard Datasets and Benchmarks for Semantic Search
* Infrastructure for Semantic Search Evaluation

Evaluation for Entity Search Track

Our ultimate goal is to develop a benchmark, based on which semantic search systems can be compared and analysed in a systematic fashion. Clearly, semantics can be used for different tasks (document vs. data retrieval) and can be exploited throughout the search process (for more usable query construction, for better matching and ranking, for richer results presenation etc). Hence, such a benchmark shall enable the study of different aspects of semantic search systems.

For this workshop, we will intially focus on the aspects of matching and ranking in the semantic data search scenario. In particular, we aim to analyze the effectiveness, efficiency and robustness of those features of semantic search systems which are ready to be applied to the Web today: A large share of Web search queries issued today are about entities, i.e. are of the type entity search queries. There is a large and increasing amount of semantic data about entities on the Web. The research questions we aim to tackle are:

* How well do semantic data search engines perform on the task of Entity Search on the Web?
* What are the underlying concepts and techniques that make up the differences?

For answering these questions, we provide the following guidelines and support for evaluating entity search systems:

Queries: We provide a set of queries that are focused on the task of entity search. These queries represent a sample extracted from the Yahoo Web search query log. Every query is a plain list of keywords.

Data: We provide a corpus of datasets which contain entity descriptions in the form of RDF. They represent a sample of Web data extracted from publicly available sources (selected LOD datasets such as DBPedia). Also, a large amount of entity descriptions comes from data associated with Web pages (Microformats, RDFa).

Relevance Judgement: The search systems produce lists of at most 10 entities ordered by relevance. These results have to be drawn from data in the corpus. Results will be evaluated via the three point scale (0) Not Relevant, (1) Relevant and (3) Perfect Match. A perfect match is a description of a resource that matches the entity to be retrieved by the query. A relevant result is a resource description that is related to the entity, i.e. the entity is contained in the description of that resource. Otherwise, a resource description is not relevant.

Evaluation Process: For participating, each system will have to run the provided queries on the corpus. We will provide a bechnmarking system for participants to submit their results. The assessment of the results will be performed manually using Amazon Mechanical Turk. Based on the relevance judgements, recall, precision, f-measure and the mean average precision will be computed, and used as the basis for comparing search systems' performance. The process of result submission, assessment and providing evaluation feedbacks will be supported by the benchmarking system.

Program Committee

