WePS 3 2010 : Third WePS Evaluation Workshop: Searching Information about Entities in the Web

posted by user: javart || 9983 views || tracked by 22 users: [display]

WePS 3 2010 : Third WePS Evaluation Workshop: Searching Information about Entities in the Web

When	Sep 20, 2010 - Sep 22, 2010
Where	Padua (Italy)
Abstract Registration Due	Jun 15, 2010
Submission Deadline	Aug 15, 2010

Categories information retrieval information extraction disambiguation web search

Call For Papers

Previous WePS campaigns have been focused on the people search task:
the first campaign addressed the name ambiguity problem, defining the
task as a clustering of web search results for a given person name,
aiming at one cluster per person sharing the name. The second campaign
used a refined version of the evaluation metrics and added an
attribute extraction task for web documents returned by the search
engine for a given person name.

In WePS-3 we aim at merging both problems into one single task, where
the system must return both the documents and the attributes for each
of the different people sharing a given name. This is not a trivial
step from the point of view of evaluation: a system may correctly
extract attribute values from different URLs but then incorrectly
merge them into person profiles.

In addition, WePS-3 adds a task which considers, for the first time,
another relevant type of named entity: organizations. We will focus on
name ambiguity for organizations, which is a highly relevant problem
faced by Online Reputation Management systems. Take, for instance, the
online company Amazon. In order to trace mentions and opinions about
Amazon in web data (including news and blog feeds and input from
social networks), the system must filter out alternative senses of
“Amazon” (the South American river, the nation of female warriors,
etc.). But such filtering cannot be done by liberally adding keywords
to a query (e.g. “amazon online store”), because that may harm recall,
and recall is crucial for reputation management.

* Task definitions

WePS 3 will be a competitive evaluation campaign including two tasks
concerning the Web entity search problem:

** Task 1: Clustering and Attribute Extraction for Web People Search

Task 1 is related to Web People Search and focuses on person name
ambiguity and person attribute extraction on Web pages.
Given a set of web search results for a person name, the task is to
cluster the pages according to the different people sharing the name
and extract certain biographical attributes for each person (i.e., for
each cluster of documents).

Guidelines for the WePS-3 Person Name Disambiguation Task
(http://nlp.uned.es/weps/weps-3/guidelines/41-guidelines-for-the-weps-3-person-name-disambiguation-task)

Guidelines for the WePS-3 Attribute Extraction Subtask
(http://nlp.uned.es/weps/weps-3/guidelines/42-guidelines-for-the-weps-3-attribute-extraction-subtask)

** Task 2: Name ambiguity resolution for Online Reputation Management (ORM)

Task 2 is related to Online Reputation Management (ORM) for
organizations and focuses on the problem of ambiguity for organization
names and the relevance of Web data for reputation management
purposes. The motivation is to help experts in reputation management
and alert services. Nowadays, the ambiguity of names is an important
bottleneck for these experts. Twitter has been chosen as target data
because it is a critical source for real time reputation management
and also because ambiguity resolution is challenging: tweets are
minimal and little context is available for resolving name ambiguity.

The task is defined as follows: given a set of Twitter entries
containing an (ambiguous) company name, and given the home page of the
company, the task is to discriminate entries that do not refer to the
company. Entries will be given in two languages: English and Spanish.

Guidelines for the WePS-3 On-line Reputation Management Task
(http://nlp.uned.es/weps/weps-3/guidelines/40-guidelines-for-the-weps-3-on-line-reputation-management-task)

* Participation

A team can choose to participate in both Task 1 and Task 2 or only in
one of them. In Task 1 Clustering is mandatory and Attribute
Extraction optional (i.e. teams that perform the Attribute Extraction
subtask are required to complete the Clustering task too).
The organizers will provide annotated data for developing/training
systems (read the task guidelines at
http://nlp.uned.es/weps/weps-3/guidelines for more details). On a
second stage, an unannotated corpus will be distributed, systems
output will be collected and evaluation results returned to the
participants. Each team can submit up to five runs for each task
(Clustering, Attribute Extraction and ORM). Every team is expected to
write a paper describing their system and discussing the evaluation
results.

The results of the evaluation campaign will be discussed in a one day
workshop as a CLEF 2010 Lab in Padua (Italy), 22 or 23 September 2010.

* How do I register ?

Please send an email expressing your interest to the task organizers
(weps-organizers@lsi.uned.es). State the name of your research group,
a contact e-mail and the task(s) in which you intend to participate
(Task 1 clustering only, Task 1 clustering + attribute extraction,
Task 2).

* Important Dates

Release of trial data: 15 February 2010
Release of test data: 1 June 2010
Submissions due: 15 June 2010
Release of official results: 15 July 2010
Papers due: 15 August 2010
Workshop: 20, 21 or 22 September (CLEF 2010, Padua)

* Organizers

The general lab coordinators are:
Julio Gonzalo (UNED, Madrid), julio@lsi.uned.es
Satoshi Sekine (NYU, New York), sekine@cs.nyu.edu

The coordinators for Task 1 (people search) are:
Javier Artiles (UNED, Madrid), javart@bec.uned.es
Andrew Borthwick (Intelius Corp., Palo Alto), aborthwick@intelius.com

The coordinators for Task 2 (organizations search) are:
Bing Liu (University of Illinois at Chicago), liub@cs.uic.edu
Enrique Amigó (UNED, Madrid), enrique@lsi.uned.es
Adolfo Corujo (Llorente & Cuenca, Madrid), acorujo@llorenteycuenca.com

* Program Committee

Eneko Agirre, EHU, Spain
Breck Balwin, Alias-i, USA
Danushka Bollegala, Tokyo University, Japan
Jeremy Ellman, Northumbria University, UK
Donna Harman, National Institute of Standards and Technology (NIST), USA
Eduard Hovy, ISI, USA
Dmitri Kalashnikov, University of California, USA
Paul Kalmar, USA
Bernardo Magnini, FBK-irst, Italy
Gideon Mann, Google, USA
Yutaka Matsuo, Tokyo University, Japan
Manabu Okumura, Tokyo Inst. of Tech., Japan
Ted Pedersen, University of Minnesota, USA
Massimo Poesio, University of Essex, UK
Maarten de Rijke, University of Amsterdam, Netherlands
Jamie Taylor, Freebase, USA
Mark Sanderson, University of Sheffield, UK
Arjen P. de Vries, Centrum Wiskunde & Informatica, Netherlands

Updated information about the task can be found at the
WePS web site (http://nlp.uned.es/weps).