posted by user: anoopk || 7502 views || tracked by 14 users: [display]

WMT 2014 : NINTH WORKSHOP ON STATISTICAL MACHINE TRANSLATION

FacebookTwitterLinkedInGoogle

Link: http://www.statmt.org/wmt14/index.html
 
When Jun 26, 2014 - Jun 27, 2014
Where Baltimore USA
Submission Deadline Apr 1, 2014
Notification Due Apr 21, 2014
Final Version Due Apr 28, 2014
Categories    NLP   natural language processing   machine translation
 

Call For Papers

This workshop builds on eight previous workshops on statistical machine translation, which is one of the most prestigious venues for research in computational linguistics:

the NAACL-2006 Workshop on Statistical Machine Translation,
the ACL-2007 Workshop on Statistical Machine Translation,
the ACL-2008 Workshop on Statistical Machine Translation,
the EACL-2009 Workshop on Statistical Machine Translation,
the ACL-2010 Workshop on Statistical Machine Translation
the EMNLP-2011 Workshop on Statistical Machine Translation,
the NAACL-2012 Workshop on Statistical Machine Translation, and
the ACL-2013 Workshop on Statistical Machine Translation.
IMPORTANT DATES

Release of training data for translation task Early December 2013
Release of training data for quality estimation task January 15, 2014
Test set distributed for translation task February 24, 2014
Submission deadline for translation task February 28, 2014
System outputs distributed for metrics task March 7, 2014
Test sets distributed for quality estimation task March 7, 2014
Submission deadline for metrics and quality estimation tasks March 28, 2014
Start of manual evaluation period March 11, 2014
End of manual evaluation April 1, 2014
Paper submission deadline April 1, 2014
Notification of acceptance April 21, 2014
Camera-ready deadline April 28, 2014
OVERVIEW

This year's workshop will feature five shared tasks:

a translation task,
a quality estimation task,
a task to test automatic evaluation metrics,
a medical text translation task
In addition to the shared tasks, the workshop will also feature scientific papers on topics related to MT. Topics of interest include, but are not limited to:

word-based, phrase-based, syntax-based, semantics-based SMT
using comparable corpora for SMT
incorporating linguistic information into SMT
decoding
system combination
error analysis
manual and automatic method for evaluating MT
scaling MT to very large data sets
We encourage authors to evaluate their approaches to the above topics using the common data sets created for the shared tasks.
TRANSLATION TASK

The first shared task which will examine translation between the following language pairs:

English-German and German-English
English-French and French-English
English-Hindi and Hindi-English NEW
English-Czech and Czech-English
English-Russian and Russian-English
Participants may submit translations for any or all of the language directions. In addition to the common test sets the workshop organizers will provide optional training resources, including a newly expanded release of the Europarl corpora and out-of-domain corpora.
All participants who submit entries will have their translations evaluated. We will evaluate translation performance by human judgment. To facilitate the human evaluation we will require participants in the shared tasks to manually judge some of the submitted translations. For each team, this will amount to ranking 300 sets of 5 translations, per language pair submitted.

We also provide baseline machine translation systems, with performance comparable to the best systems from last year's shared task.

QUALITY ESTIMATION TASK

A topic of increasing interest in MT is that of estimating the quality of translated texts. Different from MT evaluation, quality estimation (QE) systems do not rely on reference translations, but rather predict the quality of an unseen translated text (document, sentence, phrase) at system run-time. This topic is particularly relevant from a user perspective: among other applications, it can (i) help decide whether a given translation is good enough for publishing as is (Soricut and Echihabi, 2010); (ii) filter out sentences that are not good enough for post-editing (Specia, 2011); (iii) select the best translation among options from multiple MT and/or translation memory systems (He et al., 2010); and (iv) inform readers of the target language of whether or not they can rely on a translation (Specia et al., 2011).

Although still very recent, research in this topic has been showing promising results in the last couple of years. However, efforts are scattered around several groups and, as a consequence, comparing different systems is difficult as there are neither well established baselines nor standard evaluation metrics. In the Quality-Estimation track of the WMT workshop and shared-task, we will provide training and test sets, along with evaluation metrics and a baseline system. By providing a common ground for development and comparison, we expect to foster research in the topic, as well as to attract new people interested in the subject, who can build and evaluate new solutions using the provided resources.

EVALUATION TASK

The evaluation task will assess automatic evaluation metrics' ability to:
Rank systems on their overall performance on the test set
Rank systems on a sentence by sentence level
Participants in the shared evaluation task will use their automatic evaluation metrics to score the output from the translation task and the system combination task. They will be provided with the output from the other two shared tasks along with reference translations. We will measure the correlation of automatic evaluation metrics with the human judgments.

MEDICAL TEXT TRANSLATION TASK

Details to be announced.

PAPER SUBMISSION INFORMATION

Submissions will consist of regular full papers of 6-10 pages, plus additional pages for references, formatted following the ACL 2013 guidelines. In addition, shared task participants will be invited to submit short papers (4-6 pages) describing their systems or their evaluation metrics. Both submission and review processes will be handled electronically. Note that regular papers must be anonymized, while system descriptions do not need to be.

We encourage individuals who are submitting research papers to evaluate their approaches using the training resources provided by this workshop and past workshops, so that their experiments can be repeated by others using these publicly available corpora.

ANNOUNCEMENTS

Subscribe to to the announcement list for WMT14 by entering your e-mail address below. This list will be used to announce when the test sets are released, to indicate any corrections to the training sets, and to amend the deadlines as needed.
Email:
You can read past announcements on the Google Groups page for WMT. These also include an archive of annoucements from earlier workshops.
INVITED TALK

TBC
ORGANIZERS

Ondrej Bojar (Charles University)
Christian Buck (University of Edinburgh)
Christian Federmann (Microsoft)
Barry Haddow (University of Edinburgh)
Philipp Koehn (University of Edinburgh)
Matouš Macháček (Charles University in Prague)
Christof Monz (University of Amsterdam)
Pavel Pecina (Charles University in Prague)
Matt Post (Johns Hopkins University)
Herve Saint-Amand (University of Edinburgh)
Radu Soricut (Google)
Lucia Specia (University of Sheffield)

Related Resources

ICMLA 2024   23rd International Conference on Machine Learning and Applications
JARES 2024   International Journal of Advance Robotics & Expert Systems
ICDM 2024   IEEE International Conference on Data Mining
ACSIT 2024   12th International Conference of Advanced Computer Science & Information Technology
ACML 2024   16th Asian Conference on Machine Learning
IPPR 2024   10th International Conference on Image Processing and Pattern Recognition
DSIT 2024   2024 7th International Conference on Data Science and Information Technology (DSIT 2024)
NLPTT 2024   5th International Conference on NLP Trends & Technologies
CCBDIOT 2024   2024 3rd International Conference on Computing, Big Data and Internet of Things (CCBDIOT 2024)
EAIH 2024   Explainable AI for Health