posted by user: grupocole || 33 views || tracked by 1 users: [display]

BUCC 2025 : 18th Workshop on Building and Using Comparable Corpora workshop at COLING'25

FacebookTwitterLinkedInGoogle

Link: https://comparable.lisn.upsaclay.fr/bucc2025/
 
When Jan 19, 2025 - Jan 20, 2025
Where Abu Dhabi, UAE
Submission Deadline Nov 30, 2024
Notification Due Dec 8, 2024
Final Version Due Dec 12, 2024
Categories    NLP   computational linguistics   artificial intelligene
 

Call For Papers



18th WORKSHOP ON BUILDING AND USING COMPARABLE CORPORA
WITH SHARED TASK ON MULTILINGUAL TERMINOLOGY EXTRACTION
FROM COMPARABLE CORPORA

Co-located with COLING 2025 (Abu Dhabi)

Paper submission deadline: 30 November, 2024

Workshop website: https://comparable.lisn.upsaclay.fr/bucc2025/

Keynote speaker: Preslav Nakov, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi

**************************************************************

* Motivation

In the language engineering and linguistics communities, research in
comparable corpora has been motivated by two main reasons. In language
engineering, on the one hand, it is chiefly motivated by the need to
use comparable corpora as training data for statistical NLP
applications such as statistical and neural machine translation or
cross-lingual retrieval. In linguistics, on the other hand, comparable
corpora are of interest because they enable cross-language discoveries
and comparisons. It is generally accepted in both communities that
comparable corpora consist of documents that are comparable in content
and form in various degrees and dimensions across several
languages. Parallel corpora are on the one end of this spectrum, and
unrelated corpora are on the other.

In recent years, the use of comparable corpora for pre-training Large
Language Models (LLMs) has led to their impressive multilingual and
cross-lingual abilities, which are relevant to a range of applications,
including Information Retrieval, Machine Translation, Cross-lingual text
classification, etc. The linguistic definitions and observations related
to comparable corpora can improve methods to mine such corpora or
to improve cross-lingual transfer of LLMs. Therefore, it is of great interest
to bring together builders and users of such corpora.

* Shared Task
This year we will run a shared task aimed at detecting translations of
terms via comparable corpora. Please see the website for details

* Topics
We solicit contributions on all topics related to comparable (and parallel) corpora, including but not limited to the following:

Building Comparable Corpora:
- Automatic and semi-automatic methods
- Methods to mine parallel and non-parallel corpora from the web
- Tools and criteria to evaluate the comparability of corpora
- Parallel vs non-parallel corpora, monolingual corpora
- Rare and minority languages, across language families
- Multi-media/multi-modal comparable corpora

Applications of comparable corpora:
- Human translation
- Language learning
- Cross-language information retrieval & document categorization
- Bilingual and multilingual projections
- (Unsupervised) Machine translation
- Writing assistance
- Machine learning techniques using comparable corpora

Mining from Comparable Corpora:
- Cross-language distributional semantics, word embeddings and
pre-trained multilingual transformer models
- Extraction of parallel segments or paraphrases from comparable corpora
- Methods to derive parallel from non-parallel corpora (e.g. to provide
for low-resource languages in neural machine translation)
- Extraction of bilingual and multilingual translations of single words,
multi-word expressions, proper names, named entities, sentences, and
paraphrases from comparable corpora, etc.
- Induction of morphological, grammatical, and translation rules from
comparable corpora
- Induction of multilingual word classes from comparable corpora

Comparable Corpora in the Humanities:

- Comparing linguistic phenomena across languages in contrastive
linguistics
- Analyzing properties of translated language in translation studies
- Studying language change over time in diachronic linguistics
- Assigning texts to authors via authors' corpora in forensic
linguistics
- Comparing rhetorical features in discourse analysis
- Studying cultural differences in sociolinguistics
- Analyzing language universals in typological research

* Workshop Organizers
- Serge Sharoff (University of Leeds)
- Ayla Rigouts Terryn (Université de Montréal (UdeM), Mila)
- Pierre Zweigenbaum (Université Paris-Saclay, CNRS, LISN, Orsay)
- Reinhard Rapp (University of Mainz, Germany)

* Program Committee
- Ebrahim Ansari (Institute for Advanced Studies in Basic Sciences,
Iran)
- Eleftherios Avramidis (DFKI, Germany)
- Gabriel Bernier-Colborne (National Research Council, Canada)
- Thierry Etchegoyhen (Vicomtech, Spain)
- Alex Fraser (University of Munich, Germany)
- Natalia Grabar (University of Lille, France)
- Amal Haddad Haddad (Universidad de Granada, Spain)
- Amir Hazem (University of Tokyo, Japan)
- Kyo Kageura (University of Tokyo, Japan)
- Natalie Kübler (Université Paris Cité, France)
- Philippe Langlais (Université de Montréal, Canada)
- Yves Lepage (Waseda University, Japan).
- Shervin Malmasi (Amazon, USA)
- Michael Mohler (Language Computer Corporation, USA)
- Emmanuel Morin (Nantes Université, France)
- Dragos Stefan Munteanu (RWS, USA)
- Ted Pedersen (University of Minnesota, Duluth, USA)
- Nasredine Semmar (CEA LIST, Paris, France)
- Silvia Severini (Leonardo Labs, Italy)
- Pranaydeep Singh (University of Gent, Belgium)
- Richard Sproat (Google, USA)
- Marko Tadić (University of Zagreb, Croatia)
- François Yvon (Sorbonne Université, France)

Related Resources

Ei/Scopus-AACIP 2024   2024 2nd Asia Conference on Algorithms, Computing and Image Processing (AACIP 2024)-EI Compendex
BUCC 2024   17th Workshop on Building and Using Comparable Corpora
IEEE Big Data - MMAI 2024   IEEE Big Data 2024 Workshop on Multimodal AI
ICBSTS 2024   2024 5th International Conference on Building Science, Technology and Sustainability (ICBSTS 2024)
COLING 2025   International Conference on Computational Linguistics
SI AID 2024   SPECIAL ISSUE on Adaptive Intrusion Detection System using Machine Learning in Wireless Sensor Networks
AISC 2024   12th International Conference on Artificial Intelligence, Soft Computing
ICCBM 2025   2025 The 9th International Conference on Civil and Building Materials (ICCBM 2025)
CST 2024   11th International Conference on Advances in Computer Science and Information Technology
AbjadNLP 2025   The 1st Workshop on NLP for Languages Using Arabic Script