DMLR 2024 : 5th Workshop on Data-Centric Machine Learning Research

posted by user: gonzo1453 || 691 views || tracked by 1 users: [display]

DMLR 2024 : 5th Workshop on Data-Centric Machine Learning Research

When	Jul 26, 2024 - Jul 27, 2024
Where	Vienna, Austria
Submission Deadline	May 24, 2024
Notification Due	Jun 17, 2024

Categories large-scale datasets data curation unlabeled data

Call For Papers

We invite paper submissions to the 5th workshop on Data-Centric Machine Learning Research (DMLR) co-located with the International Conference on Machine Learning (ICML) – 2024 in Vienna, Austria.

This workshop builds on the success of prior data-centric workshops and brings together the DMLR, DataComp, and AI for Good communities. Our goal is to explore the critical role of datasets in shaping the future of foundation models and advance research in this area.

Scope

Large-scale foundation models are revolutionizing machine learning, particularly in vision and language domains. While model architecture received significant attention in the past, recent focus has shifted towards the importance of data quality, size, and diversity, and provenance.

This workshop aims to highlight cutting-edge advancements in data-centric approaches for large-scale foundation models in new domains, in addition to language and vision, and engage the vibrant interdisciplinary community of researchers, practitioners, and engineers who tackle practical data challenges related to foundation models. By featuring innovative research and facilitating collaboration, it aims to bridge the gap between dataset-centric methodologies and the development of robust, versatile foundation models that are able to work in and across a variety of domains in service of humanity.

Topics will include, but are not limited to

Data sources for large-scale datasets:
Construction of datasets from large quantities of unlabeled/uncurated data
Model-assisted dataset construction
Quality signals for large-scale datasets
Datasets for evaluation
Datasets for specific applications.
Impact of dataset drifts in large-scale models
Ethical considerations for and governance of large-scale datasets
Data curation and HCI
Submissions to benchmarks such as DataPerf, DynaBench, and DataComp

Submission

All authors and submissions should adhere to the ICML policy.

We welcome two types of paper submissions:
Research papers: up to 8 pages (not including references and appendices). Acceptable material includes original and high-quality unpublished contributions to the theory, practical aspects, as well as position papers relevant to the workshop topics.
Extended abstracts: up to 4 pages (not including references and appendices). Acceptable material includes work which has already been submitted or published, preliminary results and controversial findings.
Posting all versions of a paper that is submitted to DMLR workshop, on preprint servers like ArXiv is permitted. Once the paper is accepted, the preprint version should be marked with the publication information.
All submissions must represent original work and not previously published elsewhere.
The use of LLMs is allowed as a general-purpose writing assist tool. Authors should understand that they take full responsibility for the contents of their papers, including content generated by LLMs that could be construed as plagiarism or scientific misconduct (e.g., fabrication of facts). LLMs are not eligible for authorship.
Authors who choose to create new datasets must provide access to the datasets (view and download) to help reviewers assess submitted works. We strongly encourage authors to submit supplementary material, including:
Data Card: we recommend authors to check data card template.
Data Sheet: Check a datasheet example.
Authors are strongly encouraged to submit code to foster reproducibility and/or include a paragraph-long Reproducibility Statement at the end of the main text (before references) to discuss the efforts that have been made to ensure reproducibility. This optional reproducibility statement will not count toward the page limit, but should not be more than 1 page. We encourage authors to check model card template.
Authors are welcome to submit papers anonymously (if desired).
Submissions should adhere to the DMLR or ICML style templates: DMLR Latex template
Submissions are only accepted in written English.
All papers must be proofread (not just spell-checked) by the authors before submission.
Submission portal: https://openreview.net/group?id=ICML.cc/2024/Workshop/DMLR

Accepted research papers will be presented at the workshop as a poster. Accepted extended abstracts will be presented as posters. We do not intend to publish paper proceedings.

Important Dates

(Time zone: Anywhere on Earth)

Paper Submission deadline: May 24, 2024
Notification of Acceptance: June 17, 2024
Camera Ready Copy due: Coming Soon

Awards

A few selected exceptional research papers from DMLR workshop 2024 will be invited to contribute to the DMLR journal; the latest member of the JMLR family, aiming to provide a top archival venue for high-quality scholarly articles focused on the data aspect of machine learning research. The top submissions to the DMLR workshops will be invited to submit extended versions of their papers to the DMLR journal.
Workshop Organizers

Adam Mahdi, Ludwig Schmidt, Alex Dimakis, Rotem Dror, Georgia Gkioxari, Sang T. Truong, Lilith Bat-Leah, Fatimah Alzamzami, Georgios Smyrnis, Thao Nguyen, Nezihe Merve Gürel, Paolo Climaco, Luis Oala, Hailey Schoelkopf, Andrew Michael Bean, Berivan Isik, Vaishaal Shankar, Mayee F Chen, Achal Dave
Contact

If you have any questions about paper submission and the workshop, please join our Discord channel here: https://discord.gg/jYk3FNfYqG

Related Resources

IEEE-Ei/Scopus-ITCC 2025 2025 5th International Conference on Information Technology and Cloud Computing (ITCC 2025)-EI Compendex

SPIE-Ei/Scopus-DMNLP 2025 2025 2nd International Conference on Data Mining and Natural Language Processing (DMNLP 2025)-EI Compendex&Scopus

IEEE-Ei/Scopus-CNIOT 2025 2025 IEEE 6th International Conference on Computing, Networks and Internet of Things (CNIOT 2025) -EI Compendex

CETA--EI 2025 2025 4th International Conference on Computer Engineering, Technologies and Applications (CETA 2025)

IEEE-Ei/Scopus-CWCBD 2025 2025 6th International Conference on Wireless Communications and Big Data (CWCBD 2025) -EI Compendex

SPIE-Ei/Scopus-CMLDS 2025 2025 2nd International Conference on Computing, Machine Learning and Data Science (CMLDS 2025) -EI Compendex & Scopus

From Data to Decision: Empowering Ecosys 2025 The International Society for Ecological Modelling Global Conference:

IEEE CACML 2025 2025 4th Asia Conference on Algorithms, Computing and Machine Learning (CACML 2025)

ICPRS 2025 15th International Conference on Pattern Recognition Systems

MobiCASE 2025 16th EAI International Conference on Mobile Computing, Applications and Services