posted by organizer: shamujum || 2164 views || tracked by 2 users: [display]

DQAML 2021 : 2nd International Workshop on Data Quality Assessment for Machine Learning @ KDD 2021

FacebookTwitterLinkedInGoogle

Link: http://data-readiness-kdd-2021.mybluemix.net/
 
When Aug 14, 2021 - Aug 18, 2021
Where Virtual
Submission Deadline May 27, 2021
Notification Due Jun 10, 2021
Categories    data quality   machine learning
 

Call For Papers

DQAML2021: Call for Papers

Dear Colleagues,

We invite you to submit high-quality research papers to the 2nd International Workshop on Data Quality Assessment for Machine Learning in conjunction with The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2021) Virtual event , August 14th-18th, 2021

Website : http://data-readiness-kdd-2021.mybluemix.net
CFP (PDF) : http://data-readiness-kdd-2021.mybluemix.net/flyer/kdd_2021_workshop_cfp.pdf
Contact : data.readiness.kdd2021@gmail.com

Important Deadlines

Submission : May 27th, 2021 (Extended Deadline)
Decisions : June 10th, 2021
Workshop : August 14-18th, 2021

All deadlines are 11.59 pm UTC -12h ("Anywhere on Earth").

Workshop Description

In the past decade, AI/ML technologies have become pervasive in academia and industry, finding their utility in newer and challenging applications. While there has been a focus to build better, smarter and automated ML models little work has been done to systematically understand the challenges in the data and assess its quality issues before it is fed to an ML pipeline. Issues such as incorrect labels, synonymous categories in a categorical variable, heterogeneity in columns etc. which might go undetected by standard pre-processing modules in these frameworks can lead to suboptimal model performance. Although, some systems are able to generate comprehensive reports with details of the ML pipeline, a lack of insight and explainability w.r.t. to the data quality issues leads to data scientists spending ~80% time on data preparation before employing these AutoML solutions. This is why data preparation has been called out as one of the most time-consuming step in an AI lifecycle. Since the quality of data is not known at Step 0, when the data is acquired, data preparation becomes an iterative debugging process and becomes more of an art, leveraging the experience of a data scientist. Because the performance of an ML model is only as good as the training data it sees, a systematic analysis of data quality before building AI/ML models is of utmost importance.

The goal of this workshop is to attract researchers working in the fields of data acquisition, data labeling, data quality, data preparation and AutoML areas to understand how the data issues, their detection and remediation will help towards building better models. With a focus on different modalities such as structured data, time series data, text data and graph data, this workshop invites researchers from academia and industry to submit novel propositions for systematically identifying and mitigating data issues for making data AI ready.

Topics

Methods of data assessment can change depending on the modality of the data. This workshop will invite submissions for data quality assessment for different modalities: structured (or tabular) data, unstructured (such as text, log, images) data, graph structured (relational, network) data, time series data, spatio-temporal data etc. We would like to explore state-of-the-art deep learning and AI concepts such as deep reinforcement learning, graph neural networks, self-supervised learning, capsule networks and adversarial learning to address the problems of data assessment quality for ML. Following is a (non-exhaustive) list of topics that are of interest to this workshop:
- Algorithms for assessment of data quality issues relevant to ML
- Automatic remediation of data quality issues
- Human-assisted data cleaning and remediation
- Automated data cleaning workflows
- Explainability and interpretability of quality assessment
- Interactive debugging of data
- Smarter data visualisations for high dimensional data
- Evaluation techniques for data quality assessment
- Real world use cases and applications of data quality assessment
- Novel interfaces to assist human-in-the-loop intervention for interactive data cleaning
- Quality-aware representations and sampling of high dimensional data
- Representative sampling for high dimensional data
- Detection of bias and privacy breach
- Label noise detection, explanation and incorporating feedback
- Noise and low-quality data robustness studies
- Handling corrupted, missing and uncertain data
- Outlier (or anomaly) detection and mitigation in data
- Addressing Class Imbalance in data
- Benchmarking of data preparation and cleaning systems and tools: data sets and frameworks


Submission Instructions

We solicit submission of papers of papers of 4 to 10 pages representing reports of original research, preliminary research results, case studies, proposals for new work and position papers.

All papers will be peer reviewed, single blind (i.e. author names and affiliations should be listed). If accepted, at least one of the authors must attend the workshop to present the work. The submitted papers must be written in English and formatted in the double column standard according to the ACM Proceedings Template, Tighter Alternate style (http://www.acm.org/publications/proceedings-template). The papers should be in PDF format and submitted via the EasyChair submission site (https://easychair.org/my/conference?conf=datareadinesskdd2021#). The workshop website will archive the published papers. The submitted papers must not be previously published anywhere and must not be under consideration by any other conference or journal during the workshop review process.


Workshop Organizers

- Hima Patel, IBM Research AI, India
- Fuyuki Ishikawa, National Institute of Informatics, Japan
- Laure Berti-Equille, IRD, ESPACE-DEV, France
- Nitin Gupta, IBM Research AI, India
- Sameep Mehta, IBM Research AI, India
- Satoshi Masuda, IBM Research AI, Japan
- Shashank Mujumdar, IBM Research AI, India
- Shazia Afzal, IBM Research AI, India
- Srikanta Bedathur, Indian Institute of Technology Delhi, India
- Yasuharu Nishi, The University of Electro-Communications, Japan


Regards,

Workshop Chairs,
DQAML2021

Related Resources

IEEE-Ei/Scopus-ITCC 2025   2025 5th International Conference on Information Technology and Cloud Computing (ITCC 2025)-EI Compendex
SPIE-Ei/Scopus-DMNLP 2025   2025 2nd International Conference on Data Mining and Natural Language Processing (DMNLP 2025)-EI Compendex&Scopus
IEEE-Ei/Scopus-CNIOT 2025   2025 IEEE 6th International Conference on Computing, Networks and Internet of Things (CNIOT 2025) -EI Compendex
CETA--EI 2025   2025 4th International Conference on Computer Engineering, Technologies and Applications (CETA 2025)
ICSTTE 2025   2025 3rd International Conference on SmartRail, Traffic and Transportation Engineering (ICSTTE 2025)
AMLDS 2025   IEEE--2025 International Conference on Advanced Machine Learning and Data Science
CSITEC 2025   11th International Conference on Computer Science, Information Technology
SPIE-Ei/Scopus-CMLDS 2025   2025 2nd International Conference on Computing, Machine Learning and Data Science (CMLDS 2025) -EI Compendex & Scopus
IEEE CACML 2025   2025 4th Asia Conference on Algorithms, Computing and Machine Learning (CACML 2025)
IEEE-Ei/Scopus-CWCBD 2025   2025 6th International Conference on Wireless Communications and Big Data (CWCBD 2025) -EI Compendex