Data-Centric Machine Learning

Workshop Proposal for ICML 2022

A workshop covering all aspects of shaping the behavior of models by shaping their training data.


Overview

While machine learning begins with data, the process of building and curating training sets is often abstracted away in modern machine learning research. This process is viewed as the thankless and often invisible cost of exploring novel model architectures. Yet as the community moves towards building shared and increasingly commodified models trained with massive datasets, the costs of under-valuing data curation grow more stark. Models fed junk food diets of biased, problematic data are ill-suited to the aspirations of ethical and fair applications of ML. If machine learning is to make inroads into high-stakes application areas like healthcare, autonomous vehicles, virtual assistants, and many other domains, it is increasingly clear that the research community needs to think more systematically and collaboratively about the process and theory of shaping our training data. Quantity and quality of training data matters as much to the quality of machine learning models as the choice of architecture, optimizer, and hyperparameters. This workshop will bring together researchers working on methods, theory, and applications for shaping the behavior of models by shaping their training data. This is a rapidly growing area of research, cutting across virtually all areas of machine learning.

Participants are encouraged to submit new work or work in progress addressing these and related issues. Diverse perspectives from across disciplines are highly encouraged.

Topics of interest include (but are not limited to):

  • Weak supervision by labeling data with rules or related models
  • Learning from multiple noisy sources of labels
  • Adding explanations to training data
  • Best practices for data curation
  • Encouraging fairness and reducing bias via dataset curation
  • Synthetic data generation for machine learning
  • Data augmentation


Invited Speakers (Confirmed)


Panelists

TBA


Organizers


Contact Us

Email: dcml-workshop@googlegroups.com