Join us live to gain an in-depth understanding of the data-centric AI approaches and learn best practices from real-world implementations. Connect with fellow data scientists, machine learning engineers, and AI leaders from academia and industry with over 30 virtual sessions, hands-on virtual workshop, and in-person social.
August 3 - 4: 8:30AM - 2:30PM Pacific Time
Two days of keynotes, fireside chats, and lightning talks on the following topics. Agenda coming soon.
August 2, 9AM - 11AM Pacific Time
A two hour hands-on workshop where experts from Snorkel introduce you to the Snorkel Flow platform and show you how to build AI applications rapidly using a data-centric workflow. To learn more go here. Select the workshop option during the registration process to apply for a spot. Space is limited.
August 4, 5PM - 7PM Pacific Time
If you're in the San Francisco bay area, join us for an in-person social to connect with your peers in the data science and AI/ML community. Location details coming soon. Select the in-person social option during the registration process to join us. Space is limited.
A wholistic representation towards integrated AI
Presenter: Xuedong Huang AZURE AI
Applying engineering practices to data-centric AI
Presenter: Matei Zaharia, DATABRICKS
Automating data quality remediation with AI
Presenters: Bryan Richardson + Jacamo Corbo, MCKINSEY & COMPANY
Cleanlab: Datasets that fix themselves
Presenter: Curtis Northcutt, MIT
Closer look at the role of data in achieving fairness in the job ecosystem
Presenter: Sakshi Jain, LINKEDIN
Data and manual annotation monitoring for training data management
Presenter: William Huang, CAPITAL ONE
Data selection for data-centric AI: data quality over quantity
Presenter: Cody Coleman, CO-ACTIVE AI
Data-centric AI at comcast in voice and conversational interfaces
Presenter: Jan Neumann, COMCAST
DataPerf: benchmarks for data-centric AI development
Presenter: Vijay Janapa Reddi, HARVARD UNIVERSITY
HoloClean and Kamino: Structured Learning for Data Cleaning and Private Data Generation A
Presenter: Ihab Ilyas, APPLE
Leveraging synthetic data to train perception models using NVIDIA omniverse replicator
Presenter: Nyla Worker, NVIDIA
Machine Learning at Bloomberg
Presenter: Gary Kazantsev, BLOOMBERG
MLCommons and Public Datasets
Presenter: Peter Mattson, GOOGLE
NASA spaceML worldview search: a NoCode earth & natural disaster dataset curator from unlabeled petabyte scale imagery
Presenter: Anirudh Koul, PINTEREST
Petabyte-level learning
Presenter: Peter Davio, BLACK SWAN DATA
Practical paths to data-centricity in applied AI
Presenter: Robert Crowe, GOOGLE
Recommender systems at KeyBank
Presenter: Ty Tinker, KEYBANK
Repeatability, responsibility, & reliability – three Rs to make ML models more trustworthy
Presenter: Sriram Subramanian, IDC
Reusable Data Cleaning Pipelines in Python
Presenter: Jason Goldfarb, STATEFARM
Scaling NLP to the next 1,000 languages
Presenter: Sebastian Ruder, GOOGLE
Toward Superhuman Communication Assistance
Presenter: Timo Mertens, GRAMMARLY
Transforming drug discovery using digital biology
Presenter: Daphne Koller, INSITRO
Unmasking Human Trafficking Risk in Commercial Sex Supply Chains with Machine Learning
Presenter: Hamsa Sridhar Bastani, WHARTON SCHOOL, UNIVERSITY OF PENNSYLVANIA
....and more!
Follow #datacentricAI on LinkedIn and Twitter!