Practical Crowdsourcing for ML at Scale


Jul 20, 10:00AM PDT(05:00PM GMT).
  • Free 149 Attendees
Description
Speaker
AI stands on three pillars: algorithms, hardware and training data. While the first two have already become commodities on the market, the latter - reliable labelled data - is still a bottleneck in the industry.

Need to add twice as much data to the training set to improve your model? Want to validate the accuracy of a new classificator in an hour? Or maybe you are building a human-in-the-loop process with 90% of cases processed automatically and the trickiest 10% of cases fine-tuned by people in real time. You can do it all with crowdsourcing, but only with crowdsourcing done right.

In this talk, we will discuss how the new generation of methods and tools allows to collect high quality human labelled data on a large scale and why every ML specialist should know how to use crowdsourcing.

You will learn from the talk:
* Understand the applicability, benefits and limits of the crowdsourcing approach.
* Integrate an on-demand workforce into your processes and build human-in-the-loop processes.
* Control the quality and accuracy of data labeling to develop high performing ML models.
* Understand the full-cycle crowdsourcing project

Daria Baidakova(Toloka)

Director of Educational Programs at Toloka. Daria is responsible for consulting and educating Toloka requesters on integrating crowdsourcing methodology in AI projects. She also manages crowdsourcing courses at top data analysis schools (Yandex School of Data Analysis, Y-Data, etc) and organizes tutorials and hackathons for crowdsourcing specialists. Daria is a co-author of four hands-on tutorials on efficient crowdsourcing (at WSDM 2020, CVPR 2020, SIGMOD 2020, WWW 2021) and a co-organizer of the crowdsourcing workshop at NeurIPS2020.
The event ended.
Watch Recording
*Recordings hosted on Youtube, click the link will open the Youtube page.