Insights on The Data Challenge in Deep Learning Projects

Data is the most precious resource of deep learning research. As such, it should be handled carefully, from data gathering, data annotation, data QA and data versioning. However, even if you managed to perform all the above tasks in the best possible way, data holds challenges that can dramatically affect your performance.

In this talk, we discuss the fact that your data is most likely biased and that it affects the performance of your model. We will show how to identify data bias and what can be done to address it. Particularly, we focus on class imbalance. We provide illustrative experiments to accompany these ideas. Our experiments focus on an object detection task, which have additional complexities beyond vanilla classification tasks. We explore how different data balancing methods (data resampling and loss reweighting) affect the performance of minority and majority classes in such settings.

In addition we will peek into the diminishing effect of annotated data. Deep learning models are notorious for their endless appetite for training data. The process of acquiring high quality annotated data consumes a relatively large amount of resources. Monitoring the diminishing effect provides a way to assess how much data is needed for the different stages of the project lifecycle and even predicting whether the current model architecture will be able to achieve the target metric. This knowledge effectively provides a tool for optimal management of time, manpower, and computing resources.

Finally, we will discuss the features needed for a dataset management tool that can help identify and tackle the data challenge in your deep learning projects. We will demonstrate the effectiveness of using such a tool on popular computer vision tasks.

Ariel Biller

Researcher first, Developer second, In the last 5 years Ariel worked on various projects from the realms of quantum chemistry, massively-parallel supercomputing and deep-learning computer- vision. With AllegroAi, he helped build an open-source R&D platform (Allegro Trains), and later went on to lead a data-first transition for a revolutionary nanochemistry startup (StoreDot). Answering his calling to spread the word on state-of-the-art research best practices, He recently took up the mantle of Evangelist at AllegroAi.
Ariel received his PhD in Chemistry in 2014 from the Weizmann Institute of Science. With a broad experience in computational research, he made the transition to the bustling startup scene of Tel-Aviv, and to cutting-edge Deep Learning research.
  • Date: Sep 23, 10:00 (US Pacific Time)
  • Fee: Free
  • Available Seats: 20 (max 300)
  • Help? Send Question
Watch Recording