Overcome ML Data Bottleneck

Machine learning is powerful, but it can be hard to reap its benefits without large amounts of labeled training data. Labeling data by hand can be time consuming, expensive, and impractical; and sometimes you don’t even have sufficient examples to label, especially of the rare events that are most important.
This class will provide practical methods to overcome this data bottleneck. You will learn how to use heuristics to label data automatically, and you will learn how to generate synthetic training examples of rare events using generative adversarial networks (GANs). You will also learn other data augmentation approaches and methods for training models when the training data is imbalanced.
The class will also cover how to use machine learning when you only have one or a few examples.

  • Session 1: June 11th Tue 9am-12pm PST

  • 3 Hours/ 1 Session
  • Lectures / hands-on code labs
  • Live session and real time interaction
  • Watch session replay anywhere any time

Check the content tab for full course outlines.

Developers, data scientists, students.

  • Familiarity with Python, or willingness to learn it quickly
  • Basic familiarity with machine learning
    If you miss the live session or want to learn again, you can watch recorded sessions any time, along with interactive learning tools, slides, course notes

    Module 1: Introduction and Data Programming
    • Automatically generate training labels by encoding domain knowledge
    • Apply Snorkel to standard classification problems

    Module 2: Learning with imbalanced classes with the scikit-learn
    • Do machine learning even when the case you care about is drowned out in the rest of the data

    Module 3: Creating synthetic training data using GAN
    • Use deep learning to generate more variations of rare examples

    Module 4: Classifying examples into many classes
    • Learn to compare entities for similarity where the similarity space itself is learned
    • Unsupervised learning, semi-supervised learning, and using pre-trained models
    Jonathan Mugan

    Jonathan is a researcher specializing in artificial intelligence, machine learning, and natural language processing. His current research focuses in the area of deep learning for natural language generation and understanding. Dr. Mugan received his Ph.D. in Computer Science from the University of Texas at Austin, also held a post-doctoral position at Carnegie Mellon University, where he worked at the intersection of machine learning and human-computer interaction
    • Start Date: On-demand
    • Venue: Online
    • Fee:
      $49 $9
    • Students enrolled:80
    • Status: learn on-demand
    • Preview this course:
    Enroll This Course