Build Machine Learning Pipeline - Online


Are you an engineer working on feature engineering but curious about the whole picture of the machine learning system you are working on? Are you a data scientist working on modelling but want to expand your career to the system design? Are you in academia but thinking of a position from big tech companies? More importantly, do you want to learn how to build a big data machine learning system without years in school and sweating on the tedious supportive work in industry?

This course is to teach you how to apply what you already learn to build machine learning pipeline from end to end, and how to expand the system by constructing more that students can not only grasp the theoretical concept but also are hands-on in solving the real-world problems.We will show how to translate real-time raw data into the knowledge and integrate it into daily system learning

  • Session 1: Jan.8th Tue 6pm-7:30pm PT
  • Session 2: Jan.10th Thu 6pm-7:30pm PT
  • Session 3: Jan.15th Tue 6pm-7:30pm PT
  • Session 4: Jan.17th Thu 6pm-7:30pm PT

  • 24 topics / 3 code labs/ 6 hours live talks
  • Real time interactions with instructors
  • Watch recorded videos any time
  • Real time Q&A on Slack group

Check the content tab for full course outlines.

Developers with interests in building large scale machine learning systems. Both beginners and experts of the field can learn from different perspectives

Reasonable background knowledge of machine learning, Python and R programming skills is preferred

The course live sessions have ended, you can still enroll to learn the course with recorded videos, slides, course notes and discussions.
Not refundable.

Module 1: machine learning system pipeline overview (1.5 hour)
  • Case study: What is machine learning.
  • Advantages of machine learning system
  • End to end machine learning system components
  • Code lab 1 (anomaly detection for data quality control)

Module 2: select and construct features (1.5hour)
  • Build machine learning system
  • Feature selection
    • Overview
    • Linear regression
    • KS test
  • Feature construction
    • Overview
    • Entropy
    • Doc2Vec
  • Code lab 2

Module 3: build models 1 (1.5 hours)
  • Random forest
  • Gradient boosting decision tree
  • Xgboost
  • Code lab 3

Module 4: build models 2 (1 hours)
  • Neural network
  • Support vector machine
  • Logistic regression
  • Code lab 4

Module 5: summary (0.5 hours)
  • pitfalls and lessons
  • next steps
Zhen Li

Senior data scientist of Microsoft, has been leading to build machine learning systems to solve different critical business problems. These projects covers 24*7 near real time system design, data collection, data mining, feature construction, modeling and system operational practice. In Microsoft, her deep learning project won the 2nd Best AI School Project at the company level. In Amazon, she reduced bad debt for the 2nd largest oversea market using credit card by ~50% within 3 months. she got ~1000 paper citations and has been the invited speaker or served as session chair by various organizations and conferences, including the United States National Academy of Sciences and the U.S. EPA.
  • Start Date: ended
  • Venue: Online
  • Fee:
    $59 $9 USD
  • Students enrolled:271
  • Status: course ended
  • Course Preview:
  • Any questions? Contact Us