Big Data Analytics: An Interactive Introduction to Apache Beam

Introducing BeamLearningMonth in May 2020! In collaboration with Google cloud team, we host a series of practical introductory sessions to Apache Beam!

Apache Beam is an open source, unified model for defining both batch and streaming data-parallel processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the pipeline. The pipeline is then executed by one of Beam’s supported distributed processing back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.

This is session 1 of the series:
In this talk, we will be introducing Apache Beam using Jupyter Notebooks by live coding both a batch and streaming pipeline using publicly available COVID-19 data.

For more talks on Apache Beam, join the Session 2 on May 13th, 10am PST. Link

Samuel and Ning

Samuel Rohde is a Software Engineer at Google and has been working for the Cloud Dataflow team for the past 5 years. He graduated from UIUC. Sam has been contributing to the Apache Beam source code for the past couple of years.

Ning Kangis a member of the Google Cloud Dataflow team, and has been contributing to the Apache Beam Interactive Notebook OSS project. Before that, he was a software engineer in the Google Store team where he helped with 3 large hardware (pixel phone and etc.) sales events. Before joining Google, he worked in the EMR software industry

  • Date: May 06, 10:00 AM PST
  • Fee: Free
  • Available Seats: 0 (max 400)
  • Help? Send Question
Watch Recording