Google Cloud Platform Big Data and Machine Learning Fundamentals

1 day (7 hours)

Course overview

This 1 day course introduces participants to the Big Data and Machine Learning capabilities of Google Cloud Platform (GCP). It provides a quick overview of the Google Cloud Platform and a deeper dive of the data processing capabilities.

Learning outcomes

This course teaches participants the following skills:

  • Knowledge of Google Cloud Platform products and services, particularly those related to data processing and machine learning
  • Knowledge of basic products and services related to computing and storage
  • Knowledge of Cloud SQL and Dataproc
  • Knowledge of Datalab and BigQuery
  • Knowledge of TensorFlow and Machine Learning APIs
  • Knowledge of Pub / Sub and Dataflow


To get the most out of this course, participants should have:

  • experience with a common query language such as SQL
  • experience with an ETL
  • data modeling experience
  • experience in machine learning and / or statistics
  • experience with programming in Python

Target audience

This course is intended for the following participants:

Before enrolling in this course, participants should have roughly one (1) year of experience with one or more of the following: A common query language such as SQL Extract, transform, load activities Data modeling Machine learning and/or statistics Programming in Python

Course Outline

The course includes presentations, demonstrations, and hands-on labs.

Module 1: Introducing Google Cloud Platform

Google Platform Fundamentals Overview. Google Cloud Platform Big Data Products.

Module 2: Compute and Storage Fundamentals

CPUs on demand (Compute Engine). A global filesystem (Cloud Storage). CloudShell. Lab: Set up a Ingest-Transform-Publish data processing pipeline.

Module 3: Data Analytics on the Cloud

Stepping-stones to the cloud. Cloud SQL: your SQL database on the cloud. Lab: Importing data into CloudSQL and running queries. Spark on Dataproc. Lab: Machine Learning Recommendations with Spark on Dataproc.

Module 4: Scaling Data Analysis

Fast random access. Datalab. BigQuery. Lab: Build machine learning dataset.

Module 5: Machine Learning

Machine Learning with TensorFlow. Lab: Carry out ML with TensorFlow Pre-built models for common needs. Lab: Employ ML APIs.

Module 6: Data Processing Architectures

Message-oriented architectures with Pub/Sub. Creating pipelines with Dataflow. Reference architecture for real-time and batch data processing. Module 7: Summary

Why GCP? Where to go from here Additional Resources

