Professional Documents
Culture Documents
Dataflow Lab Preparation
Dataflow Lab Preparation
Pipeline lab
The focus of this lab is Dataflow. You use a template to create a batch processing
Dataflow pipeline.
The pipeline will read the contents of a text file stored in Google Cloud Storage and count
the number times a word is used in the file. The pipeline will store the word count in a
Google Cloud Storage bucket in your project.
At this point of the lab, you have successfully completed a batch processing
text-file-to-text-file pipeline. Now you create pipeline to read data from a text file to a
Pub/Sub topic.
1. Create a Dataflow pipeline that reads data from a text file in Google Cloud
Storage and publishes it to Pub/Sub.
2. Use gcloud commands to pull the messages from Pub/Sub.