Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Prepare for the Creating a Dataflow

Pipeline lab
The focus of this lab is Dataflow. You use a template to create a batch processing
Dataflow pipeline.

The pipeline will read the contents of a text file stored in Google Cloud Storage and count
the number times a word is used in the file. The pipeline will store the word count in a
Google Cloud Storage bucket in your project.

In this lab you:


1. Create a storage bucket in your project.
2. Upload a text file to the bucket.
3. Use the Dataflow template to create and execute the pipeline.
4. Review the word count in the files created in your bucket.

At this point of the lab, you have successfully completed a batch processing
text-file-to-text-file pipeline. Now you create pipeline to read data from a text file to a
Pub/Sub topic.

1. Create a Dataflow pipeline that reads data from a text file in Google Cloud
Storage and publishes it to Pub/Sub.
2. Use gcloud commands to pull the messages from Pub/Sub.

You might also like