Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Browse Data Science Machine Learning

Machine Learning with Apache Spark


This course is part of multiple programs. Learn more

Taught in English 19 languages available Some content may not be translated

Instructors: IBM Skills Network Team +1 more

Enroll
Starts Dec 22

3,613 already enrolled

Course
Gain insight into a topic and learn the fundamentals

4.7 (29 reviews)

Intermediate level
Recommended experience

14 hours (approximately)

Flexible schedule
Learn at your own pace

View course modules

What you'll learn

Describe ML, explain its role in data engineering, summarize generative AI, Evaluate ML models, distinguish between regression, classification, and
discuss Spark's uses, and analyze ML pipelines and model persistence. clustering models, and compare data engineering pipelines with ML
pipelines.

Construct the data analysis processes using Spark SQL, and perform Demonstrate connecting to Spark clusters, build ML pipelines, perform
regression, classification, and clustering using SparkML. feature extraction and transformation, and model persistence.

Skills you'll gain

Machine Learning Machine Learning Pipelines Data Engineer SparkML Apache Spark

Details to know

Shareable certificate Recently updated!


Add to your LinkedIn profile July 2023
Assessments
7 assignments

Enroll
Starts Dec 22

See how employees at top companies are mastering


in-demand skills
Learn more about Coursera for Business

Build your subject-matter expertise


This course is available as part of multiple programs
When you enroll in this course, you'll also be asked to select a specific program.

Learn new concepts from industry experts

Gain a foundational understanding of a subject or tool


Develop job-relevant skills with hands-on projects
Earn a shareable career certificate
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV
Share it on social media and in your performance review

There are 4 modules in this course


Explore the exciting world of machine learning with this IBM course.

Start by learning ML fundamentals before unlocking the power of Apache Spark to build and deploy ML models for data engineering applications.
Dive into supervised and unsupervised learning techniques and discover the revolutionary possibilities of Generative AI through instructional
readings and videos.

Gain hands-on experience with Spark structured streaming, develop an understanding of data engineering and ML pipelines, and become proficient
in evaluating ML models using SparkML.

In practical labs, you'll utilize SparkML for regression, classification, and clustering, enabling you to construct prediction and classification models.
Connect to Spark clusters, analyze SparkSQL datasets, perform ETL activities, and create ML models using Spark ML and sci-kit learn. Finally,
demonstrate your acquired skills through a final assignment.

This intermediate course is suitable for aspiring and experienced data engineers, as well as working professionals in data analysis and machine
learning. Prior knowledge in Big Data, Hadoop, Spark, Python, and ETL is highly recommended for this course.

Read less

Get Started with Machine Learning


Module details
Module 1 • 4 hours to complete

In this module, you will gain knowledge of machine learning techniques that enable computers to perform tasks without explicit programming. You will
explore the lifecycle of machine learning models and understand the crucial role of data engineering in machine learning projects. The module covers
supervised and unsupervised learning techniques, including classification, regression, and clustering. Furthermore, you will acquire valuable insights
into Generative AI and its potential to revolutionize multiple industries, enhance people's lives, and generate newer and previously unimaginable data
and experiences.

What's included

11 videos 4 readings 2 assignments 5 app items

Hide info about module content

11 videos • Total 66 minutes

Course Introduction • 6 minutes • Preview module

Introduction to Machine Learning for Everyone • 7 minutes

Role of Data Engineering in Machine Learning • 8 minutes

Machine Learning Model Lifecycle • 2 minutes

Supervised vs Unsupervised Learning • 6 minutes

Regression • 6 minutes

Classification • 6 minutes

Evaluating Machine Learning Models • 7 minutes

Clustering • 5 minutes

Generative AI Overview and Use Cases • 5 minutes


Generative AI Applications • 5 minutes

4 readings • Total 13 minutes

Course Syllabus • 2 minutes

How to make the most of this course • 3 minutes

Module 1 Glossary • 5 minutes

Summary and Highlights • 3 minutes

2 assignments • Total 42 minutes

Graded Quiz: Get Started with Machine Learning • 30 minutes

Practice Quiz: Get Started with Machine Learning • 12 minutes

5 app items • Total 150 minutes

Hands-on Lab: Building and Training a Prediction Model using Linear Regression • 30 minutes

Hands-on Lab: Build a Classifier Model using Logistic Regression • 30 minutes

Hands-on Lab: Metrics for Regression • 30 minutes

Hands-on Lab: Metrics for Classification • 30 minutes

Hands-on Lab: Clustering • 30 minutes

Machine Learning with Apache Spark


Module details
Module 2 • 2 hours to complete

This module will introduce you to Spark and provide an overview of its key features and applications in the field of data engineering. You will discover
the process of connecting to a Spark cluster using SN labs and delve into various topics such as regression, mileage prediction, classification, diabetic
classification, clustering, and clustering load data using SparkML. Additionally, you will gain insights into how to construct these models using Spark
ML. Moreover, this module will cover GraphFrames on Apache Spark and guide you in hands-on labs.

What's included

5 videos 2 readings 2 assignments 4 app items

Hide info about module content

5 videos • Total 31 minutes

Spark for Data Engineers • 4 minutes • Preview module

Regression using SparkML • 7 minutes

Classification using SparkML • 8 minutes

Clustering using SparkML • 6 minutes

GraphFrames on Apache Spark • 4 minutes

2 readings • Total 7 minutes

Module 2 Glossary • 4 minutes

Summary and Highlights • 3 minutes

2 assignments • Total 40 minutes


Graded Quiz: Machine Learning with Apache Spark • 30 minutes

Practice Quiz: Machine Learning with Apache Spark • 10 minutes

4 app items • Total 100 minutes

Hands-on Lab: Connecting to Spark Cluster using SN Labs • 10 minutes

Hands-on Lab: Regression using SparkML • 30 minutes

Hands-on Lab: Classification using SparkML • 30 minutes

Hands-on Lab: Clustering Customer Data using SparkML • 30 minutes

Data Engineering for Machine Learning using Apache Spark


Module details
Module 3 • 3 hours to complete

This module begins with Apache Spark Structured Streaming and its role in processing streaming data with Spark SQL. You will acquire knowledge
about key terms associated with Structured Streaming. The module then covers the Extract-Transform-Load process and provides hands-on experience
in transferring data from one source to another destination with varying data formats or structures. Additionally, you will gain a practical understanding
of feature extraction and transformation using Spark extract and transform features. The module also delves into machine learning pipelines using
Spark, demonstrating the process and benefits involved. Lastly, you will grasp the concept of model persistence and its significant role in Machine
Learning.

What's included

6 videos 2 readings 2 assignments 5 app items 1 plugin

Hide info about module content

6 videos • Total 33 minutes

Spark Structured Streaming • 4 minutes • Preview module

ETL Workloads • 4 minutes

Spark SQL • 4 minutes

Feature Extraction and Transformation • 7 minutes

Machine Learning Pipelines using Spark • 7 minutes

Model Persistence • 5 minutes

2 readings • Total 6 minutes

Module 3 Glossary • 4 minutes

Summary and Highlights • 2 minutes

2 assignments • Total 40 minutes

Graded Quiz: Data Engineering for Machine Learning using Apache Spark • 30 minutes

Practice Quiz: Data Engineering for Machine Learning using Apache Spark • 10 minutes

5 app items • Total 150 minutes

Hands-on Lab: ETL using Spark • 30 minutes

Hands-on Lab: Analyze a dataset using SparkSQL • 30 minutes

Hands-on Lab: Feature Extraction and Transformation Lab • 30 minutes

Hands-on Lab: PipeLine creation using SparkML • 30 minutes


Hands-on Lab: Model Persistence • 30 minutes

1 plugin • Total 8 minutes

Reading: Data Engineering vs Machine Learning Pipelines • 8 minutes

Final Project
Module details
Module 4 • 3 hours to complete

In this module, you will apply the data engineering skills and techniques you have acquired throughout the course. The course concludes with a final
project and assignments that allow you to demonstrate your proficiency in these areas. You will step into the role of a data engineer working at a
renowned aeronautics consulting company recognized for its adeptness in handling large datasets. Your role as a data engineer is crucial as the data
scientists rely on your expertise to carry out ETL (Extract, Transform, Load) tasks and establish machine learning pipelines. While data scientists possess
expertise in machine learning, they depend on your specialized knowledge to handle various algorithms and data formats. Your contribution plays a
vital role in ensuring the smooth execution of their tasks.

Instructors
Instructor ratings 4.9 (9 ratings)

IBM Skills Network Team


IBM
49 Courses • 655,942 learners

View all 2 instructors

Offered by

IBM
Learn more

Recommended if you're interested in Machine Learning

Recommended Degrees

IBM IBM

Data Engineering Capstone Project Introduction to NoSQL Databases

Course Course

Show 8 more

Why people choose Coursera for their career


Felipe M. Jennifer J.
Learner since 2018 Learner since 2020

"To be able to take courses at my own pace and rhythm has been "I directly applied the concepts and skills I learned from my
an amazing experience. I can learn whenever it fits my schedule ● ○ courses to an exciting new project at work."
and mood."

4.7 29 reviews

5 stars 82.75%

4 stars 6.89%

3 stars 6.89%

2 stars 0%

1 star 3.44%

BS
5 · Reviewed on Dec 19, 2023

This course gives good overview about ML with


Spark.

View more reviews

New to Machine Learning? Start here.

What Is a Machine Learning 7 Machine Learning Projects to How to Land a Machine Learning Machine Learning in Finance: 10
Engineer? (+ How to Get Started) Build Your Skills Internship: A 2024 Career Guide Applications and Use Cases

November 29, 2023 November 29, 2023 December 12, 2023 November 29, 2023
Article Article · 7 min read Article · 6 min read Article · 7 min read

Open new doors with Coursera Plus


Unlimited access to 7,000+ world-class courses, hands-on projects, and job-ready
certificate programs - all included in your subscription

Learn more
Advance your career with an online degree
Earn a degree from world-class universities - 100% online

Explore degrees

Join over 3,400 global companies that choose Coursera for Business
Upskill your employees to excel in the digital economy

Learn more

Frequently asked questions

When will I have access to the lectures and assignments?


What will I get if I subscribe to this Certificate?

What is the refund policy?

More questions
Visit the learner help center

Coursera Community
About Learners
What We Offer Partners
Leadership Beta Testers
Careers Translators
Catalog Blog
Coursera Plus The Coursera Podcast
Professional Certificates Tech Blog
MasterTrack® Certificates Teaching Center
Degrees
For Enterprise
For Government
For Campus
Become a Partner
Coronavirus Response
Social Impact

More
Press
Investors
Terms
Privacy
Help
Accessibility
Contact
Articles
Directory
Affiliates
Modern Slavery Statement
Manage Cookie Preferences

Learn Anywhere

Follow Us
© 2023 Coursera Inc. All rights reserved.

You might also like