Spark and Scala Course

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Scala & Spark

Scala : "Red Hot" Programming Language for Apache Saprk .


According to a recent survey by Databricks 71% of Spark users also
use Scala programming language.

SCALA
Getting started With Scala
01
Scala Background, Scala Vs Java and Basics
Interactive Scala – REPL, data types, variables,
expressions, simple functions
Running the program with Scala Compiler
Explore the type lattice and use type
inference
Define Methods and Pattern Matching
Scala Environment Set up
Scala set up on Windows
and UNIX
JAVA Setup
SCALA Editor

02 Interpreter
Compiler

Functional
Programming

What is Functional
Programming?
Differences between OOPS and 03
FPP

Collections
Iterating, mapping, filtering,
and counting
Regular expressions and
matching with them
Maps, Sets, group By, Options,
flatten, flat Map
Word count, IO operations, file

04 access, flatMap

Object-Oriented
Programming

Classes and Properties


Objects, Packaging, and
Imports
Traits
Objects, classes, inheritance,
Lists with multiple related
types, apply 05
Scala & Spark
Scala : "Red Hot" Programming Language for Apache Saprk .
According to a recent survey by Databricks 71% of Spark users also
use Scala programming language.

Deep Dive into Scala -1


Benefits of Scala
Language Offerings
Type inferencing
Variables
06
Functions
LOOPS
Control Structures
Vals
Arrays
Lists
Deep Dive into Scala -2
Tuples Maps
Sets Traits and Mixins
Classes and Objects
First class functions
Closures
Inheritance
Sub classes
Case Classes
07 Modules
Pattern Matching
Exception Handling
FILE Operations

Integrations

What is SBT?
Integration of Scala in Eclipse
IDE
Integration of SBT with Eclipse 08

GIT
Introduction to GIT &
Installation
Comparisons, Branching &
Merging
Rebasing, Stashing & Taggings

09

Spark and Hadoop

What is Hadoop platform


Why Hadoop platform
What is Spark
Why spark
Evolution of Spark
Hadoop Vs Spark (Spark
Benefits )
Architecture of Spark
Define Spark Components
Lazy Evaluation
10
Scala & Spark
Scala : "Red Hot" Programming Language for Apache Saprk .
According to a recent survey by Databricks 71% of Spark users also
use Scala programming language.

SPARK

Environment
Configuring Apache Spark
spark-shell
11
spark submit
Setting Up memory (Driver Memory , Executor
Memory)
Setting Up Cores (Executors Core)
Running Spark in Local
SPARK UI Explanation
Yarn and Cluster Framework

Overview of YARN and


cluster framework.
Setting up Yarn and cluster.
Benefits of Running spark

12 Jobs On cluster Mode Instead


of Local.
Fine Tuning Of memory
While running spark job on
cluster Mode
Programming Magic
with RDD

Hadoop Map Reduce VS Spark


RDD
Benefits Of RDD Over Hadoop
Map Reduce
RDD overview
Transformations and actions in
the context of RDDs.
Demonstrate Each Api's of RDD
With Real Time
Example(Like:cache,uncancahe,
13
count,filter,map etc)
Check Point in RDD.

Minimize data transfers


Concepts of Broadcast Variable
Concepts of Accumulators
Magic With Data Repartition Concepts

frames
Overview Of data frames
14
Read a CSV/ Excel Files And create a
data frame.
Cache/ Uncahe Operations On data
frames.
Persist/UnPersist Operations On data
frames.
Partition and repartition Concepts of
data frames.
For each Partitions On Data frames.
Programming using data frame .
How to use data frames Api 's
effectually.
A magic spark Job using data frame
concept.(small project)
Schema Defining on from data frame
How to perform SQL operations On
data frame.
Check Point in data frame .
StructType and arrayType in data
frames
Complex Data Structure on data
15
frame
Scala & Spark
Scala : "Red Hot" Programming Language for Apache Saprk .
According to a recent survey by Databricks 71% of Spark users also
use Scala programming language.

Various data
sources
CSV files
16
Excel Files
JSON Files
Parquet file
Benefits of Parquet file
Text Files

Various levels of
persistence
MEMORY_ONLY
MEMORY_ONLY_SER
MEMORY_AND_DISK
MEMORY_AND_DISK_SER,

17 DISK_ONLY
OFF_HEAP

User Define
Functions

Benefits of UDF's over SQL


Writing the UDF's and applying
on to the data frame
Complex UDF's
Data cleaning Using UDF's

18

Connecting Spark
With S3

Connect spark with s3


Read a file from s3 and perform
Transformation
Write a File to the s3
Preparation and close while

19 writing the file to the s3

Cassandra database

Overview of Cassandra
database and benefits.
Partition Key and collection
concepts in Cassandra
Connecting Cassandra with
spark
Read a table from Cassandra
and perform transformations.
Writing data to a Cassandra
table with millions of data
20
Scala & Spark
Scala : "Red Hot" Programming Language for Apache Saprk .
According to a recent survey by Databricks 71% of Spark users also
use Scala programming language.

Redis

Overview of redis
21
How to connect spark with redis
Collection concepts of redis
Reading the key, HashKey and set from redis
and doing operation in spark
Writing various keys to the redis using spark

Spark SQL
Overview of Spark SQL.
How to write SQL in spark.
Various types of Clause in
spark SQL

22 Using UDF’s inside spark SQL


SQL Fine Tuning using spark

Data cleaning

What are the data column


types?
How many fields match the
data type?
How many fields are
mismatches?
Which fields are matches?

Which fields are mismatches?

23
Spark Mlib
Introduction to machine
learning and benefits
Spark Mlib library Introduction.
Vectors, Decision Tree and
matrix concepts
Classification and Regression
Correlations and Stratified
Sampling concepts
Various algorithms Explanation

24 Like K-means, Gaussian


mixtures (GMMs)

Case Studies
Spark Streaming

and Live
Overview of spark streaming
Concepts of Input DStreams
and Receivers and Receiver
Project On
spark
Transformations on DStreams
Window Operations

25

You might also like