Professional Documents
Culture Documents
Spark Introduction
Spark Introduction
Spark Introduction
• Rdd
a = sc.parallelize([1,2,3,4])
• dataframe
Df = a.toDF(‘a’)
START SPARK ENGINE
• Basic configuration
LOAD DATA
• From csv
LOAD DATA
http://localhost:4040
SCHEMA AND COLUMNS
• Rename columns
DATAFRAME EXPLORATION
• filter
• Filter
• aggregation
• join
• pivot
USER DEFINED FUNCTION (UDF)
TEMPORARY TABLE
EXPORT RESULT
Save to csv
Save to parquet
Result folder
• spark.stop()