Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Get Started

with PySpark
Programming

©2022 Databricks Inc. — All rights reserved 1


Module Agenda
Get Started with PySpark Programming

Spark SQL Overview


DE 0.1 - Spark SQL
DE 0.2L - Spark SQL Lab
DE 0.3 - DataFrame & Column
DE 0.4L - Purchase Revenues Lab
DE 0.5 - Aggregation
DE 0.6L - Revenue by Traffic Lab

©2022 Databricks Inc. — All rights reserved 2


Spark SQL Overview

©2022 Databricks Inc. — All rights reserved 3


Spark SQL is a module for structured data processing
with multiple interfaces

DataFrame API
SQL
Python, Scala, Java, R

©2022 Databricks Inc. — All rights reserved


The same Spark SQL query can be expressed with
SQL and the DataFrame API

SELECT id, result spark.table("exams")


FROM exams .select("id", "result")
WHERE result > 70 .where("result > 70")
ORDER BY result .orderBy("result")

©2022 Databricks Inc. — All rights reserved


Spark SQL executes all queries on the same
engine

SQL Queries

Python DataFrame
API

Query Plans RDDs Execution


Scala DataFrame API

©2022 Databricks Inc. — All rights reserved


Spark SQL optimizes queries before execution

Query Plan Optimized RDDs Execution


Query Plan

©2022 Databricks Inc. — All rights reserved

You might also like