Welcome to Scribd!

Spark

Uploaded by

0% found this document useful (0 votes)

2 views1 page

This Python code uses Spark SQL to read in a CSV file, calculate the average age by name, and write the results out to Cloud Storage and BigQuery. It first creates a SparkSession and reads the input CSV. It then groups the data by name, calculates the average age, and shows the results. Finally, it extracts the name column, drops duplicates, and writes the results to Cloud Storage in silver and gold layers and comments out writing to a BigQuery table.

Original Description:

generacion de un ambiente spark

Original Title

spark

Copyright

Available Formats

TXT, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as TXT, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as txt, pdf, or txt

0% found this document useful (0 votes)

2 views1 page

Spark

Uploaded by

Josue Rueda Garcia

Copyright:

Available Formats

Download as TXT, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as txt, pdf, or txt

Jump to Page

You are on page 1of 1

Search inside document

from pyspark.

sql import SparkSession

from pyspark.sql.functions import avg

project_name ='datos-gg-qa'
dataset_name='serverless_spark'

# Create a DataFrame using SparkSession

spark = SparkSession.builder.appName("ETL").config('spark.jars',
'gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-
0.22.2.jar').getOrCreate()
input_data="gs://gg-bronce-qa/data.csv"

#Reading the Input Data

data_df = spark.read.format("csv").option("header",
True).option("inferschema",True).load(input_data)
data_df.printSchema()

# Group the same names together, aggregate their ages, and compute an average
avg_df = data_df.groupBy("name").agg(avg("age"))

# Show the results of the final execution

avg_df.show()

# extract columns to create country table

avg_table = avg_df.selectExpr("name").dropDuplicates()

##Escribir en capa Plata/Transform

output_data_plata="gs://gg-silver-qa/datasets/"
avg_table.write.mode("overwrite").csv(output_data_plata +
'spark_dataframe_plata.csv')

##Escribir en capa Oro/Transform

output_data_gold="gs://gg-gold-qa/datasets/"
avg_table.write.mode("overwrite").csv(output_data_gold +
'spark_dataframe_gold.csv')

### Escribir hasta BQ

##avg_table.write.format('bigquery') .mode("overwrite").option('table',
project_name+':'+dataset_name+'._avgage') .save()

Sap Hana - All About Views
From Everand
Sap Hana - All About Views
Alka Jain
Rating: 5 out of 5 stars
5/5 (29)
Py Spark Final
Document1 page
Py Spark Final
roy.scar2196
No ratings yet
My Pyspark Practice Notes
Document63 pages
My Pyspark Practice Notes
Study Table
No ratings yet
Week12 Assignment Solution
Document10 pages
Week12 Assignment Solution
Arnab Dey
No ratings yet
Loading and Saving Data
Document5 pages
Loading and Saving Data
durgapriyachikkala05
No ratings yet
PySpark Learning Hub 1700684461
Document8 pages
PySpark Learning Hub 1700684461
karunakar.mes
No ratings yet
Pyspark - SQL Module
Document132 pages
Pyspark - SQL Module
sergii volodarski
No ratings yet
Adv Java Assignment 4
Document13 pages
Adv Java Assignment 4
w e b b i e
No ratings yet
Pyspark Commands
Document12 pages
Pyspark Commands
Rambabu Giduturi
No ratings yet
Python Code
Document7 pages
Python Code
Gnan Shetty
No ratings yet
Ajax Drop Down Selection Data Load With PHP Amp MySQL
Document4 pages
Ajax Drop Down Selection Data Load With PHP Amp MySQL
dimitrisand
No ratings yet
19.3.2 Data Preprocessing Di Spark
Document5 pages
19.3.2 Data Preprocessing Di Spark
Yafi Shalihuddin
No ratings yet
Introducing Letters
Document33 pages
Introducing Letters
Katraj Nawaz
No ratings yet
Creating RDD
Document2 pages
Creating RDD
Parveen Mittal
No ratings yet
iBATIS-SqlMaps-2-Tutorial en
Document9 pages
iBATIS-SqlMaps-2-Tutorial en
Harshad Nelwadkar
100% (3)
Spark-Scala Code
Document3 pages
Spark-Scala Code
juliatomva
No ratings yet
Python Cgi Samples
Document3 pages
Python Cgi Samples
rhitika
No ratings yet
Example Import GCP To ADLS
Document7 pages
Example Import GCP To ADLS
jenniferwright3264338
No ratings yet
JDBC Lecture Notes
Document14 pages
JDBC Lecture Notes
minni
No ratings yet
F
Document11 pages
F
The rock
No ratings yet
JDBC Program
Document6 pages
JDBC Program
Shobha Kumar
No ratings yet
Code
Document2 pages
Code
karthikeyanmlops
No ratings yet
Migrating Data From HDFS To Big Query
Document5 pages
Migrating Data From HDFS To Big Query
Madhu Sudhan
No ratings yet
Csharp Sqlite
Document8 pages
Csharp Sqlite
Zheng Jun
No ratings yet
Snow SQL
Document3 pages
Snow SQL
Durgesh Saindane
No ratings yet
Blackcoffeee Assignment Solution
Document9 pages
Blackcoffeee Assignment Solution
Nandani Vyas
No ratings yet
Questionnaire On Behavioral Finance
Document55 pages
Questionnaire On Behavioral Finance
Naman
No ratings yet
Introduction To Spring Framework
Document18 pages
Introduction To Spring Framework
satish.sathya.a2012
No ratings yet
Untitled
Document10 pages
Untitled
Maite
No ratings yet
F
Document7 pages
F
ramos sngi
No ratings yet
Additional Source Code: For This Article Zip File
Document3 pages
Additional Source Code: For This Article Zip File
Arr RA
No ratings yet
F
Document15 pages
F
Ignacio Martínez
No ratings yet
Module 5 Assignment Java (MCA)
Document16 pages
Module 5 Assignment Java (MCA)
TCS110-Riya Singh
No ratings yet
F
Document22 pages
F
financerudra7
No ratings yet
PMG Panchayat Doc Script
Document4 pages
PMG Panchayat Doc Script
AnilKumar
No ratings yet
F
Document5 pages
F
ramos sngi
No ratings yet
F
Document14 pages
F
Warda Mehmood
No ratings yet
F
Document17 pages
F
waqas
No ratings yet
Ecosia
Document9 pages
Ecosia
J.
No ratings yet
F
Document5 pages
F
Mixy Mall Los Colores
No ratings yet
Mainpy (Customer Segmentation)
Document6 pages
Mainpy (Customer Segmentation)
sadnova805
No ratings yet
F
Document11 pages
F
The rock
No ratings yet
Lecture 39 43
Document84 pages
Lecture 39 43
api-3729920
No ratings yet
Step Rubah Upload
Document7 pages
Step Rubah Upload
afafathinalaziz
No ratings yet
F
Document17 pages
F
Masniati
No ratings yet
F
Document10 pages
F
Daniela Resendiz
No ratings yet
462 Solution Code Spring Security Demo 08 JDBC Plaintext
Document14 pages
462 Solution Code Spring Security Demo 08 JDBC Plaintext
ion2010
No ratings yet
JDBC Netbeans Mysql
Document18 pages
JDBC Netbeans Mysql
grprasad1957
No ratings yet
05 Functions
Document6 pages
05 Functions
jen
No ratings yet
(Big Data Analytics With PySpark) (CheatSheet)
Document7 pages
(Big Data Analytics With PySpark) (CheatSheet)
Niwahereza Dan
No ratings yet
Imagenes
Document5 pages
Imagenes
el 21
No ratings yet
F
Document15 pages
F
Ignacio Martínez
No ratings yet
Experiment 04
Document4 pages
Experiment 04
Hasi P
No ratings yet
F
Document18 pages
F
Maria De Los Angeles Soler
No ratings yet
F
Document11 pages
F
ramos sngi
No ratings yet
F
Document14 pages
F
Basanthithippanna
No ratings yet
F
Document17 pages
F
stephanie farhat
No ratings yet
F
Document5 pages
F
Layew
No ratings yet
How to a Developers Guide to 4k: Developer edition, #3
From Everand
How to a Developers Guide to 4k: Developer edition, #3
Xinc Cyberwizard
No ratings yet
Inspiring Powershell Articles
From Everand
Inspiring Powershell Articles
Murat Yildirimoglu
No ratings yet