Welcome to Scribd!

0% found this document useful (0 votes)

3K views

Spark SQL Hands - On

Uploaded by

This document provides instructions for performing SQL queries and joins on data stored in Hive tables and JSON files using Spark SQL. It shows how to: 1) Create a Hive table in Spark SQL from a text file and load data into it. Queries are then run on the table to select female records with salary less than 200k. 2) Load two JSON files into DataFrames, join them on a common id column, and perform aggregation queries on the joined data. The results are cached and output to text files.

Copyright:

Available Formats

Download as TXT, PDF, TXT or read online from Scribd

Flag for inappropriate content

Architect
Document11 pages
Architect
Sathish Kumar
100% (1)
Java 8 Innards Date and Time API
Document2 pages
Java 8 Innards Date and Time API
M.S.V.PAVAN KUMAR
33% (3)
Spring Boot - API Cantabile
Document3 pages
Spring Boot - API Cantabile
Amar Fulwade
No ratings yet
Unstructtured Data Classification Fresco
Document4 pages
Unstructtured Data Classification Fresco
sujesh
100% (1)
Docker Swarm
Document6 pages
Docker Swarm
Santhosh
0% (1)
CSS3 Hands-On &MCQ
Document14 pages
CSS3 Hands-On &MCQ
Stark
No ratings yet
Prodigious Git Handson
Document2 pages
Prodigious Git Handson
Tejas Gupta
50% (4)
Azure Essentials Continuum
Document9 pages
Azure Essentials Continuum
kashyap
No ratings yet
Artillery Handson
Document5 pages
Artillery Handson
Senthil Lakshmi
No ratings yet
Nodejs TCS Fresco Play
Document1 page
Nodejs TCS Fresco Play
raviteja naidu
80% (5)
Datascience Quiz
Document3 pages
Datascience Quiz
Liam D. Garillo
33% (3)
Wings1 T1 ReactJS Application (62636)
Document5 pages
Wings1 T1 ReactJS Application (62636)
Anusia Sharma
No ratings yet
Py Spark Final
Document1 page
Py Spark Final
roy.scar2196
No ratings yet
Bootstrap Handson
Document5 pages
Bootstrap Handson
Anzal Malik
0% (1)
HTML Final Assessment
Document5 pages
HTML Final Assessment
Karanjyot Chadha
50% (2)
Azure
Document3 pages
Azure
Vedant Kumar
0% (1)
This Study Resource Was
Document5 pages
This Study Resource Was
MALLUPEDDI SAI LOHITH MALLUPEDDI SAI LOHITH
No ratings yet
Scala Constructs: Concepts of Functional Programming
Document21 pages
Scala Constructs: Concepts of Functional Programming
Rahul S.Kumar
No ratings yet
Selenium HandsOn
Document2 pages
Selenium HandsOn
aman kumar
100% (1)
SR No Category Sub Category Course Name Enable / Disable D Hands On? Yes/No Handson Detail
Document3 pages
SR No Category Sub Category Course Name Enable / Disable D Hands On? Yes/No Handson Detail
Sourav Mallick
No ratings yet
Python List Handson 1
Document2 pages
Python List Handson 1
sanskriti
No ratings yet
This Study Resource Was
Document3 pages
This Study Resource Was
M.S.V.PAVAN KUMAR
No ratings yet
Spark Streaming
Document3 pages
Spark Streaming
sathyanarayanan o
100% (1)
Handlebars
Document5 pages
Handlebars
sathyanarayanan o
No ratings yet
Python Funstinos and OOPS
Document7 pages
Python Funstinos and OOPS
yipemet
No ratings yet
R Handson
Document3 pages
R Handson
vinodbabu24
100% (3)
Angular 2 Building Blocks Handson
Document9 pages
Angular 2 Building Blocks Handson
SHIVAM SHRIVASTAVA
0% (1)
Ansible - Automation Sibelius
Document10 pages
Ansible - Automation Sibelius
AniKet B
0% (1)
Kafka - Premiera Ola
Document5 pages
Kafka - Premiera Ola
sathyanarayanan o
No ratings yet
ECMAScript6 Handson
Document2 pages
ECMAScript6 Handson
Rutuja Borhade
100% (1)
Neural Networks and Deep Learning
Document3 pages
Neural Networks and Deep Learning
AniKet B
100% (1)
Nightwatch Respuestas
Document5 pages
Nightwatch Respuestas
Mr. O
100% (2)
Props
Document1 page
Props
Venkatesh Babu
0% (3)
Spark Streaming - Malay
Document1 page
Spark Streaming - Malay
Mahesh VP
100% (1)
Gradle Resp
Document4 pages
Gradle Resp
IgorJales
No ratings yet
E1 Fresco Prob3 Correct
Document1 page
E1 Fresco Prob3 Correct
Kaushik Raj
No ratings yet
Serverless
Document2 pages
Serverless
Preeti
No ratings yet
Angular 2 Game of States Course
Document2 pages
Angular 2 Game of States Course
Xvy Xyv
No ratings yet
Tensor Flow
Document2 pages
Tensor Flow
Ayush Garg
No ratings yet
Grail
Document23 pages
Grail
Ashish Gupta
No ratings yet
Continuous Integration
Document1 page
Continuous Integration
saranaji
No ratings yet
DC - Os
Document3 pages
DC - Os
Krishna Chivukula
No ratings yet
Gradle Hello or Gradle - Q Hello
Document3 pages
Gradle Hello or Gradle - Q Hello
ECE A
No ratings yet
Context
Document4 pages
Context
roy.scar2196
No ratings yet
SAP ABAP Data Dictionary
Document1 page
SAP ABAP Data Dictionary
Seyed Billalgani
No ratings yet
Web User Interface Design Techniques
Document5 pages
Web User Interface Design Techniques
SWAGAT SHAW
100% (2)
Continuous Integration 2
Document1 page
Continuous Integration 2
AniKet B
No ratings yet
Angular 2 Game of State
Document1 page
Angular 2 Game of State
Surajbhan Singh
0% (1)
Prequel 2
Document2 pages
Prequel 2
Senthil Lakshmi
No ratings yet
Powershell
Document3 pages
Powershell
Udit
57% (7)
Nodejs Mock Test III
Document6 pages
Nodejs Mock Test III
ramanareddy
No ratings yet
Unstructured Data Classification
Document2 pages
Unstructured Data Classification
Ayush Garg
No ratings yet
Spark Preliminaries
Document4 pages
Spark Preliminaries
Mehul Chavada
0% (1)
Onsen UI - Course Introduction
Document19 pages
Onsen UI - Course Introduction
Mahesh VP
No ratings yet
AngularJS Packaging and Testing (1) - 1
Document2 pages
AngularJS Packaging and Testing (1) - 1
amam
0% (1)
Lecture Notes Day 2 Demo 18 Oracle Multi-Tenant Database Architecture - Create A New Pluggable Database Named Gsis
Document6 pages
Lecture Notes Day 2 Demo 18 Oracle Multi-Tenant Database Architecture - Create A New Pluggable Database Named Gsis
hjsbgjsjgdf
No ratings yet
Class Running Notes 10th To 18th August
Document29 pages
Class Running Notes 10th To 18th August
sanjay
No ratings yet
Enquizit Inc.: Integrating Shibboleth and AWS (Runbook)
Document12 pages
Enquizit Inc.: Integrating Shibboleth and AWS (Runbook)
maham sabir
No ratings yet
Oracle 12c Software and Database Installation GUI
Document48 pages
Oracle 12c Software and Database Installation GUI
chennam1
No ratings yet
DG 12c Setup Rac Phys Standby To Rac Prim
Document15 pages
DG 12c Setup Rac Phys Standby To Rac Prim
Piccola Tonia
No ratings yet
TANCET Previous Year Papers MCA 2016
Document18 pages
TANCET Previous Year Papers MCA 2016
Mani Kandan
No ratings yet
Harmony iPC - HMIPWC7D0E01
Document6 pages
Harmony iPC - HMIPWC7D0E01
Elvis Silva
No ratings yet
ReleaseNote FileList of FX505DD 19H2 64 V1.02
Document6 pages
ReleaseNote FileList of FX505DD 19H2 64 V1.02
Pulkit P. Patel
No ratings yet
WiFi 6 For Dummies Extreme Networks 2nd Special Edition
Document100 pages
WiFi 6 For Dummies Extreme Networks 2nd Special Edition
Sandro Melo
100% (1)
Jmeter Setup: Directory Structure
Document37 pages
Jmeter Setup: Directory Structure
Teena Kamra
No ratings yet
Biovis mp2000
Document2 pages
Biovis mp2000
Ersan Gönül
No ratings yet
Getting Started With AnyDesk-1
Document7 pages
Getting Started With AnyDesk-1
akeem alturki
No ratings yet
User Interfaces
Document4 pages
User Interfaces
NataliaRzonsowska
No ratings yet
ACS380, 0.37 To 22 KW: ABB Drives For Cranes
Document2 pages
ACS380, 0.37 To 22 KW: ABB Drives For Cranes
Dev Shah
No ratings yet
Chemitek - Sonda Tlenu - S423.C.OPT - Rejestry Modbus PDF
Document2 pages
Chemitek - Sonda Tlenu - S423.C.OPT - Rejestry Modbus PDF
Michal Szyja
No ratings yet
Design of Smart Mirror Based On Raspberry Pi
Document5 pages
Design of Smart Mirror Based On Raspberry Pi
علو الدوري
No ratings yet
Hipath Daks 2.0: Digital Alarm and Communication Server
Document20 pages
Hipath Daks 2.0: Digital Alarm and Communication Server
daniel.bpm
No ratings yet
Wireless Lab Manual
Document20 pages
Wireless Lab Manual
Ashutosh Gupta
No ratings yet
Model Name Part Number Segment CPU GPU
Document3 pages
Model Name Part Number Segment CPU GPU
MucHtarAl-anwarNagkPopay
No ratings yet
Women Security - Online Complaint & SMS Alert Based Android App
Document6 pages
Women Security - Online Complaint & SMS Alert Based Android App
Priya Singh
No ratings yet
Seminar Final Report
Document26 pages
Seminar Final Report
Hrishita singh
No ratings yet
Mod002702 Software Implementation 2021-22 Spring Trimester Multiphased Coursework
Document3 pages
Mod002702 Software Implementation 2021-22 Spring Trimester Multiphased Coursework
Saran
No ratings yet
Ats Test Taker Guide India 2022-23
Document10 pages
Ats Test Taker Guide India 2022-23
krishnabiju
No ratings yet
Retail Store Automation Database System
Document15 pages
Retail Store Automation Database System
Sakshi Khurana
0% (2)
CEHday 1
Document121 pages
CEHday 1
Yo Yo
No ratings yet
Pipelining: Advanced Computer Architecture
Document30 pages
Pipelining: Advanced Computer Architecture
Shinisg Vava
100% (1)
Fundamentals of Computer and Tally Important Questions-1: Part - A
Document3 pages
Fundamentals of Computer and Tally Important Questions-1: Part - A
Shamim Akhter
No ratings yet
Lecture Slide of Expression in C Program
Document51 pages
Lecture Slide of Expression in C Program
Min Yoonti
No ratings yet
Vynamic View ProAgent JRE Operation Manual 5.3.3
Document39 pages
Vynamic View ProAgent JRE Operation Manual 5.3.3
pratama johan
No ratings yet
3.7 Digital Input Module SM 321 DI 32 X DC 24 V (6ES7321-1BL00-0AA0)
Document3 pages
3.7 Digital Input Module SM 321 DI 32 X DC 24 V (6ES7321-1BL00-0AA0)
LanreSK
No ratings yet
CS 2 Syllabus
Document13 pages
CS 2 Syllabus
Mae Aura
No ratings yet
DavidDaftar Akun VPN Tunnel - My.id
Document2 pages
DavidDaftar Akun VPN Tunnel - My.id
Jhon Alfa
No ratings yet
16dep20f2001-Lab Introduction Dee20033
Document4 pages
16dep20f2001-Lab Introduction Dee20033
asyhmf
No ratings yet
Accounting Information Systems Research Over The Past Decade - Pas PDF
Document50 pages
Accounting Information Systems Research Over The Past Decade - Pas PDF
Niah Winnie Dayrit
No ratings yet
Panw 5 Best IoT Security Solutions
Document16 pages
Panw 5 Best IoT Security Solutions
w.melo
No ratings yet

Spark SQL Hands - On

Uploaded by

pavan kumar

0% found this document useful (0 votes)

3K views3 pages

Original Description:

Original Title

Spark SQL Hands_On

Copyright

Available Formats

TXT, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as TXT, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as txt, pdf, or txt

0% found this document useful (0 votes)

3K views3 pages

Spark SQL Hands - On

Uploaded by

pavan kumar

Copyright:

Available Formats

Download as TXT, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as txt, pdf, or txt

Jump to Page

You are on page 1of 3

Search inside document

Create sqlContext

---------------------
We need to create sqlContext in windows but in cloudera it is available like
sqlContext

In Windows,
scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)

Type and check the value

scala> sqlContext

For this Handson connect to Cloudera 5.4 and above

----------------------------------------
-----------------------------------------
1. Hive Query Execution - on text file
----------------------------------------
----------------------------------------
No need of hiveContext in Cloudera Spark
Queries are expressed in HiveQL

Hive_site.xml should be placed in Spark Conf directory

cloudera> sudo -i
cloudera> cp /etc/hive/conf/hive-site.xml /etc/spark/conf

Dataset: A text file with each containing information about a person

It is a text file.

A. Create a Hive table from Spark

--------------------------------------
scala> sqlContext.sql("CREATE TABLE people_table (FIRST STRING, MIDDLE STRING, LAST
STRING,GENDER STRING, BDATE STRING,SALARY DOUBLE, SSN STRING) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ':' STORED AS TEXTFILE")

Check the table created in Hive:

The returned DataFrame has two columns: tableName and isTemporary

scala> sqlContext.sql("show tables").collect().foreach(println);

Check in hadoop file system:

cloudera> hadoop fs -ls /user/hive/warehouse

B. Load data into Hive table

------------------------------

scala> sqlContext.sql("LOAD DATA LOCAL INPATH '/home/cloudera/people.txt' INTO

TABLE people_table")

The results of SQL queries are themselves RDDs and support all normal RDD functions

C. Query on Hive table - To fetch female whose salary is lesser than 2 lakhs
------------------------------------------------------------------------------
scala> val resultdf = sqlContext.sql("FROM people_table SELECT FIRST,SALARY,SSN
WHERE GENDER='F' and SALARY < 200000 LIMIT 10")

To see a sample of 20 records

scala> resultdf.show()

To get the count of records

scala> resultdf.count()

To see the schema in treeformat

scala> resultdf.printSchema()

-------------------------------------
2. Load a JSON file and perform Join
-------------------------------------
Dataset: A json with each containing information about department of a person and a
json with each containing information about people
File name: department.json , people.json

Loading Json File

-----------------
(Cloudera 5.4/Spark 1.3)
scala> val deptdf = sqlContext.jsonFile("file:/home/cloudera/department.json")

(In Spark 1.5/cloudera 5.5 and above)

scala> val deptdf = sqlContext.read.json("file:/home/cloudera/department.json")

scala> deptdf.printSchema()

Verify the data:

scala> deptdf.select("ssn","dept").show()

Loading 2nd Json File

---------------------
Ver 1.3:
scala> val ppldf = sqlContext.jsonFile("file:/home/cloudera/people.json")

Ver 1.5 and above:

scala> val ppldf = sqlContext.read.json("file:/home/cloudera/people.json")

scala> ppldf.printSchema()

Join this json df with the dept json df

scala> val joinresult = deptdf.join(ppldf, deptdf("ssn") === ppldf("ssn"))

scala> joinresult.select("name", "city","dept").show()

See the explain plan

scala> joinresult.explain(true)
Now cache the first json and then join and check the performance
scala> deptdf.cache()

Again join and see the performance

scala> val joinresult = deptdf.join(ppldf, deptdf("ssn") === ppldf("ssn"))

scala> joinresult.select("name", "city","dept").show()

A. To find the no of female doctors in each city:

--------------------------------------------------
Now, register the DF as temporary table and do SQL query on it

scala> joinresult.registerTempTable("people_dept")
scala> val fdocdf= sqlContext.sql("SELECT city, count(*) as cnt FROM people_dept
WHERE gender = 'F' and dept = 'Doctor' GROUP BY city ORDER BY cnt DESC LIMIT 10")
scala> fdocdf.show()

B. Save the output as text file

------------------------------------
First convert the df to RDD.
scala> val fdocrdd = fdocdf.rdd

In command line, delete the path (if available)

cloudera> rm -r /home/cloudera/sparkout/jsonout

scala> fdocrdd.saveAsTextFile("file:/home/cloudera/sparkout/jsonout")

Check the saved file in filesystem:

cloudera> ls sparkout/jsonout
cloudera> cat sparkout/jsonout/part-00000

Architect
Document11 pages
Architect
Sathish Kumar
100% (1)
Java 8 Innards Date and Time API
Document2 pages
Java 8 Innards Date and Time API
M.S.V.PAVAN KUMAR
33% (3)
Spring Boot - API Cantabile
Document3 pages
Spring Boot - API Cantabile
Amar Fulwade
No ratings yet
Unstructtured Data Classification Fresco
Document4 pages
Unstructtured Data Classification Fresco
sujesh
100% (1)
Docker Swarm
Document6 pages
Docker Swarm
Santhosh
0% (1)
CSS3 Hands-On &MCQ
Document14 pages
CSS3 Hands-On &MCQ
Stark
No ratings yet
Prodigious Git Handson
Document2 pages
Prodigious Git Handson
Tejas Gupta
50% (4)
Azure Essentials Continuum
Document9 pages
Azure Essentials Continuum
kashyap
No ratings yet
Artillery Handson
Document5 pages
Artillery Handson
Senthil Lakshmi
No ratings yet
Nodejs TCS Fresco Play
Document1 page
Nodejs TCS Fresco Play
raviteja naidu
80% (5)
Datascience Quiz
Document3 pages
Datascience Quiz
Liam D. Garillo
33% (3)
Wings1 T1 ReactJS Application (62636)
Document5 pages
Wings1 T1 ReactJS Application (62636)
Anusia Sharma
No ratings yet
Py Spark Final
Document1 page
Py Spark Final
roy.scar2196
No ratings yet
Bootstrap Handson
Document5 pages
Bootstrap Handson
Anzal Malik
0% (1)
HTML Final Assessment
Document5 pages
HTML Final Assessment
Karanjyot Chadha
50% (2)
Azure
Document3 pages
Azure
Vedant Kumar
0% (1)
This Study Resource Was
Document5 pages
This Study Resource Was
MALLUPEDDI SAI LOHITH MALLUPEDDI SAI LOHITH
No ratings yet
Scala Constructs: Concepts of Functional Programming
Document21 pages
Scala Constructs: Concepts of Functional Programming
Rahul S.Kumar
No ratings yet
Selenium HandsOn
Document2 pages
Selenium HandsOn
aman kumar
100% (1)
SR No Category Sub Category Course Name Enable / Disable D Hands On? Yes/No Handson Detail
Document3 pages
SR No Category Sub Category Course Name Enable / Disable D Hands On? Yes/No Handson Detail
Sourav Mallick
No ratings yet
Python List Handson 1
Document2 pages
Python List Handson 1
sanskriti
No ratings yet
This Study Resource Was
Document3 pages
This Study Resource Was
M.S.V.PAVAN KUMAR
No ratings yet
Spark Streaming
Document3 pages
Spark Streaming
sathyanarayanan o
100% (1)
Handlebars
Document5 pages
Handlebars
sathyanarayanan o
No ratings yet
Python Funstinos and OOPS
Document7 pages
Python Funstinos and OOPS
yipemet
No ratings yet
R Handson
Document3 pages
R Handson
vinodbabu24
100% (3)
Angular 2 Building Blocks Handson
Document9 pages
Angular 2 Building Blocks Handson
SHIVAM SHRIVASTAVA
0% (1)
Ansible - Automation Sibelius
Document10 pages
Ansible - Automation Sibelius
AniKet B
0% (1)
Kafka - Premiera Ola
Document5 pages
Kafka - Premiera Ola
sathyanarayanan o
No ratings yet
ECMAScript6 Handson
Document2 pages
ECMAScript6 Handson
Rutuja Borhade
100% (1)
Neural Networks and Deep Learning
Document3 pages
Neural Networks and Deep Learning
AniKet B
100% (1)
Nightwatch Respuestas
Document5 pages
Nightwatch Respuestas
Mr. O
100% (2)
Props
Document1 page
Props
Venkatesh Babu
0% (3)
Spark Streaming - Malay
Document1 page
Spark Streaming - Malay
Mahesh VP
100% (1)
Gradle Resp
Document4 pages
Gradle Resp
IgorJales
No ratings yet
E1 Fresco Prob3 Correct
Document1 page
E1 Fresco Prob3 Correct
Kaushik Raj
No ratings yet
Serverless
Document2 pages
Serverless
Preeti
No ratings yet
Angular 2 Game of States Course
Document2 pages
Angular 2 Game of States Course
Xvy Xyv
No ratings yet
Tensor Flow
Document2 pages
Tensor Flow
Ayush Garg
No ratings yet
Grail
Document23 pages
Grail
Ashish Gupta
No ratings yet
Continuous Integration
Document1 page
Continuous Integration
saranaji
No ratings yet
DC - Os
Document3 pages
DC - Os
Krishna Chivukula
No ratings yet
Gradle Hello or Gradle - Q Hello
Document3 pages
Gradle Hello or Gradle - Q Hello
ECE A
No ratings yet
Context
Document4 pages
Context
roy.scar2196
No ratings yet
SAP ABAP Data Dictionary
Document1 page
SAP ABAP Data Dictionary
Seyed Billalgani
No ratings yet
Web User Interface Design Techniques
Document5 pages
Web User Interface Design Techniques
SWAGAT SHAW
100% (2)
Continuous Integration 2
Document1 page
Continuous Integration 2
AniKet B
No ratings yet
Angular 2 Game of State
Document1 page
Angular 2 Game of State
Surajbhan Singh
0% (1)
Prequel 2
Document2 pages
Prequel 2
Senthil Lakshmi
No ratings yet
Powershell
Document3 pages
Powershell
Udit
57% (7)
Nodejs Mock Test III
Document6 pages
Nodejs Mock Test III
ramanareddy
No ratings yet
Unstructured Data Classification
Document2 pages
Unstructured Data Classification
Ayush Garg
No ratings yet
Spark Preliminaries
Document4 pages
Spark Preliminaries
Mehul Chavada
0% (1)
Onsen UI - Course Introduction
Document19 pages
Onsen UI - Course Introduction
Mahesh VP
No ratings yet
AngularJS Packaging and Testing (1) - 1
Document2 pages
AngularJS Packaging and Testing (1) - 1
amam
0% (1)
Lecture Notes Day 2 Demo 18 Oracle Multi-Tenant Database Architecture - Create A New Pluggable Database Named Gsis
Document6 pages
Lecture Notes Day 2 Demo 18 Oracle Multi-Tenant Database Architecture - Create A New Pluggable Database Named Gsis
hjsbgjsjgdf
No ratings yet
Class Running Notes 10th To 18th August
Document29 pages
Class Running Notes 10th To 18th August
sanjay
No ratings yet
Enquizit Inc.: Integrating Shibboleth and AWS (Runbook)
Document12 pages
Enquizit Inc.: Integrating Shibboleth and AWS (Runbook)
maham sabir
No ratings yet
Oracle 12c Software and Database Installation GUI
Document48 pages
Oracle 12c Software and Database Installation GUI
chennam1
No ratings yet
DG 12c Setup Rac Phys Standby To Rac Prim
Document15 pages
DG 12c Setup Rac Phys Standby To Rac Prim
Piccola Tonia
No ratings yet
TANCET Previous Year Papers MCA 2016
Document18 pages
TANCET Previous Year Papers MCA 2016
Mani Kandan
No ratings yet
Harmony iPC - HMIPWC7D0E01
Document6 pages
Harmony iPC - HMIPWC7D0E01
Elvis Silva
No ratings yet
ReleaseNote FileList of FX505DD 19H2 64 V1.02
Document6 pages
ReleaseNote FileList of FX505DD 19H2 64 V1.02
Pulkit P. Patel
No ratings yet
WiFi 6 For Dummies Extreme Networks 2nd Special Edition
Document100 pages
WiFi 6 For Dummies Extreme Networks 2nd Special Edition
Sandro Melo
100% (1)
Jmeter Setup: Directory Structure
Document37 pages
Jmeter Setup: Directory Structure
Teena Kamra
No ratings yet
Biovis mp2000
Document2 pages
Biovis mp2000
Ersan Gönül
No ratings yet
Getting Started With AnyDesk-1
Document7 pages
Getting Started With AnyDesk-1
akeem alturki
No ratings yet
User Interfaces
Document4 pages
User Interfaces
NataliaRzonsowska
No ratings yet
ACS380, 0.37 To 22 KW: ABB Drives For Cranes
Document2 pages
ACS380, 0.37 To 22 KW: ABB Drives For Cranes
Dev Shah
No ratings yet
Chemitek - Sonda Tlenu - S423.C.OPT - Rejestry Modbus PDF
Document2 pages
Chemitek - Sonda Tlenu - S423.C.OPT - Rejestry Modbus PDF
Michal Szyja
No ratings yet
Design of Smart Mirror Based On Raspberry Pi
Document5 pages
Design of Smart Mirror Based On Raspberry Pi
علو الدوري
No ratings yet
Hipath Daks 2.0: Digital Alarm and Communication Server
Document20 pages
Hipath Daks 2.0: Digital Alarm and Communication Server
daniel.bpm
No ratings yet
Wireless Lab Manual
Document20 pages
Wireless Lab Manual
Ashutosh Gupta
No ratings yet
Model Name Part Number Segment CPU GPU
Document3 pages
Model Name Part Number Segment CPU GPU
MucHtarAl-anwarNagkPopay
No ratings yet
Women Security - Online Complaint & SMS Alert Based Android App
Document6 pages
Women Security - Online Complaint & SMS Alert Based Android App
Priya Singh
No ratings yet
Seminar Final Report
Document26 pages
Seminar Final Report
Hrishita singh
No ratings yet
Mod002702 Software Implementation 2021-22 Spring Trimester Multiphased Coursework
Document3 pages
Mod002702 Software Implementation 2021-22 Spring Trimester Multiphased Coursework
Saran
No ratings yet
Ats Test Taker Guide India 2022-23
Document10 pages
Ats Test Taker Guide India 2022-23
krishnabiju
No ratings yet
Retail Store Automation Database System
Document15 pages
Retail Store Automation Database System
Sakshi Khurana
0% (2)
CEHday 1
Document121 pages
CEHday 1
Yo Yo
No ratings yet
Pipelining: Advanced Computer Architecture
Document30 pages
Pipelining: Advanced Computer Architecture
Shinisg Vava
100% (1)
Fundamentals of Computer and Tally Important Questions-1: Part - A
Document3 pages
Fundamentals of Computer and Tally Important Questions-1: Part - A
Shamim Akhter
No ratings yet
Lecture Slide of Expression in C Program
Document51 pages
Lecture Slide of Expression in C Program
Min Yoonti
No ratings yet
Vynamic View ProAgent JRE Operation Manual 5.3.3
Document39 pages
Vynamic View ProAgent JRE Operation Manual 5.3.3
pratama johan
No ratings yet
3.7 Digital Input Module SM 321 DI 32 X DC 24 V (6ES7321-1BL00-0AA0)
Document3 pages
3.7 Digital Input Module SM 321 DI 32 X DC 24 V (6ES7321-1BL00-0AA0)
LanreSK
No ratings yet
CS 2 Syllabus
Document13 pages
CS 2 Syllabus
Mae Aura
No ratings yet
DavidDaftar Akun VPN Tunnel - My.id
Document2 pages
DavidDaftar Akun VPN Tunnel - My.id
Jhon Alfa
No ratings yet
16dep20f2001-Lab Introduction Dee20033
Document4 pages
16dep20f2001-Lab Introduction Dee20033
asyhmf
No ratings yet
Accounting Information Systems Research Over The Past Decade - Pas PDF
Document50 pages
Accounting Information Systems Research Over The Past Decade - Pas PDF
Niah Winnie Dayrit
No ratings yet
Panw 5 Best IoT Security Solutions
Document16 pages
Panw 5 Best IoT Security Solutions
w.melo
No ratings yet

Spark SQL Hands - On

Uploaded by

Copyright:

Available Formats

You might also like

Spark SQL Hands - On

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Spark SQL Hands - On

Uploaded by

Copyright:

Available Formats

Create sqlContext

Type and check the value

For this Handson connect to Cloudera 5.4 and above

Hive_site.xml should be placed in Spark Conf directory

Dataset: A text file with each containing information about a person

A. Create a Hive table from Spark

Check the table created in Hive:

scala> sqlContext.sql("show tables").collect().foreach(println);

Check in hadoop file system:

cloudera> hadoop fs -ls /user/hive/warehouse

B. Load data into Hive table

scala> sqlContext.sql("LOAD DATA LOCAL INPATH '/home/cloudera/people.txt' INTO

To see a sample of 20 records

To get the count of records

To see the schema in treeformat

Loading Json File

(In Spark 1.5/cloudera 5.5 and above)

Verify the data:

Loading 2nd Json File

Ver 1.5 and above:

Join this json df with the dept json df

scala> val joinresult = deptdf.join(ppldf, deptdf("ssn") === ppldf("ssn"))

See the explain plan

Again join and see the performance

scala> val joinresult = deptdf.join(ppldf, deptdf("ssn") === ppldf("ssn"))

A. To find the no of female doctors in each city:

B. Save the output as text file

In command line, delete the path (if available)

Check the saved file in filesystem:

You might also like