Welcome to Scribd!

Lab 2 - Data Preparation

Uploaded by

0% found this document useful (0 votes)

15 views3 pages

The document discusses preparing data in Apache Spark. It covers steps to install Spark, load a CSV data file, examine the data schema and types, select columns, filter rows, handle null values, and describe statistics of the data. The goal is to clean and prepare the telecom usage data for further analysis and modeling.

Original Description:

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

15 views3 pages

Lab 2 - Data Preparation

Uploaded by

Muhammad Rafli

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 3

Search inside document

1/18/2021 Lab 2 - Data Preparation 1.

ipynb - Colaboratory

Lab 2 - Data Preparation 1

#1. Install Apache Spark

!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q https://downloads.apache.org/spark/spark-2.4.7/spark-2.4.7-bin-hadoop2.6.tgz
!tar xf spark-2.4.7-bin-hadoop2.6.tgz
!pip install -q findspark

#2. Setting environment variable

import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.4.7-bin-hadoop2.6"

#3. Inisiasi spark

import findspark
findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()

#4. Upload file

from google.colab import files
!rm data_telepon_seluler.csv
files.upload()

#5. Load data

from pyspark.ml.feature import VectorAssembler
from pyspark.ml.regression import LinearRegression
dataset = spark.read.csv('data_telepon_seluler.csv',inferSchema=True, header =True, sep=",
dataset.printSchema()

#6. Menampilkan data

dataset.show() #20 data pertama
#dataset.head() #5 data pertama
#dataset.first()#1 data pertama
#dataset.head(10) # 10 data pertama

#7. Cek tipe data

type(dataset)

#8. Menampilkan data

#collect data + metadata
dataset.select('*').collect()
dataset.select('provinsi', '2012').collect()
#show data saja
https://colab.research.google.com/drive/1fJ3YNoDYuQvEV8bcdd5xTkacPysJvFXb#scrollTo=TLrIqNPgcz4B&printMode=true 1/3
1/18/2021 Lab 2 - Data Preparation 1.ipynb - Colaboratory

#collect data + metadata

dataset.select('*').collect()
dataset.select('provinsi', '2012').collect()
#show data saja
dataset.select('*').show()
dataset.select('provinsi', '2012').show()
#take data + metadata sebagian data
dataset.select('*').take(5)
dataset.select('provinsi', '2012').take(5)

#9. Cek tipe data kolom

dataset.select('provinsi')

#10. Distinct
dataset.select('provinsi', '2012').distinct().show()

#11. Menampilkan daftar kolom

dataset.columns

#12. Menampilkan data

dataset.select(dataset.columns[0:3]).show()

#13. Menampilkan data

dataset.show(2,truncate= True)
X = dataset.collect()[0]['2014']
X = dataset.collect()[0][3]

#14. Menampilkan sebagian data

selected_columns = ["provinsi", "kode_wilayah", "2012"]
subset_df_2 = dataset.select(selected_columns[0],selected_columns[1],selected_columns[2])
subset_df_2.head()

#15. Filtering
dataset.filter("provinsi = 'DI YOGYAKARTA'")
dataset.filter("provinsi in ('DI YOGYAKARTA')")

#16. Menampilkan data null

dataset.where(dataset["2012"].isNull()).show()
dataset.where(dataset["2012"].isNotNull()).show(999)

#17. Menampilkan struktur data

https://colab.research.google.com/drive/1fJ3YNoDYuQvEV8bcdd5xTkacPysJvFXb#scrollTo=TLrIqNPgcz4B&printMode=true 2/3
1/18/2021 Lab 2 - Data Preparation 1.ipynb - Colaboratory
p
print((dataset.count(), len(dataset.columns)))

#18. Menampilkan rangkuman data

dataset.describe().show()
dataset.describe("2012").show()

#19. Mengganti tipe kolom

dataset.createOrReplaceTempView("tmpprov")
df4 = spark.sql("SELECT provinsi, int('2012'),int('2013'),int('2014') from tmpprov")
dataset.printSchema()
df4.printSchema()

Copy protected with Online-PDF-No-Copy.com

https://colab.research.google.com/drive/1fJ3YNoDYuQvEV8bcdd5xTkacPysJvFXb#scrollTo=TLrIqNPgcz4B&printMode=true 3/3

7 - Introduce Tekla Open API PDF
Document34 pages
7 - Introduce Tekla Open API PDF
Shawkat Ali
100% (2)
How To Scrap Any Website's Content Using Scrapy
Document20 pages
How To Scrap Any Website's Content Using Scrapy
Anton Rifco
0% (1)
Training Report
Document69 pages
Training Report
shyam rana
59% (22)
Blackcoffeee Assignment Solution
Document9 pages
Blackcoffeee Assignment Solution
Nandani Vyas
No ratings yet
How To Work With AngularJS As Frontend and PHP MYSQL As Backend - W3SCHOOL
Document7 pages
How To Work With AngularJS As Frontend and PHP MYSQL As Backend - W3SCHOOL
darlington
No ratings yet
Module 5 Assignment Java (MCA)
Document16 pages
Module 5 Assignment Java (MCA)
TCS110-Riya Singh
No ratings yet
Inserting Data Into Datbase s9
Document6 pages
Inserting Data Into Datbase s9
subhabirajdar
No ratings yet
P1 - Pengenalan R Untuk Data Spasial (RA) PDF
Document39 pages
P1 - Pengenalan R Untuk Data Spasial (RA) PDF
syahrir83
No ratings yet
Getting Start On Rapid Web: System Requirements
Document8 pages
Getting Start On Rapid Web: System Requirements
samas1740
No ratings yet
Advance Java Lab Programs
Document16 pages
Advance Java Lab Programs
Aniket Nsc0025
No ratings yet
How To Create Chart Using Codeigniter and Morris
Document22 pages
How To Create Chart Using Codeigniter and Morris
Abdul Syukur
No ratings yet
Presented To: Prepared By: PGT Computer Science Xii Science
Document9 pages
Presented To: Prepared By: PGT Computer Science Xii Science
Shubham Tyagi
No ratings yet
Ant Script
Document123 pages
Ant Script
Ravi Kiran
No ratings yet
25 Awesome Python Scripts
Document26 pages
25 Awesome Python Scripts
moises tinte
No ratings yet
Modernizr-2 6 2
Document26 pages
Modernizr-2 6 2
Omar Rabeh
No ratings yet
Web Scraping in Python: 1. Use Scrapy Via Python. HTML: 2. Attributes: // All
Document5 pages
Web Scraping in Python: 1. Use Scrapy Via Python. HTML: 2. Attributes: // All
MAHMUDA AKTER KEYA
No ratings yet
Bigger: Drupal + Mongo
Document33 pages
Bigger: Drupal + Mongo
Vu Duy Khanh
No ratings yet
Flutter Project Code - 10 - 1
Document18 pages
Flutter Project Code - 10 - 1
thorwithstrombraker
No ratings yet
Output2
Document2 pages
Output2
Laptop-Dimas-249
No ratings yet
cs3362 Foundations of Data Science Lab Manual
Document53 pages
cs3362 Foundations of Data Science Lab Manual
thilakraj.a0321
No ratings yet
Faculty of Department Of: Analysis Using R
Document22 pages
Faculty of Department Of: Analysis Using R
HƯƠNG NGUYỄN LÊ NGỌC
No ratings yet
Weather Forecasting
Document5 pages
Weather Forecasting
ahmed salem
No ratings yet
SESION 10 (Pandas 2)
Document120 pages
SESION 10 (Pandas 2)
2marlenehh2003
No ratings yet
Aim: To Write A Program in XML For Creation of DTD Which Specifies Set of Rules. Script: Note - XML
Document8 pages
Aim: To Write A Program in XML For Creation of DTD Which Specifies Set of Rules. Script: Note - XML
vohib80009
No ratings yet
Big Data Lab Material
Document45 pages
Big Data Lab Material
Gaurav Nagar
No ratings yet
Angular
Document5 pages
Angular
minostalgi
No ratings yet
Ajax Drop Down Selection Data Load With PHP Amp MySQL
Document4 pages
Ajax Drop Down Selection Data Load With PHP Amp MySQL
dimitrisand
No ratings yet
Pertemuan 11 Aplikasi Web Dengan Servlet Dan JSP: Matakuliah: T0053/Web Programming Tahun: 2006 Versi: 2
Document19 pages
Pertemuan 11 Aplikasi Web Dengan Servlet Dan JSP: Matakuliah: T0053/Web Programming Tahun: 2006 Versi: 2
Fikri Fikri
No ratings yet
PHP AngularJS CRUD With Search and Pagination Example From Scratch
Document18 pages
PHP AngularJS CRUD With Search and Pagination Example From Scratch
Zaiful Bahri
No ratings yet
Database Language Bindings: Java (Mysql)
Document7 pages
Database Language Bindings: Java (Mysql)
Brian LeGrand
No ratings yet
FRM Ed 2 Dynamic Addition
Document200 pages
FRM Ed 2 Dynamic Addition
saurabh
No ratings yet
1.0 Android Autocompletetextview With Database Video Demo
Document9 pages
1.0 Android Autocompletetextview With Database Video Demo
Pradeep Bhilare
No ratings yet
Android Studio
Document10 pages
Android Studio
Samuel Álvarez
No ratings yet
Assignment: Department of Computer Science & Engineering ST Joseph Engineering College, Mangaluru-575028
Document28 pages
Assignment: Department of Computer Science & Engineering ST Joseph Engineering College, Mangaluru-575028
Akul Vinod
No ratings yet
Lec4 Android
Document39 pages
Lec4 Android
Ganesan San
No ratings yet
Spring Boot 2 PDF
Document27 pages
Spring Boot 2 PDF
Ariel Cupertino
100% (1)
Csharp Sqlite
Document8 pages
Csharp Sqlite
Zheng Jun
No ratings yet
Create A Simple Stats Plugin For Your Site
Document8 pages
Create A Simple Stats Plugin For Your Site
Morne Zeelie
No ratings yet
Ingles Areas Curriculares
Document75 pages
Ingles Areas Curriculares
Ana Alonso Mamolar
No ratings yet
How To Upload Image in Folder and Path in Database Using Servlet, JSP %
Document4 pages
How To Upload Image in Folder and Path in Database Using Servlet, JSP %
Yanno Dwi Ananda
50% (2)
2 Testing A Component
Document6 pages
2 Testing A Component
Aman Ali
No ratings yet
Pygments Options With Minted: 1 Default
Document24 pages
Pygments Options With Minted: 1 Default
Alex Mandel
No ratings yet
AngularJs Ngresource SlimPhp
Document10 pages
AngularJs Ngresource SlimPhp
Oscar Beltrán Gómez
No ratings yet
Spark
Document1 page
Spark
Josue Rueda Garcia
No ratings yet
CRUD Without Reload Page Using Ajax and Codeigniter
Document26 pages
CRUD Without Reload Page Using Ajax and Codeigniter
Abdul Syukur
No ratings yet
Creating A Dynamic Poll With Jquery and PHP
Document8 pages
Creating A Dynamic Poll With Jquery and PHP
Qiqi Abaziz
No ratings yet
Accessing GPS Information On Your Android Phone
Document10 pages
Accessing GPS Information On Your Android Phone
Jose
No ratings yet
Fresco 2 Play Coding Answers
Document31 pages
Fresco 2 Play Coding Answers
dashingknight90
No ratings yet
EM622 Data Analysis and Visualization Techniques For Decision-Making
Document47 pages
EM622 Data Analysis and Visualization Techniques For Decision-Making
Ridhi B
No ratings yet
MongoDB - Commands - Basic
Document3 pages
MongoDB - Commands - Basic
manjunath61
No ratings yet
PMG Panchayat Doc Script
Document4 pages
PMG Panchayat Doc Script
AnilKumar
No ratings yet
Tugas Praktikum PBB
Document13 pages
Tugas Praktikum PBB
RAYN
No ratings yet
Dbhelper
Document2 pages
Dbhelper
Anuj Kumar Singh
No ratings yet
MS SQL Server StoreProc - Get Server Details
Document57 pages
MS SQL Server StoreProc - Get Server Details
arunkumarco
No ratings yet
Java Struts2 and Hibernate4 CRUD With MySQL With Pagination, Sorting and Export Option Using Netbeans
Document22 pages
Java Struts2 and Hibernate4 CRUD With MySQL With Pagination, Sorting and Export Option Using Netbeans
basman mail
No ratings yet
Python Mysql Tutorials
Document5 pages
Python Mysql Tutorials
utagore58
No ratings yet
ПрогрМобілПлатформ ЛР3 Колесніков
Document7 pages
ПрогрМобілПлатформ ЛР3 Колесніков
tancorko1
No ratings yet
AQI Project
Document25 pages
AQI Project
visheshkaushal8
No ratings yet
SQL Using Java
Document19 pages
SQL Using Java
chazeyoung21
No ratings yet
SBMS Part39
Document4 pages
SBMS Part39
rmaharana328
No ratings yet
Copy and Paste For Tampermonkey and VIolent Monkey
Document17 pages
Copy and Paste For Tampermonkey and VIolent Monkey
kelianmousset34
No ratings yet
How to a Developers Guide to 4k: Developer edition, #3
From Everand
How to a Developers Guide to 4k: Developer edition, #3
Xinc Cyberwizard
No ratings yet
What Is Device Enrollment
Document320 pages
What Is Device Enrollment
ion
100% (1)
Hospital Management System: Dept. of CSE, GECR
Document30 pages
Hospital Management System: Dept. of CSE, GECR
Younus Khan
No ratings yet
Types of Operating Systems - GeeksforGeeks
Document1 page
Types of Operating Systems - GeeksforGeeks
Befkadubirhanu7
No ratings yet
Assembly Language Assignment No 3 (BCSM-S19-055, BCSM-S19-015, BCSM - f18-079)
Document7 pages
Assembly Language Assignment No 3 (BCSM-S19-055, BCSM-S19-015, BCSM - f18-079)
Waqar Ghafoor
No ratings yet
J2EE & Weblogic - 0.1
Document12 pages
J2EE & Weblogic - 0.1
chavs
No ratings yet
Chap 2 - Discussion Questions
Document2 pages
Chap 2 - Discussion Questions
Trent Branch
100% (2)
Foodie Documentation
Document44 pages
Foodie Documentation
johnsteele298
No ratings yet
Limited Time Discount Offer! 15% Off - Ends in 00:34:46 - Use Discount Coupon Code A4T2023
Document3 pages
Limited Time Discount Offer! 15% Off - Ends in 00:34:46 - Use Discount Coupon Code A4T2023
Caner kutamış
No ratings yet
Inheritance Java
Document10 pages
Inheritance Java
ujjwol
100% (1)
DualSPHysics v3.0 GUIDE PDF
Document89 pages
DualSPHysics v3.0 GUIDE PDF
fjpd15
No ratings yet
Corejava 2
Document9 pages
Corejava 2
vanamgoutham
No ratings yet
Vue Essentials Cheat Sheet
Document2 pages
Vue Essentials Cheat Sheet
priyanshusingh6014
No ratings yet
21CS8133 Labassignment1
Document7 pages
21CS8133 Labassignment1
Sai Praneeth
No ratings yet
Classes Assignment
Document5 pages
Classes Assignment
Niti Arora
100% (1)
GBDK Manual
Document377 pages
GBDK Manual
Anonymous HPlNDhM6ej
No ratings yet
Technology Now Your Companion To Sam Computer Concepts 2nd Edition Corinne Hoisington Test Bank
Document18 pages
Technology Now Your Companion To Sam Computer Concepts 2nd Edition Corinne Hoisington Test Bank
KevinRiossknof
100% (14)
20486B 16
Document18 pages
20486B 16
sjcamcer
No ratings yet
Kundan Kumar
Document4 pages
Kundan Kumar
challenge2uttam
No ratings yet
Modules, Hierarchy Charts, and Documentation: After Studying Chapter 3, You Should Be Able To
Document36 pages
Modules, Hierarchy Charts, and Documentation: After Studying Chapter 3, You Should Be Able To
Kobina
No ratings yet
Interview Questions
Document8 pages
Interview Questions
naveendream111
No ratings yet
Microsoft: PL-200 Exam
Document5 pages
Microsoft: PL-200 Exam
kgothatso tshilambwana
No ratings yet
Oracle Application Express App Builder Users Guide
Document1,310 pages
Oracle Application Express App Builder Users Guide
edward cruz chavez
100% (1)
Review of C++ Programming: Sheng-Fang Huang
Document49 pages
Review of C++ Programming: Sheng-Fang Huang
Ifat Nix
No ratings yet
F5 Customer Demo: BIG-IP AFM - Use AFM in Network Firewall Mode
Document11 pages
F5 Customer Demo: BIG-IP AFM - Use AFM in Network Firewall Mode
Aries Raf Ondis
No ratings yet
Infix, Postfix and Prefix Evalution
Document8 pages
Infix, Postfix and Prefix Evalution
Hager Massoud
No ratings yet
Offer Letter
Document2 pages
Offer Letter
Srivastava Amit
No ratings yet
Python Day 1
Document18 pages
Python Day 1
Muthu Selvan
No ratings yet
Web Development With Django Cookbook Sample Chapter
Document40 pages
Web Development With Django Cookbook Sample Chapter
Packt Publishing
No ratings yet