Big Data Thouraya Hadj Hassen SIC

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

Hadj hassen thouraya 3SIC

1. Creating a VM:

2. Accessing the VM terminal on the port 4200:

3. Accessing the Ambari dashboard after resetting the


login and pwd to admin/admin
Hadj hassen thouraya 3SIC

4. Creating the data folder in /tmp/ and adding the


datasets “truck.csv” and “geolocation.csv”:

5. Entering the Hive interface and adding the


“Geolocation.csv”
Hadj hassen thouraya 3SIC

We have to tick the “is first row header?” for the data to
work properly

This a preview of each column and the type associated:


Hadj hassen thouraya 3SIC

we do the same with “truck.csv”)


(

6. Sample Data from the trucks table:

7. Beeline - Command Shell: Connect to Beeline hive


and Enter the beeline commands to grant all
permission access for “admin” user:
Hadj hassen thouraya 3SIC

Enter the beeline commands to view 10 rows from


foodmart database customer and account tables:

8. Create Table truckmileage From Existing Trucking


Data:
Hadj hassen thouraya 3SIC

9. Explore a sampling of the data in the truckmileage


table:

Saving the query used as “average-mpg”:


Hadj hassen thouraya 3SIC

10. Explore Explain Features of the Hive Query Editor:

11. Create Table avgmileage From Existing


trucks_mileage Data:
Hadj hassen thouraya 3SIC

12. Create Table DriverMileage from Existing


truckmileage data and view Sample Data of
Hadj hassen thouraya 3SIC

avgmileage: (forgot to take screenshot of the creation


of “drivemillage”)

13. Exporting drivemillage into .csv file:

14. Open Zeppelin interface using URL:


http://40.84.192.134:9995/Create a Spark2 notebook and
initiate the instance
Hadj hassen thouraya 3SIC

15. Read CSV Files into Apache Spark:


Hadj hassen thouraya 3SIC

16. Import CSV data into a data frame with a user


defined schema:
Hadj hassen thouraya 3SIC

17. Query Tables To Build Spark RDD:

18. Querying Against Registered Temporary Tables:


Hadj hassen thouraya 3SIC

19. Perform join Operation:


Hadj hassen thouraya 3SIC

20. Compute Driver Risk Factor:

21. Data Reporting With Zeppelin: Import the Data and


Visualize final results Data in Tabular Format:
Hadj hassen thouraya 3SIC

22. Build Charts using Zeppelin:

You might also like