Welcome to Scribd!

Skip carousel

Project 3 - Social Media

Uploaded by

kumar

0% found this document useful (0 votes)

26 views3 pages

Project 3

Original Title

Project 3_Social Media

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Project 3

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

26 views3 pages

Project 3 - Social Media

Uploaded by

kumar

Project 3

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 3

Search inside document

Big Data Hadoop—Real World

Project—Social Media

Page |0
Analyze data set from Stack Exchange

As part of a recruiting exercise, a major social media company asked candidates to analyze data set from
Stack Exchange.

You will be using the data set to arrive at certain key insights.

Download the data set from the following link:

The data set contains the following attributes:

qid: Unique question id

i: User id of questioner

qs: Score of the question

qt: Time of the question (in epoch time)

tags: a comma-separated list of the tags associated with the question. Examples of
tags are ``html'', ``R'', ``mysql'', ``python'', and so on; often between two and six tags are
used on each question.

qvc: Number of views of this question (at the time of the datadump)

qac: Number of answers for this question (at the time of the datadump)

aid: Unique answer id

j: User id of answerer

as: Score of the answer

at: Time of the answer (in epoch time)

You need to arrive at the following results:

- Top 10 most commonly used tags in this data set.
- Average time to answer questions.
- Number of questions which got answered within 1 hour.
- Tags of questions which got answered within 1 hour.

The total time expected to complete this task is 8 hours.

Hint: Use the techniques taught in lesson 03 to load data sets in HDFS. For using Pig/Hive, recall the
techniques mentioned in Lesson 07 and 08.

This project should be performed using CloudLab.

Data Science Life Cycle PDF
Document403 pages
Data Science Life Cycle PDF
ragh K
No ratings yet
Semantic Search On Stack Overflow
Document3 pages
Semantic Search On Stack Overflow
rohit
No ratings yet
Apache Spark-Real Time Project-Marketing Analysis
Document4 pages
Apache Spark-Real Time Project-Marketing Analysis
kumar
No ratings yet
Professional Data Engineer Demo
Document32 pages
Professional Data Engineer Demo
Domenico Conte
No ratings yet
DP-600-questions
Document10 pages
DP-600-questions
ali8z88186
No ratings yet
Assignment 4
Document5 pages
Assignment 4
Ahmed Haa
No ratings yet
Interview QuesTextJAVA
Document2 pages
Interview QuesTextJAVA
Utsav Galphat
No ratings yet
Note: Observe The Following: Task Should Be Reached by Allotted Time (Time Management Is Mandatory)
Document8 pages
Note: Observe The Following: Task Should Be Reached by Allotted Time (Time Management Is Mandatory)
sati
No ratings yet
Data Mining Thesis 2014 PDF
Document6 pages
Data Mining Thesis 2014 PDF
znpdasmef
100% (2)
Ip Xii 2024-25
Document4 pages
Ip Xii 2024-25
alphabeast239
No ratings yet
Lecture No. 04 Data Warehouse
Document9 pages
Lecture No. 04 Data Warehouse
usman
No ratings yet
Solved Big Data and Data Science Projects
Document85 pages
Solved Big Data and Data Science Projects
m shadrack
100% (1)
Big Data Storge Management
Document4 pages
Big Data Storge Management
V Harsha Shastri
No ratings yet
Cognizant Interview Process
Document9 pages
Cognizant Interview Process
sumit mali
No ratings yet
BD Project Document
Document3 pages
BD Project Document
طاهر محمد
No ratings yet
Syllabus
Document3 pages
Syllabus
Akshat 31
No ratings yet
Dart Dissertations
Document6 pages
Dart Dissertations
HelpInWritingPaperSingapore
100% (1)
Revised Informatics Practices Syllabus 2020-21
Document4 pages
Revised Informatics Practices Syllabus 2020-21
Tanishq Rai
No ratings yet
Real Time Data Processing Framework
Document16 pages
Real Time Data Processing Framework
AISHWARYA ANILKUMAR
No ratings yet
Data Science Interview Resources
Document12 pages
Data Science Interview Resources
Krutika Sapkal
No ratings yet
HACKING RST
Document6 pages
HACKING RST
Rich Stanley
No ratings yet
The Complete Collection of Data Science Cheatsheets KDnuggets
Document17 pages
The Complete Collection of Data Science Cheatsheets KDnuggets
gamajima2023
100% (1)
RN 2103213618 1 MT 690966-638031645126201096-Big-Data-Question
Document6 pages
RN 2103213618 1 MT 690966-638031645126201096-Big-Data-Question
Alex Munala
No ratings yet
Building Arduino Projects For The Internet of Things
Document5 pages
Building Arduino Projects For The Internet of Things
santhosh n prabhu
No ratings yet
Keynote Edbt2014 Boncz
Document76 pages
Keynote Edbt2014 Boncz
xu fei
No ratings yet
5 Data Enginnering Projefct
Document9 pages
5 Data Enginnering Projefct
Jose Luis Colmenares
No ratings yet
Servicenow Application Developer Exam New-Practice Test Set 2
Document30 pages
Servicenow Application Developer Exam New-Practice Test Set 2
Apoorv Diwan
No ratings yet
Rubibenuboduvoz
Document3 pages
Rubibenuboduvoz
Aman Singh
No ratings yet
Struct Assignment in C
Document4 pages
Struct Assignment in C
cjceszkv
100% (1)
Question Recommendation On The Stack Overflow Network: Jacob Perricone
Document8 pages
Question Recommendation On The Stack Overflow Network: Jacob Perricone
batman
No ratings yet
Ip 12
Document4 pages
Ip 12
sahusoubhagyaranjan72
No ratings yet
CT305-N-IT Workshop (Web Designing)
Document5 pages
CT305-N-IT Workshop (Web Designing)
Jigar
No ratings yet
Data Lake Analytics Program External 15 Apr
Document13 pages
Data Lake Analytics Program External 15 Apr
Anju Singh
No ratings yet
A To Z Preparation Guide For Code With Cisco by Vikram
Document17 pages
A To Z Preparation Guide For Code With Cisco by Vikram
algorithmsarchitecture
No ratings yet
Thesis Proposal On Data Mining
Document5 pages
Thesis Proposal On Data Mining
katyallenrochester
100% (2)
Becominghuman - Ai-Cheat Sheets For AI Neural Networks Machine Learning Deep Learning Amp BignbspData
Document24 pages
Becominghuman - Ai-Cheat Sheets For AI Neural Networks Machine Learning Deep Learning Amp BignbspData
Taasiel Julimamm
100% (1)
Servicenow Application Developer Exam New-Practice Test Set 4
Document31 pages
Servicenow Application Developer Exam New-Practice Test Set 4
Apoorv Diwan
No ratings yet
Datastage Scenario Based Interview Questions and Answers For Experienced
Document4 pages
Datastage Scenario Based Interview Questions and Answers For Experienced
Saravanakumar V
No ratings yet
Big Data Processing: Jiaul Paik
Document47 pages
Big Data Processing: Jiaul Paik
Momin Subur
No ratings yet
Information Retrieval Master Thesis
Document7 pages
Information Retrieval Master Thesis
afcmunxna
100% (2)
IMDB Scraping & Analysis
Document5 pages
IMDB Scraping & Analysis
varun goel
No ratings yet
Downloadable: Cheat Sheets For AI, Neural Networks, Machine Learning, Deep Learning & Data Science PDF
Document34 pages
Downloadable: Cheat Sheets For AI, Neural Networks, Machine Learning, Deep Learning & Data Science PDF
fekoy61900
No ratings yet
Seminar 1015 Twhuang
Document44 pages
Seminar 1015 Twhuang
f f
No ratings yet
AIML PGCP Project B21
Document6 pages
AIML PGCP Project B21
Murtuza Khan
No ratings yet
Xii Ip Study Material
Document92 pages
Xii Ip Study Material
debasissahu09471
No ratings yet
CS614 Final Term Solved Subjectives With Referencesby Moaaz
Document27 pages
CS614 Final Term Solved Subjectives With Referencesby Moaaz
chi
No ratings yet
Kamakhya Prasad Singh: Pep 8, Pyflakes
Document5 pages
Kamakhya Prasad Singh: Pep 8, Pyflakes
Vipin Handigund
No ratings yet
Topic Analysis Presentation
Document23 pages
Topic Analysis Presentation
Nader AlFakeeh
No ratings yet
KTC Hiring Task (SDE Intern)
Document1 page
KTC Hiring Task (SDE Intern)
Sambuddha Koner
No ratings yet
ITECH2302 MainAssessment Report
Document8 pages
ITECH2302 MainAssessment Report
sedobi1512
No ratings yet
Data Engineering Notes
Document4 pages
Data Engineering Notes
pranay mahindrakar
No ratings yet
Kadi Sarva Vishwavidyalaya: LDRP Institute of Technology and Research Gandhinagar
Document44 pages
Kadi Sarva Vishwavidyalaya: LDRP Institute of Technology and Research Gandhinagar
Himanshu M
No ratings yet
Data Analytics Lifecycle
Document51 pages
Data Analytics Lifecycle
Bhagya Patil
No ratings yet
Primary Skills: Java, Python, Django Framework, Django Rest Framework, Angular2, Mysql, Teradata
Document2 pages
Primary Skills: Java, Python, Django Framework, Django Rest Framework, Angular2, Mysql, Teradata
Jayachandra Jc
No ratings yet
KDnuggets The Complete Collection of Data Science Cheatsheets
Document17 pages
KDnuggets The Complete Collection of Data Science Cheatsheets
Bala Krishnan
No ratings yet
CSE508: Information Retrieval Assignment 2: Question 1 - (40 Points) Scoring and Term-Weighting
Document3 pages
CSE508: Information Retrieval Assignment 2: Question 1 - (40 Points) Scoring and Term-Weighting
Pranshu Patel
No ratings yet
Data Scientist Interview Questions
Document2 pages
Data Scientist Interview Questions
Khadija Lasri
No ratings yet
GreyAtom FSDSE Brochure PDF
Document25 pages
GreyAtom FSDSE Brochure PDF
moronkreacionz
No ratings yet
Question Answering Systems For Customer Relationship Management
Document6 pages
Question Answering Systems For Customer Relationship Management
Ause El
No ratings yet
Selenium Interview Question and Answer Part 2
Document3 pages
Selenium Interview Question and Answer Part 2
vijaygokhul
No ratings yet
Mastering ServiceStack
From Everand
Mastering ServiceStack
Niedermair Andreas
No ratings yet
ProdCat Tables 3-6 PDF
Document57 pages
ProdCat Tables 3-6 PDF
kumar
No ratings yet
TL 9000 Quality Management System Measurements Handbook: Release 3.0
Document168 pages
TL 9000 Quality Management System Measurements Handbook: Release 3.0
kumar
No ratings yet
Business Continuity Planning Final Internal Audit Report - Appendix 2 PDF
Document13 pages
Business Continuity Planning Final Internal Audit Report - Appendix 2 PDF
kumar
No ratings yet
HOLYSPIRITVERSES
Document1 page
HOLYSPIRITVERSES
kumar
No ratings yet
IMBD-Real-Time Project - Movie Review Analysis
Document2 pages
IMBD-Real-Time Project - Movie Review Analysis
kumar
No ratings yet
Business Continuity Planning and Disaster Recovery Planning - Part 5 PDF
Document38 pages
Business Continuity Planning and Disaster Recovery Planning - Part 5 PDF
kumar
100% (1)
Project 3 - Solution - Social Media
Document5 pages
Project 3 - Solution - Social Media
kumar
No ratings yet
Feb-2017 - New 200-125 Exam Dumps With PDF and VCE Download PDF
Document4 pages
Feb-2017 - New 200-125 Exam Dumps With PDF and VCE Download PDF
kumar
No ratings yet
2.4 Example+1+-+part+4
Document3 pages
2.4 Example+1+-+part+4
kumar
No ratings yet
2.3 Example+1+-+part+3
Document4 pages
2.3 Example+1+-+part+3
kumar
No ratings yet
2.2 Example+1+-+part+2
Document3 pages
2.2 Example+1+-+part+2
kumar
No ratings yet
Large Union Ltd. (Lul) - Eu Sme Co. Example
Document3 pages
Large Union Ltd. (Lul) - Eu Sme Co. Example
kumar
No ratings yet
Large Union Ltd. (Lul) - Eu Sme Co. Example
Document3 pages
Large Union Ltd. (Lul) - Eu Sme Co. Example
kumar
No ratings yet
Information Security Management Systems Assessment Test
Document3 pages
Information Security Management Systems Assessment Test
kumar
50% (2)