Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

University of Management and Technology

Machine Learning
Assignment 01 (CLO-02)
Course Instructor: Aqsa Afzal

Name: Roll Number:

Date: Dated:

Semester: Total Marks: 10

INSTRUCTIONS

 Complete the assignment in a group of two only.


 Late submissions will not be accepted under any circumstances.
 Prepare for a viva session, where you may be questioned on the content of your assignment.
 Ensure figures and diagrams are clearly labeled
 Submit your assignment in PDF format and hard form too.

Instructor Signature

You can get quick recap about Data Wrangling from:


https://medium.com/@ms-analytics/data-wrangling-with-python-fff4d66a758e

Page PAGE 3 of NUMPAGES 3


Steps for assignment:
Gathering data from multiple sources
* Assessing the data visually and programmatically to identify quality and tidiness issues
* Ridding each dataframe of every tidiness and quality issue
* Merging the three dataframes into one clean master dataframe
* Analyzing, exploring relationships and visualizing insights from the clean data

Gathering of Data
Gathered three different datasets with different formats in three different ways
1. Download the :
[twitter_archive_enhanced.csv](https://d17h27t6h515a5.cloudfront.net/topher/2017/
August/59a4e958_twitter-archive-enhanced/twitter-archive-enhanced.csv) file and read it into
a pandas dataframe.
2. Programmatically downloaded the second file, 'image-prediction.tsv' from the provided [url
here](https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-
predictions/image-predictions.tsv) using the Requests library
3. Sourced data from Twitter using the Tweepy library to query additional data via the Twitter
API, saving it into a txt file 'tweet_json.txt' and read it line by line into a pandas dataframe.

Merged the dataframes into a single dataframe twitter_archive_master.csv

Major Work has already been done, you just have to analyze the code and follow the text and
comments provided

 Import python notebook as pdf and submit it. For VS code, do install Markdown(for
comment and text and pdf).
 Don’t include the long output in hard form
 Outputs for Issues(mentioned in notebook) should be visible

Project Credit: Nohmie Aguga

Page PAGE 3 of NUMPAGES 3

You might also like