Assignment 1

Uploaded by

sindhuk55

0% found this document useful (0 votes)

1 views3 pages

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

0% found this document useful (0 votes)

1 views3 pages

Assignment 1

Uploaded by

sindhuk55

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

Jump to Page

You are on page 1of 3

Search inside document

2030ICT/7030ICT

Introduction to Big Data

Analytics

Practice-based Assignment
Assignment 1

Assignment 1

Description
Use the following two data files:

titles.csv - Contains the following information for titles:

• tconst (string, required) - alphanumeric unique identifier of the title

• titleType (string) – the type/format of the title (e.g., movie, short, tvseries, tvepisode,
video, etc)
• primaryTitle (string, required) – the more popular title / the title used by the filmmakers
on promotional materials at the point of release
• originalTitle (string) - original title, in the original language
• isAdult (boolean) - 0: non-adult title; 1: adult title
• startYear (number) – represents the release year of a title. In the case of TV Series, it is
the series start year
• endYear (number) – TV Series end year. ‘\N’ for all other title types
• runtimeMinutes (number) – primary runtime of the title, in minutes
• genres (string) – includes up to three genres associated with the title ratings.csv –

Contains the IMDb rating and votes information for titles

• tconst (string, required) - alphanumeric unique identifier of the title

• averageRating (number, required) – weighted average of all the individual user ratings
• numVotes (number) - number of votes the title has received

Tasks
1. Create a database named “imdb” on the Compass tool.
2. Create two collections named “titles” (refers to title.basics.tsv.gz) and “ratings” (refers to
title.ratings.tsv.gz).
3. Schema validation: Write JSON Schemas for each of the collections based on the descriptions.
4. Import data to the collections using Compass. Insert the data in titles.csv into “titles” and
ratings.csv into “ratings”.
5. Perform schema analysis and describe the data characteristics. Go to the “Schema” tab
and click on the “Schema analysis” button. Compass will generate an analysis for each
column. You should describe the interesting characteristics of the data in the columns.
You can learn more about schema analysis at:
https://docs.mongodb.com/compass/master/schema/
6. Perform some advanced analysis of the data using Aggregation. You must extract the
following information and include the output.
a. Find the total number of movies released each year.
b. Find the top five Fantasy-Adventure movie titles (primaryTitle) released in 2021
according to the rating. [Hint: Genre must include both Fantasy and Adventure.]

c. Find the top five Fantasy-Adventure movie titles (primaryTitle) released in 2021
according to the number of votes. [Hint: Genre must include both Fantasy and
Adventure.]
Marking Criteria
Database Creation and Import Data Database and collection 1.5 Mark
(Tasks 1-4) creation
Schema validation 6 Marks
7 Marks (3 Marks per schema)
Import data 3 Marks
(1.5 Mark per
collection)
Schema Analysis Complete the assigned tasks 1.5 Mark
(Task 5) using Compass
Comment on the statistics of the column
data, e.g., In which year we can find the
greatest number of released movies?
Which genre covers the highest number of Statistical analysis 3 Marks
movies? etc.

3 Marks

NoSQL Queries Query 6.a 3 Marks

(Task 6) Hint: $sortByCount

10 Marks Query 6.b 6 Marks

Hint: $match, $lookup,
$unwind, $project, $sort and
$limit
Query 6.c 6 Marks
Hint: $match, $lookup,
$unwind, $project, $sort and
$limit
Total 30 Marks

Good luck J

Learning JavaScript Data Structures and Algorithms - Second Edition
From Everand
Learning JavaScript Data Structures and Algorithms - Second Edition
Loiane Groner
No ratings yet
SQL Text Book Exercise Qusetions
Document5 pages
SQL Text Book Exercise Qusetions
CSEMohan Raj M
No ratings yet
WQD7005 (Alternative Assessment)
Document4 pages
WQD7005 (Alternative Assessment)
AdamZain788
No ratings yet
Assignment 2 PDF
Document25 pages
Assignment 2 PDF
Boni Halder
No ratings yet
Afis Project RFP (Revised) - Final
Document39 pages
Afis Project RFP (Revised) - Final
ashwin2250
100% (1)
Assignment 1
Document3 pages
Assignment 1
sindhuk55
No ratings yet
AI Lab 06 Lab Tasks
Document6 pages
AI Lab 06 Lab Tasks
Maaz
No ratings yet
CS373 Homework 1: 1 Part I: Basic Probability and Statistics
Document5 pages
CS373 Homework 1: 1 Part I: Basic Probability and Statistics
sanjay
No ratings yet
Numpy Recommendation System
Document26 pages
Numpy Recommendation System
xinzhi
No ratings yet
ARI711s Test 1
Document5 pages
ARI711s Test 1
Jones Ntesa
No ratings yet
Bar Graph Grade 4
Document17 pages
Bar Graph Grade 4
Sanila Subhash
No ratings yet
Eggs, Bungee Jumping, and Algebra: An Application of Linear Modeling
Document11 pages
Eggs, Bungee Jumping, and Algebra: An Application of Linear Modeling
Brandy Grimm
No ratings yet
Day 1-Tasks
Document3 pages
Day 1-Tasks
vinothkumar0743
No ratings yet
Chapter2 Basic and Customized Plotting
Document16 pages
Chapter2 Basic and Customized Plotting
Marwa Miwa
No ratings yet
WQD7005 (Alternative Assessment)
Document4 pages
WQD7005 (Alternative Assessment)
AdamZain788
100% (1)
R Basic
Document16 pages
R Basic
Taslima Chowdhury
No ratings yet
Coding Introduction
Document46 pages
Coding Introduction
lijiayi2023666
No ratings yet
Neural Networks & Machine Learning: Worksheet 3
Document3 pages
Neural Networks & Machine Learning: Worksheet 3
shreya arun
No ratings yet
AI Lab 06 Lab Tasks
Document11 pages
AI Lab 06 Lab Tasks
Maaz
No ratings yet
01 Exercise
Document2 pages
01 Exercise
hamouichhamza
No ratings yet
Quiz 1
Document16 pages
Quiz 1
Khatia Ivanova
No ratings yet
Spatstat Quickref
Document33 pages
Spatstat Quickref
arthur.pgsouza
No ratings yet
Microsoft R PreProcessing
Document1 page
Microsoft R PreProcessing
Manisha Panda
No ratings yet
Data Exploration R
Document10 pages
Data Exploration R
Priyadarshini Chavan
No ratings yet
CS4622 Machine Learning PROJECT
Document3 pages
CS4622 Machine Learning PROJECT
Raveen Shamentha
No ratings yet
DWM May 19
Document3 pages
DWM May 19
Muhammad Shanu
No ratings yet
Programming Language Design Concepts-IT 340 8
Document7 pages
Programming Language Design Concepts-IT 340 8
jathurshanm3
No ratings yet
Cs3361 Data Science Laboratory
Document139 pages
Cs3361 Data Science Laboratory
karthickamsec
No ratings yet
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
Document38 pages
Project - Machine Learning-Business Report: By: K Ravi Kumar PGP-Data Science and Business Analytics (PGPDSBA.O.MAR23.A)
Ravi Kotharu
No ratings yet
CS464 Ch1 Intro Fall2020
Document83 pages
CS464 Ch1 Intro Fall2020
Mathias Bueno
No ratings yet
Experiment No 10 - Updated
Document28 pages
Experiment No 10 - Updated
Aman Jain
No ratings yet
Important Questions
Document4 pages
Important Questions
Adilrabia rsl
No ratings yet
Unit-2 Feature Selection
Document92 pages
Unit-2 Feature Selection
Rahul Vashistha
No ratings yet
CS211 Fall 2002 P4
Document7 pages
CS211 Fall 2002 P4
Nikhil Varandani
No ratings yet
Lecture 7 - Integrated Analysis With R
Document79 pages
Lecture 7 - Integrated Analysis With R
Anurag Laddha
No ratings yet
Introduction To The Analysis of Spatial Data Using R
Document8 pages
Introduction To The Analysis of Spatial Data Using R
Mohan Pudasaini
No ratings yet
Data Analysis Lab - Final - 23-24
Document11 pages
Data Analysis Lab - Final - 23-24
forallofus435
No ratings yet
Lec 05-DSFa23
Document65 pages
Lec 05-DSFa23
labnexaplan9
No ratings yet
Problem Statement
Document6 pages
Problem Statement
Archana Jagadeesan
No ratings yet
Experiment 3
Document43 pages
Experiment 3
PUSHPITHA
No ratings yet
November: 6.034 Quiz 2 17,2003
Document17 pages
November: 6.034 Quiz 2 17,2003
Maria Silva
No ratings yet
CS5785 Homework 4: .PDF .Py .Ipynb
Document5 pages
CS5785 Homework 4: .PDF .Py .Ipynb
Al Tarino
No ratings yet
FortyMarksTest GroupAB
Document3 pages
FortyMarksTest GroupAB
gargipatil0702
No ratings yet
Package Itemanalysis': R Topics Documented
Document7 pages
Package Itemanalysis': R Topics Documented
Joko Purwanto
No ratings yet
Python 201 - (Slightly) Advanced Python Topics
Document69 pages
Python 201 - (Slightly) Advanced Python Topics
sandro car
No ratings yet
FINC 614 Introduction To Data Science Mid Term Exam Do The Following in R and Turn in A Word or PDF Document Generated With Knitr, Via Blackboard
Document1 page
FINC 614 Introduction To Data Science Mid Term Exam Do The Following in R and Turn in A Word or PDF Document Generated With Knitr, Via Blackboard
Amkouk Fatima Zahra
No ratings yet
CS3361 Set2
Document6 pages
CS3361 Set2
hodit.it
No ratings yet
Exercise and Experiment 3
Document14 pages
Exercise and Experiment 3
h8792670
No ratings yet
Unit#8 - Top - Most Popular DS Algorithms
Document11 pages
Unit#8 - Top - Most Popular DS Algorithms
Tanveer Ahmed Hakro
No ratings yet
Percobaan Visualisasi Data: Menggunakan EXCEL, Octave/Matlab, R, Dan Python
Document26 pages
Percobaan Visualisasi Data: Menggunakan EXCEL, Octave/Matlab, R, Dan Python
Elma Lyrics
No ratings yet
Data Visu Lab4
Document23 pages
Data Visu Lab4
yomna mohamed
No ratings yet
Package Reams': R Topics Documented
Document12 pages
Package Reams': R Topics Documented
DrHHHQ
No ratings yet
Amiete - It December 2016: Code: At78 Subject: Data Mining & Warehousing
Document3 pages
Amiete - It December 2016: Code: At78 Subject: Data Mining & Warehousing
Ala mia
No ratings yet
Attachment 1
Document4 pages
Attachment 1
sammiepetez8
No ratings yet
TDDB84 DesignPatterns Exam 2005 10 15 PDF
Document6 pages
TDDB84 DesignPatterns Exam 2005 10 15 PDF
ponni
No ratings yet
Assignment 4
Document2 pages
Assignment 4
abras
No ratings yet
Lab. Activity 3 - Array Basic Operations
Document2 pages
Lab. Activity 3 - Array Basic Operations
Mizuki Akabayashi
No ratings yet
Lab Mannual
Document49 pages
Lab Mannual
vickyakfan152002
No ratings yet
Dav Exps - Merged - Merged
Document99 pages
Dav Exps - Merged - Merged
Sahil Surve
No ratings yet
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
Document8 pages
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
Harshali Mane
No ratings yet
מצגת תרגול 7
Document20 pages
מצגת תרגול 7
Gal Zeev
No ratings yet
Intepro ModBus SCPI User Manual A4 New Edits 6 - 2017 PDF
Document83 pages
Intepro ModBus SCPI User Manual A4 New Edits 6 - 2017 PDF
Anil Tiwari
No ratings yet
Cracking The Coding Interview v6 Errata & Changes
Document37 pages
Cracking The Coding Interview v6 Errata & Changes
yaoayaoa
No ratings yet
HANA-Based SAP BW Transformation
Document28 pages
HANA-Based SAP BW Transformation
Mahesh Babu
No ratings yet
OML07127 - UFP ModBus Table
Document16 pages
OML07127 - UFP ModBus Table
JOSE
No ratings yet
Exp 3 - 6 (A)
Document10 pages
Exp 3 - 6 (A)
Mr Movie
No ratings yet
Parallel Wireless HetNet Gateway Data Sheet PDF
Document4 pages
Parallel Wireless HetNet Gateway Data Sheet PDF
Samudera Buana
No ratings yet
Solutions Series: Using System Data Synchronization
Document47 pages
Solutions Series: Using System Data Synchronization
Cristhian Haro
No ratings yet
Mitsubishi FX5U
Document6 pages
Mitsubishi FX5U
Vu Minh
No ratings yet
Heroku + Salesforce Integration PDF
Document3 pages
Heroku + Salesforce Integration PDF
only4u doll
No ratings yet
128M (8Mx16) GDDR SDRAM: HY5DU281622ET
Document34 pages
128M (8Mx16) GDDR SDRAM: HY5DU281622ET
Boris Lazarchuk
No ratings yet
ABAP HANA Complete - Gensoft Technologies
Document4 pages
ABAP HANA Complete - Gensoft Technologies
Ramesh Nayudu
No ratings yet
A Project Report On "Hotel Reservation and Billing System"
Document12 pages
A Project Report On "Hotel Reservation and Billing System"
nikita gangathade
No ratings yet
11 Ip
Document7 pages
11 Ip
Divyansh Pradhan
No ratings yet
Computer Organization & Architecture: Cache Memory
Document52 pages
Computer Organization & Architecture: Cache Memory
muhammad farooq
No ratings yet
C in Mainframe
Document1,032 pages
C in Mainframe
Vjay Ksk
No ratings yet
Getting Started With Tableau Prep
Document3 pages
Getting Started With Tableau Prep
majujmathew
No ratings yet
Deadlock in DBMS
Document2 pages
Deadlock in DBMS
Editor IJTSRD
No ratings yet
Siemens - Profibus and Modbus Comparison
Document5 pages
Siemens - Profibus and Modbus Comparison
manuela.bibescu
No ratings yet
Computer General Knowledge QuestionS
Document27 pages
Computer General Knowledge QuestionS
Ashok
No ratings yet
Rhayven Jay C. Adigue: A - Windows CMD Commands
Document8 pages
Rhayven Jay C. Adigue: A - Windows CMD Commands
melvin segui
No ratings yet
Digital Video Processing
Document19 pages
Digital Video Processing
Arjun Hande
0% (1)
Design and Development of Centralized Squid Proxy Management System
Document6 pages
Design and Development of Centralized Squid Proxy Management System
Gildas ASSOURI
No ratings yet
Delta Ia-Plc Dvpcopm-Sl Om en 20220705
Document52 pages
Delta Ia-Plc Dvpcopm-Sl Om en 20220705
Serdar Yeni
No ratings yet
Cassandra Design Patterns - Sample Chapter
Document32 pages
Cassandra Design Patterns - Sample Chapter
Packt Publishing
No ratings yet
Elastix Installation v1.3.2
Document16 pages
Elastix Installation v1.3.2
Juan David García Jaime
No ratings yet
Huawei SD-WAN Solution For Technical Training 2017Q1 V1.0
Document109 pages
Huawei SD-WAN Solution For Technical Training 2017Q1 V1.0
dfssgr
0% (1)
02 Stack Queue
Document12 pages
02 Stack Queue
varyk
No ratings yet
FILE HANDLING Practical PDF
Document25 pages
FILE HANDLING Practical PDF
Sunishtha Singh
No ratings yet
Stack and Queue Presentation New
Document48 pages
Stack and Queue Presentation New
zelalem
No ratings yet