Professional Documents
Culture Documents
Be - Computer Engineering - Semester 7 - 2022 - December - Big Data Analysis Rev 2019 C Scheme
Be - Computer Engineering - Semester 7 - 2022 - December - Big Data Analysis Rev 2019 C Scheme
Be - Computer Engineering - Semester 7 - 2022 - December - Big Data Analysis Rev 2019 C Scheme
A0
3
C
A4
B7
17
58
Paper / Subject Code: 42172 / BIG DATA ANALYTICS
C3
CA
78
09
C7
A4
B7
3A
7
58
CB
81
CA
09
4C
C7
77
65
3A
58
AA
B
9B
B4
C
C7
A0
65
AD
Time: 03 Hours Marks: 80
4
8
A
CB
B4
C3
5
A
DB
C7
65
D
A4
3A
8
54
BA
B
4
5
Note: 1. Question 1 is compulsory
5C
CA
DB
4C
C7
73
46
81
58
AA
54
B
2. Answer any three out of the remaining five questions.
5C
B
DB
77
7
73
3. Assume any suitable data wherever required and justify the same.
4D
8C
C
46
9B
BA
CB
8
75
35
DB
7
A0
65
7
BC
17
9B
Q1 a) What is function of Map Tasks in the Map Reduce framework? Explain with the [5]
58
54
A
C3
4
8
5C
DB
DB
77
0
C7
73
help of an example.
A4
6
9B
54
BA
C3
CB
4
CA
8
b) Demonstrate how business problems have been successfully solved faster, cheaper [5]
B
7
A0
3
4
65
B7
D
17
58
A
and more effectively considering NoSQL Google’s MapReduce case study. Also
A
3
B4
CA
78
09
5
C
C7
DB
73
illustrate the business drivers and the findings in it.
AD
A
58
AA
CB
46
9B
4
3
35
4C
C7
c) Why is HDFS more suited for applications having large datasets and not when there [5]
DB
DB
77
A0
65
17
8
AA
CB
9B
4
54
BA
C3
75
DB
8
77
A0
65
73
8C
C
A4
d) Explain the concept of bloom filter with an example [5]
4D
BA
9B
B4
81
C3
75
5C
35
77
A0
4D
C
C
4
6
17
8
AA
A
CB
9B
4
Q2 a) Name the three ways that resources can be shared between computer systems. Name [10]
3
5
35
DB
78
4C
C7
A0
65
4D
8C
17
B7
AA
A
CB
B4
78
C3
75
35
09
b) Write a map reduce pseudo code for word count problem. Apply map reduce [10]
65
B7
8C
BC
17
A4
3A
4
B4
75
35
5C
A
DB
4C
B7
D
3A
8C
C
7
46
1
AA
54
BA
CB
8
09
5
C
C7
3
A4
65
B7
D
3A
8C
17
BA
CB
Q3 a) Suppose the stream is 1, 3, 2, 1, 2, 3, 4, 3, 1, 2, 3, 1. Let h(x) = 6x + 1 mod 5. [10]
B4
CA
78
09
75
35
C
A4
65
Show how the Flajolet- Martin algorithm will estimate the number of distinct
7
4D
AD
3A
BC
7
58
B4
CA
8
09
5C
B
77
A4
D
3A
17
58
CB
46
9B
54
BA
78
4C
C7
DB
A0
65
73
4D
58
AA
CB
B
B4
BA
C3
1 1 56
78
09
35
C7
65
AD
B7
4D
3A
2 2 75
7
58
A
CB
4
81
CA
DB
09
3 1 48
35
4C
C7
77
65
17
4 2 69
58
AA
A
CB
9B
4
C3
DB
DB
78
7
5 1 84
A0
65
8C
C
A4
B7
4
CB
6 2 53
4
C3
75
35
CA
DB
DB
09
65
C
7
A4
3A
81
58
4
B
4
35
i. Create a subset of subject less than 4 by using subset () function and demonstrate
5C
A
DB
DB
4C
C7
C
7
the output.
6
81
AA
4
CB
4
5
35
DB
DB
7
ii. Create a subset where the subject column is less than 3 and the class equals to 2
7
65
B7
8C
BC
7
81
75
35
5C
DB
7
B7
D
3A
BC
7
46
81
A
9
5C
DB
DB
7
A0
B7
46
81
54
BA
C3
b) With a neat sketch, explain the architecture of the data-stream management system. [10]
9
DB
7
A0
3
A4
B7
4D
7
81
Q5 a) Determine communities for the given social network graph using Girvan- Newman [10]
BA
C3
CA
35
7
A0
algorithm.
A4
B7
4D
17
58
C3
A
8
9
35
77
A0
8C
A4
17
B
C3
5
78
9
C7
A0
8C
A4
B7
CB
C3
5
09
15786 Page 1 of 2
C7
65
8C
A4
3A
CB
B4
75
4C
65
8C
BC
AA
B4
BADB465CBC758CAA4C3A09B77817354D
75
5C
7
A0
3
C
A4
B7
17
58
Paper / Subject Code: 42172 / BIG DATA ANALYTICS
C3
CA
78
09
C7
A4
B7
3A
7
58
CB
81
CA
09
4C
C7
77
65
3A
58
AA
B
9B
B4
C
C7
A B D E
A0
65
AD
4
8
A
CB
B4
C3
5
A
DB
C7
65
D
A4
3A
8
54
BA
B
4
5
5C
CA
DB
4C
C7
73
D
C G F
46
81
58
AA
54
B
5C
B
DB
77
7
73
4D
8C
C
46
9B
BA
CB
8
75
35
DB
7
A0
65
7
BC
17
9B
58
54
A
C3
4
b) [10]
8
The data analyst of Argon technology Mr. John needs to enter the salaries of 10
5C
DB
DB
77
0
C7
73
A4
A
employees in R. The salaries of the employees are given in the following table:
6
9B
54
BA
C3
CB
4
CA
B
7
A0
3
4
65
B7
D
17
58
A
Sr. No. Name of employees Salaries
A
3
B4
CA
78
09
5
C
C7
DB
73
4
AD
A
1 Vivek 21000
58
AA
CB
46
9B
4
3
35
4C
C7
DB
DB
77
A0
65
17
2 Karan 55000
8
AA
CB
9B
4
54
BA
C3
75
DB
8
77
A0
65
73
8C
C
A4
4D
3 James 67000
BA
9B
B4
81
C3
75
5C
35
77
A0
4D
C
C
4
4 Soham 50000
6
17
8
AA
A
CB
9B
4
3
5
35
DB
78
4C
C7
A0
65
4D
8C
17
5 Renu 54000
B7
AA
A
CB
B4
78
C3
75
35
09
65
B7
8C
BC
17
6 Farah 40000
A4
3A
4
B4
78
09
75
35
5C
A
DB
4C
B7
D
3A
8C
7 Hetal 30000 C
7
46
1
AA
54
BA
CB
8
09
5
C
DB
7
C7
3
A4
8 Mary 70000
65
B7
D
3A
8C
17
BA
CB
B4
CA
78
09
75
35
C
A4
9 Ganesh 20000
65
7
4D
AD
3A
BC
7
58
B4
CA
8
09
35
C
C7
5C
B
77
10 Krish 15000
A4
D
3A
17
58
CB
46
9B
54
BA
CA
78
4C
C7
DB
A0
65
73
7
4D
i. Which R command will Mr. John use to enter these values demonstrate the output.
58
AA
CB
B
B4
BA
C3
78
09
35
C7
ii. Now Mr. John wants to add the salaries of 5 new employees in the existing table,
65
AD
B7
4D
3A
7
58
A
CB
4
which command he will use to join datasets with new values in R. Demonstrate the
81
CA
DB
09
35
4C
C7
77
65
output.
A
17
58
AA
A
CB
9B
4
C3
DB
DB
78
7
A0
65
8C
C
A4
B7
Q6 a) i. Write the script to sort the values contained in the following vector in ascending [10]
4
CB
4
C3
75
35
CA
DB
DB
09
order and descending order: (23, 45, 10, 34, 89, 20, 67, 99). Demonstrate the
65
C
7
A4
3A
81
58
4
output.
B
4
35
5C
A
DB
DB
4C
C7
ii. Name and explain the operators used to form data subsets in R.
C
7
6
81
AA
4
CB
4
5
35
DB
DB
7
65
B7
8C
BC
7
81
suitable example.
4
B4
09
75
35
5C
DB
7
B7
D
3A
BC
7
46
81
-----------------
9
35
5C
DB
DB
7
A0
B7
46
81
54
BA
C3
DB
7
A0
3
A4
B7
4D
7
81
BA
C3
CA
35
7
A0
A4
B7
4D
17
58
C3
A
8
9
35
77
A0
8C
A4
17
B
C3
5
78
9
C7
A0
8C
A4
B7
CB
C3
5
09
15786 Page 2 of 2
C7
65
8C
A4
3A
CB
B4
75
4C
65
8C
BC
AA
B4
BADB465CBC758CAA4C3A09B77817354D
75
5C