Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Subject Name: BD

Subject Code: CS-3032


Program Name: B.Tech
Semester: VI (Regular)
Year - 2019

SPRING MID SEMESTER EXAMINATION-2019


SOLUTION AND EVALUATION SCHEME
School of Computer Engineering
Kalinga Institute of Industrial Technology
Deemed to be University, Bhubaneswar-24
Big Data
CS-3032

Time: 11/2 Hours Full Mark: 20


Answer any four questions including question No.1 which is compulsory.
The figures in the margin indicate full marks. Candidates are required to give their answers in their
own words as far as practicable and all parts of a question should be answered at one place only.

Q.1. [5X1]
(a) List the different layers of Big Data Stack.
Evaluation scheme: Full mark if at least 5 layers are listed. Step mark can be awarded judiciously
depending on the partial correctness of the solution.
Solution: 1) Data Sources Layer, 2) Ingestion Layer, 3) Storage Layer, 4) Physical Infrastructure Layer, 5)
Platform Management Layer, 6) Security Layer, 7) Monitoring Layer, 8) Analytics Engine, and 9)
Visualization Layer

(b) Calculate the probability that a slot is set to 1 after insertion of 5 elements into a Bloom Filter of size
10.
Evaluation scheme: Full mark for the correct answer and no step mark to be awarded.
Solution: optimum number of hash function = 10/5 ln 2 = 2 * 0.693 = ceil(1.38) = 1
The probability is 1 – (1- 1/10)1*5 = 0.40951

(c) Explain Value Coercion in R with suitable example.


Evaluation scheme: Full mark for the correct answer. Step-wise mark can be awarded judiciously
depending on the partial correctness of the solution.
Solution:
n = c(2, 3, 5)
s = c("aa", "bb", "cc", "dd", "ee")
c(n, s)
[1] "2" "3" "5" "aa" "bb" "cc" "dd" "ee“
In the above code snippet, numeric values are being coerced into character strings when the two
vectors are combined. This is necessary so as to maintain the same primitive data type for members in
the same vector and is called as value coercion.
(d) Calculate the probability of a slot being hashed in a Bloom Filter of size 10 and with 3 hash functions.
Evaluation scheme: Full mark for the correct answer and no step mark to be awarded.
Solution: The probability is (1/10)3 = 0.001
(e) What is the output of the following program?
[2.5]
print(5*( c(1:3)+ seq.int(1, 4, 2)+ c(1:5)))
Evaluation scheme: Full mark for the correct answer and no step mark to be awarded.
Solution: [1] 15 35 35 30 50
Q.2. [2.5]
(a) Discuss the phases of analytic life cycle that should be followed to get the expected output.
Evaluation scheme: Full mark for the correct answer and step mark to be awarded judiciously
depending on the partial correctness of the solution.
Solution: Refer to lecture note, unit – 1.
(b) Discuss MPP database analytics architecture. [2.5]
Evaluation scheme: Full mark for the correct answer and step mark to be awarded judiciously
depending on the partial correctness of the solution.
Solution: Refer to lecture note, unit – 1.
Q.3. [2.5]
(a) Explain CAP Theorem (also called as Brewer’s Theorem) and prove it.
Evaluation scheme: Full mark for the correct answer and step mark to be awarded judiciously
depending on the partial correctness of the solution.
Solution: Refer to lecture note, unit – 1.
(b) Explain the four types of analytics with suitable examples.
Evaluation scheme: Full mark for the correct answer and step mark to be awarded judiciously
depending on the partial correctness of the solution.
Solution: Refer to lecture note, unit – 1.

Q.4. [5]
An election is contested by 3 candidates and the candidates are numbered from 1 to 3. The voting is
done by marking the candidate number on the ballot paper. Write a program in R to i) cast the vote for
user supplied n voters, and to ii) read the ballots and count the votes for each candidate. In case a
number read is outside the range of 1 to 3, the ballot should be considered as a ‘spoilt ballot’ and the
program should also count the number of spoilt ballots.
Evaluation scheme: Full mark for the correct answer and step mark to be awarded as per the scheme:
casting vote = 1 mark, reading ballot = 1 mark, counting votes = 1 mark, logic to consider ‘spoilt ballot’
= 1 mark and to count the number of spoilt ballots = 1 mark.
Solution:
voterNum = as.integer(readline(prompt = "Enter number of voters: "))
num = 1:voterNum
voteVec = vector()
for (n in num){
vote = as.integer(readline(prompt = "Caste your vote:"))
voteVec[n] = vote
}
print(voteVec)
for (n in num){
if (voteVec[n] > 3){
voteVec[n] = 4
}
}
table(voteVec)
Q.5. [5]
Develop an algorithm to i) insert an item, and to ii) test the membership (or lookup) in Bloom Filter.
Draw a step-by-step process in the insertion of element 25, and then 40 into the Bloom Filter of size 10.
Then, draw a step-by-step process for lookup/membership test with the elements 10 and 48. The hash
functions are: h1(x) = (3x+4) mod 6, and h2(x) = (7x+5) mod 3. Identify whether any lookup element (i.e.
either 10 or 48) is resulting into the case of FALSE POSITIVE?
Evaluation scheme: Full mark for the correct answer and step mark to be awarded as per the scheme:
insertion of item algorithm = 0.5 mark, lookup of item algorithm = 0.5 mark, step-by-step
demonstration for insertion of items = 1.5 marks, step-by-step demonstration for lookup of items = 1.5
marks, and identification of lookup element = 1 mark.
Solution:
Insertion of item into Blooms filter algorithm - Refer to lecture note, unit – 2.
Lookup of item in Blooms filter algorithm - Refer to lecture note, unit – 2.
Step-by-step demonstration for insertion of items -
Initially all slots of bloom filter is set to 0.
0 0 0 0 0 0 0 0 0 0
0 1 2 3 4 5 6 7 8 9
25 and 40 are inserted and the slot numbers are as follows:
Item h1 h2
25 1 0
40 4 0
New state of Bloom filter is as follows:
1 1 0 0 1 0 0 0 0 0
0 1 2 3 4 5 6 7 8 9
membership test with the elements 10 and 48:
Item H1 H2
10 4 0
48 4 2
Since the element 10 is not inserted, but the hashing is resulting into slot no 4 and 0 which are set to 1,
so it’s the evidence of FALSE POSITIVE.
Faculty Consent

You might also like