Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Information Management

Prof. Sara Foresti


February 26, 2021
Time available 2:00 hours (1:30 hours for students who do not answer to question 5)

Question 1)

1. Clearly illustrate the concepts of data-centric and client-centric consistency, describing the differences between the two
concepts, and clarifying in which scenario each consistency concept can (or should) be adopted.
2. Consider the following schedule of operations performed by four processes over one variable (initially set to zero):
P1: R(X = 1) R(X = 4) R(X = 3)
P2: R(X = 1) W (X = 2) R(X = 3) R(X = 4)
P3: W (X = 1) W (X = 3) R(X = 2)
P4: R(X = 3) R(X = 2) W (X = 4)
(a) Is the schedule sequential consistent?
If sequential consistency is satisfied, include the minimum number of operations that makes the schedule non
sequential consistent.
Otherwise, indicate the minimum number of operations (and which one) that should be removed to guarantee
sequential consistency.
(b) Is the schedule causal consistent?
List all the causal dependencies in the schedule.
If the schedule is causal consistent, include the minimum number of operations to make the schedule non causal
consistency.
If the schedule is not causal consistent, indicate the operation(s) that should be removed to guarantee causal
consistency.

Question 2)

1. What is a bitmap index? Clearly explain its advantages and disadvantages and how to insert/remove values from it.
2. Build the bitmap index for attribute P RODUCT and the bitmap index for attribute C ITY.
Write the condition operating on bitmap indexes to filter sales in Milan for products P1 and P3.

id product month quantity


1 P1 Milan 200
2 P1 Rome 150
3 P1 Venice 100
4 P2 Milan 170
5 P2 Venice 120
6 P3 Milan 250
7 P3 Venice 100
8 P3 Turin 80
Question 3)

1. Describe and discuss the association rule mining and the frequent itemset mining problems and the relationship between
these two problems.
2. Considering the table below and assuming min sup=0.75, identify all the frequent itemsets using Apriori algorithm.

TID Items
1 A, B, C,
2 A, C, D, E
3 A, B, C, D, E
4 A, B, C, D

Question 4)
Assume a distributed database for a company, where each warehouse keeps track, for each product, of the flows of items during
the year in a relation having schema F LOW(Id, ProductId, Date, Quantity).
Note that attribute Quantity has a positive value in case of input flow, and a negative value in case of output flow.
How would you use a MapReduce framework to identify, for each product, the overall flow (positive or negative quantity) of
each product?
How would you define the map and the reduce functions?
Illustrate your solution through an example with small tables.

Question 5) only for students who did not attend database course with Prof. Samarati

1. Illustrate the idempotency property for UNDO and REDO operations in log management.
2. Given the following log:

DUMP, B(T1), B(T2), B(T3), I(T1,O1,A1), I(T2,O2,A2), C(T1), B(T4), D(T4,O3,B3), CK(. . . ), U(T2,O4,B4,A4),
D(T3,O5,B5), B(T5), A(T4), U(T5,O6,B6,A6), CK(. . . ), D(T5,O7,B7), C(T2), B(T6), I(T6,08,A8), A(T5), C(T3) FAI -
LURE

(a) write, for each checkpoint record, active transactions;


(b) illustrate in the details the steps of a warm restart to recover from failure.

You might also like