Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Course Code : CDT 305-2 GHXW/MW – 22 / 1637

Fifth Semester B. Tech. ( Computer Science and Engineering /

Data Science ) Examination


Time : 3 Hours ] [ Max. Marks : 60

Instructions to Candidates :—
(1) All questions carry marks as indicated against them.
(2) Assume suitable data wherever necessary and clearly state your assumptions.

1. (a) Write a short note on the Business Layer of the BI framework.

(b) What information is kept in the Metadata Repository ? 2(CO2)
(c) Explain the following terms with an example :
(i) Return On Investment (ROI).
(ii) Return On Asset (ROA).
(iii) Total Cost of Ownership (TCO).
(iv) Total Value of Ownership (TVO). 4(CO2)

2. (a) An insurance company, with branches all over the country, wants to develop
a data warehouse for effective decision-making about their insurance policies.
There are a number of different types of insurance like Auto insurance,
Home insurance, Industrial insurance, etc. The entire country is categorized
into four regions, namely, North, South, East and West. Each region consists
of a set of states. There may be different types of customers like individuals,
institution, industry, etc. The data warehouse should record an entry for
each policy issued to each customer along with the premium paid. With
respect to the above use case, answer the following questions. Necessary
assumptions can be made to support your answer :
(i) Design a star schema for the data warehouse clearly identifying
the fact table(s), dimensional table(s), their attributes and measures
along with the primary key and foreign key relationships.

GHXW/MW-22 / 1637 Contd.

(ii) Write an SQL query by which you can display region - wise,
insurance - type - wise, year - wise total premium collected from your

(iii) Draw possible schema hierarchies for each dimension.


(b) Explain each of the following with the help of an example :

(i) Role playing dimensions.

(ii) Degenerate dimensions.

(iii) Junk dimensions. 3(CO1)

3. (a) What is your understanding of the staging area ? Explain. 3(CO1)

(b) According to Kimball, data profiling should be done at every stage of

the ETL process. State what types of problems are encountered if data
profiling is not done at every stage. 4(CO2)

(c) What are the dimensions of data quality ? Explain in detail any three.

4. (a) What is a Histogram ? In which scenario is a histogram used instead

of a bar chart ? Determine if a bar graph or a histogram should be
used to display the given data. Give reasons for your answer :

(1) A group of kids were asked for their favorite colors : 10 said
red, 5 said orange, 5 said yellow, 1 said green, 15 said
blue and 5 said purple. Let's assume none of them were lying.

(2) Some fish were weighed and their weights were found to be
6 oz, 6 . 5 oz, 7 oz, 7 . 2 oz, 7 . 3 oz and 8 oz.

(3) Happy Meals, Inc. charges $3 for a happy meal while Fast
Food Fishes charges $2 for a happy meal. 5(CO3)

GHXW/MW-22 / 1637 2 Contd.

(b) The following table gives the frequency distribution of the weekly wages
(in thousand rupees) of 100 workers in a factory. Draw the histogram
and frequency polygon of the distribution.
wages 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59 60-64 Total

Number of
4 5 12 23 31 10 8 5 2 100


5. (a) You are the owner of a retail chain. You wish to enhance the productivity
of your store's employees. Give 5 metrics that you will define to achieve
this objective. 5(CO2)
(b) Write a short note on Balanced Scoreboard. 5(CO2)

6. (a) In what scenarios can Hadoop and RDBMS coexist ? Explain.

(b) Define data stream mining. What additional challenges are posed by data
stream mining ? 5(CO4)

GHXW/MW-22 / 1637 3 25

You might also like