Professional Documents
Culture Documents
حل اسئلة استاذ علاء
حل اسئلة استاذ علاء
حل اسئلة استاذ علاء
1. Data staging: Data staging provides a place and an area with a set of functions to
2. DW Refresh:
3. Data granularity:
• For efficient query processing, only some of the possible summary viewsmay be
materialized.
5. Non-Volatile Data: DW is always a physically separate store of data transformed from the
control mechanisms (only two operations required: initial loading of data and
access of data).
4. Measurements : • A multidimensional point in the data cube space can be defined by aset
of dimension value pairs•Time D “Q1”, location D “Vancouver”, item D“computer”.
• A data cube measure is a numeric function that can be evaluated ateach point in the data
cube space.
• A measure value is computed for a given point by aggregating the data
corresponding to the respective dimension value pairs defining the
given point.
Measures can be organized into three categories based on the kind of
aggregate functions used
•Distributive
•Algebraic
•Holistic
5. Semantic standardization: is another major task. You resolvesynonyms and homonyms.
• When two or more terms from different source systems mean
the same thing, you resolve the synonyms.
• When a single term means many different things in different
source systems, you resolve the homonym.
• Data transformation involves combining processes, followed by
sorting and merging.
• Combining data from single source record or related data elements from many source
records.
• Primary keys in the DW cannot have built-in meanings.
6. OLAP: stands for On-line Analytical Processing.•It uses database tables (fact and
dimension tables) toenable multidimensional viewing ,analysis andquerying of large
amounts of data.
•OLAP applications and tools are those that are designed to ask complex queries of large
multidimensional collections of data and provide fast answers to analyze historical data.
• Due to that OLAP is accompanied with data warehousing.
[2x5]
1. Data Characterization : describes data in ways that are useful to the miner and begins the
process of understanding what is in the data.
– The number of classes, the number of observations, the number of attributes, the number
of features with numeric data, type and the number of features with symbolic data typ
• For summary type information, HOLAP leverages cube technology for faster performance.
• It stores only the indexes and aggregations in the multidimensional form while the rest of
the data is stored in the relational database
4. Data Science: is the study of the generalizable extraction of knowledge from data.
5. Non-volatil
6. Neural Network: (computational model) works similar to the human brain neurons.
– Layer of "input" units is connected to a layer of "hidden" units (deep), which is connected
to a layer of "output" units. – Each neuron takes input(s), perform operation(s) and passthe
output to the following neuro
1. Ranking
2. Homonym Standardization: When a single term means many different things in different
• For efficient query processing, only some of the possible summary views
may be materialized.
4. Non-volatile
• In your DW, you want to keep a single record for one customer and link
all the duplicates in the source systems to this single record (this
customer file).
6. Distributive Measures : Merge In this mode, you apply the incoming data to the target
data.
7. Job sequencing : determine whether the beginning of one job in anextraction job stream
has to wait until the previous job has finishedsuccessfully.
8. Business Meta data: It contains data that gives info related to business stored in DW to
users, examples (privacy level, security level and business rules).
Q2: Define the following: (choose SIX OUT OF SEVEN): 6 MARKS
[1x6]
• In your DW, you want to keep a single record for one customer and link
all the duplicates in the source systems to this single record (this
customer file).
independent or dependent.
marts, and so on (solution where data may be physically or logically integrated through
shared key fields, overall global metadata, distributed queries, and such other methods,
4. Periodic Status: Periodic Status: in this category, the value of the attribute is preserved as
thestatus every time a change occurs. At each of these points in time, the status
value is stored with reference to the time when the new value became
effective. This category also includes events stored with reference to the timewhen each
event occurred.
5. Non-volatile
6. Fact Constellation : Sophisticated applications may require multiple fact tables to share
dimension tables.
7. Data Granularities
[2x3]
An induced tree may over-fit the training data. – Too many branches, some may reflect
anomalies due to noise or outliers.– Poor accuracy for unseen samples.Two approaches
to avoid overfitting
– Pre-pruning:
Halt tree construction early ̵do not split a node if this would result in the goodness
measure falling below a threshold Difficult to choose an appropriate threshold
– Post-pruning:
Remove branches from a “fully grown” tree get a sequence of progressively pruned
trees Use a set of data different from the training data to decide which is the “best
pruned tree”
Destructive Merge
• Merge In this mode, you apply the incoming data to the target data.
• If the primary key of an incoming record matches with the key of an
existing record, update the matching target record.
• If the incoming record is a new record without a match with any
existing record, add the incoming record to the target table.
3.5.1.4 Constructive Merge
• This mode is slightly different from the destructive merge.
• If the primary key of an incoming record matches with the key of an
existing record, leave the existing record, add the incoming record,
and mark the added record as superseding the old record.
3. What are the key steps for data mining process?
4. What are the difference between operational systems and data warehouse according to
view and access patterns?
View
• An OLTP system focuses mainly on the current data within an enterprise or
department, without referring to historic data or data in different organizations.
• OLAP system often spans multiple versions of a database schema, due to the
evolutionary process of an organization. Because of their huge volume, OLAP data
are stored on multiple storage media.
• Access patterns
• The access patterns of an OLTP system consist mainly of short, atomic transactions.
Such a system requires concurrency control and recovery mechanisms.
• Access to OLAP systems are mostly read-only operations (because most DWs store
historic rather than up-to-date information), although many could be complex
queries.
Destructive Merge
• Merge In this mode, you apply the incoming data to the target data.
• If the primary key of an incoming record matches with the key of an
existing record, update the matching target record.
• If the incoming record is a new record without a match with any
existing record, add the incoming record to the target table.
C- Explain briefly the difference between base cuboid and apex cuboid?
The cuboid that holds the lowest level of summarization is called the base
cuboid.• The 0-D cuboid, which holds the highest level of summarization, is called
theapex cuboid.
The key of the fact table is the concatenation of the keys of the dimension tables.
• Therefore, for this reason, dimension records are loaded first.
• You have to create the concatenated key for the fact table record from the keys of the
corresponding dimension records.Perform fact table surrogate key look-up.