Professional Documents
Culture Documents
SCHEME OF VALUATION FOR M.Tech
SCHEME OF VALUATION FOR M.Tech
SCHEME OF VALUATION FOR M.Tech
There are mainly 2 major approaches for data integration – one is “tight
coupling approach” and another is “loose coupling approach”.
Tight Coupling: [2M]
Here, a data warehouse is treated as an information retrieval
component.
In this coupling, data is combined from different sources into a
single physical location through the process of ETL – Extraction,
Transformation and Loading.
Loose Coupling:
Here, an interface is provided that takes the query from the user,
transforms it in a way the source database can understand and
then sends the query directly to the source databases to obtain
the result.
And the data only remains in the actual source databases.
Issues in Data Integration: [2M]
There are no of issues to consider during data integration: Schema
Integration, Redundancy, Detection and resolution of data value
conflicts. These are explained in brief as following below.
1. Schema Integration:
Integrate metadata from different sources.
The real world entities from multiple source be matched referred
to as the entity identification problem.
For example, How can the data analyst and computer be sure that
customer id in one data base and customer number in another reference
to the same attribute.
2. Redundancy:
An attribute may be redundant if it can be derived or obtaining
from another attribute or set of attribute.
Inconsistencies in attribute can also cause redundanciesin the
resulting data set.
Some redundancies can be detected by correlation analysis.
3. Detection and resolution of datavalue conflicts:
This is the third important issues in data integration.
Attribute values from another different sources may differ for
the same real world entity.
An attribute in one system may be recorded at a lower level
abstraction then the “same” attribute in another.
2. a) SNOWFLAKE SCHEMA is a logical arrangement of tables in a [4M]
multidimensional database such that the ER diagram resembles a
snowflake shape. A Snowflake Schema is an extension of a Star Schema,
and it adds additional dimensions. The dimension tables
are normalized which splits data into additional tables.
Characteristics of Snowflake Schema:
The main benefit of the snowflake schema it uses smaller disk
space.
Easier to implement a dimension is added to the Schema
Due to multiple tables query performance is reduced
The primary challenge that you will face while using the
snowflake Schema is that you need to perform more
maintenance efforts because of the more lookup tables.
b) In data warehouses, data cleaning is a major part of the so-called ETL [4M]
process. Data cleaning, also called data cleansing or scrubbing, deals
with detecting and removing errors and inconsistencies from data in
order to improve the quality of data.
7. a) A decision tree is a structure that includes a root node, branches, and [2M]
leaf nodes. Each internal node denotes a test on an attribute, each
branch denotes the outcome of a test, and each leaf node holds a class
label. The topmost node in the tree is the root node.
The following decision tree is for the concept buy_computer that
indicates whether a customer at a company is likely to buy a computer
or not. Each internal node represents a test on an attribute. Each leaf
node represents a class.
Input:
Data partition, D, which is a set of training tuples
and their associated class labels.
attribute_list, the set of candidate attributes. [3M]
Attribute selection method, a procedure to determine
the
splitting criterion that best partitions that the data
tuples into individual classes. This criterion includes
a
splitting_attribute and either a splitting point or
splitting subset.
Output:
A Decision Tree
Method
create a node N;
if Dj is empty then
attach a leaf labeled with the majority
class in D to node N;
else
attach the node returned by Generate
decision tree(Dj, attribute list) to node N;
end for
return N;
[1M]
Tree Pruning
Tree pruning is performed in order to remove anomalies in the training
data due to noise or outliers. The pruned trees are smaller and less
complex.
Tree Pruning Approaches
There are two approaches to prune a tree −
Pre-pruning − The tree is pruned by halting its construction
early.
Post-pruning - This approach removes a sub-tree from a fully
grown tree.
Classification −
A bank loan officer wants to analyze the data in order to know
which customer (loan applicant) are risky or which are safe.
A marketing manager at a company needs to analyze a customer [2M]
with a given profile, who will buy a new computer.
In both of the above examples, a model or classifier is constructed to
predict the categorical labels. These labels are risky or safe for loan
application data and yes or no for marketing data.
Prediction −
Suppose the marketing manager needs to predict how much a given
customer will spend during a sale at his company. In this example we
are bothered to predict a numeric value. Therefore the data analysis
task is an example of numeric prediction. In this case, a model or a
predictor will be constructed that predicts a continuous-valued-
function or ordered value.
******
EDI Documents
Following are the few important documents used in EDI −
Invoices
Purchase orders
Shipping Requests
Acknowledgement
Business Correspondence letters
Financial information letters
Steps in an EDI System
Following are the steps in an EDI System.
A program generates a file that contains the processed
document.
The document is converted into an agreed standard format.
The file containing the document is sent electronically on the
network.
The trading partner receives the file.
An acknowledgement document is generated and sent to the
originating organization.
Advantages of an EDI System
Following are the advantages of having an EDI system.
Reduction in data entry errors. − Chances of errors are much
less while using a computer for data entry.
Shorter processing life cycle − Orders can be processed as soon
as they are entered into the system. It reduces the processing
time of the transfer documents.
Electronic form of data − It is quite easy to transfer or share the
data, as it is present in electronic format.
Reduction in paperwork − As a lot of paper documents are
replaced with electronic documents, there is a huge reduction in
paperwork.
Cost Effective − As time is saved and orders are processed very
effectively, EDI proves to be highly cost effective.
Standard Means of communication − EDI enforces standards on
the content of data and its format which leads to clearer
communication.
b) The difference between horizontal and vertical organizations is [7M]
that vertical organizations have a top-down management structure,
while horizontal organizations have a flat structure that provides
greater employee autonomy.
White Pages
The white pages in a phone book are for personal land line phone
numbers and street addresses in a specific region. The white pages are
organized alphabetically by name, with the surname (or last name) first,
then first name followed by middle name or initial, if applicable.
Everyone with a land line telephone service is registered with the
phone book printer under the name of the phone service account unless
they opt out of the phone book by calling the phone book company and
asking to be on the red list. This red list will stop a person's name from
appearing in the phone book and online on the phone book website.
Yellow Pages
The yellow pages generally follow the white pages in the phone book,
in the back half. The yellow pages are all business listings, with the
name, number and address of local businesses. They differ from the
white pages in that yellow pages are paid listings, meaning that
businesses must pay for the listing in the book and can also pay extra
money for larger more attention-grabbing ads. The second major
difference is that the businesses are first listed by category and then in
alphabetical order by name. For example, Tony's Pizza would be listed
under the "Pizza" category and then between the two other pizza
restaurants that come immediately before and after it alphabetically.
*****