Professional Documents
Culture Documents
Data Warehousing: Version 6.0 - 04/18/2000
Data Warehousing: Version 6.0 - 04/18/2000
• Internal Data
– Data about people, products, services, processes.
– Often stored in corporate data bases (e.g. Sales or HR).
– Some data may be disparate in different regions, but
accessible by networks.
• Personal Data
– Individuals document expertise by creating personal data -
subjective estimates. Some is kept in heads, or mental
models.
– Can be store on PCs, or available on the Web.
• External Data
– Many sources.
• Is not Easy
– collect in the field
– elicit from people
– collect manually, electronically, or by sensors.
• Data collection technology has not kept pace
with advances of data storage technology.
• Data collection from external sources is not easy
either.
• Bottom Line: Garbage IN, Garbage OUT - GIGO.
• Data Quality is an Important Issue.
• Intrinsic DQ:
– accuracy, objectivity, believability, reputation.
• Accessibility DQ:
– Accessibility and access security
• Contextual DQ
– relevancy, value-added, timeliness,completeness, amount of
data.
• Representation DQ:
– Interpretability, ease of understanding,concise
representation, consistent representation
• Internal Data:
– Financial Systems
– Logistics Systems
– Sales Systems
– Production Systems
– Personnel Systems
– Billing Systems
– Information Systems
• External Data Needs:
– to recognize opportunities
– to detect threats
– to identify synergies
M
a
r
k
e
t
Product
Ship
Shipper Ship To Product
Type
Product
District Order Contact Line
Credit Item Location
Sales Customer
Order Location
Product
Group
Contract
Contract Customer Contact
Type
Store_key
store_name
address
floor_plan_type
• Examples of dimensions
– products, salespeople, market segments, business units,
geographical locations, distribution channels, country,
industry
• Examples of Facts or Measures:
– money, sales volume, head count, inventory, profit, actual
vs. forecast.
• Examples of Time:
– daily, weekly, monthly, quarterly, yearly
• Subject-orientation
• integrated
• non-volatile (i.e. not updated)
• time variant (kept for long periods, for forecasting
and trend analysis)
• summarized
• large volume
• not normalized
• metadata
• data sources