Professional Documents
Culture Documents
Sensor Data Warehouse Design March 14 2005 MySQL Conf
Sensor Data Warehouse Design March 14 2005 MySQL Conf
Sensor Data Warehouse Design March 14 2005 MySQL Conf
Jacob Nikom MIT Lincoln Laboratory The MySQL Users Conference 2005 19 April 2005
MIT Lincoln Laboratory
This work was sponsored by the U.S. Army Space and Missile Defense Command under Air Force Contract# F19628-00-C-0002. Opinions, interpretations, recommendations and conclusions are that of the author and are not necessarily endorsed by the United States Government.
Outline
Introduction Corporate Information Factory (CIF) and its
Data Management Architecture (DMA)
Outline
Introduction
Reagan Test Site (RTS) and its instrumentation What is RTS Operations Coordination Center (ROCC)? ROCC primary operations ROCC logical component block diagram ROCC modernization New ROCC Data Management Architecture
Displays
Flat Files
Sensors
FORWARD PATH
actuating signal m(t) controlled variable c(t)
+ -
CONTROLLER
PLANT
feedback signal
b(t)
feedback processor FEEDBACK PATH
c(t)
Control is the process of making a system variable adhere to a particular value, called reference value A system designed to follow a changing reference is called tracking control system
Reference Data
Planning
Data Plant
Sensors Simulation
Output Data
ROCC Modernization
Obsolete system hardware
Old central processors and boards are no longer supported Not enough computational power to perform new tasks Old components and interfaces are incompatible with modern technology
Centralized monolithic architecture Flat files for storing data Use of old procedural languages Alphanumeric displays Industry standard 32/64-bit Xeon or Opteron servers Software vendor independence: Linux and Java Database-based storage Distributed architecture using publish/subscribe paradigm Graphical user interface for visualization tools Targeted dataflow rates: 5 MB/s (sustained), 10 MB/s (peak) Data accumulation rate: 1 TB/year
MIT Lincoln Laboratory
7/19/2011 2:32:52 AM 8
Modernized system
Outline
Introduction Corporate Information Factory (CIF) for Data Management
Architecture
What is Corporate Information Factory (CIF)? CIF data flow diagram CIF data CIF layers CIF logical component block diagram
Summary
Reference data
Statistical analysis
Application layer
eComm (tx)
Warehouse layer
Alternative storage
ERP (tx)
DSS applications
Enterprise transactions
CRM (tx)
BI (tx)
DW
ODS
Data marts
Operational reports
Metadata management
CIF Data
External data
Data is defined outside of corporation. Could have erroneous, redundant or unnecessary items Data format is defined outside of corporation. Reformatting could be required
Reference data
Allows to standardize on commonly used names for important and frequently used information Allows consistent interpretation of corporate data across different departments Could be aliases for common and often referred names
Historical data
Volume of data longer history more data Usefulness of data recent data is more useful than the older one Granularity of data older data likely be used on summary level
Corporate timeline
Ancient history Data Recent history Most current activity Immediate future
DW
MySQL Users Conf. 04-19-2005
ODS
7/19/2011 2:32:52 AM
13
CIF Layers
eComm (tx)
Application layer
Interacting directly with end user Gathering detailed transaction data
ERP (tx)
CRM (tx)
BI (tx)
Warehouse layer
Data Warehouse
Subject-oriented Integrated Nonvolatile Time-variant Comprised of both summary and detailed data Summary data optimized for Report & Analyses queries Normalized and de-normalized data
Statistics
eComm (rpt) CRM (rpt) ERP (rpt) BI (rpt)
MySQL Users Conf. 04-19-2005
Statistical analysis
Exploration reporting Data mining reporting DSS analysis and reporting Finance Sales Marketing Accounting MIT Lincoln Laboratory
7/19/2011 2:32:52 AM 15
Reference Data
Corporate Goals
Data Plant
Output Data
Applications
Real-time DSS
Operational Data Store
Corporate Report
Long-term DSS
Data Warehouse
MySQL Users Conf. 04-19-2005
Outline
Introduction Corporate Information Factory (CIF) for Data Management
Architecture (DMA)
Summary
Operational data
Archived data
External world
Operational layer
Planning
Warehouse layer
Secondary storage
Multicast middleware
RIB
RIB
Best Choice
ODS
RIB
Smoother
BET
Post overview
DW
Impact
RIB
Data Fusion
Space
Quick Look reports
Data marts
ROCC Data
External data
Data is defined outside of ROCC. Could have erroneous, redundant, or unnecessary items Data format is defined outside of ROCC. Reformatting or object conversion could be required Comprise geophysics models and constants necessary for external data interpretation Comprise common locations, sensor names, name of computers, programs Comprise the user names, passwords, access rights and privileges Operational data being migrated to the warehouse become historical data Detailed historical data are used to produce summarized historical data Historical data only inserted, never updated Comprise configuration data for the sensors acquisition procedures Comprise ROCC software components configuration data (XML format) Comprise data to plan specific activities to acquire space objects coordinates
MIT Lincoln Laboratory
7/19/2011 2:32:52 AM 19
Reference data
MySQL Users Conf. 04-19-2005
Historical data
Planning data
ROCC Layers
External world
Simultaneous output from multiple sensors up to 10 MB/s Capable to produce data autonomously Capable to work under the guidance of DSS applications Produces data as streams with considerable output rates
Feedback from DSS applications
Plays vitally important role in reconciling the incoming external data content and format with the internal data requirements Converts incoming data into appropriate Java objects Creates necessary metadata Mathematical transformation Reformatting and resequencing
RIB
RIB
RIB
Integrated
Physical unification and cohesiveness Uniform key structures Table naming conventions Common physical units and coordinate systems Data layouts and Metadata
Volatile
ODS data could be updated (replaced) as a normal part of processing. After acquisition session is done the data are moved to the DW
Current-valued
ODS data values are related to the current event (current acquisition session). For the next mission the ODS will be updated and its content will be moved to the DW (data migration)
Best Choice
Detailed
ODS contains inserted values of the published sensor objects and does not expect to have summary data
Smoother
Normalized
ODS contains normalized data
Data Fusion
ODS
Primary System
MySQL Users Conf. 04-19-2005
ODS
Secondary System
DW
Archive System
Necessary operations could be performed during the copying Two operational databases could be used in parallel right after the acquisition
Data Warehouse
Subject-oriented
Organized like ODS around major ROCC entities, but focused on the modeling and analysis of data
Integrated
Data migrated into DW from ODS are integrated with the rest of DW data
Time-variant
Every datum in the data warehouse is identified with a particular time period. All summarized data are correct only for the particular period to whom the corresponding detailed data are identified with
Non-volatile
There are no updates in the warehouse, only inserts. The past cannot be changed, only expanded
ROCC DW specifics
ROCC DW does not use multidimensional data model yet, only summarized tables
Creation of a mathematical model to describe differences between reported and actual antenna pointing positions
Sensor data collection
RIB
Bias
Data migration Analytical queries Bias model coefficients Corrected pointing information
Data Warehouse
Truth Data
Data Warehouse
Observed Data
Generate Residuals
Residual Data
Multivariate Regression
Atmospheric Data
Report
Data Warehouse
Reference Data
Planning
Output Data
Voice
Classification Identification
Operators
Trajectory Estimation
Data Warehouse
Database Selection
The same server should work adequately for both ODS and DW Deficiency in sophistication could be mitigated by custom programming
Comparison criteria
(qualitative values)
MySQL
Oracle
DB2 (IBM)
PostgreSQL
Dialectism
Usage of specific database dialects Deviation from existing SQL standards Locks the user with specific vendor
Outline
Summary
Summary
Modernization of the ROCC calls for a new type of data management architecture
New high-performance hardware Significant increase of generated and managed volumes of data Introduction of new services Designed to support large scale information system Effectively manages different types of information queries Provides flexibility in distributing data between multiple producers and consumers ODS supports near real-time storage requirements and targeted, low granular queries DW is used for complex queries against summary-level data ODS provides information for tactical decisions about near real-time data acquisition DW delivers feedback for strategic decisions leading to system improvements Good performance for fast queries in ODS Capable of storing large amount of data in DW Simple installation and licensing allow many independent servers to run inside one system being used as ODS, DW, data marts, etc. Excellent Java support allows seamless integration with the rest of the software MIT Lincoln Laboratory
7/19/2011 2:32:52 AM 30