Data Management Flows

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39


Corsello Research Foundation

Information Workflow
Information Lifecycle Example

Corsello Research Foundation

Stages of Information
All data must pass through several stages in its lifecycle
Creation or Collection Processing or Review (QA/QC) Use and Re-use Disposal

The main stage is use and re-use, which may result in data creation
Analysis results are new data creations Intermediate data may be directly disposed
Corsello Research Foundation


Information Stores
Data is always stored in some place and format A data store indicates the place in which data is stored
A relational database (e.g. Oracle, SQL Server) is a type of store A network share is another type of store

A data format indicates the internal structure or encoding of data within the store
A pdf file is a type of format A table in a database defines its own format
Corsello Research Foundation


Information Formats
Once data is within a specific format, that format will govern how it may be used
Images (e.g. jpg) can be displayed, but data within them is lost (e.g. text) Documents (e.g. pdf) can be read and indexed for searching, but numeric data within them is lost (e.g. tables) for use Databases allow for data to be transformed into other formats as needed
Unless the database contains pre-formatted content (e.g. pdf file in Oracle)

Data format is critical for data exchange and understanding within computer programs


Corsello Research Foundation

Information Flows
Each of the stages of information will tie to a different human workflow

Actual human workflow should be based upon human need

Technical needs to support next stage data use is secondary
There are no easy buttons, every workflow will take some support


Corsello Research Foundation

Work Flows
Each work area or topic (e.g. water quality, fish counts) will require its own work flow governing data management
Several work areas may end up using the same work flow, but that should be circumstantial rather than planned

Each information phase will have a sub-workflow for a given topic

Water quality will have a master workflow with a separate sub-flow for:
Collection QA/QC (includes loading to databases)

Use and analysis (getting data out of databases)

Disposal (mostly rules on how long to keep data in the database)
Corsello Research Foundation


Work Flows


Corsello Research Foundation

Corsello Research Foundation


For any new project, planning must occur to determine what is to be collected
For each dataset to be collected, there must be a data standard produced for handling that type of data Data standards should be common across all projects for a given data type Data stores may need to be created to support each data type


Corsello Research Foundation

Planning Phase


Corsello Research Foundation

If data standardization is needed, the process involves several aspects:
Identify existing standards
US Federal / US DoD Industry / International

Identify existing formats

COTS tool formats (e.g. Microsoft Word)
Non-COTS tool formats (e.g. DSS)

Model data Evaluate existing related data

Resulting standards and model becomes the norm for the organization
Should be considered a mostly one-time cost
Corsello Research Foundation


Standardization Phase


Corsello Research Foundation

Creation / Collection
Data gets created in several ways:
Field collection Real-time telemetry (e.g. SCADA) Analysis results Report generation

Each form of data creation may need a workflow Field collection is of primary concern due to two primary factors:
Human involvement and potential for mistake / blunder Time component (data re-collected is time shifted)


Corsello Research Foundation

Creation Phase


Corsello Research Foundation

Processing / QA/QC
Once created, most data must be evaluated for quality, correctness
If data is not acceptable, there must be a rejection capability

Accepted data is processed, transformed and loaded into the final information store(s)
This may be a manual or automated process COTS tools may be ideal for this (e.g. Aquarius for water quality)

Each domain of data will be treated differently


Corsello Research Foundation

Processing Phase


Corsello Research Foundation

Use and Analysis

Final data is used in various ways for simple display and for generating additional value-added data Each form of use that results in the creation of a data product is a data use
Analysis (model runs) Reports (synthesized from human review of data)

Results are then treated as newly created data back in the creation phase


Corsello Research Foundation

Use Phase


Corsello Research Foundation

Use and Reuse Cycle

Output of analysis is input to the creation phase Forms a closed-loop cycle

Relations exist
Source - Output Source - Source


Corsello Research Foundation

Implementing a data strategy is an ongoing process These cycles will be developed in concert with the data producers and users Tools will be bought / built as needed to facilitate effective information management There will be several implementation efforts that will span projects
Corsello Research Foundation


Corsello Research Foundation


All field data is collected at a geographic location
If a given location is well-known and used repeatedly, the management of that location provides value A site is a name that represents a location where sampling may take place
All data collected at a specific site can be related back to the site at which it was collected Querying the site will yield the data collected


Corsello Research Foundation

While sites are intuitively a spatial location, locations do not necessarily need to be stored for the site to be useful
If however, the site location is stored (e.g. GIS point)
Querying by location will yield all sites in that location Query by basin (basin stored spatially), will result in all sites within that basin to be returned

In addition to the spatial nature of the site itself, a site boundary can be stored indicating the uncertainty of collections


Corsello Research Foundation

A site will be defined as a named place where some form of collection or sampling may be performed A site may have a spatial location (GIS shape) associated with it
Support for points, lines (transect) and areas (netting area) A second spatial location is allowed (area only) for sampling approximation

Sampling events are associated with sites

One site will support any number of events Multiple types of events (e.g. water quality) may occur at a single site
Corsello Research Foundation


Sampling Events
Any activity of collecting data is a sampling event
A sampling event that occurs at a defined site may be entered and associated with that site

The organization that performs the sampling is associated with the event (e.g. contractor company)
The project that the sampling is being conducted for (paying) is associated with the event


Corsello Research Foundation

Any organized work effort may be a project
All formal work projects are projects
Projects can be nested (sub-projects)

There are two classes of project

Project, an official work project Work Effort, finer-grained effort within a project (task, SOW, etc)

Work efforts can be nested as can projects

Work efforts can be under a project or stand-alone

Projects cannot be under work efforts


Corsello Research Foundation

An organization is a group of people working toward a common goal
Any named group is an organization

Just a formalization for tracking and grouping

Organizations will be managed to track project teams (external agencies) and personnel alignments


Corsello Research Foundation

Contactable Party
Organizations and people can be contacted, and therefore have contact information (email, phone, address) A contactable party will be defined as any of the below:
A person An organization A point of contact
A job role within an organization which may be filled by a person

Some other external thing that has contact information


Corsello Research Foundation

Point of Contact
A point of contact is a simple abstraction of a job or position Allows for a front-desk type of entity that is intermittently filled by various people Each project has a default point of contact
This allows the actual person filling the role to change more easily


Corsello Research Foundation

Data Catalog
There is a current effort to build a card catalog for data within the district The previous slides provide data elements that will be used in the data catalog and as a mechanism for mining all data across the organization The data catalog will become the inventory of data with links to the actual data cataloged


Corsello Research Foundation

Current Model


Corsello Research Foundation

The data catalog concept is still notional at this time

A data model for each of the items in the previous slide are being developed
Once modeled, these data elements may be collected, even without a tool in place for the data Implementation of the tools will be based upon a prioritization
Need for capability Cost to develop

Time to develop
Dependency on other capability


Corsello Research Foundation

Water Quality
Corsello Research Foundation


Water quality data is commonly collected across many projects Collections are commonly performed by contractors Collections use several types of collection methods
Fixed telemetry

Fixed time series (e.g. continual hydrolab)

Grab series (e.g. one-time hydrolab) Grab instantaneous (e.g. handheld probe)

Data elements collected varies by collection

Temperature, TDG, DO, pH, Color, Depth, etc.


Corsello Research Foundation

All collections are performed at some form of site
Instantaneous grab samples may not have well-known sites, but are still sampling events

Sampling events may be continuous such as telemetry and fixed time series
Multi-level samplings occur at a single site (sites have no Z axis)

Sampling events may be scheduled

Create the event, then later add the data
Corsello Research Foundation


The water quality workflow will incorporate several aspects
Many forms of field collection activities
Many forms of data submission (telemetry) QA/QC processes for evaluation Aquarius tool integrated into data process Multiple database insertions
Aquarius database CWMS database


A partial flow for field collection activities (non-telemetry) has been developed


Corsello Research Foundation

Currently Notional


Corsello Research Foundation


What other data areas should be considered?

What other projects should be addressed?

How should external collection activities be managed?


Corsello Research Foundation

You might also like