Download as pdf or txt
Download as pdf or txt
You are on page 1of 83

Architectural Design and Best Practices Project

Final Report and Design Recommendations (A006.1)

Prepared for the Virginia Department of Education

February 28, 2011

Technical Point of Contact:

Louis McDonald | CIO/CTO

louis.mcdonald@cit.org | 703.689.3037

Administrative Point of Contact:

Pat Inman | Contract Manager

pat.inman@cit.org | 703.689.3037

A006.1 Delivverable – Final Report and Desiggn Recommendaations

Conte
ents
Executivve Summary ................................................................................................................................................. 4�

1� Intrroduction ...................................................................................................................................................... 6�

1.1� Study Goal ................................................................................................................................................ 6�

1.2� Project Deliiverables ................................................................................................................................ 6�

2� Research Processs ............................................................................................................................................. 6�

3� Keyy Messages .................................................................................................................................................... 8�

3.1� Stakeholderr Managemeent .................................................................................................................... 8�

3.2� Federated Systems Perfform Poorly .................................................................................................... 9�

3.3� nance ..................................................................................................................................... 9�

Data Govern
3.4� Leveraging Existing Sysstems ............................................................................................................. 10�

3.5� mmercial Systtems .............................................................................................................. 10�

Use of Com
3.6� Multiple Haash Keys............................................................................................................................... 11�

3.7� nts and System Architeccture ......................................................................................... 11�

Requiremen
3.8� Clearly Defiined Securitty Policies ..................................................................................................... 12�

4� hitecture Beest Practice Case Studiess ................................................................................................. 13�

Arch
4.1� partment of Education .................................................................................................... 13�

Indiana Dep
4.2� ducation .........................................................................................................18�

Iowa Deparrtment of Ed
4.3� de Mitigatioon Project...................................................................................................... 20�

Army Suicid
4.4� Texas Educcation Agenccy .................................................................................................................... 23�

4.5� DLA Data Convergencee and Qualityy Project ................................................................................. 26�

4.6� he Universitty of Chicagoo Data Enclaave ...................................................................... 29�

NORC at th
5� bject Matter Expert Interrviews ........................................................................................................... 34�

Subj
5.1� ham ............................................................................................................... 35�

Dr. Bhavanii Thuraisingh


5.2� Paul Carneyy............................................................................................................................................. 36�

5.3� pbell ..................................................................................................................................... 37�

James Camp
5.4� Susan Carteer ........................................................................................................................................... 38�

5.5� h .............................................................................................................................................40�

Raj Ramesh
5.6� man ........................................................................................................................................ 41�

Ron Kleinm
5.7� Peter Dobleer ............................................................................................................................................42�

5.8� Dr. Laura Haas ....................................................................................................................................... 43�

5.9� Dr. Thilini Ariyachandrra ................................................................................................................... 44�

5.10� Dr. Cynthiaa Dwork ............................................................................................................................... 45�

6� SLD w .................................................................................................................... 47�

DS Architecture Overview

Arch n and Best Practiices Project | P a g e | 2


hitectural Design
A006.1 Delivverable – Final Report and Desiggn Recommendaations

6.1� n Functionall Componentts ............................................................................................. 47�

SLDS Seven
6.1.11� Portal ..................................................................................................................................................... 47�

6.1.22� Securitty ...........................................................................................................................................49�

6.1.33� flow ....................................................................................................................................... 50�

Workfl
4�
6.1.4 Reportting ........................................................................................................................................ 53�

6.1.55� n ............................................................................................................................................ 56�

Lexicon
6.1.66� Shakerr .............................................................................................................................................. 57�

6.1.77� Data ................................................................................................................................................... 58�

7� Phyysical Infrastructure ................................................................................................................................ 60�

7.1� Developmen ment ................................................................................................................ 60�

nt Environm
7.2� Test Enviroonment.................................................................................................................................. 61�

7.3� nt.................................................................................................................... 62�

Production Environmen
x A: Secondary Architeccture Best Prractice Case Studies ............................................................ 63�

Appendix
A.1 Illiinois State Board of Educcation ........................................................................................................... 63�

nstruction.............................................................................. 65�

A.2 Noorth Dakota Departmentt of Public In


A.3 Washington Education Reesearch and Data Centerr.......................................................................... 68�

x B: Best Praactice Case Studies Interrviewee Listt .......................................................................... 70�

Appendix
B.1 Ind ucation .......................................................................................................... 70�

diana Departtment of Edu


B.2 Iow ment of Educaation .............................................................................................................. 70�

wa Departm
B.3 Daata Strategiess – Army Suiicide Mitigaation Projectt .......................................................................... 70�

B.4 Teexas Education Agency ........................................................................................................................... 71�

nce and Quaality Project ...................................................... 71�

B.5 Daata Strategiess – DLA Datta Convergen


B.6 NO nclave ................................................................................................................................... 71�

ORC Data En
B.7 Illiinois State Board of Educcation ........................................................................................................... 72�

nstruction .............................................................................. 72�

B.8 Noorth Dakota Departmentt of Public In


B.9 Staate of Washiington Educcation Reseaarch & Data Center ............................................................. 73�

x C: Materiaals Sent to Best Practicess Intervieweees ...................................................................... 74�

Appendix
C.1 Besst Practices Interview Template ........................................................................................................ 74�

C.2 Arrchitectural Best Practice, Design & Planning Su w ............................ 76�

upport Projeect Overview


Appendix ubject Matter Experts ............................................................................. 77�

x D: Materiaals Sent to Su
D.1 Sub
bject Matterr Expert – In mplate .................................................................................... 77�

nterview Tem
D.2 Virginia Statew
wide Longittudinal Data System - Ex mmary ......................................... 79�

xecutive Sum
D.3 Virrginia Statew udinal Data System - Ussage ................................................................... 82�

wide Longitu

Arch n and Best Practiices Project | P a g e | 3


hitectural Design
A006.1 Delivverable – Final Report and Desiggn Recommendaations

Execu
utive Summary
The educcational land dscape has ch hanged dram matically sin
nce the establishment of Statewide
Longitud dinal Data Syystems (SLD DS) throughoout the Unitted States. A grant progrram1 funded by
the U.S. Departmentt of Educatioon, as authorrized by the Educationall Technical Assistance Act of
2002, hass helped to change manyy states’ K-122 data system ms significanntly and mayy, in fact,
revolutioonize the future managem ment and utility of educational dataa. States thatt have
implemen nted these data systemss now have morem accuratte and robust data and an n enhanced
ability too access, anallyze and utillize the dataa in a mannerr previously unavailablee in the past.
These successes, how wever, have been achieveed only throu ugh rigorouss planning, meticulous design,
and assid duous implem mentations.

In 2010, the Virginia Departmentt of Educatioon (VDOE) was awarded d a federal multi-year grrant to
enhance its statewid de data system m, and launcched the Virrginia Longittudinal Dataa System (VL LDS)
project. The project team was ch harged with creating a syystem that will address the needs off all
stakehold ders; additioonally, the teeam faced ch
hallenging Feederal and Virginia security and privvacy
requirem ments. To meeet these chaallenges, VDO OE commisssioned the Center for Inn novative

Technoloogy (CIT) too identify keyy success facctors that coould provide guidance in n the develop
pment
of a fully secure and private, as well as more efficient, SLLDS in Virgin nia.

In order to achieve thhis complex objective, th he CIT projeect team inveestigated othher large datta
integration projects in state educcation agenccies, other goovernmentall agencies an nd in other
industriees. The team examined published infformation from a varietyy of sources, and a gap
analysis was perform med on these reports and d articles to identify misssing informaation and criitical
areas whhere further research musst be conduccted. In addiition, the teaam interview wed nine SLD DS
leaders and a numberr of industryy leaders wh ho managed large integraation projectts. These
interview
ws provided the foundatiion for the compilation
c of best practtices and keyy takeawayss that
were esseential to thee success of each of thesee projects.

Informattion collectedd during thee research annd analysis phase was an nalyzed to id
dentify comm mon
themes and these theemes were orrganized intto a set of beest practices that were in ncluded in a final
report. This process revealed sevveral key elemments2 that were cruciall to ensuringg successful
implemen ntations. Thhe primary th hemes that emerged
e inclluded, amonng others, thee necessity foor
detailed project plan
nning and maanagement (the ( importaance of data governance and stakehoolder
managem ment) as welll as the needd to conductt comprehen nsive researchh and planniing before
implemen nting the tecchnology and creating th he system arrchitecture (the use of coommercial
solutionss and leveragging existingg systems).

1
TheStateewideLongitud dinalDataSysttems(SLDS)GrrantProgram
2
Thefinalreportinclude
edchallengesaandobstaclesttobeavoidedandprovidedrrecommendatiionsforanum
mberof
preliminaryactionitemms.Inaddition,,thereportco
ontainedasupp
plementthatin
ncludedtheco
ompletereporttswith
detailedfindingsforeaachinterviewaanddataprojectresearched..


Arch n and Best Practiices Project | P a g e | 4


hitectural Design
A006.1 Delivverable – Final Report and Desiggn Recommendaations

The CIT project teamm, along with h VLDS subjject matter experts, took k the originaal conceptuaal
SLDS arcchitecture an
nd incorporaated results from
f the besst practices research andd subject matter
expert in
nterviews to develop an implementattion architeccture. The refined architecture (Secction
6) expannded the detaails for functtional compoonents, secu
urity, reportin
ng, and howw data requessts
would bee managed given known n constraintss. The team developed an n understand ding of the fllow of
information, and the necessary workflows too support the different scenarios to support the SLDS
deployment.

All this in
nformation allowed the team to devvelop a physiical infrastru ucture archittecture (Secttion
7). This included thee physical haardware, the location, an
nd the functionality for th he hardwaree.
Followin ng standard lifecycle devvelopment prractices, threee versions of the infrasttructure are
described d; Developm
ment, Test, annd Productioon.

Arch n and Best Practiices Project | P a g e | 5


hitectural Design
A006.1 Delivverable – Final Report and Desiggn Recommendaations

1 Intrroductio
on
1.1 Sttudy Goall
The goal of the Archiitectural Dessign and Besst Practices Project was to provide th he Virginia
Departmment of Educaation (VDOE E) with an up-to-date annd relevant assessment of the best
practicess related to the design, development, deploymen nt, and operaation of a Staatewide
Longituddinal Data Syystem (SLDS S).

1.2 Prroject Delliverables


s
Deliverable
e ID Deescription Delivery Date
t
A001 Monthly Status Repoorts Monthly
A002.1 List of alll SMES interrested in parrticipating in
n October 20110
O
intervieww program
A002.2 Intervieww Schedule aand Interview w Template O
October 20110
A002.3 Preparatiion and pre­reading matterial as requ uired October 20110
O
for Intervview SMEs
A002.4 Consolid dated outputt from intervviews D
December 20
010
A002.5 Templatee for Final Deliverable November 20
N 010
A003.1 Monthly Status Repoorts Monthly
A003.2 PMO Sup pport, to incclude programm Ongoing
documen ntation i.e., W
Work plan, S Scope,
Requirem ments, Sched dule, Risk, an
nd Change
Managem ment plans as requested by the Progrram
Office
A004 Architecttural Best Prractices Report D
December 20
010
A005.1 Workshoop Agenda and Presentation materiaal December 20
D 010
A005.2 Summaryy Workshop p Findings RReport J
January 2011
A006.1 Final Rep port & Desiggn Recommeendations February 20111
F

2 Research Process
s

The CIT project team m applied itss established


d CIT Conneect research and analysiss process to
execute this effort. This rigorouss “best practiice” method
dology includ
des the identtification of and
analysis of informatioon providedd by Subject Matter Expeerts and by comparable SLDS projeccts or
large data integrationns from the public and private sectoor provides high confidennce results.
Additionnally, this meethod also in
ncludes analyysis and con
nsolidation of feedback received fromm the
SLDS staakeholders.
Figure 1: CIT Connect Process

Technolo ogy Technology Recommenndation Feedback


ProjectandRequirements
Identificatio
onand Assessmentand
A Developm
mentand Integrationand
D
Definition
Researcch nterviewProcess
In Consolid
dation ReportFinalization

CIT Con nnect projectts are perform


med under the control of a well-defiined project managemen nt
approach h. This approoach providees visibility into project status at all times via reegular review
ws,
status repports, and in
nterim deliveerables. The CIT Connecct Process, shown in Figgure 1, provid des a
diverse, five-step appproach to souurcing innovvation, rigorous analysiss of alternativves, and a

Arch n and Best Practiices Project | P a g e | 6


hitectural Design
A006.1 Delivverable – Final Report and Desiggn Recommendaations

structureed engineerin
ng methodology for creaating final reecommendattions. The usse of this
structureed multi-step
p approach maximizes thet likelihoood of successs, while redu
ucing risk. Prroject
executionn is guided by a project plan develop
ped and main ntained by the CIT projeect managemment
team.

For the VDOE Archittectural Dessign and Bestt Practices Project, the process begaan with an
analysis of the informmation and requirementss provided by VDOE and d the identiffication of th
he
information and tech hnology areaas to be targeeted for stud
dy. The secon
nd process sttep focused on
initial ressource sourccing by developing a list of candidatee SLDS and large data in
ntegration
projects and Subject Matter Exp perts in the in
nformation and technoloogy areas defined for thee
project. This second process prod duced severaal deliverablles focused on Subject Matter Experrts
interviewws and best practices casse studies thhat were pressented to the Virginia Department of

Educatioon between October 2010 0 and Decem mber 2010.

The Arch hitectural Beest Practices Report (A0


004) focusedd on initial daata sourcingg, both to devvelop
a list of candidate SLLDS and large data integrration projeccts to be anaalyzed and too collect
published analyses, reports, and case studiess on existingg projects. Th he set of can
ndidate projeects,
which was documen nted in Deliveerable A002-2, was com
mpiled by acccessing CIT’ss business
network and by searching our daata resourcess to identify candidate companies an nd organizattions.
This list was delivereed to the Virrginia Department of Edu ucation on October 31, 2010.

The Consolidated Ou utput from Subject Mattter Expert In nterviews (A


A002.4) focu used on initiaal
resource sourcing byy developing a list of canddidate Subjeect Matter Experts in th
he informatioon and
technology areas defiined for the project. CITT researcherss reached ou
ut to CIT’s bu
usiness netw work
in additioon to contaccting these id
dentified exp
perts, who were discoveered through h literature
searches.. The set of candidate Suubject Matteer Experts, which was doocumented in Deliverablle
A002.1, was submitteed to the Department off Education on October 31, 2010.

For bothh reports, CIT T reviewed the sourced materials


m an
nd developed d a gap analyysis of inform
mation
requiremments. The gaap analysis was the basiss for the creaation of survvey tools and d an analysis
framewoork which th hen were used to guide th he intervieww process and d the assessm
ment steps that
followedd. The gap an nalysis also guided the seelection of leeaders of proojects targeteed for direct
ws.3 The infoormation prioorities that emerged
interview e fromm the gap an nalysis were, therefore, used
to classiffy and to priooritize intervview candid
dates via a prrocess basedd upon criterria that incluuded
domain relevance to VDOE, cost, complexityy, technical and businesss maturity, and stakehollder
consideraations. The survey questtions and tarrgeted list off candidates for best praactice interviiews
were proovided to thee Departmen nt of Educatiion as Deliveerable A002--2 on Octobeer 31, 2010. The
targeted list of Subjeect Matter Ex xperts (Deliverable A00 02.1), intervieew schedulee, question

templatee (Deliverablle A002.2), and pre-read ding materiall (Deliverablle A002.3) were provided d to
the Depaartment of Ed ducation on October 31, 2010.

The subssequent phasses of the proocess focuseed on consullting leaders of similar sttate and
commerccial efforts, synthesizingg and analyziing and orgaanizing theirr feedback an nd presentinng
these besst practices to stakehold
ders. Step thrree, the longgest phase off the project,, involved

3
Alloftheeprojectshadpotentialrelevvancetotheefffort;however,itwouldhaveebeenbothinffeasibleand
duplicativetointerviewwtheleaderso ofallofthecan
ndidateprojeccts,giventhisp
project’sshorttimeline.

Arch n and Best Practiices Project | P a g e | 7


hitectural Design
A006.1 Delivverable – Final Report and Desiggn Recommendaations

performing interview
ws and synth hesizing andd analyzing thhe informatiion from botth the Subjecct
Matter Experts and Best Practicees candidatees. During thhis phase, CIIT researcherrs conducted d nine
interview ders of large-scale projectts representiing both thee public and private sectors,4
ws with lead
and ten Subject Mattter Experts.

In step foour, the CIT Connect teaam organizeed and categoorized key leessons learneed from each
h of
the intervviews and highlighted common them mes and uniqque insightss. The CIT Connect teamm
presented d the best prractice and Subject Matter Expert innterview fin
ndings to VDDOE stakeholders
on Decem mber 13, 20100 and integraated feedbacck and guidaance from th
he stakeholdeers to generaate
the reporrts A004 and d A002-4.

Step five involved thee presentatioon of the CIT


T project teaam’s implem
mentation arcchitecture an nd
associateed physical in
nfrastructurre to the SLD
DS stakehold ders on January 27, 2011. Presentatioons
were mad de by the CIIT project team and VLD DS subject matter expertts. The purpose of this
workshoop was to forrmalize an aggreed upon architecture for the SLD DS.

3 Key
y Messa
ages
The threee componen nts of the VD
DOE Architecctural Design n and Best Practices prooject covered
d
three diffferent themees and focus. The Best Prractices inteerviews and analysis centered on thee
implemen ntation and logistical prrocesses invoolved in largge scale data integration projects. Th
he
Subject Matter Expeerts interview ws focused oon technical best practicces for an LD DS architectuural
design.

The folloowing topicss are the key messages an


nd best practices borne out of the Beest Practicess and
Subject Matter Interrviews.

Best Practtices Interviiews Subject Maatter Experts Interview


w
Stakehollder Manageement Federaated Systems Perform Pooorly
Data Govvernance Data Governance
G
Use of Coommercial Solutions
S Use off Commerciaal Solutions
Leveragin
ng Existing Systems Use off Multiple Hash
H Keys
Requiremments Drive System Arch hitecture Clearly Defined Seecurity Policcies

3.1 Sttakeholde
er Manag
gement
When em mbarking uppon a systemms integrationn project, nuumerous stak keholders pllay a part in the
planningg, developmeent, implemeentation, and
d maintenan nce of the sysstem. Knowiing stakeholders’
requirem
ments, expecttations, and resources arre essential to a project’ss success.

During thhe Army Suiicide Mitigattion Project,, Data Strateegies discoveered that maanaging the
stakeholdders becamee an overwheelming task when it cam me to obtaining memoran ndums of
understaanding (MOU U) and data sharing agreeements neeeded prior too the integrattion of a data
nto the system. Data Straategies also found that clear commun
source in nication bettween the prroject
implemen ntation team
m and the staakeholders as well as com mmunicatioon among thee stakeholdeers
was best facilitated by the projecct managers.. The DLA Data Converggence and Qu Quality Projecct

4
Alistofttheorganizatio
onsandcompaanieswithwho
omwespokem
maybefoundin
nAppendixB.

Arch n and Best Practiices Project | P a g e | 8


hitectural Design
A006.1 Delivverable – Final Report and Desiggn Recommendaations

managerss needed to ensure accurrate and tim mely commun nication of project statuss, feedback, and
next step
ps; this creatted a foundation for a poositive collab
borative enviironment. Th his positive
collaboraative environ
nment amon ngst the teamms contributeed to the oveerall successs of the projeect.

The Indiaana Departmment of Educcation attem mpted to gathher stakehold


der requirem
ments througgh
large, moonthly meetings before discovering tthat meetinggs with indivvidual stakeh
holder group
ps
proved too be the morre effective and led to inccreased buy--in.

Lastly, ass Illinois is currently in the design sttage of its SL LDS, the Illin
nois State Booard of Educcation
has hiredd a consultin ng firm to perrform some stakeholderr managemen nt. The consu ulting groupp is in
the midst of gatherin ng the techniical and proggram inform mation for eacch of the 13 data systemss that
will be in
ntegrated intto the Illinoiis SLDS and this informaation will haave a direct impact on th he
SLDS’ finnal architectu ure.
(Army Suiciide Mitigationn, DLA, Illinoiss State Board of Education, Indiana Department of Education)

3.2 Fe
ederated Systems
s Perform
m Poorly
Federated systems su uffer in perfoormance more than a cen ntralize dataabase. Requiirements and d
queries should be plaanned prior to building the system to maximize performancce. Usually with a
distributted databasee model, dataa converges into a wareh house for simmpler analysiis. A global
schema is defined thaat allows forr an easier coonvergence of the data. The federateed model for the
SLDS can nnot allow foor permanen nt convergennce, nor is it likely that a global scheema will be
developeed that encom mpasses all data sourcess. The disparrate data and d network im mpacts on th he
distributtion of the daata can impaact the overaall performannce of the fedderated archhitecture.
(Ariyachaandra, Doblerr, Haas)

3.3 Da
ata Goverrnance
The impoortance of daata governan nce was a common message through hout the courrse of this efffort.
Data govvernance ofteen is viewed as a large in nitial effort foor many data integration n projects. For
systems that continu ue to expand d and to add data sourcess, however, it will be an ongoing effoort,
one that,, our intervieewees noted d, is often und derestimated d. In most caases, an SLD
DS effort musst
accommoodate a num mber of disparrate stakehoolders and soources and, thus, requirees a higher th han
normal leevel of effortt to identify data ownersship and oveersight to enssure its accu uracy and
security. The fact thaat each data source will have its own n data govern nance createes an additioonal
layer of complexity when attemp pting to creaate and manaage data governance. Prior to
implemen nting a stateewide system m, it is critical for the staakeholders to agree on who owns th he
data in th
he system, who will overrsee and maiintain the syystem and who will apprrove output and
requests for access.

The Nortth Dakota Department of Public Instruction and d the Washin ngton Reseaarch and Datta
Center arre two projeects that are in the early stages of building their data integraation systems.
Both of th
hese projectts have had difficulties m
moving forwaard with thee implementaation of the
system due because of stakehold ders’ inabilityy to agree up
pon the ruless of the LDS’’ data govern
nance.

The Armmy Suicide Mitigation and d the DLA Data Converggence and Quality Projeccts were proojects
that stresssed the imp
portance of data governaance during the early plaanning stages prior to
implemen ntation and continue to emphasize these elemen nts as data sources are added. With the

Arch n and Best Practiices Project | P a g e | 9


hitectural Design
A006.1 Delivverable – Final Report and Desiggn Recommendaations

addition of a new datta source, th


he project maanagers musst understand the new soource’s
governannce and howw its integratiion will affecct the LDS’ overarching governance..

Appropriiate data govvernance, paarticularly in


n a federated
d model, is keey to ensurin
ng that the data,
the linkaages of data, and perform
mance of the system are optimized. Once the SLD DS architectuure is
implemen nted and operational, it is important to monitorr the types of queries exeecuting in th
he

system. This monitorring allows for tuning off the system to improve its performaance, and to
understaand how the security moodel is enforccing the rulees. It is not un
ncommon th hat rules willl
need to be tweaked as data goverrnance contiinues to evollve.

Althoughh the Virginiia SLDS’ stak keholders haave a strong understandiing of the syystem’s baselline
data goveernance, a nu umber of facctors will inffluence the need for upfrront and onggoing efforts. The
Virginia SLDS will ad dd future sources and th his will requiire both a riggorous upfroont effort to
minimizee rework and d redesign annd will neceessitate ongooing efforts when changees are made to the
existing data sourcess or when neew data sourrces are addeed.
(Army Suuicide Mitigattion, DLA, North Dakota SL
LDS, Texas Edducation Agency, Washingtoon Research annd Data
Center)

3.4 Le
everaging
g Existing
g System
ms
Some SLD DS projects were able too leverage ex
xisting system
ms to becom
me the found
dation of the SLDS
system; this saved tim
me and resouurces duringg the design and implem
mentation staages of the prroject.

Indiana was able to leverage an existing systtem, Learning Connectioon, and expaand on its
capabilitties. Initially, Indiana had
d not planneed to expand d its Learnin
ng Connectioon portal, as it

was builtt primarily ffor teacher networking aand was not intended to be a workin ng data systeem for
other staakeholders. However, du ue to politicaal conflicts, Learning Connection evoolved into su uch a
system. Indiana’s LD DS project waas begun by the previouss administraation, and th he state’s new
w
leadershiip originally planned to eliminate Leearning Conn nection. Thee Indiana Deepartment off
Educatioon, however, argued thatt starting oveer with a new w data systeem for K-12 would not be
cost-effecctive. In the end, Learninng Connectiion was mod dified to be used as a colllaboration siite
and as a K-12 data management system.

North Daakota also was able to leeverage an ex xisting data warehouses to avoid “reeinventing th he
wheel.”5 After surveyying what syystems existeed in the state, the North h Dakota SLLDS team
discovereed that theirr legacy K-122 system hadd the techniccal capabilitiies to form th
he LDS
foundatioon. This K-12 warehousee will be exp panded into an LDS and will collect information n from
other ageencies. By bu
uilding out th he K-12 dataa warehousee into an LDS S, North Dak kota’s team saved
time and
d money in th he project, which will ennable them to focus on other techniccal and non­
technicall issues (such as linkages between other data systems).
(Indiana Department of Education, North Dakotaa Department of Public Instruuction)

3.5 Us
se of Com
mmercial Systems
s
Our interrviews reveaaled both positive and neegative consequences of using comm mercial off-th
he­
shelf solu
utions; thesee commercial solutions can be a beneefit, saving agencies timee from build ding
its own solutions, buut they can also limit thee versatility and expandaability of thee system.

5
Korsmo,T.(2010,Octo
ober26).Telep
phoneInterview
wwithRonaJo
obe.

Archiitectural Design and Best Practicces Project | P a g e | 10


A006.1 Delivverable – Final Report and Desiggn Recommendaations

The Indiaana Departmment of Educcation began n their projecct with an Oracle platforrm for their data
warehou use, but eventually switched to a SQL L platform. . After a yearr’s effort, thee project stafff
realized that Oracle was not meeeting their needs, was ovverly compliccated, was not user-frien ndly,
and was extremely ex xpensive. Thhe team restarted with a new solutioon and had to perform reework
because of the comm mercial solutiion they initially chose; however, sinnce their movve to the SQ QL
platform, the SLDS has progresseed rapidly an nd has perfoormed well.

The Iowaa Departmen nt of Educatiion purchaseed an off-thee-shelf data model for th


heir SLDS. Thhis
model was adopted prior to Iowaa receiving tthe SLDS aw ward, when th he system was focused on the
K-12 spacce. After decciding to exp
pand their effforts to P-166, the SLDS team found that the dataa
model th
hey had purchased did noot work as aan effective model for thee higher education data
within th
he state. In order to integgrate the higgher education data, the Iowa team is investigating
whether to purchasee additional commercial data modelss or to build their own cu ustom data
models (bboth which will requiree additional ffinancial and d man hour resources).

The Texaas Education n Agency purchased com mmercial off--the-shelf soolutions in orrder to devellop a
public-faacing Web siite where ussers can acceess the data from the SLD DS. The Tex xas SLDS team m
found thaat commerciial software provided ad dequate toolss that alloweed them to maintain thee Web
site, whille minimizinng maintenan nce resourcees.
There aree few off-thee-shelf solutiions that aree able to perfform data in ntegration on n-the-fly in
federatedd databases, but the spacce continuess to grow. Major databasse vendors, e.g. IBM, Oraacle,
have fedeerated datab base managem ment system ms that are ab ble to assist with the inttegration
requirem ment.
(Indiiana Departm
ment of Educatiion, Iowa Depaartment of Eduucation, Texass Education Aggency, Ariyachhandra,
Haas, Ramesh)

3.6 Mu
ultiple Ha
ash Keys
s
Encryptiion of person
nal identifiab
ble information (PII) usiing one-wayy hashing waas discussed as a

method for protectinng an individdual’s identitty. Subject Matter Experrts mentioneed that usingg
various data to createe hash keys can provide a number off options forr greater recoord matching.
Techniquues can inclu
ude combining multiple values into a single hash h, or creatingg multiple hashes
that can be used for comparison.
(CCarney, Carteer, Dobler, Kleinman)

3.7 Re
equireme
ents and System A
Architectu
ure
Reportinng and usagee requiremen nts should deetermine thee type of archhitecture to be built and
d
identifyinng these elemments of thee system earlly in the desiign and deveelopment phhases will savve
time andd money. Datta warehousiing specialissts at Claraviiew,6 emphaasized duringg an intervieew
that know wing how th he system shhould perform m and what functions will be requirred will drive the
architectture of the syystem. In esssence, the architecture of an LDS shoould be deteermined largeely by
what an agency wantts it to do. The Claraview w team cauttioned that, as they havee witnessed with
other staate departmeents of educaation, failuree to identify and addresss system andd stakeholderr
needs adequately willl result in a failed or less than optimmal LDS.

6
Claraview
wisabusinesssintelligenceanddatawareh
housingconsulttingorganization.See
www.claraview.com/dnn/
http://w

n and Best Practices Project | P a g e | 11


Archiitectural Design
A006.1 Delivverable – Final Report and Desiggn Recommendaations

The Indiaana Departmment of Educcation team cconcurred th hat design shhould be deppendent upoon
functionss, or how thee agency plaans to use thee system. Th
hey reiterated that since all states haave
differing reporting reequirementss and needs, n no one desiggn solution will be approopriate for alll.
This requuirement shoould be closeely aligned w
with identifyying stakehoolder needs.
(Indiana Department of Education)

3.8 Cllearly Deffined Sec


curity Pollicies
Institutin
ng security polices for thhe protection n of data to prevent the possible ideentification of a
person iss critical to the success of the system
m. SMEs statted that secu urity policiess and measurres
need to be defined cllearly. Securiity policies, in combinattion with thee database seecurity, can

maximize the protecction of sensiitive data. To


o rely only on database security toolls would be short-
sighted. It is importaant to review
w all aspects
of security, including op
perating systtem hardeniing
practicess and networrk device con nfigurations

s.
(Dwork, Kleeinman)

Archiitectural Design and Best Practicces Project | P a g e | 12


A006.1 Delivverable – Final Report and Desiggn Recommendaations

4 Arc
chitectu
ure Best Practice
e Case Studies
The goal of the Archiitectural Dessign and Besst Practices Project was to provide th
he VLDS teaam
with an up-to-date and relevant assessment of the best practices relaated to the design,
developm
ment, deployyment, and operation of a Statewide Longitudinaal Data Systeem.

The Centter for Innovvative Technnology (CIT)) was commiissioned by VDOE to conduct researrch on
similar loongitudinal database devvelopment efforts or largge data integgrations acrooss disparatee
organizattions. Addittionally, CIT
T was tasked d to produce best practicce recommen ndations that
would innclude the id
dentification of risks andd impedimen nts in buildinng an LDS. Based on the
information collected d from the nine individuual case studies,
d the projeect team con
nsolidated
themes, lessons learnned and bestt practices.

4.1 Ind
diana Departmentt of Educ
cation
State/Agency: Indiana Deepartment of Education
Web Sitee: http://ww
ww.doe.in.govv/data/
Address: 151 West Ohio Street
Indianapoolis, Indiana 46204
POC: Molly Chaamberlin
Director of Data Analyysis Collection and Repoorting
POC Phoone: 317-234-68849
POC Email: mchamberr@doe.in.govv

CaaseProfile

Stud
dentEnrollm 1477
ment:1,046,1
88
Teacchers:62,668
88,2609
LDSGrant:$5,18

Backgrou
ound
In 2007, the Indiana Departmentt of Educatioon (IDOE) was awarded d approximattely $5.2 milllion
to create a compreheensive P-20 data system. For its LDS, IDOE envissioned a systtem that woould
“allow daata integration at all leveels and woulld enable staakeholders too track and to analyze
student achievementt and attainm ment from eaarly childhoood through higher educaation and
beyond.”10 The main objectives of Indiana’s LDS were to improve datta quality; prrovide

7
Stateedu
ucationaldata
aprofiles.(n.d.)).Retrievedfro
om
http://nces.ed.gov/proograms/stateprofiles/sresult.asp?mode=sh hort&s1=18
8
Ibid.
9
Statewid
delongitudinaldatasystemg grantprogramgranteestateeIndiana.(n.d.).Retrievedfrom
http://nces.ed.gov/proograms/slds/sttate.asp?stateaabbr=IN
10
IndianaP
P20ComprehensiveDataSyystem.(n.d.).R Retrievedfrom
http://nces.ed.gov/proograms/slds/pd df/Indianaabsttract.pdf

Archiitectural Design and Best Practicces Project | P a g e | 13


A006.1 Delivverable – Final Report and Desiggn Recommendaations

longitudiinally linked
d data to be used to drivee policy deciisions; and too make the data user-friiendly
for teach
hers, principaals, superinttendents, and
d other stakeeholders.

The projeect involved d many differrent stakeholders, primaarily: the Loccal Education
n Agencies, IDOE
data warrehouse, inteernal IDOE staff, Departm ment of Workforce Deveelopment, an nd higher
educationn (e.g., Ivy Tech, Indianaa’s statewidee communityy college nettwork). Indiirectly, the
project’s stakeholderrs and consu umers were policy makerrs, legislatorrs, parents annd students.

Initially, IDOE held monthly meeetings for th he LDS’ stakeeholders. Th hese stakehollders includeed
representtatives from
m every majorr division in the IDOE, sp pecial educaation, languaage minorityy, Title
1, Curricuulum and In
nstruction, Data Reportin ng, Student Services, Teechnology, 40 0 fellows froom
different school systeems across the state, state administrrators, etc. Because of th
he large size of the
group, thhe meetings became “tooo involved an nd unproducctive.”11 An outside evalu uator suggestted
performing a series of interviewss with the inndividual grooups instead of holding large stakehoolder
meetingss.

IDOE con nducted inteerviews withh each of thee stakeholdeer groups andd asked abouut their visioon in
an LDS inn terms of fu
unctionality and design. During this process, IDO OE acted as intermediarry and
a champiion for each group. As a result, not only
o was IDO OE able to gaather pertineent input froom
each stak
keholder grooup, but theyy obtained buy-in
u from the stakehold ders. IDOE synthesized the
feedbackk data and crreated a smalll functionall committee that assisted
d in the day--to-day decissions
of buildin
ng and desiggning the LD
DS.

KeyyTakeaway

Inputfromstakeholderrsisessentiaaltodesigniingalongitu
udinaldata
system.The
erequireme entssetforthhbythestakeholdersh help
determinetthearchitecctureandfunctionalityoofthefinalssystem.

The IDOE team faced d a number of obstacles while buildiing Learningg Connection ns and the data
12
warehou use. First, thhe team founnd that theirr original plaatform, Oracle, was expeensive and time­
consumin ng to learn. Once the plaatform was switched to SQL, howevver, the projeect progresseed
smoothlyy. Another difficulty wass the changee in Indiana’ss administraation. Once the new
administtration took office, the LDS team waas forced to defend the need for an LD DS and present an
overvieww of what a data warehou use is, how it should fun nction; and what had beeen done untiil that
point. In spite of IDO
OE’s presentations, the new adminisstration was still uncertaain on what to do
with an LDS system, particularlyy Learning Connection. 13 After severral discussioons, the IDOE
LDS teamm persuaded d the adminisstration to expand Learn ning Connecction to becoome an LDS tool.
The new administrattion respond ded and requ uested additiional changees – with thee evaluation

11
Chamberlin,M.(2010,October25).TTelephoneInteerviewwithRo onaJobe,CIT.
12
LearninggConnectionsw wasbuiltinap
pproximately188months,whilethewarehou usewasbuiltin15months.
13
TheorigginalLearningC
Connectionwaasaninteractivve,networkinggsite,andthen
newadministrationwantedtto
closeitbeccausetheydidnotseeitsvallue.

Archiitectural Design and Best Practicces Project | P a g e | 14


A006.1 Delivverable – Final Report and Desiggn Recommendaations

process and switchin ng from Oraccle to SQL. After negatioons, the LDS
S team was able to retainn their
evaluatioon system, bu
ut replaced their Oracle platform wiith SQL. Aftter securing the new
administtration’s consent and appproval, alongg with the aggreed-upon changes, thee linkage of the
two systeems, Learninng Connectioon and the warehouse, was relativelly short and seamless.
Additionnally, IDOE also created a help desk for Learningg Connection n and its pub
blic reportin
ng
system, DOE Compaass; both help p desks havee email addreesses to whiich users can
n submit
questions.

System Design and


d Architecturre
Indiana’ss LDS is an amalgamation of multiple systems. Itt is compriseed of three main data
warehou uses and porttals: Learnin
ng Connectioon, IWIS (In ndiana Work kforce Intelliigence Systeem),
and the IDOE Data Warehouse. Additionallyy, a public siite, IDOE Coompass sharees aggregatee
reports and data. Thee IDOE Com mpass systemm accesses coopied tables and rolled-u up data fromm the
IDOE Warehouse, which are shaared with thee public. Avaailable data sets includee number of
students and teacherrs in Indianaa as a whole, and in certaain districts. Access to ceertain data sets is
restricted
d to authorizzed users.

IDOE Datta Warehouse


The IDOE Data Warrehouse is an n internal entterprise dataa warehousee that is builtt on an SQL
platform. The warehouse projectt commenced d two years ago and emp ployed an Orracle platforrm
and Oraccle Business Intelligencee tools. This architecture
a e was chosenn because of the
recommeendations of the Indiana Office of Teechnology, prior to the new administration. Afteer a
year, IDO
OE reevaluatted the systeem’s performmance and cooncluded thaat the Oraclee tools did noot
meet their needs. Acccording to thhe LDS stafff, the Oracle tools were “not user frieendly,” were
“overly coomplicated [and] extremmely expensiive.” Therefoore, the team
m investigateed other soluutions.
One prod duct they considered as a reporting platform
p waas SharePoin
nt, but foundd the softwarre too
expensivve.

When th he new administration toook office, th he Data Anallysis, Collection, and Reeporting Offiice
briefed th
he new CIO on the prob blems they had encounteered in build ding the dataa warehouse and,
under hiss guidance, the data warrehouse was moved to th he SQL platfform. After the conversioon to
the SQL platform, IDDOE was able to create data marts in n-house. Theey currently are using
Microsofft SSRS and SSAS and, th hus far, havee not encoun ntered the prroblems theyy experienceed
with Oraacle and noteed that thesee tools are “vvery, very eassy to use.”14

Learning Connection
Originallly, Learning Connection n was built ass an interacttive site for Indiana teach
hers to exchhange
information on lesson n plans, tech
hniques, and
d resources – similar to a social-netw working site..
Throughout the courrse of Indian na’s LDS projject, Learninng Connectioon evolved in nto a learnin
ng
managem ment tool thaat provides data to stakeeholders at the local leveel. In essencee, Learning
Connectiion is a portal where teaachers and ad dministratorrs can accesss standards--based activiities,
share lessson plans, an
nd communiicate with otther teacherrs. Additionaally, Learningg Connectioon also
allows teeachers to acccess their cu
urrent studeents’ longituddinal data. Currently, thhe system can n run
simple reeports, but thhe IDOE teaam is workin ng on expand ding it to havve more commplex-reportting
capabilitties. Learningg Connectioon also interaacts with thee Data Wareehouse by pu ulling “copieed”
14
Ibid.

Archiitectural Design and Best Practicces Project | P a g e | 15


A006.1 Delivverable – Final Report and Desiggn Recommendaations

data from
m the warehoouse. Once the data has been cleaned d, checked, verified, and
d loaded intoo the
warehou use, Learningg Connection
n retrieves reeports directtly from the warehouse.

IWIS
The Indiaana Workfoorce Intelligeence System (IWIS) is a separate datta warehousse that is link ked to
K-12 and
d post-second dary data. According to the Indiana Workforce Developmen nt website, IWIS
“began byy integratingg disparate data sets fromm within thhe Departmen nt of Workfforce
Developmment to then n integratingg [the] resultting new datta with inforrmation from
m the Comm mission
15
for Higheer Educationn.” This datta warehouse is outside of the IDOE E and is run by Indiana’s
Departmment of Work kforce Devellopment. IW WIS was addeed to the LD DS project aftter Learning
Connectiion and the IDOE Data Warehouse. Originally, the Indiana SLDS team planned to
populatee the IDOE Data Warehoouse with lin nked data. However, afteer discoverinng that the
Departmment of Work kforce Devellopment and d members off the higher education coommunity had
systems of their ownn, the SLDS team decided d to link them with Indiiana’s LDS.

Theoreticcally IWEIS
S also will pu
ull its data frrom the dataa warehouse;; however, ID
DOE currenttly is
strugglin
ng with acqu
uiring accesss to data fromm its Departm ment of Woorkforce Development,
althoughh Ms. Chambberlain did not elaboratee on this poin nt.

Securityy
Informattion within the Data Waarehouse is id dentifiable, however, whhen other syystems pull
information from thee warehousee, the warehoouse creates a set of tables from the identified daata
that is dee-identified and aggregatted. In essen
nce, systems do not actually access the source daata
directly. For examplee, Learning Connection only accessees tables thatt have been created and
copied frrom the wareehouse. Add ditionally, the warehousee and Learning Connectiion utilizes role­
based permissions; educators in Learning Coonnection haave access on nly to their current stud
dents
and admiinistrators have access only to the sttudents currrently attendding their schools.

Moreoveer, the sourcee data in the warehouse can be accesssed only byy certain IDO
OE personnell who
have app
propriate perrmissions – approximateely four peopple. Ms. Chaamberlain deeclined to dissclose
what oth
her security measures haave been impplemented.

Data Usa
sage and Rep
porting
The wareehouse housses five yearss of data, whhich represennts approximmately one million publicc
school sttudents’ recoords and 65,0000 non-pub blic school students’ records. Assesssments data are
generated d once a year – this inclu
udes enrollm
ment and oth her data requ
uired to geneerate the statte
report caard and reports to the fedderal govern
nment.

The publlic may view w prepared agggregated daata sets by scchool and byy district, as well as pubblic
reports thhrough the Compass daata site. The system accessses copied tables (rolled up data) to
generate these reportts and aggregated data.166 Certain datta from the IDOE Comp pass site are
accessiblle only to reggistered userrs. However, other unideentified, agggregate data sets that aree not
readily avvailable throough the sitee can be requ
uested. Depeending upon n the size of the request, this

15
Chamberlin,M.(2010,October25).TTelephoneInteerviewwithRo onaJobe,CIT.
16
Thesysttemisnotacce
essingdeidenttifieddata,buttratheraggreggateddata.

Archiitectural Design and Best Practicces Project | P a g e | 16


A006.1 Delivverable – Final Report and Desiggn Recommendaations

data usuaally can be provided witthin 2 to14 b


business dayss.
(http://coompass.doe.in.gov/Dash
hboard.aspx??view=STATE&val=0&d
desc=STATE

Researchhers also mayy submit req quests for larrge data setss. IDOE has a number of “canned rep ports”
for researrchers (e.g., enrollment by school in the last fivee years). Howwever, if reseearchers requ uest
student-llevel data thhat is de-iden
ntified, this rrequest is prrocessed by the legal deppartment and d
must signn a data sharring agreemeent. Once th he legal deparrtment apprroves the ressearcher’s req quest,
s/he mayy use the onliine data requuest system for IDOE Coompass. Thee request is entered into a
queue annd IDOE perssonnel retrieeve and revieew the de-id dentified data before it iss released.
Currentlly, the IDOE team is expanding the ssystem’s repoorting capab bilities. Theyy note that had
they takeen into accou unt the typees of reports and departm mental requiirements from m the LDS in n the
beginninng, building and expandiing the systeem would haave been easiier.

Lessons Learned
Throughout Indiana’’s LDS projecct, the LDS teamt found practices thaat have helped along thee way.
First, IDO
OE discovered the efficaacy of buildin ng upon legaacy systems like Learninng Connectioon
because it saved timee and moneyy. IDOE also found that prohibiting other linked d systems to
access soource data ennhances secu urity and preeserves a con
nsistent “tru
ue” record. Data cleaningg is
imperativve. Lastly, acchieving stak
keholder buyy-in and gainning feedbacck is importaant in buildiing a
system. Interviewingg different sttakeholders and represen ntatives indiividually proovided IDOEE
substantive informattion on stakeeholder requ uirements (e.g., types of reports and how the sysstem
should perform). Furrthermore, ID DOE found that gainingg stakeholderr buy-in is allso importan nt for
a smooth h LDS implem mentation.

KeyyTakeaway
Buildingonlegacysystemssavestimeandmo oney,e.g.,tu
urning
LearningCoonnection,in
nitiallyateaachernetwoorkingsystemm,intoa
datamanaggementtoollratherthan neliminatinggthesystem
mand
startingove
er.

Archiitectural Design and Best Practicces Project | P a g e | 17


A006.1 Delivverable – Final Report and Desiggn Recommendaations

4.2 Iow
wa Deparrtment off Education
State/Agency: Iowa Depaartment of Education
Web Sitee:
http://ww
ww.iowa.gov/educate/in
v ndex.php?op ption=com_coontent&view
w=article&id
d=1691:edinssight
&catid=445:data-colleections&Item mid=2490

Address: 400 E 14th


h St
Des Moinees, Iowa 503319
POC: Jay Pennin
ngton
Bureau Chhief
POC Phoone: 515-281-48837
POC Email: jay.pennin
ngton@iowa.gov

CaaseProfile

Stud
dentEnrollm 5917
ment:487,55
Teacchers:35,96118
77,45919
LDSGrant:$8,77

Backgrou
ound
In 2008, Iowa initiateed a project to create Ed dInsight, thee Iowa Deparrtment of Ed ducation’s (IDE)
K-12 centtralized dataa warehousee. EdInsight integrated seven years of historical data from Prroject
EASIER (student levvel enrollmen nt and curricculum data), IMS (speciial education n data), and the
Iowa Tessting Program m (student assessment data). The in nitial budgett for the projject was $1.22
million and had a tottal implemen ntation cost of $2.9 milliion through FY2009. In May 2009, the
project was funded by an $8.78 million SLDS S grant whicch would be used to incrrease the scoope
and functionality of EdInsight too be interopeerable with postsecondaary data systems or to creeate a
consolidaated P-16 daata system. The LDS team m plans to addd additionaal sources of information n such
as teacheer, financial, transcript, workforce, disaster mitiggation, and additional asssessment data.
EdInsighht is still in itts statewidee rollout phase.

System Design and


d Architecturre
IDE decid ded to use a commerciall off-the-shellf (COTS) data
d model foor the EdInsiight project
because this particullar data moddel was desiggned, specifically, for usee in the K-122 space. Duriing
the desiggn process, th
he IDE team
m discovered that some of the data within the sysstem did nott fit
the COTTS model, parrticularly, th
he post-seconndary data. Eventually, however, thiis post-secon ndary
data wass integrated into EdInsigght.

17
Stateeducationaldata
d aprofiles.(n.d.).Retrievedfro
om
http://nces.ed.gov/proograms/stateprofiles/sresult.asp?mode=sh hort&s1=19
18
Ibid.
19
StatewiddelongitudinalldatasystemggrantprogramgranteestateIowa.(n.d.)
t ).Retrievedfro
om
http://nces.ed.gov/proograms/slds/sttate.asp?stateaabbr=IA

Archiitectural Design and Best Practicces Project | P a g e | 18


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Securityy
Security is managed through thee use of role-b
based accesss and trainin
ng.

Data Usa
sage and Rep
porting
EdInsighht’s data is ussed to condu
uct analyses and producee reports forr education stakeholderss,
such as the IDE staff who are graanted access to data in preformatted d reports andd advanced data
analyses,, depending on their rolee and permisssions. Curreently, over 1550 users havve been trained
and givenn access to thhe system annd more than n a dozen prre-formattedd reports havve been
developeed. At this tim
me, there aree no plans too allow userss to perform ad-hoc querries.

A portionn of the SLD


DS grant willl fund the creeation of a public portal. This portall will provid
de
aggregatee-level data that will be accessible th
hrough the Web; howevver, the portaal has not yeet
been devveloped.

Lessons Learned
Mr. Pennnington stateed that gainiing buy-in att the regionaal level was critical to th
he current su
uccess
of the prooject and wiill continue to play a keyy factor durin
ng its statew
wide rollout. He observed that
COTS prroduct did not meet IDE E’s needs andd took longerr to load andd to format the data. To
combat this problem m, the team iss investigatinng whether to purchase or develop additional
modules that will fit the post-seccondary and d workforce data that ID DE intends too integrate in
nto
EdInsighht.

KeyyTakeaway

Commerciaalofftheshe elfsolutionssmustbeevvaluatedcarrefullyand
setagainstthesystem’’scurrentan ndfuturereqquirementsinorderto
attaintheirrcostandtiimesavingb benefits

Archiitectural Design and Best Practicces Project | P a g e | 19


A006.1 Delivverable – Final Report and Desiggn Recommendaations

4.3 Arrmy Suiciide Mitiga


ation Pro
oject
State/Agency: Data Strattegies
Web Sitee: http://ww
ww.datastrategiesinc.com m
Address: P.O. Box 772
Midlothian, Virginia 23113
POC: Susan Carrter
Managingg Partner
Kevin Corrbett
Managingg Partner
POC Phoone: 804-965-00003
POC Email: SCarter@D DataStrategiiesInc.com
KCorbett@ @DataStrateegiesInc.com
m

CaaseProfile

#ofRecords:Un
navailable
Proje
ectBudget:Unavailablee

Backgrou
ound
Due to ann increase in
n suicides, thhe United States Army hired Data Sttrategies to design and pilot a
prototyppe data system m that woulld gather infformation froom disparatee sources in order to identify
predictorrs of potentiial suicides. The goal of the
t pilot prooject was to utilize this information to
establish
h a means to stem the number of suiccides and suiicide attemp pts. In order to achieve th
his
goal, Arm
my leaders reealized that they would needn an inteegrated data environmen nt that would
d
provide accurate and d reliable datta for analysiis.

The projeect team’s ch hallenge wass to develop a system thaat would relly on numeroous databasees,20
both govvernment and d private, th
hat had not been
b linked previously. The system fiirst must analyze
historicaal data of suiccide cases frrom 2001 to 2008
2 in ordeer to determiine if there are any
commonalities. Thesse commonallities then will w be match hed against the records of current solldiers
in the hoope of identiffying those who may be at a high risk for suicidee.

In terms of database managemen nt systems, although


a the sample grou up was relattively small, the
size of th
he records was large. Beccause of the sensitivity of the topic and the need d to ensure th
he
soldiers’ privacy and the securityy of their infoormation, peersonal, iden
ntifiable infoormation wass
removed or de-identiified. Furtheer, since the Army had noo stringent performancee requiremen nts,
such as ad-hoc queries into the system, the majority of th he analyses were perform med on histoorical

20
SomeoffthedatasourrcesincludedaareArmy,finan
ncialandmediccal.Theprojecctteammustn
negotiateHIPAA
A
requireements,whichwillimpacttheeArmy’sabilityytoaggregatethedata.

Archittectural Design and Best Practicces Project | P a g e | 20


A006.1 Delivverable – Final Report and Desiggn Recommendaations

data thatt was static. As a result, the time from


m query to data deliveryy could be alllowed to tak
ke
days.

System Design and


d Architecturre
The finall design of thhe system haad not yet beeen determin ned at the timme of the intterview, parttly
due to th he fact that many of the leaders of the planned daata sources had not yet signed
memoran nda of underrstanding (MMOU) or datta sharing aggreements. Because thesee data sourcees
were from m different industries, th
hey did not ffollow a sharred schema, governance or, in many cases,
data typees. At that tiime, Data Strrategies plan
nned to investigate a num mber of desiign types thaat
would allow the Arm my a choice of date typess in the futurre. The projeect team considered varioous
architecttures and sch hemas, whilee remaining open to variious data typ pes (e.g., Exccel, Oracle, and
flat file tyypes) to ensu
ure that the system coulld be flexiblee and expand dable.

A sandboox environm ment was creaated as a cenntralized datta warehouse that copied d data from its
data sourrces. This warehouse waas used by D Data Strategiees because th hey were noot allowed diirect
access too the data souurces for thee prototype d
development, however, the sandbox x allowed Daata
Strategiees to mimic systems thatt could be ceentralized, diistributed orr federated database

managem ment systemss. For the pu urpose of thee prototype, queries werre not submiitted live acrross

the internnet but, insttead, used th


he sandbox eenvironmentt. This meantt that althou ugh real

performaance of the syystem was not measured d, this was acceptable siince speed performance was

not a req
quirement of the system at this stagee of developm ment.

Securityy
In order to meet the security requuirements seet forth by th he program, the Data Strrategies team m de­
identified
d the records from the vaarious databbases, but stiill had to be able to link the data to

perform the analysess. To accomp plish this, thee team creatted unique iddentificationn (ID) numb bers.

They werre able to lin


nk this uniqu
ue ID to the records of eaach of the daatabases in the followingg two

ways:

1. They searched for an indiividual’s recoords that con


ntained an existing uniq
que ID and thhen

pushed that unique ID too all the remaaining data sources.

2. They found a unique ID contained wiithin each off the data sources and crreated a tablle of

th
hose IDs at the central data warehou use.

Due to th
he relatively small subjecct group, botth of these solutions woorked.

Data Usa
sage and Rep
porting
The goal for the systeem is to havee de-identifiied aggregatee data that will allow on
nly authorizeed

users witthin the Armmy to analyzee the data. T


The data and d reports werre not made available to the

public orr to any participating data sources.

Lessons Learned
Ms. Cartter explained d that althou
ugh there weere many arcchitectural an nd technological barrierrs to

the projeect, the single most compplex obstaclee to overcom


me within th his project was the

managem ment among the various data sourcess. Although the various Army agenciies were und der a
mandate by the Secreetary of the Army to parrticipate in thhis pilot proogram, the ex
xternal agen
ncies

Archiitectural Design and Best Practicces Project | P a g e | 21


A006.1 Delivverable – Final Report and Desiggn Recommendaations

providingg informatioon were not. The MOU and data shaaring agreem ments21 had yeet to be
negotiateed and signeed and it wass necessary that these doocuments bee executed prior to Data
Strategiees accessing the data sou
urce informattion and inteegrating it in
nto the armyy suicide sysstem.

Due to th
he number of data sourcees and the underestimattion of resou urces neededd to manage
stakeholdders and exeecute these tasks, many of the MOU Us and data sharing agreeements weree not
signed duuring the pillot project. A final impleementation of this system
m would req quire MOUs and
data sharring agreemeents that cou uld take yearrs to be signed. Ms. Cartter recommeended that
organizattions planniing to constrruct a longitu udinal databbase make su
ure that theyy plan to com
mmit
resourcess to the deveelopment of the MOUs and data shaaring agreem ments as welll as the
managem ment of the various stakeeholders welll in advancee of the projeect launch.

KeyyTakeaway
Datagovern nanceandsstakeholdermanagemen ntareupfro ontefforts
buttheyalssorequireo
ongoingeffortsthatshouldnotbeo overlooked.
Thesetwoe effortsaree
essentialtoe
ensuretherreliabilityan
nd
expandabilityofthesyystem

21
Theseagreementsdetterminedwhowouldparticip
pate,whatdataawouldbeshaared,howitwastobeutilizeed,and
whereiitcouldbestored.

Archittectural Design and Best Practicces Project | P a g e | 22


A006.1 Delivverable – Final Report and Desiggn Recommendaations

4.4 Te
exas Education Ag
gency
State/Agency: Texas Edu ucation Agenncy
Web Sitee: http://ww
ww.texaseduccationinfo.org/tpeir/
Address: Informatioon Analysis, TPEIR Grou up
Texas Edu ucation Agenncy
1701 Northh Congress Avenue
Austin, Teexas 78701
POC: Brian Rawwson
Director, Statewide Data Initiativees
Nina Tayloor
Director of Informatioon Analysis
POC Phoone: 512-463-94437
512-475-20085
POC Email: Brian.Rawwson@tea.staate.tx.us
Nina.Tayloor@tea.statee.tx.us

CaaseProfile
Stud
dentEnrollm 14822
ment:4,752,1
0523
Teacchers:327,90
195,07824
LDSGrant:$18,1

Backgrou
ound
In 2001, the Texas Leegislature fun nded a projeect that wou uld build an integrated data repositoory for
the Texaas Education n Agency (TE EA), the Tex xas Higher Ed ducation Cooordinating Board (THE ECB),
and Statee Board for Educator Cerrtification (S SBEC. The project becam me known as the Texas PK-16
Public Ed ducation Infformation Reesource (TPEIR) Projectt. Half of TP PEIR’s origin
nal $7 millionn
appropriiation for thee public acceess initiativee was set asid
de for FY20002 and FY20 003.25 The syystem
ulted from th
that resu he project teaam’s work in ntegrates thee data from disparate daata sources at each
of the parrticipating agencies. Theese data incllude studentt, educator, and organizaational data from
as far bacck as 1989.

TPEIR was designed d to ensure thhat stakehollders within Texas woulld have accesss to high qu
uality
data usin
ng an efficien
nt and effective method to obtain it and would liink student data from eaarly
childhood through postgraduatee study to alllow for longiitudinal anallysis that woould identifyy
patterns and trends within the Texas public education system. Dataa from TPEIR R was plann ned to
be availab
ble to intern
nal staff as well as to the public.

22
Stateeducationaldata
d aprofiles.(n.d.).Retrievedfro
om
http://nces.ed.gov/pro ograms/stateprofiles/sresult.asp?mode=sh hort&s1=48
23
Ibid.
24
Statewiddelongitudinalldatasystemg grantprogramgranteestateTexas.(n.d.
t .).Retrievedfrrom
http://nces.ed.gov/pro ograms/slds/sttate.asp?stateaabbr=TX
25
Thefinaltotalcostofttheprojectwass$6.1million,with$1.75millionspentinFFY2002,and$4 4.35millionspeentin
FY2003.InMay2010,TTexaswonan$ $18.2millionSSLDSgrant,theesecondhighestgrantamountawarded.

Archittectural Design and Best Practicces Project | P a g e | 23


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Project management of TPEIR was a compleex and ongoing effort. Th he management of the syystem
was hand dled by two advisory grooups, the Intteragency Stteering Commmittee (ISC)) comprised of the
Informattion Resourcces Managers of each ageency and thee Technical Advisory Grooup (TAG)
compriseed of the prooject manageers of each aggency. The ISSC met twicce a month too determine
policy, reeview risks, and resolve issues, and tthe TAG mett weekly to determine th he technical
infrastructure, plan the practicall implementaation, and reesolve techniical issues.

System Design and


d Architecturre
TPEIR was designed d with two distinct data repositoriess. One reposiitory housed d aggregated d
data26 that was de-iddentified and d approved ffor public rellease. In ordeer to comply with federall and
state stan
ndards, the otther repositoory contains cconfidential,, student-leveel education data that is
available only to authorized users..

The actuaal developmeent of the sysstem was outtsourced to an outside ven ndor. The ressulting custoom
system deesign adopteed a combinattion of the R Ralph Kimball (i.e., a congglomerate of data marts) and
Bill Inmoon methodoloogies (i.e., a siingle data waarehouse) as the foundatiion for the daata warehousse,
which waas similar to that of the TEA K-12 dataa warehouse.

The dataa warehouse stores facts//metrics witthin fact tablles and codees within dim
mension tablles.
An AIX server was used during the developm ment and tessting processses, but the final data
collection
n was moved
d to a produuction serverr.

TPEIR cu urrently inteegrates data from two daata sources into a centraalized databaase, but its
architectture framewoork allows foor new data sources to be added in order to enh hance the pow wer of
the systeem. Figure 2 illustrates th
he TPEIR arrchitecture frramework.
Figure 2: TPE ure Chart27
EIR Architectu

26
Thisdattacanbeaccessedathttp:///www.texaseducationinfo.org.

27
TexasEd
ducationAgenccy,Information
nAnalysisDivission.(2010).TeexasPK16pub
bliceducationiinformationreesource
Retrievedffromhttp://ww
ww.texaseducaationinfo.org/ttpeir/TPEIR_Do ocumentation..pdf

Archittectural Design and Best Practicces Project | P a g e | 24


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Securityy
The team
m ensured th he system’s seecurity by crreating dimeension tabless that used surrogate keyys
that weree arbitrary, system generrated valuess as unique id
dentifiers. These keys were used to
perform the linkagess in the systeem.

Data Usa
sage and Rep
porting
The systeem’s report component uses Crystall Reports sollution. The TPEIR data is available to the
public an
nd to authorized stakehoolders. The p publicly avaiilable data arre used to reeport on Tex
xas
public hiigh school grraduation; Texas collegee and universsity admissioons, enrollm ment, and
graduatioon; teacher certification,, employmen nt, and reten
ntion; and scchool districtt employmen nt. A
completee list of publlicly availablle reports can
n be found at: http://ww ww.texasedu ucationinfo.oorg/
Authorizzed TEA stafff members use Rapid SQ QL or SAS too run queriess against thee data and
generate reports or extract data to be stored in files. Theese results off these queries are return ned as
quickly as a few secoonds while laarger queriess may take several minuttes.

Lessons Learned
The integgration of th
he data from three differeent agencies and the con nduct of mulltiple data
collection
ns (while prreserving thee original datta) was a prooblem that the TPEIR teeam faced eaarly in
the plannning of the SLDS. It was important tto preserve the original data so that each agencyy
could reccreate historrical results, if necessary. The team conformed data across th he agencies and
defined standards th hat would ap pplied to currrent and futu ure data colllections. Theey maintaineed
regular meetings of the Interagen ncy Steeringg Committeee (ISC) and Technical Ad dvisory Grouup
(TAG) too exchange information, review chan nges, resolvee issues, and establish coonsensus.

KeyyTakeaway
Forsystemssthatrequirrethesourccedatatomaintainitsd data
integrity,th
hedatagoveernanceiscrritical.Itistthroughthestringent
standardan ndrulesdefiinitionsthattthedataso ourcesareaabletoshare
e
theirdataforuseinthe
eSLDSwhile epreservinggtheoriginaaldatabase’ss
system.

Another problem thee project team m faced wass the implemmentation of the public-facing
f Web site
that wou uld allow thee public to acccess data. TEA wanted d to minimizee the mainteenance
requiremments for thiss Web site. To accomplish this, the TPEIR team m utilized com mmercial offf-the
shelf softtware and minimized the customizaation of softw ware tools too maintain thhe Web site..
These toools allowed the developeers to utilizee metadata, common edu ucational terrminology, online
help pagees, standard reporting foormats, simp ple navigatioon, and altern
natives to view the data in
text and//or graphic formats with hout large ex xpenditures.

As a resu
ult of followiing these besst practices, the TPEIR’ss final expen
nditures werre nearly tweelve
percent under budgeet.


Archittectural Design and Best Practicces Project | P a g e | 25


A006.1 Delivverable – Final Report and Desiggn Recommendaations

4.5 DL
LA Data Converge
ence and Quality Project
State/Agency: Data Strattegies
Web Sitee: http://ww
ww.datastrategiesinc.com m
Address: P.O. Box 772
Midlothian, Virginia 23113
POC: Susan Carrter
POC Phoone: 804-965-00003
POC Email: SCarter@D DataStrategiiesInc.com

CaaseProfile

#ofRecords:7m
millionbaserecords(eacchrecordhaad
apprroximately1520associaatedrecordss)
Proje
ectCost:$2.5millionovver5years

Backgrou
ound
In 2002, the Defense Logistics Aggency (DLA)) hired Dataa Strategies to vet its proocess and sysstems
issues in implementing a Businesss Systems Modernizatioon (BSM) prrogram. Thiss five-year prroject
was part of a larger system overhhaul that DLA implemen nted during a period of over 10 years.. The
BSM proogram was im mplemented to upgrade the procurem ment and fin nancial systeems that man
naged
DLA’s suupply chain managementt processes. The new proocess requireed DLA to deliver accuraate
information and dataa for businesss, profiling standards, business rulees, and proceesses. Howevver,
because the three cennters have evvolved over time, it was difficult to merge them.

Originallly, DLA begaan with three main supp ply “centers” that perform med the samme functions on
different items. Wheen the projecct first begann 50 years aggo, the three centers had identical
architectture and bussiness processses. Over timme, howeverr, the three centers evolvved and begaan
fferent methoods and busiiness rules. In 2002, DLA
using diff A initiated a massive meerger of the three
centers. The goal was to make th he centers interoperable and compliaant with thee new busineess
rules DLAA was develooping in ord der to have ceentralized. In other worrds, although h the data ceenters
were phyysically sepaarated, the daata was, in a virtual persspective, to be integrated
d and located d in
one centrral place, sin
nce users neeeded the abillity to retrievve procurem
ments that were located in
more thaan one data center.

This inteegration meaant that the data had to appear to the user to be in one placee so that theyy
could query across thhe centers. The project had three maain stakehold der groups, which are th he
“owners”” of each data centers, annd a fourth entity, an um
mbrella recorrd center callled the Logisstics
Informattion Group (LIG). Logisttics Information Group approved evverything thaat was done to
the systeem and recorrds, as well as any cleanssing. Moreovver, this entity maintaineed records of
what item ms the Defen
nse Departm ment could pu urchase and
d their negotiiated prices.. In essence,
Logisticss Informationn was the gaate-keeper off what itemss could be prrocured. Thee LIG inform mation
became known as the “golden reccord.” Althoough this grooup owned the “golden record,” theyy were

Archittectural Design and Best Practicces Project | P a g e | 26


A006.1 Delivverable – Final Report and Desiggn Recommendaations

not userss of the systeem. While thhe three suppply centers had their ow
wn functionss and governiing
teams, alll data had too be compliaant and matcch with the “golden recoord.”

System Design and


d Architecturre
It became clear that DLA needed d to develop a set of unifoorm businesss rules that would goverrn all
centers. There was a tremendouss amount of resistance frrom each of the centers as each group was
unwillingg to sacrificee control or autonomy. To eliminate this resistan nce, Data Strrategies
facilitateed discussion
ns among thee centers in order to reacch a consenssus in creatinng the busin
ness
rules. Daata Strategiess worked wiith each of th
he disparatee data sourcee owners to understand not
only how w each of thee centers andd systems capptured, proccessed, and stored data, but also howw they
would eaach need to interact oncee all systemss were integrrated.

The proccess involved d negotiation


ns between tthe DLA heaadquarters an nd the centeers on how th he
data is ussed, how it should appeaar, and whatt specific excceptions in the rules thaat would be
required because of each center’ss unique itemm/record typ pes and security requirem ments. In thee end,
because of the definittions providded by DLA aand the feeddback from th he centers, Data Strategiies
recommeended a singlle set of busiiness rules, w
with definedd exceptionss in each cennter.

Once thee business ruules were creeated, Data S


Strategies su
urveyed eachh data centerr and assesseed its
level of data cleanlineess. The asseessment wass based on th
he new set of business ru
ules. During the
data cleaansing process, the team::

“……ensured that the source datta and the convverged data maaintained theirr independencee without
loosing either conntext. Data Strrategies staff aanalyzed the buusiness rules associated withh the data;
iddentified the ruules and metriccs required to vvalidate that data would mett the new businness rules;
annd created auttomated routinnes to run compplex queries thhat analyzed annd identified reecords that
haad anomalies. The results weere displayed inn both summarry and detailedd reports that showed the
annomalies and the recommendded solutions. T The solution also included ann approach usiing
exxtensive heurisstics and patteern matching coode to overcom me embedded data issues.”28

The basee population of data was 7 million an nd each one had 15to 20 associated reecords. Dataa
Strategiees divided th
hem into lotss and evaluatted each lot every 2 week
ks over 18 months. This did
not reducce the quanttity of data, but ensured that it was compliant annd collaboraative.

The threee data centerrs were integgrated in virrtual space so that users might accesss and queryy data
and reports, regardleess of the sou
urce. For thee ultimate ennd-user, Dataa Strategies created narrrow,
role-baseed views of the personal systems so tthat users coould view th he data that pertained to them
and for which they were cleared.. The reportss comprised of HQ-levell statistics an nd trends thhat
outlined the current status of thee data qualitty as a wholee (outlining the risk to th
he migration n
success) provided deetailed informmation at thhe Source Ow wner, Table, and Attribu ute levels to
identify where the laargest issues were.

28
DataStraategies.(n.d.).“DLADataConvergenceand
dQualityProject.”

Archittectural Design and Best Practicces Project | P a g e | 27


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Ms. Cartter credits much of the project’s succcess to the soolution’s cap
pability to crreate detaileed and
dashboarrd views of the results ass well as its recommendations for prroblem resollution.

Lessons Learned
For largee data integraation projects, Ms. Carter recommen nds several best practicees.
Ms. Cartter’ first recoommendation n is to mentoor the stakeh
holders invoolved in the data integrattion,
so that th
hey understaand the histoory of what is being don ne and why itt is being doone. This givees
these grooups informaation they neeed to make decisions reegarding the project. Con nversely, prooject
implemen nters and leaaders must solicit feedbaack from staakeholders inn order to unnderstand
stakehold der requiremments as own ners and useers of the datta. This two--way commu unication wiill
help an organization n to establish
h effective daata governan
nce rules.

For Dataa Strategies, informing th


he three centter’s stakehoolders of the project’s pu urposes and
receivingg feedback gaave them an advantage in bringing the three cen nters togetheer. As a neutrral
intermeddiary, Data Strategies waas well receivved because stakeholderrs felt they had a voice in n the
developmment of the new businesss rules. Ms. Carter explaained that, “A A neutral appproach is
importan nt in overcom
ming politicaal silos. In th
he end, if thee different aggencies’ need
ds are not meet, the
new systtem will be useless.”29

Throughout a project, there must be good coommunicatioon between parties that is conducivee to a
collaboraative environ
nment. A con nstant review
w of the projject goals an
nd status is essential to
ensuring the team is on track. Th he way the architecture is establisheed in the begginning is
importannt because itt lays out thee foundation
n for the rest of the projeect. It is impoortant for th
he
system designers to understand the purposes of the wareehouse beforre they map and design the
architectture. Moreovver, a compleex project neecessitates the engagemeent of an opeen vendor th hat
will not constrain thhe design of the data archhitecture or the project. Finally, VDO OE must enssure
the technnology to be used can bee used by eveeryone and must keep th he technologgy simple for
longevityy.

KeyyTakeaway
Managingsstakeholderss–gatheringginputfrom
mthemandbeingtheir
advocateissimperativeinbuildingbusinessrules.Havingiinputfrom
stakeholdersandactingastheir“cchampion”rresultsinlesssfriction
amongstakkeholders.

29
Carter,SS.(2010,Novem
mber5)Teleph
honeInterview
wwithRonaJob
be,CIT.


Archittectural Design and Best Practicces Project | P a g e | 28


A006.1 Delivverable – Final Report and Desiggn Recommendaations

4.6 NO
ORC at th
he Univerrsity of Chicago
C Data Encla
ave
State/Agency: NORC at the Universiity of Chicaggo Data Encllave
Web Sitee: http://ww
ww.norc.org/D DataEnclavee/
Address: 1155 East 60th Street
Chicago, Illinois 606377
POC: Timothy Mulcahy
POC Phoone: 301-634-93330
POC Email: mulcahy-ttim@norc.org

CaaseProfile
#ofRecords:Nooactualnum
mbergiven,bbutatany
ntime,thessystemproceesses40millionrecordss
given
Proje 50,000(initially)
ectCost:$75

Backgrou
ound
The Natiional Opinioon Research Center (NO ORC) Data Enclave is a “ssecure virtuaal environmeent
ng and analyyzing sensitivve microdataa.” The Enclaave providess a confidenttial, protecteed
for storin
environmment within which authoorized researrchers can access sensitiive micro-daata remotelyy.

A brief su
ummary from
m the NORC
C website:

“WWhile public usse data can be disseminated in a variety off ways, there is a more limitedd range of
opptions for disseeminating senssitive micro-daata that have not been fully de-identified foor public
usse. Some data producers have sufficient ecoonomies of scalle to develop addvanced in-house
soolutions that seerve the needs of external ressearchers, but most lack the resources to arrchive,
cuurate, and disseminate the daatasets they haave collected. The NORC Daata Enclave proovides our
paartner organizzations a securre platform whhere they can booth host and buuild a researchh
coommunity arouund their dataa.”30

The NOR RC Data Encclave31 was established in n the early 2000s; howevver, the build-up to the
project can be traced d to decades of history. There
T had beeen some moovement with hin governmment
agencies and other orrganizationss to provide access to miicrodata witth sensitive content to
researcheers and to reesearch organnizations. Inn 2002, the Confidential Information n Protection and
Statisticaal Efficiencyy Act was passsed; this waas a mandatee to all the feederal statistical agenciees to
develop a plan “to prrovide some level of acceess to some parts of theirr agencies’ microdata.” In n
2006, thee National In nstitute on Standards an nd Technologgy (NIST) reeleased a Reequest for
Proposall (RFP) that described th he need to coonceptualizee and build a secure rem mote access

30
DataEncclave–NORCaattheUniversittyofChicago.(n.d.)Retrievedfrom
http://www w.norc.uchicaggo.edu/DataEnnclave/
31
TheEncclave’sdesignanddevelopmentcostswereeapproximately$750,000.Th hethirditeratiionisplannedfora
February2011andm mayreceiveanaadditional$500,000to$750,,000infundingg.

Archittectural Design and Best Practicces Project | P a g e | 29


A006.1 Delivverable – Final Report and Desiggn Recommendaations

modalityy that could provide both


h on-site and
d remote acccess to microodata as welll as direct acccess
to the raw
w microdataa.

Originallly, statisticiaans, lawyers,, and agencyy leaders werre very conceerned about allowing
researcheers access too raw data an nd these grou ups developed plans to perturb the data prior too
allowing researchers access. How wever, in 200 06, there waas a significan
nt change in
n thinking annd
decades’ worth of thoought engineering and sscience on hoow to perturrb data to bee ready for
researcheers. The new w model allow wed research hers, other governmentaal agencies an nd private seector
organizattions to havee access to the actual raw w microdataa as opposed d to allowingg them accesss only
to perturrbed data.

f in policy occurred wheen NORC prroved32 that there were remarkable differences in
This shift
research results if a researcher ussed perturbeed data ratheer than raw data. In somme cases,
researcheed based on perturbed data yielded results oppoosite from what they would have beeen
had reseaarchers been n allowed to access the raaw data.33 This revelatioon caused leaaders in
government agenciess to question n whether prrevious policcies and proggrams createed from research
performeed with pertu urbed data were based u upon false asssumptions. As a result, these
statisticiians, lawyerss and agenciees who origiinally opposed the idea of providing raw data too
researcheers changed their stand on the matteer.

The challlenge becam


me to find thee true resultss and gatherr the data to be availablee in the publiic
domain, while at the same time protecting th he confidenttiality of the provider of the data or

survey. Shortly thereeafter, the prroject was launched. It iss sponsored by the Natioonal Institutte of
Standardds and Techn nology, the Kauffman Fooundation, th he Departmeent of Agricu ulture, the
Nationall Science Fouundation, an nd the Anniee E. Casey Fooundation.

The NOR RC Data Encclave team’s goal was to “provide a secure remotte access mod dality that was
both sophisticated, technologicaally and operrationally, an nd reasonablly cost and met the
wn to zero as possible.”34
replicatioon standardss and abilityy to push thee risk of breaach as far dow
Addition nally, they aim
med to provvide remote aaccess. Untill the NORC project, acccess to sensittive
data for researchers was a cumbeersome and ttime-consum ming processs. Researcheers had to acccess
and perfoorm analysess on the dataa on site, andd were not allowed to leave the build ding with anny
data. Thee process req quired that researchers bbe mailed their data and analysis afteer an internaal
statisticiian carefully reviewed thheir analysess. The NORC C Data Enclaave aimed to relieve that
burden on researcherrs.

In summmation, the aiim of the new


w system waas to share soocial sciencee data in a seecure manneer.
NORC plans to prom mote access to sensitive b
business miccrodata; protect confideentiality; arch hive,
index, an
nd curate microdata, and d encourage researcher collaboration n.

System Design and


d Architecturre
The Encllave is constaantly being enhanced. The Enclave had a soft lau
unch in 20066, with a 6-m
month
incubatioon period (Ju
uly through December 2006). Reseaarchers and focus groupss consulted in

32
Mulcahyy,T.(2010,Octtober28).TelephoneIntervieewwithRonaJJobe,CIT.
33
Ibid.
34
Mulcahyy,T.(2010,Octtober28),TelephoneIntervieewwithRonaJJobe,CIT.

Archittectural Design and Best Practicces Project | P a g e | 30


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Decembeer 2006. The NORC staff ff collected aand respondeed to feedbacck and, afterrward, opened the
Enclave in March 2007. In design ning and buiilding the En nclave, NOR RC employed d a 14 to 16-p
person
team com mprised of enngineers, ressearchers, infformation teechnologistss, and metadata people,
among otthers, and beegan with white-boardin ng and mock k-ups. The in
nitial processs of white­
boardingg lasted manyy months an nd was the reesult of num merous meetings, feedbacck and
resolutioons. The desiign process was reiteratiive, although h no scenarioos of how th
he data woulld be
used werre created beecause of thee infinite posssibilities of research questions. Thuus, the focus

during thhe design proocess was th he possibilityy of a convennient secure remote acceess system th hat
was virtuually impreggnable.

The core infrastructu


ure is, essenttially, a stand
dard implemmentation wiith CITRIX security
requirem
ments for rem
mote access ability. CITR RIX providess layers of seecurity for reemote accesss in
general; however, thee Enclave teaam customizzed the envirronment by adding speccialized toolss that
researcheers would neeed to perforrm certain tyypes of analyyses (e.g., staatistical pack
kages). For data
managem ment, NORC C has built innto the systeem a way of packaging th he data for thhe researcheers,
tracking what researrchers are dooing with thee data, etc.

Securityy
Before daata is loaded
d onto the En
nclave, it is ccleaned so th
hat it is harm
monized with h the data seets
already in
n the systemm. Every dataa set that entters the Encllave must goo through a DDI (Data
Documen ntation Initiative) check
klist to be DD DI-complian nt and SDMX X (Statisticaal Data and
Metadataa Exchange))-compliant for time-serries data. NO ORC also em mploys a metaadata servicees

team andd an IHSN microdata maanagement tooolkit.

NORC utilizes a porrtfolio approach to securrity measures by bundlin ng multiple protections. The
system uses the Citriix client’s bu
uilt-in securiity measuress for the fron
nt-end securrity.

Data Usa
sage and Rep
porting
To accesss data in thee Enclave, evvery researchher first mustt go throughh a vetting prrocess by eacch of
the sponsors. Each sp ponsor decid des the ruless on who is eligible and who will be authorized to
access th
he data. Oncee the researcchers get passt this vettinng process, thhey must sub bmit propossals
and subsstantiate whyy they need raw data froom the Enclaave – in essen nce, why thee public use--data
is not adeequate for th
heir researchh. For federal statistics data sets, researchers mu ust substantiiate
that theirr research is within the mission of th he federal aggency and thhat the data required is foor
pure reseearch purposses – not for marketing, law enforcem ment, etc. Th
he proposalss must inclu ude
their plan
nned statistiical and disssemination m methods and d potential ouutlets. In gen
neral, there are
several coontractual steps before any research her can be grranted accesss to raw data. Currentlyy,
some of the data avaiilable within n the Enclavee include:
� NIST-TIP
o ATP Survey of Joiint Venturess (JV)
o ATP Survey of Ap pplicants
o Business Reportin ng Survey Seeries (BRS)
� USDA/ERS/N NASS
o Agricu ultural Resoource Managgement Surveey (ARMS)
� National Scieence Foundaation
o Surveyy of Earned Doctorates ((SED)

Archiitectural Design and Best Practicces Project | P a g e | 31


A006.1 Delivverable – Final Report and Desiggn Recommendaations

o Surveyy of Doctoraal Recipientss (SDR)


� Kauffman Fou undation
o Kauffm man Firm Su
urvey (KFS)
� Annie E Caseey
o Makin ng Connectiions Survey ((MC)

Access too the Enclave is gained th


hrough a WWeb site portaal at https:///enclave.norcc.org. Users are
required to download d the CITRIIX Client to their desktoops and use the usernam mes and passw words
that weree provided to them. Inside the Enclaave are collab
boration toools, statisticaal software
packagess, discussion
n forums, etcc. (See Figuree 3: NORC Data Enclavee Screenshotts for samplee
screen sh
hots from thee NORC Encclave presen ntation). Howwever, althoough research hers are ablee to
collaboraate with other researcheers within th
he Enclave, th
hey are not allowed to taalk to one otther
or share data.

Figgure 3: NORC
C Data Enclavee Screenshots

Data Do
ocumentation & Sha
ared Code Libraries

Click
C on docum ments and
folders to open oro navigate
in the struccture

Use the menu to


U o
create folders or
up
pload documents

When daata is provid ded to researcchers, certaiin informatioon is strippeed from it – e.g., social seecurity
numberss, addresses, birth dates, and other ob bvious identtifiers. The leevel of data stripping is
dependen nt upon whiich agency has supplied it. In some cases, agenciies will allow w access to the
raw micrrodata, thouggh with som me data noisee. The Enclavve aims to prrovide as mu uch data
granulariity to its useers as possiblle, so that th
heir research
h results are as true as poossible.
The systeem currentlyy has more th han 200 reseearchers acrooss various sponsor areaas. The numb ber of
users is expected to increase to more than 30 00 within thhe next six months and, potentially, to
more thaan 600 in thee next two orr three yearss.

Archittectural Design and Best Practicces Project | P a g e | 32


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Lessons Learned
For Virgiinia, one solu
ution will noot work – itss LDS will reequire a mix xture of differrent solution
ns.
There aree several opttions that th
he state could
d pursue, esp pecially if daata will be sh
hared with the
public, but security and confiden ntiality mustt be preserveed. One meth hod of sharinng data with h the
public is through a batch executiion job. In esssence, a ressearcher willl submit a reesearch questtion
or query and an interrnal staff meember will ruun the analyssis and prooof for disclosu ure, thereaftter
returningg the outputt to the reseaarcher.

Mr. Mulccahy advisess that “it is more efficient now for peeople to build their own”” systems.
Moreoveer, the differeent componeents in a systtem e.g., bussiness intelligence tools, should
complemment one anoother and nott compete with each oth her. When planning to build, “keep your
options open.”

In the areea of projectt managemennt and building the archhitecture, enssure that staaff and personnel
have defiined roles annd responsib
bilities. Ensure that the project has a well-defineed plan and goals.
NORC notes that in such technical projects,, the techniccal issues aree the easiest to overcomee;
project management and stakehoolder managgement are th he more diffiicult tasks.

KeyyTakeaway

InbuildinganLDSthatsharesinformationwitththepublicc,keepyourr
optionsopeenanddonootbeconstrrictedtoone
eproductso
olution.

Archittectural Design and Best Practicces Project | P a g e | 33


A006.1 Delivverable – Final Report and Desiggn Recommendaations

5 Subject Ma
atter Expert Inte

erviews
The reseaarch team innterviewed teen Subject MMatter Experrts (SMEs) in order to provide the VLDS
team witth specific feeedback on barriers, risk
ks, design isssues and oppportunities associated with
implemen nting a largee data system
m. The followwing individduals were innterviewed and, based on n the
information collected d during theese interview
ws, the projecct team conssolidated theemes into thhe key
points prresented in this section. Specifics keey points fromm each interrview are proovided in Seection
3.

SUBJECT MA
S
ATTER EXPER
RT AREA OF EXPERTISE

D Bhavani Th
Dr. huraisingham n Security
Information
n Managementt
Information
Data Managgement, Mininng, and Securityy
Data Mininng for Counter--Terrorism.
Mr. Paul Carney
M e Higher Eduucation
nternet-Based Services
Building In
Mr. James Cam
M mpbell Large Data Integration
Technologyy and Data Flow
w
Ms. Susan Carrter
M Data Managgement
n Technology
Information
New Techn nology Researcch
M Raj Ramessh
Mr. Information
n Technology
Software Prroduct Developpment
Enterprise Database Systeems/ eLearningg Systems
Data Wareh housing
Very Large Database Systtems (VLDB)
Web-based d Collaborativee eLearning Sysstems
Enterprise A
Architectures,, Portals and CRM
Dr. Ron Kleinm
D man XML
Java
Mr. Peter Dobler
M l Software Development
Sybase
SQL
Dr. Laura Haass
D Computer EEngineering
Systems Deesign
VLDB (Verry Large Databaase Systems)
Thilini Ariyachandra
T Informationn Systems
ntelligence
Business In
Data Managgement and MModeling
Impacts of Social Networrking
Dr. Cynthia Dwork
D w Privacy Preeserving Data A
Analysis
Differentiall Privacy
Cryptograp phy
Distributedd Computing

Archittectural Design and Best Practicces Project | P a g e | 34


A006.1 Delivverable – Final Report and Desiggn Recommendaations

5.1 Drr. Bhavan


ni Thurais
singham
Title: Direcctor, Cyber Security Cennter
Profeessor of Com
mputer Sciennce
Organizaation: Univversity of Tex
xas at Dallass
Phone: 972-883-4738
Email: bhavvani.thuraisingham@utd dallas.edu

Backgrou
ound
Dr. Thuraisingham reeceived IEEE E Computerr Society’s prrestigious 1997 Technicaal Achievemeent
Award foor “outstandding and innoovative contrributions to secure dataa managemen nt”. Her reseearch
in inform
mation securiity and inforrmation man
nagement haas resulted in n over 60 jou
urnal articless, over
200 referreed conferen
nce papers, and three USS patents. Shhe is the auth
hor of seven
n books in daata
managem ment, data mining and daata security including on ne on data mining for coounter-terrorrism.

KeyyTakeaway
� Federated mod del works beest when th he domain off questions
is noot well knowwn.
� It is important that the datta governan nce be well thought
t outt
so thhat the acceess controls across the data sourcees is
conssistent.

Summary
ry
During thhe course of the intervieew, Dr. Thuraisingham made two strrong points; one related to the
performaance of a fedeerated modeel, and the otther related to data goveernance. She stated that if the
domain of questions to be answeered by the system is nott well known n, then the best distribu
uted
databasee model would be federatted.

Her majoor concern was in the areea of data goovernance. Dr. Thuraisin ngham stated d that it is
importan nt that the d
data governan nce be well thought out so that acceess controls across the data
sources are consisten nt. She furthher stated thaat “You need
d to ask the question: Ass data moves up
the hieraarchy (via joiins), does the governance model stilll work?”

Archittectural Design and Best Practicces Project | P a g e | 35


A006.1 Delivverable – Final Report and Desiggn Recommendaations

5.2 Pa
aul Carne
ey
Title: Vice President, Technical Seervices
Organizaation: Natu
ural Insight
Email: pcarney@naturaalinsight.comm

Backgrou
ound
Combiniing best tech hnical practices with maanagement know-how, Mr. Carney has excelled in the
fields of higher educaation, consullting and Intternet-based
d services. He launched his first Inteernet
business in 1997, andd has since helped build additional Innternet-baseed service orrganizations in
both con nsumer and business envvironments. More recenttly, Mr. Carn ney oversaw the development
of the Naatural Insigh
ht solution, a robust resoource for man
naging and optimizing distributed
workforcces.

KeyyTakeaway
� Connsider a virtuualization compute
c moodel to manage
proccessing requuirements.
� Connsider creatiing multiplee hash valuees based on various
perssonal identiifiable informmation elem
ments.
� Bewware of log fiiles created on a system
m containingg
tran
nsactional data.
d

Summary
ry
Mr. Carn ney’s experieence with larrge scale disttributed systems exposeed him to thee use of
virtualizaation as a meeans to manage computiing resourcees. Mr. Carneey felt that when the ressource
utilizatioon is unknowwn, virtualizzation shouldd be consideered to manaage processin
ng requiremeents
for the syystem.

Some of his customerrs include fin nancial instiitutions thatt require prootection of personal
identifiab
ble informattion (PII). Hee stated thatt creating muultiple hash values based d on variouss
personal identifiable information n elements sh hould be connsidered in the implemeentation

architectture. Mr. Carrney’s experrience demon nstrated thiss technique allowed for multiple
opportun nities to find
d a match acrross the variious data sou
urces. When n the discusssion moved to
privacy and the need d to protect the individuaal, Mr. Carn
ney felt that applying a hash algorithm to
the PII was a big stepp in archivin
ng that requiirement.

He expreessed concerrn for log filees on computting systemss. All log filees created on
n a system neeed to
be evaluaated for the type of inforrmation theyy contain. It is possible thhat some logg files (e.g.,
operatingg system, appplication) coould containn transactionnal data thatt would violaate privacy
policies.

Archittectural Design and Best Practicces Project | P a g e | 36


A006.1 Delivverable – Final Report and Desiggn Recommendaations

5.3 Ja
ames Cam
mpbell
Title: Impllementation Strategist
Organizaation: SIF Association
Phone: 202--607-5491
Email: jcam
mpbell@sifasssociation.orgg

Backgrou
ound
At SIF, Mr. Campbelll is responsiible for leadiing and takin
ng ownershiip over proviiding value-aadd
for memb bers and pottential memb
bers of the Association and their plaanned or onggoing SIF
Implemeentation and Developmen nt. Prior to joining SIF, he was the In
ntegration Team Managger for
the Oklahhoma State Departmentt of Educatioon. In his rolee, he manageed many statte and local
projects aimed at impproving the technology and data flow w across 5400 school disttricts and ch
harter
schools.

KeyyTakeaway
� Virgginia’s SLDSS model is definitely
d diifferent from
m what
otheer states havve developeed.
� Mosst current SIF developm ment effortss are focusin
ng on
inteernal user po
ortals.
� Federated mod del requires strong dataa governance.

Summary
ry
Mr. Cam mpbell has exxposure to a number of LDS implemeentations aroound the nattion. In all known
implemen ntations, he stated that Virginia’s SL
LDS model is definitely different from m what otheer
states haave developedd. He furtheer stated thatt SIF would be interesteed in the Virgginia solutioon as
SIF is conncerned aboout integratioon across staates.

The SLDS portal wass described to Mr. Camp pbell as a pubblic facing and internal facing
implemen ntation. He responded that most SIF F developmeent efforts arre focusing on the intern nal
user porttal; howeverr, the intent is to have on
ne portal for both internaal and externnal users. Th
he
requirem
ments on dataa access are easier to maanage for thee internal useers initially. When addin ng a
public facing connecction, the reqquirements become
b moree difficult.

Towardss the end of the interview w, the team discussed


d daata governan
nce with Mr.. Campbell. He
explainedd that the feederated arch
hitecture req
quires strongg data govern
nance as thee model prom
motes
a hierarch
hy of govern nance. He ad
dvised the teaam that the State of Washington maay be a good
model too investigate for data govvernance ideaas.

Architectural Design and Best Practicces Project | P a g e | 37


A006.1 Delivverable – Final Report and Desiggn Recommendaations

5.4 Su
usan Cartter
Title: Mannaging Partner
Organizaation: Dataa Strategies, Inc.
Phone: 804­-965-0003
Email: SCarrter@DataSttrategiesInc.com

Backgrou
ound
Susan Caarter has oveer twenty years of experiience in the Data Managgement and Information
Technoloogy field. Mss. Carter buiilt a successfful Women-O Owned Smaall Business (WOSB) in Data
Managem ment and IT consulting. Responsiblee for the reseearch and inttegration of new technoologies
into seveeral large corrporations, she has helpeed organizations such ass the Defensee Logistics
Agency (DLA), MCI (WorldCom m), E.I. DuPoont, and SmiithKline Beeecham gain inncreased
efficienciies and mark ket share thrrough innovaative uses off new and prroven technoologies.

KeyyTakeaway
� Mulltiple person nal data elements can be used in combination n
to crreate a hash
h key that will result inn more uniqu ue IDs.
� Use of multiplee hash keys based on vaarious sets of
iden
ntifiable infoormation alllows differeent databasses with
diffeerent identiifiable inforrmation to have a higheer
probbability of forming link kages.
� Thee inability to
o store linkaages will greeatly limit performancee
� Anyy changes to o the data so ources will require tighht data
goveernance

Summary
ry
Ms. Cartter found Virrginia’s uniq
que privacy requirements
r s for the SLDDS similar too data integration
work shee performed for the Army. This work k combined information n from variouus databases
includingg military, fin
nancial, med
dical, and pssychological.. Similarly, th
he subject off interest forr
forming these data linkages was one that woould require the utmost security to ensure the prrivacy
of the ind
dividuals.

To addreess this issuee, Ms. Carterr suggested the


t use of noot a single daata element to create a hash
key, but of multiple data elementts that could d be combineed in a know wn way priorr to being haashed.
Another technique sh he suggestedd was the usse of multiplee hash keys based on a variety of sep parate
or combiined data eleements. Thiss technique would
w increaase the possiibility of creeating match hes
across daatabases thatt may not haave the samee data elemen nts across alll databases, but have at least
one. The other advan ntage to utiliizing multiple hash keyss as identifierrs is the abillity to match
h
records when there are errors or inconsistencies in certaain data elem ments such ass a misspelleed
name in one particular database. The use of multiple hassh key identiifiers and an algorithm too
determinne confidence in a positivve match woould result in n matches thhat a single hash key
identifierr would overrlook. The key to successful matchess would be through the identificatioon of
which daata elementss, or combinaations, woulld result in th he highest number of poositive match hes
while redducing false matches. Mss. Carter sugggested that the intelligeence agenciees (e.g.,

Archittectural Design and Best Practicces Project | P a g e | 38


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Departm ment of Homeeland Securiity, CIA, and


d FBI) are woorking inten
nsively in thee area of mattching
multiple identifiable elements an
nd the develoopment of allgorithms th
hat result in high confideence
levels of matching.

Ms. Cartter also spok


ke briefly on Virginia’s in
nability to store linkagess persistentlyy. In her
experience this woulld greatly lim
mit the perfoormance of a system. Ms. Carter has worked witth
some sysstems that haave been dessigned to nott store linkages due to privacy or oth her policy
requirem
ments; the sysstem was ab ble to adopt tthis design because therre was a highh level of
importannce on privaccy and a low
w level of impportance on performancee requiremen nts.

Lastly, th
he topic of data governannce was toucched upon during the in
nterview. Mss. Carter streessed
the impoortance of tigght data goveernance in any system whose data soources woulld regularly have
changes or when new w data sourcces are addedd. 

Archittectural Design and Best Practicces Project | P a g e | 39


A006.1 Delivverable – Final Report and Desiggn Recommendaations

5.5 Ra
aj Rames
sh
Title: CEOO
Organizaation: CTEEC
Phone: 703­7766-5774
Email: rraj@
@ctec-corp.ccom

Backgrou
ound
Mr. Rammesh has overr 17 years of management and techniical experien
nce in inform
mation techn nology
and softw
ware producct developmeent. Mr. Rammesh's areas of technical expertise in
nclude: Enterrprise
Databasee Systems/eL
Learning Sysstems, Data Warehousin ng and Very Large Datab base Systemss
(VLDB), Web-based d Collaboratiive E-Learninng Systems, Enterprise Architecturees, Portals an
nd
CRM. Mr. Ramesh iss a co-inventtor of a Paten
nt-Pending Task Share platform for the desktop p
environmment and han
nd-held wireeless devicess.

KeyyTakeaway
� Used a three-tiier design prrocess for projects;
p presentation,
busiiness logic, database.
d
� In data
d analysiss projects w
with large daatasets, userrs accept
delaayed results.
� Devvelopment ofo global schhema helps identify
i datta type
disccrepancies.
� Dataa integratio
on on-the-flyy is a new teechnology area
a that
has few produccts.

Summary
ry
Mr. Rammesh, like Mrr. Carney, haas extensive background
b d with large scale system m integrationn. His
current projects invoolve multi-soource data inntegration accross disparaate organizaations (Federral
Governmment). He usees a three-tieer design proocess for proojects; presen ntation, bussiness logic,
databasee. Mr. Ramessh’s current projects con nsist of large data sets (teens of millions of record
ds). In
these datta analysis projects, userrs accept lon
ng delays in receiving ressults. As the projects havve
moved too production n and have matured, userrs now are reequesting reeal-time capaability in
reportingg.

When working with h large disparrate data setts, Mr. Rameesh stated thhat the devellopment of a
global schema helps identify dataa type discreepancies. Th here are visuaal mapping tools that prrovide
good assiistance in working out the global schema. The fiinal point hee made was that to his
knowled dge data integgration on-tthe-fly is a neew technoloogy area thatt has few prooducts.
 

Archittectural Design and Best Practicces Project | P a g e | 40


A006.1 Delivverable – Final Report and Desiggn Recommendaations

5.6 Ro
on Kleinm
man
Title: CTOO
Organizaation: SIF Association
Phone: 202­-607-8526
Email: rkleiinman@sifasssociation.orrg

Backgrou
ound
Prior to joining SIF, Mr. Kleinmaan was the Chief Technical Evangeliist for Sun Developer
Relationss, and servedd as Sun's representativee on multiplee industry-wwide Java and d XML standards
committeees. He has extensive ex xperience con nsulting witth developers
r who are trrying to "javaa-tize"
their exissting applicaations. He haas prepared and delivereed numerouss presentatioons on Java
technologies both in the U.S and d overseas. His particularr areas of exppertise inclu
ude Java on the
Server (E
EJBs and servver-side APIIs), Jini, Javaa-based devicce access conntrol and maanagement, and
more recently, XML.

KeyyTakeaway
� Grannularity of searching
s caan impact security poliicy
impplementation n.
� Real-time hash hing could be a performmance ancho or.
� Use of federateed identity cconcepts maay provide a solution
for record
r mapp ping across sources.
� Use of a centrall security au
uthority buiilt into the process
p
coulld enforce strong
s data protection
p iin the system.
� Sepaaration of crross-walk table
t with fiirewall.

Summary
ry
Dr. Klein
nman spent a majority off the intervieew focused on the securiity implemen ntation for the
LDS. He pointed out that the graanularity of searching caan impact seccurity policyy implementtation.
The moree the implem mentation alllows finer reesolution in the search, the stronger the securityy
policy neeeds to be deefined. Dr. Kleinman notted that the use of a centtral security authority coould
enforce strong data protection in n the system
m.

Dr. Klein
nman recomm mended thatt the team coonsider federrated identitty concepts to provide a
solution for record mapping acrooss sources. This federatted identity can be used to enforce
consisten
nt security policies acrosss the data sources. He felt that secu
urity measurres provided by
operatingg systems, database man nagement systems, firewwalls, and rou
uters can addd to the secu
urity
implemen ntation for the LDS. Forr example, th he linking/crrosswalk tab
ble in the dessign could be
separatedd from the reest of the sysstem with a firewall.

A final pooint made byy Dr. Kleinm


man was the concern thaat real-time hashing coulld be a
performaance anchor for the LDS.. Some userss may find th
he delay in reesults to be unacceptable.

Archiitectural Design and Best Practicces Project | P a g e | 41


A006.1 Delivverable – Final Report and Desiggn Recommendaations

5.7 Pe
eter Dobler
Title: Presiident
Organizaation: Dobller Consultin
ng
Phone: 813­3322-3240
Email: pdobbler@doblercconsulting.ccom

Backgrou
ound
Mr. Dobller started his profession nal career moore than tweenty two yeaars ago in software
developmment. After working man ny years as a consultant for the threee largest Swiiss banks hee
founded his own con nsulting busiiness in 19977. Mr. Doblerr is a recognized expert in Sybase ASSE,
Sybase Replication Server and Syybase IQ. Hee also has maany years of Oracle expeerience, inclu uding
the latestt 11g release. Mr. Dobler also has in-ddepth knowwledge of SQL L Server 200
00 and 2005..

KeyyTakeaway
� Thee problem with a federaated databasse is the perrformance
� Creaate a hash using not on nly identifiabble informaation but
exteernal data ass well, such
h as the sourrce.

Summary
ry
The majoority of the discussion with Mr. Dob bler focused on the perfoormance of th
he federated d
databasee and how too improve itss efficiency. If Virginia were able to perform various filteringg of
the data prior to enteering the SLD
DS’s data enngine, the perrformance could be improved
ntly. He sugggested off-th
significan he-shelf solu
utions includding those from Sybase IQ and SAP.

Mr. Dobller also weigghed in on th


he use of hash key identiifiers and sugggested a haash key basedd on a
combinattion of data element(s) and externall data. In a particular example, he coombined PII data
within a record withh the source database ideentifier. The use of exterrnal data wou uld help to add a
layer of protection an
nd informatiion to the haash key.

Archittectural Design and Best Practicces Project | P a g e | 42


A006.1 Delivverable – Final Report and Desiggn Recommendaations

5.8 Drr. Laura Haas


Title: IBM Fellow
Direcctor, Compu
uter Science
Organizaation: IBM Almaden Research Cen nter
Phone: 408­-927-1700
Email: lauraa@almaden.iibm.com

Backgrou
ound
Dr. Haas is an IBM Distinguished d Engineer and Directorr of Computeer Science att Almaden
Research h Center. Preeviously, Dr. Haas was a research staaff member and managerr at Almaden n. She
nown for herr work on th
is best kn he Starburst query proceessor (from which DB2 UDB was
developeed), on Garlicc, a system which allowed federation of heteroggeneous dataa sources, and d on
Clio, the first semi-au
utomatic toool for heterogeneous sch
hema mappin ng. Dr. Haas is Vice Pressident
of the VLLDB Board off Trustees, a member of the IBM Acaademy of Teechnology, an nd an ACM
Fellow.

KeyyTakeaway
� Wh hen performming joins onn-the-fly, it is importannt to
min
nimize the vo olume of thhe data and minimize
m th
he trips
backk and forth..
� A seemi-join can
n be an efficient way too link data on-the-fly
o
and may be bettter than a liinking tablee/directory.
� A neested-loop join
j can be uused on thee most consttrained
sourrce from thee query andd then join data
d from th
he other
sourrces as need
ded.
� Commmercial daatabases ten nd not to usee join index
xes due to
the amount of look-ups
l wh hich degrad de performaance.

Summary
ry
The interrview with Dr. Haas wass one focused on optimizzing the desired architeccture of Virgginia’s
SLDS. Knnowing thatt Virginia woould not be able to store linkages perrsistently annd the system
m
would neeed to be a feederated dattabase, Dr. Haas was ablle to share vaarious techn
niques that would
improve the performmance of the system. Firstt and foremoost, Virginia would need d to minimizze
both the volume of data being traansmitted an nd minimizee the back annd forth trip
ps between thhe
data sourrces and the SLDS.
Instead of a linking table, Dr. Haaas stated thaat in-memorry joins would likely resuult in greateer
performaance, depend ding on the types of querries that are conducted. Linking tablles or join in
ndexes
are not tyypical of com
mmercial dattabases as thhey could reqquire multip
ple look-ups,, thereby waasting
cycles an
nd decreasingg efficiency. Dr. Haas didd not suggesst a linking table for mulltiple databaase
systems. The SLDS’ architecture could be designed to maaximize perfformance byy identifying the
types of queries and the frequenccy pattern off each type of query. Semmi joins, merrge joins, andd
nested-looop joins weere different types of join
ns that Dr. Haas recomm mended as viaable optionss, each
with perfformance beenefits depen nding on thee database size, query typ pe, and querry frequencyy
patterns.. 

Archittectural Design and Best Practicces Project | P a g e | 43


A006.1 Delivverable – Final Report and Desiggn Recommendaations

5.9 Drr. Thilini Ariyachandra


Title: Assisstant Professor, Manageement Inform
mation
Systeems
Organizaation: Xaviier Universitty
Phone: 513-7745-3379
Email: ariyaachandrat@x xavier.edu

Backgrou
ound
At Xavierr, Dr. Ariyacchandra teacches principlles of inform
mation system ms, business intelligencee and
data mannagement an nd modeling exposing stu udents to BII and database offerings byb Teradataa,
Oracle, Microsoft and Microstrategy. She haas received seeveral award ds for scholarly excellencce.
Her reseaarch is focussed on the seelection, desiign, implemeentation of business inteelligence
solutionss in organizaations, inform
mation systeem success as well as imp pacts of sociial networkiing.
She has published in high impactt practitioneer and acadeemic journalss.

KeyyTakeaway
� In a federated model, smalll query setss work best,, and ad hocc
can lead to poo
or performannce.
� Trennds show distributed database im mplementatio ons movingg
tow
wards the fedderated moddel.
� In-m
memory dattabases are not as scalab ble as stand
dard DBMS.

Summary
ry
Dr. Ariyaachandra’s paper that coompared diffeerent data warehouse arrchitectures provided a
starting foundation for the intervview. She actually was hoping to tallk the team out of the
federatedd model, butt soon realizeed that was not an optioon. In a federrated model,, Dr. Ariyach
handra
noted thaat a small qu
uery sets work best (con ntrolled envirronment), an nd ad hoc caan lead to pooor
performaance. Based on her interp pretation of the architecture, the prooposed federrated architeecture
does not easily suppoort ad hoc annalysis of data and, thereefore, may noot be impactted by
performaance.

Dr. Ariyaachandra expplained that her current consulting engagementts show that trends in
distributted databasee implementaations are moving towarrds the federrated model. She also notted
that in-m
memory dataabases are noot as scalablee as standard
d DBMS; howwever, vendoors are startiing to
offer prodduct optionss.

Archittectural Design and Best Practicces Project | P a g e | 44


A006.1 Delivverable – Final Report and Desiggn Recommendaations

5.10 Drr. Cynthia


a Dwork
Title: Distiinguished Sccientist
Organizaation: Micrrosoft Reseaarch
Phone: 650­693-3701
Email: dwork@microsooft.com

Backgrou
ound
Dr. Dworrk is the worrld's foremost expert on n placing privvacy-preservving data anaalysis on a
mathemaatically rigorrous foundattion. A corneerstone of thhis work is differential privacy, a stroong
privacy guarantee peermitting higghly accuratee data analyssis. Dr. Dwoork has also made seminaal
contributions in crypptography an nd distributed computin ng, and is a recipient of the Edsger W.
Dijkstra Prize, recognnizing somee of her earlieest work establishing th
he pillars on which everyy
fault-toleerant system
m has been buuilt for decaddes.

KeyyTakeaway
� Theere’s no prin
ncipal way tto sanitize data.
d
� Re­iidentificatioon techniquues are gettiing faster an
nd cheaper.
� Diffferencing attacks are abble to re-ideentify inform
mation evenn
withh large aggrregate data.
� Thee addition off “noise” is u
used to prevvent variouss attacks
incluuding differrencing andd averaging.
� VDO OE should consider wh hether histoorical recordds should bee
arch
hived after a period of time as a seccurity measu ure.

Summary
ry
As a privvacy expert, Dr. Dwork’ss interest in Virginia’s
V SL
LDS was the security rulles being plaaced
on the arrchitecture and on the rooles of users given accesss to the dataabase. She waas quick to state
that re-id
dentification
n of data wass becoming faster
f and ch
heaper and, as a result, th
hat there waas no
principall way to saniitize data. Dr. Dwork strressed the im
mportance of data securiity and
anonymization. She offered her in nsights on various
v techn
niques that are used to re-identify daata
and coun ntermeasuress to those teechniques.

Upon leaarning that Virginia inten nded to pressent aggregaate data to thhe public, Drr. Dwork
providedd examples of “differencin ng attacks” that
t use twoo large data sets with simmilar informmation
and could d be used to re-identify individuals. To combat these differeencing attack ks the
introduction of data “noise” can be added to the actual data which, if done propeerly, can
effectivelly stop the ability to average the resuults of aggreegate data. The addition of noise should
not signiificantly skeww the aggreggate data, buut may be aggainst certain n regulationss or policies..
Another means of prootecting agaainst differen ncing attacks is to limit a user’s abiliity to submitt
queries or limit theirr ability to ru
un queries th
hat are too siimilar.

As a geneeral questionn, Dr. Dwork k asked if thee SLDS wou uld have a lim
mit on its abiility to queryy
historicaal data. By archiving or liimiting the SLDS’ accesss to data oldeer than a deffined period,, the
SLDS can n again limitt potential seecurity attaccks. This lim
miting of dataa also would
d have the ad dded

Archittectural Design and Best Practicces Project | P a g e | 45


A006.1 Delivverable – Final Report and Desiggn Recommendaations

benefit of reducing th
he total recoords within the databasee which wou
uld increase performancee of
the systeem.

When diiscussing thee internal usse of the SLD DS and the ab bility of reseearchers to gain access too
record level data, Dr. Dwork talk ked about th
he high risk of privacy brreach and thee need for
limitationns and ruless that should d be placed on these userrs. Security of a system must be
implemen nted as a com mbination of both techn nological seccurity measu ures and secuurity policy that
governs the individuals with access to the syystem; only by having booth, can a sysstem maintaain a
high leveel of securityy of its inform
mation.

Archittectural Design and Best Practicces Project | P a g e | 46


A006.1 Delivverable – Final Report and Desiggn Recommendaations

6 SLDS Arch
hitecture
e Overview
The SLDS architectu ure consists of seven funcctional
componeents. Commeercial-of-thee-shelf (COT TS) productss will
be used where appliccable, and sh
hared compu uting resourcces
will be used in the ph
hysical implementation where
applicablle.

The SLDS architectu ure can be rep presented byy a bull’s-eyee


signifyingg the data-centric naturre of the arch
hitecture. Thhe
importannce of security is reflecteed in the representation
through the dual ringgs that surroound criticall componentts
nside the poortal. For exaample, a secu
located in urity ring
surroundding data inddicates tightt security of that data
componeent, and the ring surroun nding the fouur tools and task
oriented componentss illustrates security con ntrols built innto the otheer functionall componentts.
The SLDS Portal provides the keey interface into the arch hitecture.

6.1 SL
LDS Seve
en Functiional Com
mponents
s
6.1.1 Portal
Thee front door into the Stattewide Longgitudinal Datta System (SSLDS) is throough
the SLDS Portall. The SLDS Portal proviides both pu ublic (anonymmous) and priivate
med) users with a varietyy of functions and servicees. Developm
(nam ment of the Portal
willl performed using a moddern applicattion framewoork, e.g., .Neet or Java and
da
conttent manageement system m, e.g., DotN
NetNuke or Umbraco.

Named users gain access after thhey have requ uested an account, and their requestt has been
approvedd by the apprropriate ageencies. Once approved, th he named usser account has access too help,
training, the Lexicon
n, requests foor data, statu
us of requestts, and accou
unt maintenance includiing
password d reset.

SLDS Poortal Compoonents


The Porttal provides access to virrtually all of the SLDS coomponents to include th he Shaker,

Reports, Lexicon, Daata, and a lim


mited amoun nt of Workflow. In addittion to the SL LDS
Components, the Porrtal providess services su uch as help fiiles, frequenttly asked qu
uestions (FAQs),
hyperlink nd the abilityy to request a private (naamed) accoun
ks to Agencyy reports, an nt. Figure 4
provides a conceptuaal representaation of the functions
f an
nd services which will bee accessible
through the SLDS Poortal.

Archittectural Design and Best Practicces Project | P a g e | 47


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Figure 4: Concceptual SLDS Web Portal

Public (A
(Anonymouss) User Functionalit
n y
Public usser will havee access to th
he following features and
d functionality:
• Help files for functions which are avaailable to public users.
• Frequently assked questioons (FAQs).
• Prebuilt aggregated data reports whiich have been n approved by the goverrnance structure.
• Lexicon elem ments which have been ap pproved by the governannce structurre.
• Hyperlinks too Agency rep ports on other websites.
• Electronic req quest and workflow for Named Useer Account reequests.

Private (Named) User


Us Function
onality
Private users will havve access to additional fu
unctionalityy which is noot available to public useers.
Requestss for private accounts wiill be submittted electronnically usingg elements off the SLDS Portal
and the Workflow coomponent. Procedures for submittin ng, approvinng and denyin ng account
requests will be delin
neated by thhe governancce structure.

Users whho have beenn approved for a private account willl be notifiedd by email. Access to privvate
account features willl be granted after users h
have supplieed a valid useername and password an nd
have been
n authenticaated against either the CCOV or COV V AUTH direectory. Privaate users mayy,
dependinng on their permissions, have access to the followwing featurees and functiionality:
� Help/Trainin
ng files to incclude “How T
To” and insttructional viddeos.
� Reports
– Abilitty to view noon-suppresseed aggregateed.
– Abilitty to access the Query Bu uilding Tooll (QBT) for constructingg data requessts.
� Lexicon
– Functtionality determined by the governan nce structurre.
� Workflow

Archittectural Design and Best Practicces Project | P a g e | 48


A006.1 Delivverable – Final Report and Desiggn Recommendaations

– Abilitty to electronnically subm


mit and track
k Data request.
– Abilitty to retrievee data which
h has been reequested andd approved.
– Abilitty to attach files.
– Abilitty to check status, modiffy or cancel account and d/or data requ
uest.
� Password resset
– Abilitty to reset th
he user’s passsword. This capability may be proviided through
h the
COV AUTH direcctory processs.

Applicattion Framew
work and C
Content Man
nagement System
Sy Featu
ures
The SLDS Portal willl be developed using a mmodern appliication devellopment fram mework and d
content managementt system (CM MS). Use of an applicatiion framewoork, such as Microsoft .N
Net,
and a conntent managgement systeem, such as UUmbraco, alllows for the developmen nt of rich
functionaality and serrvices with minimal deveelopment. Most contentt managemen nt systems
include features and services succh as web URRL control, custom conttent types an nd views, revvision
control, taxonomy, user managem ment, docum
mentation, an nd established communiity support.

6.1.2 Security
Seccurity is the foundation component for the SLDS. The sensittivity of the
infformation annd policies reegarding whho and how data is handlled will be
maanaged throu ugh a cohesivve security model. The model used for the SLDS S
inccorporates auuthenticatioon and authoorization pieeces.

Auuthentication n is required
d for all privaate (named) users, to incllude research
hers
as well ass agency empployees. Ressearchers and agency em mployees willl be authenticated as a
precondiition to gainiing access too the named user portion ns of the SLD
DS portal. Aggency emplooyees
will be au
uthenticatedd before gainning access to the Workflow component of the SLDS application.
Figure 5 depicts the interaction of the Work kflow compoonent with other SLDS components,, as
well as th
he authenticcation interfaace for agenccy employeees and researrchers.

When reequests for acccounts and d data access are submittted through the SLDS Poortal, the
Workflow componen nt triggers messages to ddesignated Commonweaalth of Virgin nia (COV)
employeees for revieww and action.. The action takes the foorm of approvval or deniall. In order foor a
COV emp ployee to intteract with the Workfloow componeent, s/he wou uld need to log in
(authentticate by the COV Activee Directory) to the COV V infrastructu
ure. Thereaftter, s/he wou uld be
able to acccess the Workflow com mponent in oorder to act on the Workkflow trigger.

Archittectural Design and Best Practicces Project | P a g e | 49


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Figuree 5: Security Flow

For reseaarchers, authhentication would occurr through thee SLDS Portal using the COV AUTH H
directoryy. After an acccount requeest is approvved, a researccher would be required to log in to their
accountss to make data requests from resourcces for which h they have received appproval.
Authorizzation definees user roles and the perm missions asssociated withh those roless.

For exam
mple, a researrcher (role) would
w have access to vieew (permisssion) the Lex
xicon, while a
data adm
ministrator (rrole) would have access to view and d modify (perrmission) th
he Lexicon. The
Workflow componen nt is the hub
b for managing a user’s roles and associated perm
missions. Th
he
SLDS commponents cooordinate wiith the Workflow to maanage requessts for services correctly..

6.1.3 Workflow
The back offiice of the SLDS is the Workflow com mponent. Th he SLDS
Workflow will be develooped using th he Microsoftt Dynamics Customer
Relationship Managemen nt (CRM) package. CRM M is a solutiion for
automating in nternal busiiness processses by creatiing workfloww rules that
describe routtine tasks invvolving dailyy business operations. These processes
can be design ned to make sure that apppropriate an nd timely infformation iss sent
to the correct peoplee. To initiate workflows and in orderr to act upon n the successsful complettion of
workflowws the SLDS S Workflow component will need too have interfaaces to the other compon nents
of the SL
LDS Portal.

The mainn function off the SLDS Workflow coomponent iss to manage and define a series of tassks
within an
n organization to producce a final outtcome or outcomes. Theese workflow ws will alloww the
partner agencies to work togetheer to controll access to th heir shared data and systtem. The
workflowws will handdle email alerrts and notiffications to both the partner agenciees and to thee

Archittectural Design and Best Practicces Project | P a g e | 50


A006.1 Delivverable – Final Report and Desiggn Recommendaations

portal naamed users. Figure 6Figu


ure 6 providees a conceptual represen
ntation of thee architecturre
and interrfaces of the SLDS Workkflow Comp ponent.

Figure 6: Workflow Com


mponent

Interfacees
To initiatte workflow
ws and in ord
der to act upoon the succeessful complletion of worrkflows the SLDS
Workflow componen nt will need to have inteerfaces to thee other comp
ponents of th
he SLDS Porrtal.

Portal In
nterfaces
Named users will intteract with the SLDS Portal in orderr to submit data to the Workflow
componeent to initiatte the follow
wing workflow processess:
� User Access Request
� User Query Request
Once thee workflows have been completed, th he result of the process will be comm
municated back
to the naamed user th
hrough the Poortal. The Workflow component wiill need to bee able to pussh

back the following in


nformation to the SLDS Portal:
� Approval/Dissapproval meessages,
� Request for additional daata,
� Portal user roole permissioons,
� And query result file locaation.

Shaker Interface
I s
When a query requesst is approveed, the Work kflow component will in nteract withh the Shaker in
order to submit the query for exeecution. Thee Shaker willl notify the Workflow coomponent of the
success or failure of the query ex
xecution and
d, in the even
nt of success,, the resultan
nt file location.

Archiitectural Design and Best Practicces Project | P a g e | 51


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Multiple
le Componen
nt Interfacees
The Worrkflow comp ponent will be used to cooordinate th
he access autthorization in multiple
componeents of the SL LDS system.. This will alllow user acccess to be ceentrally adm
ministered bu
ut
distributted to the ind
dividual com
mponents baased on purp pose and need d.

Workfloows
The SLDS Workflow w componentt manages an nd defines a series of tassks within an
n organizatioon to
produce a final outcoome or outcoomes. It will allow the SL LDS team too define diffeerent workflows

for differrent types of jobs or proccesses. At each stage in the workflow w, one indiviidual or grou
up is

responsib ble for a speccific task. Once the task


k is completee, the workfllow softwaree ensures that the

individuaals responsib ble for the neext task are notified and
d receive the data they need to execu ute

their stagge of the proocess. It will also automaate redundan nt tasks and ensure incoomplete task ks are

followed d up.

User Acc
ccount Requ
uest
A workfllow must ex
xist to review w a request foor named ressearcher acccess to the Poortal. The

workflow
w will resultt in one of th
he following outcomes:

� Approve,
� Disapprove,
� Or request more data (up pdate accounnt request).

A workfllow must ex
xist to review w a request foor named daata owner acccess to the Portal. The

workflow
w will resultt in one of th
he following outcomes:

� Approve,
� Disapprove,
� Or request more data (up pdate accounnt request).

User Ad
d Hoc Queryy Request
A workfllow must ex xist to review
w a query req
quest for a naamed researcher. The woorkflow willl

result in one of the foollowing outtcomes:


� Approve,
� Disapprove,
� Or request more data (up pdate query rrequest).

User Ad
d Hoc Queryy Result
A functioon must exisst to handle the result off a query requ
uest for a naamed researccher. The fun
nction

must perrform the folllowing task


ks:
� Receive back status,
� Receive file loocation,
� Communicate resultant file location tto the named researcherr through thee portal,

� Communicate status to th he named reesearcher thrrough the poortal,

� Communicate status to th he named reesearcher thrrough an alert or email,

� And commun nicate failed status to prooper administrator.

Archittectural Design and Best Practicces Project | P a g e | 52


A006.1 Delivverable – Final Report and Desiggn Recommendaations

All accou
unt and dataa requests aree processed through andd managed by the Work kflow component.
Workflow monitors and triggerss actions succh as query submission and maintain ns status of
requests.. Workflow is the sourcee of informattion about roles and perrmissions forr SLDS userss.

When an n account reqquest is subm mitted, it is the Workfloow componeent that man nages the
message((s) and notiffies designatted COV emp ployees abou
ut the request. Through the Workfloow
componeent, employeees can approove or deny the request. Workflow then notifiess the submittter of

the accou
unt request of the final decision.

On a dataa request, Workflow moonitors the rrequest, conffirms approvval, and subm mits the query to

the Shak
ker for action n. Designatedd COV emplloyees are nootified of thee request to approve or deny

the queryy. If the requ


uest is denied
d, Workfloww notifies thee researcherr of the deniaal status. If

approvedd, Workflow w submits thhe request to the Shaker and continu ues to monitoor status. Up pon

completiion of the traansaction, Workflow nootifies the ressearcher thee data set is available for

downloaad.

6.1.4 Reporting
Thhe SLDS Busiiness Intelliggence (BI) arrchitecture will supportt two scenarios:

� Ad Hocc reports of rrecord-level user data.

� Pre-deffined (canneed) reports of aggregated d linked dataa.

Ad Hoc Recor
o d-Level Data
Da BI Archi
hitecture
A visual repressentation of the Ad Hoc record-levell BI architectture is preseented

in Figuree 7. This arch


hitecture connsists of the following major componnents:

� A Lexicon and Shell Dataabase that arre built based d on the souurce data and d will supporrt the

Logi Ad Hoc tool.


� Report Creattion using th he Logi Ad HHoc Business Intelligencee platform. This consistss of

th
he Logi Ad Hoc Query Building Tooll and Ad Hooc Metadata..

� A Workflow engine that routes reporrt submissioons through an approval process.

� The Shaker th hat will servvice the querry against reccord level daata in the dissparate source

syystems.
� The record leevel Query Results that w will be preseented to the user.

Archittectural Design and Best Practicces Project | P a g e | 53


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Figurre 7: Ad Hoc Record-Level BI Architectu


ure

n and Shell Database


Lexicon
The Lexiicon will conntain information about the data objjects that eacch of the sou urce data sysstems
have mad de available to the SLDS. This inform mation will be utilized too create a Shhell Databasee. The
Shell Dattabase servess two purpooses: (1) it is an instantiation of the Lexicon and is needed byy Logi
Ad Hoc in order to fuunction; (2) it contains sample data that will be used by Loggi Ad Hoc too
allow ressearchers to preview theeir query requ uest. A proccess will be built that creeates and
populatees the Shell Database bassed on the in nformation available in th he Lexicon.

Report Creation
The Repoort Creation n process willl be provideed through Logi Ad Hoc. Logi Ad Hooc has a self--serve
user interface to allow
w a user to specify a rep
port query. Itt has facilitiees for the useer to save the
query and preview a sample of th he report. When the user is finished with the rep port, they caan run
the reporrt. During th
his step, a cu
ustom processs will interccept the querry that has been submittted.
The querry produced along with the parameters includin ng columns selected and filters speciffied
will be seent to the Workflow com mponent.

Workfloow
The Worrkflow comp ponent routees the query through thee appropriatee steps to geet acceptancee by
specified
d agency reviiewers.

Shaker
The Shak ker interactss with the Leexicon and th
he source daata systems to query and d join togeth her the
record level data from m each of thee source data systems. Itt deposits th
he joined datta set in eitheer a
file or a database table.

Archittectural Design and Best Practicces Project | P a g e | 54


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Query Results
As mentiioned previoously, the Shaker has thee option to place the resu ulting query data in a
databasee table or filee. If the resullts are sent to a file, the submitting researcher is notified thaat the
file is avaailable for doownload by the research her from the Portal. If thee results are placed in a

databasee table, reporrts can dynam mically be crreated in Loggi Info that references th he data in th hese

tables. Th hese reportss will providde the user with


w some lim mited capabiilities for anaalysis like

filtering, sorting, and d grouping of the data.

Aggregat
ated Linked
d Data BI Arcchitecture
A visual representatiion of the BI architecturee for aggregaated linked data is preseented in Figu
ure 8.

This arch
hitecture connsists of the following m
major compon nents:

� The Shaker too join recordd level data frrom the sourrce data systtems.

� A repository for the recorrd level linkeed data.

� An ETL (extrract, transforrm, load) toool and proceess to extractt data from the record leevel

liinked data sttore and load


d it into the aggregated linked data store.

� A repository for the aggreegated linked data.


� The Logi Infoo Business Inntelligence platform thatt will be used d for servingg up prebuiltt
reeports.
� The SLDS Porrtal where th he Logi Info reports willl be embedd ded.

Figuree 8: Aggregatee Linked Dataa BI Architectu


ure

Shaker
The Shakker interactss with the Leexicon and thhe source daata systems to query and d join togeth
her the
record level data. It deposits the joined data set into a daatabase tablee in the Recoord Level Lin
nked
Data Storre.

Archittectural Design and Best Practicces Project | P a g e | 55


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Record Level Linked


e Data Store/ETL Proccess/Aggregate
g Linked Data Store
An ETL process takees data from the Record Level Linked d Data Storee, aggregatess it to the
appropriiate level, and
d loads the aggregated data
d into thee Aggregate Linked Dataa Store. Oncee the
data has been loaded d, the table in
n the Record
d Level Linkeed Data Storre will be pu
urged.

Prebuiltt Reports
Prebuilt Reports will be created using Logi IInfo and willl use data in the Aggregaate Linked Data
Store. Th
he Logi Info product alloows for custoom design an nd forma rep ports that pu
ull data from
ma
pre-speciified data soource. The reeports produ uced can conntain tables, charts, and maps or a
combinattion. Logi In nfo has somee prebuilt anaalytical typee reports thaat allow an en nd user to
perform some limited d analysis off the data inccluding sortiing, filteringg, and groupiing the data..

SLDS Poortal
The Preb
built Reportss will be mad
de available in the SLDS
S Portal.

6.1.5 Lexicon
“The Lexicon is an inventory of evvery available data field in evvery available data source, thhe
structture of their stoorage, the posssible values and meanings of the informatioon stored, all possible
transfformations of each set of fieldd values to anoother set of fielld values, methhods of data soource
accesss, and matchingng algorithms aand how they are to be used inn conjunction with possible field
value transformatioons.”

The Lexiicon (Figure 9) contains no data from m any data source. It willl be used to manage thee Shell
Databasee for users too build queriies against, aas well as prooviding the Shaker withh appropriatee
information to prepaare an optimiized query ssequence for data requessts. The Shelll Database will
contain fictitious datta.

A researccher, when building a quuery, interacts with a sett of field nam


mes and relattionships to
formulate a query. Thhe user interrface for the query buildiing providess a simple vieew of the Leexicon
for easy query constrruction.

To mainttain the accu uracy and to manage exttensibility off the Lexicon
n, the compoonent processses
all data sources perioodically at a predetermin
ned time/interval searching for:
� Changes in data ranges,
� new data field ds,
� annd anythingg else that woould disruptt the probabilistic matchhing or proviide more waays to
“sslice and dicce” the data.

Anomaliees found by the linking module will prompt an alert for an administratoor to modify the
matchingg algorithm or add new query choicees.

Archittectural Design and Best Practicces Project | P a g e | 56


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Fig
gure 9: A Logiical Representtation of the Lexicon and itss Interactions

6.1.6 Shaker
The Shaker’ss general fun
nction (Figurre 10) is to acccept an app proved queryy and
return a dataaset. The queery will be broken down n into a seriees of optimizzed
steps, or sub-queries, to retrieve de-iidentified daata from the appropriatee data
sources in th
he most efficiient mannerr. In keeping with the inttent of the
original requ
uest, query foorms (e.g. inn
ner join, leftt join, equijoiin) and speccified
ffinal output parameters (e.g. counts of non-matcching record ds by demogrraphic
categories) will be takenn into considderation. Infoormation froom the Lexiccon
concernin ng data stru
ucture and reelationships will be used d to produce a dynamic sub-query pllan for
data retriieval that miinimizes proocessing time
m and worklload on the target data soources.

For each query subm mitted to the Shaker, a raandom key iss generated. Each sub-qu uery in the data
retrieval plan will sen
nd this rand
dom key to th he data sourrce to be usedd in creatingg a secure on
ne­
way hash hed key for any applicabble records. This list of haashed keys is then used by the Shaker k to
combine records acrooss multiplee data sourcees, never tran nsferring anyy identifiablee informatioon out
of the datta source. An
ny hashed keys used to link records will be removed from th he final data set
and replaaced with yeet another raandom key which cannott be traced back to any original dataa
sources. The resultinng combined d records are then upload ded to a file or database table for lateer
access byy the user.

Archittectural Design and Best Practicces Project | P a g e | 57


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Fig
gure 10: A Loggical Represen
ntation of the Shaker and itts Interactionss

6.1.7 Data
Th
his SLDS Datta Architectu ure (Figure 11) consists of the Sourcee Data Systeems.
Th
he Shaker wiill submit qu ueries to the target data systems andd join the resulting
datta sets. The data will optionally be written to a database thaat resides in the
SL
LDS environm ment. The SL LDS environ nment will allso contain other databaases
neeeded by the SLDS Portall including a Metadata/S Security dataabase, Work kflow
dattabase, Lexiccon, Shell daatabase, and
d an Aggregate Linked Data databasee.

SLDS Daatabases
Several databases wiill reside in SLDS environ
nment. Thesse databases will act as a repository for
data and metadata neeeded by varrious compoonents of thee SLDS Portaal. Each dataabase and itss
Portal ussage are desccribed below
w.

Metadatta and Secur


urity Databas
ase
The Metaadata and Seecurity datab base will con
nsist of the Logi Ad Hocc metadata, logging data
repositorry, auditing data repository and any data that neeeds to be maintained too control secu
urity
for the poortal.

Workfloow Database
se
The Worrkflow datab base will conntain data neeeded by thee Workflow engine. Thiss will includ de
data needded to track the steps annd processess in the workkflow. It willl also includ
de data requiired
for security to be maiintained in the workfloww.

Archittectural Design and Best Practicces Project | P a g e | 58


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Figurre 11: Record Level Query BI Architectu


ure

Lexicon
The Lexiicon databasse will provid
de informatiion about the data objects that have been exposeed to
the SLDS
S Portal by each of the soource data syystems. Interraction with
h the data store will be
through the Lexicon user interfaace and admiinistration portion of thee SLDS Portal.

Shell Datab
a ase
A Shell Database is needed in ordder for Queryy Building Tool to functtion. The Sheell Database will
be built off of inform
mation contaiined in the L
Lexicon.

Shaker/D
De-identifieed Record Level Linked
d Database
The Shakker will optionally writee the results of the joined
d data from the Source Data Systemss to
the De-id
dentified Reccord Level Linked Datab base.

Aggregat
ate Linked Data
The Aggrregate Linkeed Database will be utilizzed by the prebuilt repoorts. This dattabase will be
populateed through ETL processees that will aaggregate daata from the De-identified Record Leevel
Linked Database. Stoored procedu
ures will be used by thesse reports foor data queryying and
suppresssion.

Archittectural Design and Best Practicces Project | P a g e | 59


A006.1 Delivverable – Final Report and Desiggn Recommendaations

7 Phy
ysical In
nfrastruc
cture
The SLDS application will be devveloped, testted and deplloyed in threee environmeents. The
Developmment environ
nment will be hosted at a Florida daatacenter opeerated by HP PCHost.com m. The
Test and Production environmen nts will be hoosted at the Commonweealth Enterp prise Solutions
Center (C
CESC) physsically locateed in Chesterrfield Virgin
nia.

7.1 De
evelopme
ent Enviro
ronment
The Deveelopment en nvironment (Figure 12) will be purch hased as a moonthly servicce. The
environm
ment will con nsist of Virtu
ual Machinees (VM) operrating on HPPCHost man naged VMW Ware
ESX infraastructure. This monthlyy service can
n increased, decreased orr eliminated
d as necessaryy.

Figure 12. Devvelopment En


nvironment

A request has been suubmitted to purchase a “slice” of com


mputing pow
wer which consists of a 1 4­
Core proocessor, 16 GB RAM and 300 GB of RAID 6 SAN storage. VIT TA EAD willl manage thee
Developmment VM’s using the folllowing ruless/guidelines:
� VITA EAD will build/con nfigure each VM as requested by thee developers..
� Itt is estimated
d that 6 – 122 VM’s will be built for th
his developm
ment effort.
o Portall Server
o Reporrt Server
o Work kflow Serverr
o Shakeer Server
o Databbase Servers (2)
� All Microsoftt OS based VM’s will be joined to th he VITA EAD D operated EADDEV Dom
main
unless otherw wise requested.

Archittectural Design and Best Practicces Project | P a g e | 60


A006.1 Delivverable – Final Report and Desiggn Recommendaations

� VITA EAD will operate a PFSense Firrewall that will protect the Develop pment

en
nvironment..
� Developers will access th
he environmeent using a VPN or a SSH H tunnel.
� Non-Develop pers on the COV network k may be alllowed web access to the developmen
nt
portal and rep
ports as requ
uired.
� Developers will be given full administrative contrrol over theiir respective VM’s.
� All Code willl be checked
d into a Codee Repositoryy operated byy VITA EAD D.
� Developmentt VM’s will not be backeed up unless specifically requested.

7.2 Te
est Enviro
onment
There wiill be no add
ditional serveers specificallly ordered for the SLDS Test enviroonment (Figuure
13). VITA
A EAD has orrdered new physical serrvers that wiill have the capacity to su upport SLDS
testing allong with otther applicattions. This methodologyy was adopteed to reduce infrastructu ure
costs by sharing physical infrastrructure. VIT TA EAD will be responsible for all appplications
residing on these shaared servers.

Figure 133. Test Environment

The test web and app plication serrvers will have two 12-C
Core processoors, 128 GB RAM and 14 44 GB
of RAID 5 local disk space while the test dattabase serverr will have tw
wo 4-Core processors, 32 GB
RAM and d 273 GB of RAID 5 locaal disk spacee.

The SLDS application will leveraage DOE and d SCHEV tesst databases in their testt environmen
nts.
The SLDS application will primaarily use web b services to connect and
d will use th
hat same metthod
to connecct to any futture internall or external data source.

Archiitectural Design and Best Practicces Project | P a g e | 61


A006.1 Delivverable – Final Report and Desiggn Recommendaations

The Testt environmen nt web serveers will be acccessible from m the Intern
net. VITA EA AD system
administtrators will access the baackend SLDS S servers viaa 2-factor VPPN RDP. DO OE and SCHE EV
will conttrol administtrative accesss to/from th
heir test dataabases. Deveelopers will normally nott have
access too the SLDS teest servers. Code/changees from the SLDS Develoopment envirronment willl be
promoted d to the SLD
DS Test envirronment thrrough a strucctured prom motion processs.

7.3 Prroduction
n Environ
nment
An additional web an nd applicatioon server wiill be requireed for the SL
LDS Production environm ment
(Figure 14). SLDS wiill also sharee new VITA EAD producction serverss. VITA EAD D has ordered
d new
physical servers that will have thhe capacity to support vaarious SLDS componentts along with h
other app
plications. This methodoology was ad dopted to red duce infrastructure costts by sharingg
physical infrastructuure. VITA EAAD will be reesponsible foor all applicaations residin
ng on the SL
LDS
and shareed servers.

Figure 14. Prroduction Envvironment

All SLDS S and shared servers willl have two 122-Core proceessors, 128 GB RAM and d 144 GB of RAID
5 local diisk space. Th
he SLDS app
plication willl leverage DOOE and SCH HEV producttion databasses in
their prooduction envvironments. The SLDS ap pplication will primarilyy use web serrvices to con
nnect
and will use that sam me method too connect too any future internal or external dataa source.

The Prod duction envirronment weeb servers wiill be accessiible from thee Internet. VITA EAD syystem
administtrators will access the baackend SLDS S servers viaa 2-factor VP
PN RDP. DO OE and SCHE EV
will conttrol administtrative accesss to/from th
heir productiion databasees. Developers will not have
access too the SLDS production seervers. All teest and prod duction serveers will be jooined to the COV
domain. Code/changges from the SLDS Test environmentt will be prom moted to thee SLDS
Production environm ment through h a structureed promotion process.

Archittectural Design and Best Practicces Project | P a g e | 62


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Appen
ndix A: Secondary Archhitecture
e
Best Practice C
Case Stu
udies
A.1 Illin
nois Statte Board of Educa
ation
State/Agency:
Illinois Staate Board of Education
Web Sitee:
http://ww ww.isbe.statee.il.us/ILDS/h
htmls/projecct.htm
Address:
100 North First Streett
Springfield d, Illinois 366104
POC:
Michael McKindles
POC Phoone:
217-782-03329
POC Email:
mmckindll@isbe.net

CaaseProfile

Stud
dentEnrollm 70735
ment:2,119,7
0436
Teacchers:135,70
869,81937
LDSGrant:$11,8

Backgrou
ound
In July 20
009, Illinois Governor, Pat Quinn, siigned into laaw the P-20 Longitudinaal Education n Data
38
System Act . The act was a response to the 22009 LDS grrant Illinois received from m the U.S.
Departm ment of Educaation. The grrant funded the Illinois Longitudinaal Data Systeem (ILDS), which
is being built to estab
blish the tecchnical and m
managementt systems necessary for the Illinois Board
of Education (ISBE) and its educcation partneers to managge, link and analyze P-20 0 education data.

In Decem
mber 2009, thhe ISBE creaated the Illin
nois Data Sysstem Advisory Committee. At its
inception
n, members included thee Assistant S Superintendeent, the Diviision Admin
nistrator, and
d the
ILDS Prooject Manageer.

System Design and


d Architecturre
Prior to its receipt off the federal LDS grant, IISBE implemmented a statte Student Id dentification
n
System (SIS) and exp panded its use. As of 200 09, the ISBE SIS included five years of student
enrollmeent data and program infformation; up pdated stud
dent demograaphic inform mation; and four
years of assessment results. The various dataa sources proovide data on n teacher dem mographic,
teacher certification,, LEA and scchool program m participattion, LEA fin
nancial inforrmation, LEAA

35
State educational data proofiles. (n.d.). Retrrieved from
http://nces.ed.gov/progrrams/stateprofiles/sresult.assp?mode=shorrt&s1=17
36
Ibid.

37
Statewidee longitudinal datta system grant prrogram - grantee state - Illinois. (n
n.d.). Retrieved from

http://nces.ed.gov/progrrams/slds/statte.asp?stateabbr=IL
38
Illinois Public Act 96-0 0107. (n.d.) Rettrieved from htttp://ilga.gov/leegislation/publlicacts/96/096--0107.htm 

Archittectural Design and Best Practicces Project | P a g e | 63


A006.1 Delivverable – Final Report and Desiggn Recommendaations

facilities,, specialized
d student proograms, LEA
A compliancee and monitooring, and LE EA child
nutritionn services. Ass a result, ISBE had moree than 100 diisparate colllection systeems on a rangge of
technologies.

The ISBEE data systemm currently cannot proviide data thatt can be used d effectivelyy in education
n
decision making. Datta currently collected byy the agency is highly fraagmented acrross various
systems and collectioon vehicles. This fragmen ntation coveers multiple data system
ms that includ de
student level data, ass well as a vaariety of systtems that maaintain data from other parts of the ISBE
educationn enterprisee (e.g. staff daata, LEA and d school program particiipation and LEA financial
information).

In the futture, the ISB


BE team expeects its new system to caapture and track longitu udinal data on
students in Illinois scchools, from
m pre-kinderggarten to theeir employmment outcomes. With a
longitudiinal data sysstem in placee, ISBE also plans to imp
prove its abillity to suppoort the Federral

Electroniic Data Exch hange Netwoork (EDEN))/EDFacts. IS SBE’s currentt system sup pports a seriees of
automateed programss that pull daata from variious source systems to produce the aggregationss and
calculatioons for EDEN/EDFacts. According too ISBE, this is a high maiintenance prrocess that can be
streamlinned with thee right data architecture and solution n set in placce.39

Lessons Learned
In Marchh 2010, the IS
SBE releasedd a Request ffor Sealed Prroposals (RF
FSP) to conttract with a
vendor too develop en
nterprise-wid de data arch
hitecture. Th
he ISBE plan will includee data from 13
different systems thaat currently use a mixturre of Access and SQL serrvers. These 13 systems range
from 150 to 3,000 datta elements and use Web b, LAN, andd standalone applicationss.

Currentlly, LDS systeem is in the design phasee. The state recently hireed Public Coonsulting Grroup
to design
n the data arcchitecture. Their method dology to daate has involved interviewws with proogram
and techn nical resourcces and align
nment with data model work they are performin ng for CCSSO.
The ISBEE team anticiipates the deesign processs will take 6 months and d framing th
he data
architectture will take another 6 months.

39
Illinois State Board of Education. (March 2010). Reequest for Sealeed Proposals (R
RFSP): Data Arrchitecture Vendor
for the Illlinois Longitu
udinal Data Sysstem (ILDS) Prroject.

Archittectural Design and Best Practicces Project | P a g e | 64


A006.1 Delivverable – Final Report and Desiggn Recommendaations

A.2 No
orth Dako
ota Deparrtment off Public In
nstructio
on
State/Agency: North
h Dakota Deppartment of Public Instrruction
Web Sitee: http://ww
ww.dpi.state.nd.us/
Address: 600 E. Bou
ulevard Aven
nue, Dept 2001
Bismarck, North Dakoota 58505
POC: Tracy Korrsmo
POC Phoone: 701-328-41134
POC Email: tkorsmo@@nd.gov

CaaseProfile

Stud
dentEnrollm 840
ment:94,728
Teacchers:8,18141
4

23,09042
LDSGrant:$6,72

Backgrou
ound
Prior to 2007, North Dakota did not have a Longitudinall Data System m; however, state leaderrs
realized the importan nce and benefits of linkiing data amoong North Dakota Deparrtment of Pu ublic
Instructiion (K-12 sch hools), the North Dakota Departmen nt of Commeerce, Workfforce Divisioon and
the North h Dakota Staate Board off Higher Edu ucation and pushed for an LDS projeect. To achievve
this goal,, these leadeers realized that foundatiional compoonents were necessary an nd, thus, hired
Claravieww to develop p a state-widde LDS strategic roadmaap. Shortly affter the road
dmap projectt
began, North Dakotaa applied forr and received a Statewid de Longitudiinal Data Systems grant from
the U.S. Departmentt of Educatioon. Future fu unding for th
he NDLDS will be a comb bination of these
federal fu
unds and staate appropriaations.

System Design and


d Architecturre
Currentlly, North Dak kota’s LDS is still in the design stagee. The Northh Dakota LD DS team planns to
build thee K-12 data and Workforrce Departm ment warehou use, while th
he higher eduucation dataa
warehou use will be ex
xpanded by the higher ed ducation com mmunity’s ITT staff. The three separaate
systems currently aree not integraated, but willl be in the fu
uture, possib
bly in 2011.

40
National Center for Educaation Statistics. "SState Profiles Hoome Page." Nationnal Center for Edducation Statisticcs. U.S. Departmeent of
Educationn, Fall 2009. Webb. 15 Dec. 2010. htttp://nces.ed.goov/programs/sttateprofiles/sreesult.asp?modee=short&s1=17& &s2=38
41
Ibid.
42
National Center for Educaation Statistics. "SStatewide Longittudinal Data Systtems Grant Progrram - Grantee Sttate – North Dakkota."
National Center for Educaation Statistics (NNCES). U.S. Depaartment of Educaation, May 2010. Web. 15 Dec. 20111.
http://ncces.ed.gov/proggrams/slds/statte.asp?stateabb br=ND


Archittectural Design and Best Practicces Project | P a g e | 65


A006.1 Delivverable – Final Report and Desiggn Recommendaations

During thhe planning stages, the NDLDS team m inventoried their existting statewid
de data sources
r
and discoovered that the North Dakota Deparrtment of Pu ublic Instrucction and had
d their own data
warehouuses. Therefoore, rather th
han creating an entirely new system, it was a nattural match to
build outt the K-12 warehouse to an LDS systtem. This app proach saved d the team time and
resourcess.

North Daakota’s K-12, workforce (from the H Human Resou urces Departtment), and higher educcation
data is ex
xpected to have its own separate waarehouse. Sin nce K-12 and d workforce data are alreeady
being stoored in a currrent warehoouse, North D Dakota is woorking on ex xpanding and d matching these
warehou uses to a longgitudinal datta system that is based on the K-12 data warehou use. The SLD DS,
which is being built as an extenssion of the K-12 warehou use is to conssume data frrom all the
different warehousess. The archittecture has not been finaalized – issuees of whetheer or not it will be
separated d from all th
he other wareehouses, seccurity, are stiill being reseearched and deliberated..
Addition nally, how annd to what extent the daata will be sh hared outsid de of internall researcherss have
not yet been decided d. North Dakkota may posssibly use a separate warrehouse and portal for
aggregateed data to bee available foor non-state-agency userrs.

Accordinng to North Dakota, its Higher Educcation agencyy is the mostt disjointed part of the LDS –
because the agency iss building its own and hhas their ownn staff. The linkages betw ween the ageencies
have not yet been staarted; howevver the state is now deveeloping a proocess to alignn student daata.
Attributees for the lin
nkages beingg considered are: name, date of birth,, gender, graaduating highh
school, an
nd social seccurity numbber.

North Daakota has daata sharing contracts bettween: K-12 and Higher Ed; K-12 and d Unemployyment
Insurancce; and Higheer Ed and Unnemploymen nt Insurancee. Another isssue they aree currently
working to overcomee is governan nce. Currenttly, part of th he statewidee LDS is goveerned by thee
state’s ow
wn privacy and sharing rules. Subcommittees arre currently reviewing th heir privacy and
sharing laws and tryiing to decidee if the curreent legislatioons are cond
ducive to a prroperly
functioniing and streaamlined process for an LDS.

Currentlly, North Dak kota does noot have an LD


DS design yeet. The statee is in the proocess of finallizing
the K-12 project plann. By the endd of Decembeer, North Daakota hopes to have a prooject plan in n place
for its LD
DS. North Daakota plans to utilize cannned reports to share with the publlic; they do not
plan on having ad-hooc reporting capabilities in the near future. In terms of fundiing, the statee is
currentlyy looking forr ways to auggment their current fund ding source. North Dakoota is hopingg to be
able to seecure fundin
ng to keep itss LDS runninng once it is built. North
h Dakota’s im mmediate neext
steps aree to work on a proof of cooncept, impllement its K-12 warehou use, and worrk on an “enttity
resolutioon accountabbility plan”.

Lessons Learned
Because the North Dakota LDS is in the earlyy stages of planning and d designing, Ms. Korsmo was
unable too provide maany recommeendations foor best practtices. Howevver, data govvernance has
emerged as an imporrtant issue. Although Mss. Korsmo diid not share specifics related to dataa
governannce issues, sh
he indicated that the teaam is facilitatting negotiations amongg the differen
nt
agencies and currentt legislation is being reviiewed. Ms Korsmo conclluded with a
recommeendation thaat prior to pllanning the d design of a syystem, to deetermine wheether there are

Archittectural Design and Best Practicces Project | P a g e | 66


A006.1 Delivverable – Final Report and Desiggn Recommendaations

existing centralized statewide aggency system ms upon whiich to build an LDS. Thee North Dakoota
team is optimistic forr its plans too expand thee K-12 data warehouse and expects this leveragin
ng of
the legaccy system to reduce projeect time and d costs.

Archittectural Design and Best Practicces Project | P a g e | 67


A006.1 Delivverable – Final Report and Desiggn Recommendaations

A.3 Washingto
on Educattion Rese
earch and
d Data Ce
enter
State/Agency: State of Washingtton Educatioon Research
h and Data Center
Web Sitee: http://wwww.erdc.wa.ggov/
Address: 210 11th Avvenue SW, Room 318
P.O. Box 43113
Olympia, W Washingtonn 98504
POC: Dr. Michael Gass
POC Phoone: 360-902-0 0599
POC Email: Michael.G Gass@OFM.W WA.GOV

CaaseProfile

Stud
dentEnrollm
ment:1,037,0
018
Teacchers:54,428
8
LDSGrant:$17,3
341,871

Backgrou
ound
The Wasshington Ressearch and Data Center,, an agency under Washiington’s Offi fice of Financcial
Managem ment, managges research and educatioon data for the state and d is leading the state’s LDDS
project. The Center also managess four-year hhigher educaation enrollm ment data systems. The SLDS
grant Waashington reeceived was based on an n inter-agenccy proposal to support feederal projeccts
that inclu
uded a P-20 data warehoouse. Since tthe beginnin ng of the projject in July 2010, Washin ngton
has estabblished an ex
xecutive sponsorship steeering comm mittee that cu
urrently is working throough
basic govvernance issu
ues, definingg scope and tthe project managementt framework k, etc.

System Design and


d Architecturre
The SLDS project is focused mosstly on an intternal abilityy to mine thee data. The proposal
submitteed to NCES specified thee LDS type aand what diffferent agenccies would be linked. Th he
SLDS teaam envisionss a P-20 dataa warehouse that would receive dataa from the K--12 warehou use
that currrently is bein
ng built. Add
ditionally, th
he LDS will “inherit the model” K-12 builds, thou
ugh
designs on the integrration modell are not yet in place. Washington’s LDS also wiill integrate data
from its stakeholderss:
� Office of the Superintend dent of Publiic Instruction (OSPI)– K-12 data
� Department of Early Learrning
� Higher Educaation Coordiinating Boarrd
� State Board of Education
� State Board foor Technicall and Comm munity Collegges
� Council of Prresidents – a voluntary association of the presideents of Wash hington Statte's
siix public bacccalaureate degree grantting institutiions
� Employment Security Deepartment – provides em mployment an nd employerr data
� Professional Educator Staandards Boaard

Archittectural Design and Best Practicces Project | P a g e | 68


A006.1 Delivverable – Final Report and Desiggn Recommendaations

� Higher Educaation Coordiinating Boarrd


� In
ndependent Colleges of Washington n
� Workforce Training and Education C Coordinatingg Board

Since OSSPI contracteed out its K--12 data systeem that mayy influence th
he design off the larger LDS.
OSPI con ntracted witth a vendor to customizee its existingg longitudinaal system. Th
he system will be
built oveer the next yeear.

The Wasshington Staate team alsoo is investigaating the inittiative with the CCSSO (Council of Chief
State Sch hool Officerss) model, whhich is a P-20
0 core modelling project to be designned by Publicc
Consulting Group (P PCG) and fu unded by thee Gates Foun ndation. Thee project invoolves buildinng a
data dicttionary and longitudinal data model.. Several stattes currentlyy are particippating in thiis
initiativee and may usse the resultiing system. The Washin ngton SLDS team is conssidering usin ng this
model if the state willl build interrnally and will not purch
hase an off-tthe-shelf prooduct.

The Wasshington LD DS team has yet to determ mine its LDS S’ technical architecture. However, th he
state willl be using an
n SQL serverr – as it is alrready using it in current systems. Th
he greatest isssue is
whether or not its LD DS will be ouutsourced beecause the sttate will havve less control of its systtem if
it is outsoourced. Addditionally, Washington is considerin ng a complete Microsoft BI package such
as ShareP Point.

Data Usa
sage and Rep
porting
The visioon for the sysstem was a normalized data set subsscription serrvice at leastt through an
n
authenticcated site, which would d provide a nu
umber of repports. Curreently, the teaam anticipatees
that matcches among agencies wiill be primariily through social securiity numbers (SSN), althoough
the state will use a ceentralized matching systtem that reliies on more than social security nummbers.
High schhool and postt-secondary institutionss will be mattched througgh social seccurity numbeers,
but K-12 data will bee matched diifferently. Reeports will be provided for both aimmed and
anonymoous users.

Lessons Learned
Currentlly, Washingtton has been n in a state off political im
mpasse. The various agen
ncies have noot
been ablee to agree on
n an IT goverrnance modeel, although Dr. Gass decclined to proovide specificcs. It
is an issu
ue the team hopes to oveercome withiin the next few months, especially siince there iss still
much to be done and d funding froom the federaal government will end in July 2013. The LDS teaam
plans aree to completee the system
m design in on ne year and two years off active deveelopment.

Archittectural Design and Best Practicces Project | P a g e | 69


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Appen
ndix B: Best Pra
actice Case Stu
udies Intterviewe
ee List
B.1 Ind
diana Dep
partmentt of Educa
ation
Contact::
Molly Chhamberlin
Director,, Data Analyysis Collectioon and Repoorting
Indiana Department of Education n

Email/Teelephone/Address:
mchambeer@doe.in.goov
317-234-66849
151 Westt Ohio Streett
Indianap
polis, Indianaa 46204

Links:
� http://www.d
doe.in.gov/d
data/ - Indian
na Departmeent of Educattion Data Web site

B.2 Iow
wa Deparrtment off Educatio
on
Contact::
Jay Pennington
Bureau Chief
Iowa Deppartment of Education

Email/Teelephone/Address:
Jay.pennington@owaa.gov
515-281-4
4837
4th Street
400 E. 14
Des Moin nes, Iowa 50
0319

Links:
� http://www.iiowa.gov/edu
ucate/index.php?option
n=com_conten
nt&task=vieew&id=1691&
&Ite
mid=2490 – EdInsight Web site

B.3 Da
ata Strate
egies – Arrmy Suiciide Mitiga
ation Pro
oject
Contact::
Susan Caarter
Managinng Partner
Data Straategies
Kevin Coorbett
Managinng Partner
Data Straategies

Email/Teelephone/Address:
SCarter@
@DataStrateggiesInc.com

Archittectural Design and Best Practicces Project | P a g e | 70


A006.1 Delivverable – Final Report and Desiggn Recommendaations

814-965-0003
KCorbett@DataStrattegiesInc.com
m
P.O. Box
x 772
Midlothiian, Virginia 23113

B.4 Tex
xas Educ
cation Ag
gency
Contact::
Brian Rawwson
Director of Statewide Data Initiaatives
Texas Ed
ducation Ageency

Nina Tayylor
Director of Informatiion Analysiss
Texas Ed
ducation Ageency

Email/Teelephone/Address:
brian.raw
wson@tea.state.tx.us
513-936-22383
Nina.tayllor@tea.state.tx.us
512-475-22085
1701 Nortth Congress Avenue
Austin, Texas 78701

Links:
� http://www.ttea.state.tx.u
us/ - Texas Education Aggency Web site
� http://www.ttexaseducatiioninfo.org/ttpeir/TPEIR
R_Documentation.pdf - Texas Publicc
Education Infformation Resource

B.5 Da
ata Strate
egies – DL
LA Data Converge
ence and
d Quality Project
Contact::
Susan Caarter
Managinng Partner
Data Straategies

Email/Teelephone/Address:
SCarter@
@DataStrateggiesInc.com
814-965-0003
P.O. Box
x 772
Midlothiian, Virginia 23113

B.6 NO
ORC Data
a Enclave
e
Contact:
Timothy Mulcahy
Senior Reesearch Scieentist

Archiitectural Design and Best Practicces Project | P a g e | 71


A006.1 Delivverable – Final Report and Desiggn Recommendaations

NORC att the Univerrsity of Chicaago

Email/Teelephone/Address:
Mulcahyy-Tim@norc.org
301-634-9330
Universitty of Chicaggo
4350 Easst West High hway
Bethesdaa, Maryland 20814

Links:
� http://www..norc.org/DataEnclave - NORC Daata Enclave Web site

B.7 Illin
nois Statte Board of Educa
ation
Contact::
Michael McKindles
ILDS Prooject Manageer
Illinois State Board of Education
n

Email/Teelephone/Address:
mmckind dl@isbe.net
217-782-00329
100 N. 1stt Street
Springfieeld, Illinois 62777

Links:
� http://www.iisbe.state.il.u
us/ILDS/htm
mls/project.h
htm - Illinoiss Longitudin
nal Data Systtem
Project Web site

B.8 No
orth Dako
ota Deparrtment off Public In
nstruction
n
Contact::
Tracy Koorsmo
Business Intelligencee Program Manager
North Daakota Deparrtment of Pubblic Instructtion

Email/Teelephone/Address:
tkorsmo@ @nd.gov
701-328-44134
600 E. Booulevard Avee., Dept. 112
Bismarckk, North Dakkota 58505

Links:
� http://www.d dpi.state.nd..us/ - North Dakota Deppartment of Public Instruuction
� http://www.n nd.gov/itd/p
planning/inittiatives/road
dmap.pdf - Sttate of North
h Dakota
Longitudinal Data Systemm Strategic Roadmap

Archittectural Design and Best Practicces Project | P a g e | 72


A006.1 Delivverable – Final Report and Desiggn Recommendaations

B.9 Sta
ate of Wa
ashington
n Educatiion Resea
arch & Da
ata Cente
er
Contact::
Michael Gass
State of Washington
n Education Research & Data Centerr

Email/Teelephone/Address:
Michael.Gass@ofm.wwa.gov
360-902--0599
210 11th Avenue SW, Room 318, P.O. Box 431113
Olympia, Washingtoon 98504

Links:
� http://www.eerdc.wa.gov//default.asp - Education n Research & Data Centeer website
http://nces.ed
d.gov/prograams/slds/pdff/washingtonnabstract200
09ARRA.pddf - Project
Abstract fromm Departmen nt of Educattion website.

Architectural Design and Best Practicces Project | P a g e | 73


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Appen
ndix C: Materialls Sent to Best Practice
es Interv
viewees
C.1 Be
est Practices
i Interview
r Tem
mplate
General Project Overvview
1. What were thee objectives of the project??
2. Do you have a project abstract/overview
w that you are able to share?
3. Who were thee stakeholderss?
4. How long did the project taake?
5. How much did d the project cost?

Databasee Design/Arch hitecture


6. What steps weere involved in designing the database/w warehouse?
7. What architeccture existed within your organization prior to the design and imp plementationn of
th
he system?
8. Describe the general architeecture of yourr system and its different components. Is there a mod
del
th
hat you use (ffederated, non
n-federated)?
9. Do you have a visual represeentation of th
he system thaat you are ablee to share?
10. What productts are used forr the underlyiing data manaagement (e.g. DBMS)?

Data
11. How much datta flows throu ugh the systeem (e.g. numb
ber of records))?
12. Were there dissparate data source?
13. Was sensitive (PII) data contained withhin the sourcee database?

Security
14. What were thee security req
quirements off this system?

15. Does your systtem de-identiify personal d


data? If so, please describe your data-de--identification
n

prrocess.
16. Does the system contain an n authentication process? Is it single or dual factor

Users
17. Can you descriibe the differeent users of th
he system?

18. Were separatee processes (ddatabases) useed for anonym mous users an
nd named useers?

19. What level of help desk sup pport was proovided?

Implemeentation
20. Can you give me a picture of what was in nvolved in thee implementaation process??

21. What was the level of effortt? How manyy man-years did the implem mentation tak
ke from design
n to

im
mplementatioon?
22. What were thee barriers/chaallenges of im
mplementing this program??

23. What ongoingg efforts and resources are nneed after a system is up and running?

24. Knowing whatt you know now, how wou uld you approoach the probblem and impllementation

diifferently?

Performaance & Feedb back


25. How is performmance of the system measu ured?

26. How is performmance affecteed through vaariations with hin the system
m?

27. Were any commpromises made to the systtem design too in order to achieve accepttable levels off

peerformance?
28. Describe the reesults of the system and itss ongoing usee.

Archittectural Design and Best Practicces Project | P a g e | 74


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Other
29. What additionnal best practtices or lessonns learned can
n you share?
30. Are there additional resourcces (e.g., peop
ple, documents, links, etc.)) that would be helpful?
31. What are the next steps?

Archittectural Design and Best Practicces Project | P a g e | 75


A006.1 Delivverable – Final Report and Desiggn Recommendaations

C.2 Arc
rchitecturral Best Practice, D
Design & Planning
g
Supp port Projject Overv
rview
The LDS will be a logiccal data wareh house that is fed by numerrous agenciess from around d the
Commonw wealth. The data provided d by each agen ncy is a subset of the agenccy’s primary data repositorry
(databasee). It is possib
ble that more than one dataa source from m an agency may be requireed. The linkagge
between the agencies is focused on a student as ss/he progress through the school system m into the
workforcee. LDS-level information will be de-iden ntified so thatt no individual can be uniq
quely identifiied

from the data. Even thoough the linkkage between the agencies is student-ceentric, the maanagement of the
data in th
he LDS is an unidentifiable person. It is iimperative thhat the designn of the LDS protect the
n.43
individual and not fall subject to re--identification

External query interfacces into the LDS will be m made available to the publicc, partner agencies, and
researcherrs. Level of acccess to the LDS data is maanaged througgh an authenttication and authorization
n
scheme. Levels may be from selectioon of pre-cann ned queries too ad hoc requ
uests.

The objecctive will be too develop a hiigh-level arch


hitecture and developmentt strategy which can then be
used to deetermine sourrcing requirem ments, staffinng levels and become the in
nput to a detaailed design and
staffing ph
hase which will follow rap pidly from thiis project. Gen
neral design questions reggarding the prroject
include:
What compon nents of the soolution need tto be developed (and whatt needs to be procured)?
What are the key developm ment tasks?
What actual processing and d response sp peeds are requ
uired and whaat is acceptabble?
What will be best way to de-identify thee data? What is the best way to link de--identified daata?

CIT will perform reseaarch and analyysis of other ddata warehouuse designs, in
ntegration effoorts, and theiir
implemen ntations from similar goverrnment and corporate orgaanizations. Frrom a series of research,
interview
ws, and surveyys, CIT hopes to develop beest practices and lessons leearned from such program ms.
Participattion in CIT’s research will help the Virgginia Departm
ment of Educaation in decid
ding what plan n of
action to execute.

43
Ochoa et al., Reidentiffication of Indivviduals in Chicaago's Homicide Database: A Teechnical and Leg
egal Study, tech
h.
report, Massachusetts Inst. Of Tech hnology, 20011. Retrieved frrom:
http://citeeseerx.ist.psu
u.edu/viewdocc/download?d doi=10.1.1.15.74
467&rep=rep1&type=p
df

Archittectural Design and Best Practicces Project | P a g e | 76


A006.1 Delivverable – Final Report and Desiggn Recommendaations

Appen
ndix D: Materialls Sent tto Subje
ect Matte
er Experrts
Prior to an interview
w, each Subjeect Matter Ex xpert was prrovided a paackage of pree-reading maaterial
that help
ps familiarizee him/her wiith the SLDSS program. The package consisted off a sample seet of
questions that may be asked durring the interrview and a white paperr on the gran nt award, thee
conceptu ual architecture, constraaints placed oon the impleementation, and sample use cases forr the
system.

D.1 Su
ubject Matter Expe
ert – Interrview Tem
mplate
Data Moodeling
1. What do you u consider th
he key steps tto a successfful integrateed data modeel when
in
ncorporatingg numerous data sourcess?
2. Can you desccribe the leveel of complex
xity for a datta warehousse data modeel versus a
feederated datta model?
3. What is a com mmon probllem overlook ked in data modeling wh hen dealing with multiplle
data sources??

Securityy (General)
4. In
n the area off data governnance with n numerous daata owners, are there tech hniques thatt
make data access controlls easy to maanage and im mplement acrross data sou urces?
5. Have you beeen involved with a data b breach and associated legal issues th hat occurred? Do
yoou have any recommend dations on hoow to best avvoid a data breach?
6. Are there enhhanced securrity models tthat are used d with datab base managem ment system
ms
th
hat add an ad dditional layyer of protecction (e.g. ad
dditional fireewall in front of DBMS
seerver)?
7. When implem menting an in-memory d database (IM MDB), are theere security//protection
teechniques thhat differ from a standard d disk-based d data wareh house/federaated?

Securityy (De-Identiification)
8. Can you desccribe differen nt techniquees/algorithms used for daata de-identiification?
9. Are you familliar with anyy known issu ues in de-ideentification algorithms that could caause
data elementss to be re-ideentified?
10. What issues existing in attempting to update de--identified data from its original sou urce?
Are there goood techniquees for achieviing this proccess?
11. Can the de-iddentificationn process add d substantiaal processingg time to a daatabase
trransaction/q
query? If so, are there tecchniques to help minimizze the proceessing overheead?

Architeccture Tradee-offs
10. Given the desscribed concceptual desiggn, what diffferent implementation models migh ht best
work and wh hy?
11. For any of thee given modeels, are theirr known con nstraints/lim
mitations on the

im
mplementatiion (e.g. one model work ks best for sm
mall data setts)?

12. For a federateed model, wh hat can impaact the perfoormance for data access??
13. Iff the databasse is designeed for online analytical processing, does that havve an influen
nce on
th
he type of im
mplementatioon model?

Architectural Design and Best Practicces Project | P a g e | 77


A006.1 Delivverable – Final Report and Desiggn Recommendaations

14. Are there besst practices for linking data records across a fedeerated data model (when na
coommon uniq que ID is nott present)?
15. Can you desccribe differennt techniquees/algorithms used for linnking data across a federrated
model?
16. Are in memorry databasess trending up p or down annd why?
17. What lessonss learned aree there with using an in memory database?
18. Are there advvantages or disadvantagees to using an in memoryy database too replace a
sttandard dataa warehousee?

Query Pr
Processing
19. Are there preferred queryy strategies for optimizattion when th
he system is implementeed as
a singular datta warehouse versus fedeerated?
20. Inn a federated
d model, is th
he query opttimization better handleed in centraliized, distrib
buted,
orr hybrid imp
plementation n?
21. What impactts from the network topology can aff ffect the querry processin
ng

im
mplementatiion?

Archittectural Design and Best Practicces Project | P a g e | 78


A006.1 Delivverable – Final Report and Desiggn Recommendaations

D.2 Virrginia Sta


atewide Longitudinal
i Data System - Executiv
ve Summ
mary
Collaborrative Partn
nership
The Stateewide Longiitudinal Data System (SL LDS) is a colllaborative effort by the Virginia
Departmment of Educaation (VDOE E), Virginia Employmen nt Commissioon (VEC), State Counciil of
Higher Education forr Virginia (S
SCHEV), Virrginia Comm munity Colleege System (V VCCS) and
Virginia Information
n Technologiies Agency (V VITA).

The Com mmonwealth h of Virginia’ss Departmen nt of Educattion (VDOE)) successfully secured a multi­
year fedeeral grant forr the design, developmen nt and operaation of a Staatewide Longgitudinal Daata
System (SLDS) to inttegrate stud dent and worrkforce data in the Comm monwealth. Specifically, the
SLDS willl integrate K-12, higher education and a workforcce data into a single logiical databasee
which caan be used foor research and analysis. Elements off the SLDS project will focus
f on
transactiional data (ee.g., transcrip
pts and studdent records)), which willl reduce the cost burden
n for a
number of education n stakeholderrs, includingg students, parents, adm ministrators, school
counseloors, registrarss, and collegge admissionns officers. Other elemennts will focuss on the
integration and delivvery of de-ind dentified data via a web portal, and the managem ment and
governan nce of the Coommonwealtth’s educatioon and work kforce data.

In order to establish this compreehensive, lon ngitudinal daata system, the SLDS willl be develop ped in
phases, with the initiial phase creeating a fedeerated longitudinal data linking and reporting syystem
linking data among state agency data sources, including K-12, higherr education, and workfoorce
systems. A rubric willl be createdd to documen nt data elem
ment definitioons, data reqquirements, and
technicall requiremen nts for de-ideentified dataa sets that caan be linked
d among agen ncies; build a
central liinking directtory based on data shariing agreemen nts in place or establisheed as part off the
grant prooject; and esttablish a queery process for authorizeed user access that uses the linking
directoryy to anonymoously join inndividual-levvel records frrom multiplee data sourcees.

Stakehol
olders
The SLDS will serve a variety of stakeholderss, to includee legislators, policy makeers, teachers,
school addministratorrs, education n program diirectors, reseearchers (booth inside and outside thhe
state), paarents, localiities, citizen
ns and the meedia. Append dix A providdes a simplisstic view of
potentiall uses by stak keholders.

Future Benefits
When coomplete, the SLDS will provide reseaarchers, anallyst, educatoors, parents, students, poolicy
makers, and program
m administraators with th
he following business benefits:

� Establishing kindergarten n to college and career data systemss that track progress andd
fooster continu
uous improvvement;
� Enhancing th he Commonw wealth’s abillity to examiine student progress and d outcomes over
tiime by linkin
ng individuaal-level studeent data from
m K-12 education, postseecondary
edducation andd the workfoorce system;;
� Enabling the exchange off data amongg agencies an nd institutioons within th
he State and
between Stattes to inform
m policy and practice;
� Linking studeent data witth teachers primarily responsible forr providing in nstruction;

Archittectural Design and Best Practicces Project | P a g e | 79


A006.1 Delivverable – Final Report and Desiggn Recommendaations

� Enabling the matching off teachers wiith informattion about th heir certificaation and teaacher
preparation programs;
� Enabling dataa to be easilyy generated for continuoous improvem ment and deecision-makiing
� Ensuring the quality and integrity of data contain ned in the syystem;
� Enhancing thhe Commonw wealth’s abillity to meet reporting requirements of the U.S.
Department of Education n;

Technolo
logy Challen
nge
Virginia stands out as a special case study in n the difficultties of comb
bining data frrom multiplee
agencies.. In additionn to the stand
dard layers of complexitty, Virginia-sspecific privaacy laws and
d
historicaal system of “locally admiinistered, state supervised” public seervices creatte additionall
challengees. This commplex networrk of technological, regu ulatory, and structural im
mpediments to
dividual-leveel data makees a tradition
the integgration of ind nal approacch—consolidation of daata in
a physicaal central ‘wwarehouse’— —untenablee. To successsfully combiine Virginia’ss set of
heterogen neous data sources; Virgginia propossed a federateed data systeems approacch.

Federateed Data Syst


stems
Virginia’ss federated data system will interactt with multiiple data sou urces on the back-end an nd
present itself as a sin
ngle data souurce on the frront-end. Thhe key to succcessfully lin
nking the diffferent
data sourrces is a centtral linking apparatus. Generally, thhis is a database managem ment system m that
has been set up with h access priviileges to each
h data sourcce and that houses a ‘linkking-table’
populateed with the unique identtifiers that will be used to ‘join’ the tables togeth her into one large
data set.

Using a federated syystem to meerge data accross agenciies


The speccial requirem
ments imposeed on the esttablishment of a federateed data systeem between
public aggencies dictaate that indivvidual privaccy be maintaained. For ex
xample,
While it may bee acceptable forr any system user to retrieve information sh howing that a certain
inndividual particcipated in a certain program (e.g., William Smith went too Virginia Tech h), it would
noot be acceptable if that same user could link k that person to a particular detail specific to that
prrogram (e.g. William Smith received a grad de of “F” in Calcculus). Howevver, for the purp poses of
loongitudinal research, it is that latter information that is neeeded (e.g., wee don’t care aboout
William Smith, but we do wan nt to know how w many males failed Calculus in a particulaar year).
Therefore, what is neeeded is a system that will permit th he linking of data relevannt to longituudinal
research that does noot allow perssonal identiffication of anny of the ind
dividuals in the data set. The
proposedd solution haas two distin nct processes, one for esttablishing annd maintaining an
anonymoous ‘linking directory,’ an nd one that uses the linkking directorry to join datta sources an
nd
return a ‘de-identifieed’ data set too the user th
hrough a datta query proccess.
Virginia has proposeed a cross-agency data lin nking and reeporting systtem that can n be used in a
manner that maintain ns the confid dentiality off individual student/teaccher/employeee data, can be
used for accountabiliity and analyytic purposees, and meetss the requireements of State and fedeeral
privacy laaws. In Virgginia, state laaw (§2.2-38000- § 2.2-38166) currentlyy prohibits sttate agenciess from
sharing personal infoormation acrross state ageencies excep pt under speecific circumstances.
In order to meet SLDDS program requirementts and the neeeds of the coollaborating state agenciies,
Virginia proposed a methodologyy that would d permit mu ultiple state agencies to merge de­
identified
d individual--level data using a federaated data system model.. The method dology, deveeloped

Archittectural Design and Best Practicces Project | P a g e | 80


A006.1 Delivverable – Final Report and Desiggn Recommendaations

in conjunnction with Virginia’s Offfice of the Attorney


A Genneral to link
k data betweeen K-12 and
d
higher edducation, wiill permit thee two educattion systemss and the maany agencies that house
workforcce education n and trainin
ng programs to link unit--level recordds through th
he use of de-­
identified
d data sets.

An Exam
mple
Suppose a request is received forr a report on college studdents who en ntered Virginnia’s commuunity
college syystem for the first time in 2006, and
d the variablees of interestt are particip
pation and
outcomes on statewiide assessmeents in high sschool; credeential inform mation as of August 31, 2009;
and emplloyment outtcomes sincee the studentt left high scchool.
Using daata element definitions in n the systemm, the user will define thee cohort and d variables too
include inn the dataseet. Then, the system will identify stu udents for stu
udy in the ceentral directtory,
and, usin
ng the hashed d identifier in that direcctory, join daata from partticipating aggency systemms.
Finally, jooining tools will replacee the identifiier in the cen
ntral directory with anotther unique hash
value for each individ dual in the dataset, and d deliver the fiinal de-identtified data in
n the format
specified
d by the user.

Summary
ry
The objecctive of the SLDS is to propel Virginnia’s data colllection, repoorting, and analytic
capabilitties far beyon
nd current capacities byy merging K-12, higher ed ducation and d workforce data.
By mergiing de-identiified data in a federated system, Virgginia will maaintain comp pliancy with
h state
and federral privacy laaws, while meeting critiical data repoorting requirements and d policy-
developmment needs.

Archiitectural Design and Best Practicces Project | P a g e | 81


A006.1 Delivverable – Final Report and Desiggn Recommendaations

D.3 Virrginia Sta


atewide Longitudinal
i Data System - Usage
Actors:
Virginia’ss SLDS will be utilized by a variety of stakehold ders, to inclu
ude legislatorrs, policy maakers,
teachers,, school adm
ministrators, education program direcctors, researrchers (both inside and
outside the state), paarents, studeents, localitiees, citizens and the med dia.

Potentiaal Use:
The folloowing statem ments represent potentiaal use of the system by acctors:
� Actors will usse a web-bassed portal too access pub blicly availab
ble data fromm the K-12,
postsecondarry education n, and workfoorce agenciees.
� Actors will acccess/create reports thatt will be available in a vaariety of form mats depend ding
on n the user’s preferences.. Tables, chaarts, and grap
phs will be presented to provide diffferent
viiews of the data. Maps will also be u used to proviide a geograp phic perspecctive. GIS daata
laayers, includ
ding county and city bou undaries, roaads, schools, school distrricts, and relaated
innformation such as censu us counts an nd income leevels will be integrated with the
geeographic reeports to proovide contex xtual informaation for thee data and foor further anaalysis.
� Actors will deevelop custoom reports b by combiningg data from multiple pub blicly availab
ble
datasets, and be able to reequest the d data in multip ple formats.
� Actors will deevelop Custom reports b by identifyin
ng the cohorrt, independeent and
dependent vaariables and selecting an n output metthod (table, chart, graph h, Excel, CSV V, etc.)
� Actors will viiew [create] a report ideentifying thee number and d percentagee of teacherss and
principals ratted at each performance rating or levvel.

� Actors will crreate [view] a report shoowing growtth data for cu urrent and previous year

sttudents and estimates off teacher imp pact on studdent achievem ment.

� Actors will viiew a report of high schoool graduatees who enrolll in state insstitutions of

higher educattion and com mplete at leaast one year’ss worth of coollege credit within two
yeears.
� The proposed d information system wiill also includ de reportingg capabilitiess available too
teeachers and other authorrized schooll division perrsonnel to provide estim mates of stud dent
grrowth and teacher impaact on studen nt performannce on state assessmentss in reading and
mathematics..
� Actors will crreate reportss that link sttudents to coourse enrollm ment, coursee grades, and d to
thhe teachers providing in nstruction in each coursee.
� Actors will bee able to view w pre-develooped and pu ublicly availaable reports created by
VDOE, SCHE EV and VEC.
� Actors will usse the longittudinal data system to coomplete porrtions of repoorts required d by
laawmakers, su uch as a 200
07 study of high school dropout and graduation rates
� Actors will reeceive inform mation aboutt students who have failled state-wid de assessmen nts
foor two or moore years in a row

Archittectural Design and Best Practicces Project | P a g e | 82


A006.1 Delivverable – Final Report and Desiggn Recommendaations

� Actors will usse the SCHE EV Student D Data Wareh house to creaate standard and ad hoc
reeports on poostsecondaryy education.
� Actors will viiew standard d reports thaat are publiccly available on VDOE’s website,
in
ncluding sum mmary data for required and commoonly requesteed informatiion such as
numbers of sttudents enroolled, and graduated, droopped out, and participaating in speccial

edducational innstructionall services.


� AActors will deevelop addittional reportts using dataa to be colleccted for the sstudent-teaccher
in
nformation system,
s comb bined with iinformation already colleected at the student leveel, to
develop reporrts that provvide compariison of end-oof-course grades with peerformance oon
d
sttate assessm
ments, and addditional infoormation onn students noot tested by grade and
suubject.
� AActors will coonduct analyyses of speciific content sstandards th hat, when meet, describe tthe
tyype of work that students must achiieve to be reaady for postsecondary ed ducation.

Archittectural Design and Best Practicces Project | P a g e | 83

You might also like