Can Social Network Analysis Be Effective at Improving The Intelligence Community While Ensuring Civil Rights?

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Information Security Journal: A Global Perspective, 21:115–126, 2012

Copyright © Taylor & Francis Group, LLC


ISSN: 1939-3555 print / 1939-3547 online
DOI: 10.1080/19393555.2011.647251

Can Social Network Analysis be Effective


at Improving the Intelligence Community
While Ensuring Civil Rights?
Kenneth Earl Gumm Jr.
K4 CSR Transportation Security ABSTRACT International terrorism represents a complexity of threats that
Administration Contact Center, requires all the resources of the U.S. government to detect, deter, and prevent
Somerset, Kentucky, USA
terrorist acts. Social network analysis (SNA) will be defined and specific con-
cepts available for analytical application will be introduced and evaluated for
efficacy. SNA will be examined as a tool to improve the efficiency and capabil-
ities of the intelligence analyst tasked with the prevention and early detection
of terrorist actions while preserving constitutionally guaranteed civil rights of
those it is meant to protect. Limitations of SNA will be introduced and mea-
sures to limit those limitations will be discussed.
The immediate research has revealed that the Defense Advanced Research
Projects Agency’s (DARPA) initial research and development efforts in SNA
did not include the necessary requirements for privacy protection as called for
in the Intelligence Reform and Terrorism Prevention Act of 2004 (IRTPA). The
immediate effects of 9/11 led many within the intelligence community to adopt
any and all available technology that was seen as worthwhile and offered the
ability to address as many of the deficiencies revealed by the 9/11 Commission
as possible. The history of many of the successful programs within DARPA has
been attributed to a culture of noninterference by political as well as defense
officials with an attitude of building effectiveness without regard to oversight
or diligence to existing laws or mandates. The fact that DARPA’s charter has
not been changed since its inception has contributed to the failure to adopt
already existing legislature and mandated privacy protection. This paper will
develop the practicality for SNA with privacy rights built into the architecture
so that the analysts’ output is enhanced while civil rights are preserved.

KEYWORDS social network analysis, DARPA, terrorism, intelligence community

1. INTRODUCTION
Address correspondence to Kenneth
Earl Gumm Jr., PO Box 3836,
W. Somerset, KY 42564.
Since the attacks of 9/11, the public, press, Congress, and the world have
E-mail: kenny.gumm@gmail.com questioned how the terrorists’ attacks of that day could happen to the world’s

115
most powerful nation. The investigations into the the ultimate goal of development of architecture for
events of 9/11 revealed that substate actors with moti- SNA that simultaneously preserves the individual’s
vations based on a radical interpretation of the Koran civil rights while enhancing the capabilities of the
were the perpetrators. Further examinations into this analyst.
new paradigm of international terrorism demonstrated This paper will then turn to the exposed gap in
the limitations of an intelligence community (IC) current research on the topic; DARPA has not had a
designed to counter the Cold War symmetric threat review or change in its charter since its inception and
as compared to the current asymmetric threat posed this may very well have exacerbated the public per-
by international terrorism and al Qaeda specifically. ception of privacy concerns with the use of SNA and
Zegart (2005) indicates that between the fall of the its application to international terrorism. The research
Soviet Union and the attacks of 9/11, 12 different that has been performed concerning DARPA examines
significant and empirical studies were performed by the internal operations that have led to a multitude
public and private sector entities revealing the lim- of successes in mostly “wicked problems” of primary
itations of an intelligence paradigm designed for a application to the military. This gap indicates the need
post-WWII era. The resulting recommendations from for future research that can identify for the Department
these studies called for more than 300 changes, yet of Defense (DOD) and Congress methods of adjust-
only a handful were ever successfully implemented. ing the policy and guidelines that DARPA operates
One of the common themes throughout these inves- within that enhances capabilities and operations within
tigations was the need for a paradigm shift in the DARPA. This research should be of interest to policy-
ways and means that the U.S. Government (USG) makers that are concerned with the current perception
is currently wielding its power based on the fact of SNA and to students in the field of terrorism
that the existing system was designed for a bipo- research.
lar world, consisting of largely symmetric threats, The ultimate goal of any model intent on predicting
and the emergence of international terrorism, that terrorism is the determination of when and where an
represented largely asymmetric threats. Despite sev- extremist group will attack by incorporating as many
eral attacks prior to 9/11 from international terrorist of the factors identified by the research as possible to
and calls for specific change from political, military, facilitate an accurate assessment. The identified causes
and intelligence community leaders, there was not a of terrorism from throughout the sciences demon-
concerted effort to overhaul the intelligence commu- strate the need to develop a predictive system that can
nity until the attacks of 9/11 motivated the exec- incorporate a multitude of the factors identified from
utive and legislative branches to take action in the subject matter experts (SME). SNA offers a flexible
form of the 9/11 Commission on Terrorism. As a capability to the analyst that allows multiple models to
result of the 9/11 Commission’s report to Congress, run concurrently or simultaneously and to be custom
the Intelligence Reform and Terrorism Prevention designed to the specific agency’s area of responsibil-
Act of 2004 (IRTPA) was passed and for the first ity. The analyst can incorporate individual indicators
time since Goldwater-Nichols Department of Defense for a more focused model to concentrate on tactical
Reorganization Act of 1986, a major revision of the IC subjects or broader categories of indicators that repre-
has resulted. sent multiple factors can be incorporated to function
The immediate research will examine the applica- in a strategic capacity. This capability demonstrates the
tion of social network analysis (SNA) and its abil- need to have any forecasting model or architecture
ity to increase the effectiveness of analysts’ resources flexible enough to adapt to changing evidence from
while protecting the civil rights of American citi- empirical research and the need to incorporate statis-
zens. The paper will present the fundamentals of tical analysis to expose any sensitivities to data bias.
SNA, an examination of the state-of-the-art in SNA, These capabilities will enhance the analysts’ creative
an examination of the limitations and privacy con- analysis capabilities and provides for a more efficient
cerns that have been raised to date, and the algo- use of limited resources by allowing the analysts to
rithms and frameworks available to address these lim- focus their time and efforts as directed by their agencies
itations and concerns. The next section will discuss mandates.

K. E. Gumm Jr. 116


2. FUNDAMENTALS AND the data from an unknown event and the result is a
CAPABILITIES prediction. Time-series analysis uses distance measures
to determine similarity between different time series to
SNA was originally proposed by Jacob Moreno generate predictions for future events base on known
in 1934 as a method that used links (connections or past events. Clustering is similar to classification except
ties) and nodes (individuals or groups) to discover that the groups are not predefined; the data defines the
information about the individuals in a network and group or subgroup (clique) of interest. Summarization
the actual network itself. By definition a “social is a technique for presenting the data summarized. The
network is a compilation of entities that are connected Association rule finds all frequent data points and gen-
through social ties in which a variety of resources erates rules from these patterns. Sequence discovery
are exchanged” (Hulst, 2009, p. 106). A synthesis finds sequences of patterns in the data that helps to
of the definitions provided by DeRosa, Seifert, and define the trend. (Deshpande, 2010, p. 35).
Koschade results in a definition for social network SNA has developed several specific measures that
analysis as the use of algorithms based on empirical can be leveraged by an analyst to reveal informa-
research from across the sciences to discover previously tion about the operation of the network that is not
unknown patterns and relationships that improve the readily apparent to the outside observer. One of the
allocation of limited resources DeRosa 2004, v., Seifert key concepts in SNA is centrality, which indicates
2008 I, Koschade 2006, 2). The methods of SNA how many direct connections the actors (nodes) have
include statistical models, mathematical algorithms, with other nodes (Kutcher, 2008, p. 1). Nodes with a
neural networks, and decision trees used to evaluate high centrality exert more influence over less central
quantitative data, text, or graphical data from mul- nodes and should be considered as target-worthy assets
timedia forms. The power from SNA comes from since they control information flow. A centralized net-
the flexibility of parameters used to evaluate large work may be one node or a few nodes with high
quantities of data including patterns in association centrality ratings as compared to subordinate nodes
(one event is connected to another), sequence or path that simply direct information or resources to the more
analysis (one event leads to another), classifications central node. Centralized networks are susceptible to
(identification of new patterns), clustering (finding and disruption because of their lack of redundancy; elim-
visually documenting groups of previously unknown ination of the one central node could disrupt the
facts), and forecasting (discovering patterns that can entire centralized network. Decentralized networks on
lead to reasonable predictions of future activities) the other hand do not have one central hub, but
(Seifert, 2008, p. 1). If the above parameters represent have several hubs with each node indirectly tied to
the power of SNA then the effectiveness of SNA is all other nodes. This redundancy makes the network
represented by its creativity. The algorithms used to more robust, but because of the duplication of effort it
drive SNA reveal the patterns in data not perceivable increases the overhead of the organization and more
by human analysis because of the vast amount of data importantly increases the likelihood of discovery by
surrounding the evidence of interest. government entities. Centrality is measured by close-
Among the many capabilities within SNA, of inter- ness, betweeness, degree, and eigenvectors that provide
est to predicting terrorism, two fundamental categories different perspectives of the social relationships that
emerge: predictive (classification, regression, time exist within the network.
series, prediction) and descriptive (clustering, sum- Closeness defines how many links it would take for
marization, association, sequence discovery) models a particular node to connect to all other nodes in
that require defining before progressing into the more the network and it provides a measure of the distance
detailed analysis and implementation algorithms in between actors. There are several measures of close-
the Algorithms and Frameworks section. Classification ness that provide additional insight about the actors or
maps the target data to predefined groups or classes nodes: closeness-in (how close an actor is based on the
provided by SMEs. Regression involves the numerical number of paths that are inbound), closeness-out (how
dataset from a known event to develop a mathemati- close an actor is based on the number of paths that are
cal formula that fits the data, then this formula is fed outbound), direct closeness (two actors with a direct

117 Social Network Analysis and the Intelligence Community


connection), and indirect closeness (information can should enhance the predictive capabilities for that
only pass from one node to another node by passing network’s future operations.
through at least one additional node) (Hanneman,
2005, p. 11).
3. LIMITATIONS
Betweeness measures the number of paths that pass
through each node. Nodes with a high degree of While these concepts point to the capabilities of
betweeness may indicate the presence of a gatekeeper SNA, there are also limitations to this system as with
(controls the flow of information between different any system. SNA is data dependent, and because of
parts of the network or between different clusters in the this the analysis is dependent upon accurate and timely
network) (Kutcher, 2008, p. 2). Although nodes with data that have not been corrupted deliberately or inad-
low degrees of betweeness may indicate that they have vertently. The initial logistical problems of procuring,
little control over the flow of information, if they hap- cleaning, and preparing the data for analysis and the
pen to be the only link between two different clusters, time period for analysts to learn the system are con-
their elimination could be devastating to the cluster siderable, but after the initial enterprise of establishing
that would become isolated. the system SNA actually helps to alleviate one of the
SNA measures the connections a node has by the common problems facing today’s analysts: data deluge.
label of degrees. The node with the greatest number The diversity of data sources both in format and loca-
of connections (degrees) is labeled the hub. Further tion presents several challenges to effective SNA that
measures of degree are in-degree (quantifies the num- can be addressed by specific algorithms and procedu-
ber of incoming links) and out-degree (quantifies the ral architecture to be presented in the Algorithms and
number of outgoing links) (Hanneman, 2005, p. 4). Frameworks section.
Nodes with high in-degree represent actors with final The ultimate limitation comes in the form of iden-
approval status or consultants and nodes with high tifying a causal relationship between patterns revealed
out-degree may represent actors that are newcom- in the data and any of a number of variables of inter-
ers or tasked with delivering information to other est that could be responsible. Therein lies one of the
nodes. more significant problems, that is, the rarity of terror-
Eigenvectors measure how well connected an actor ist attacks tends to limit the availability of indicators
is and how much direct influence that actor has over to build predictive models. The mundane activities of
the activities of that network. This measure is deter- preattack logistics that include communication among
mined by evaluating the centrality scores of the actors actors, fundraising, technology acquisition, trial runs,
that this actor is directly connected to (Hanneman, resource acquisitions, and target reconnoitering rep-
2005, p. 14). There are two further measures of eigen- resent some of the activity that can be modeled.
vectors: hubs (score for outbound links) and authori- As attacks are thwarted, attempted, and successfully
ties (scores for inbound links). A high scoring hub has carried out and past terrorist events are cataloged,
several outbound links to high scoring authorities, and future scenario and model building will be enhanced.
a high scoring authority has several inbound links from Another frequently discussed limitation of data
high scoring hubs. mining is the problem of false positives. Markle
Two final concepts in SNA that should be discussed (2003) identifies accurate identity as one of the fun-
before continuing are link direction (used on charts damental reasons that false positives represent a serious
and graphs to demonstrate which direction informa- concern for civil liberties. A system of redress should be
tion or logistics are flowing) and link weighting (indi- included in any efforts that involve personal identities
cates the relative strength of relationships). Both of and the architecture of any SNA system should include
these concepts lend themselves to visualization tech- an independent mechanism (private or public) to cer-
niques that use two or three dimensional graphs to tify accuracy. Classifiers as discussed below, even with
allow the analyst to identify relationships previously a high accuracy rate, are going to produce some level of
unknown. The newly identified actors can become tar- false positives, but with the use of multiple classifiers to
gets for future investigations that could lead to still search through huge databases and incorporate multi-
more previously unknown targets. These discoveries stage searches each subsequent search is initiated by the
allow the detection of an organization’s activities that previous success thereby reducing errors with each pass.

K. E. Gumm Jr. 118


Both the accuracy and the privacy are enhanced in such operations and data acquired during typical SNA pro-
architecture. Senator (2005) demonstrates that meth- cedures. When the analysis provided from SNA is
ods exist to limit false positives and false negatives by coordinated with the characteristics of the actors, their
the use of risk adjusted populations and link analyses relationships, and the types of resources or activities
and to enhance privacy concerns the data could be ran- the group under investigation has demonstrated in the
domized or anonymized to protect the identity of any past, a powerful tool emerges for forecasting future
one falsely identified in the early stages. behaviors (Hulst, p. 109). The ability of SNA to iden-
Several examples of SNA exist that apply multiple tify patterns, roles and functions that are not otherwise
classifiers in architecture that makes multiple passes readily apparent indicates the reasoning for policymak-
through the data, with each subsequent pass feed- ers increased interest since 9/11 and provides justifi-
ing the next thereby reducing the potential of false cation for inclusion of SNA in typical data mining
positives and reducing the size requirements of the operations.
database. One example of this technique is DARPA’s The experts on SNA have identified several funda-
Evidence Extraction and Link Discovery (EELD) pro- mental issues that are prerequisites for successful data
gram that has developed algorithms that extract prob- mining; a clear formulation of the problem, access to
abilistic models from relational data. The multipass relevant data that have been properly coded, and mech-
process offers two improvements to single pass: one anisms to compensate for type I and type II errors.
pass is used to feed subsequent passes, and the abil- The immediate research would seem to indicate that
ity to change the types and quantities of data with each “relevant data” could be less than obvious since the
subsequent pass also works to focus the analyst’s efforts relevancy is often not immediately apparent before the
to the data that are more related to revealed areas of analysis has begun. Factors and indicators of terrorist
concern. activities are obviously part of the equation, yet many
of these factors and indicators are mundane activities
that individually could be interpreted initially as legal
4. PRACTICAL APPLICATIONS
activities that could be relevant but might not present
Psychological and social constructs of trust, social themselves at the initial stages of the investigation. It is
norms, cohesion, and cooperation are important to only after SNA reveals the totality of these preevent
the operation of the group and can act as indicators activities occurring among disparate but connected
of reciprocity, influence, and leadership that are sus- entities that a clearer picture develops of the terrorist’s
ceptible to analysis using SNA (Hulst, 2009, p. 109). intent. Therefore, it would seem appropriate to begin
DeRosa (2004) would enhance this process by calling the analysis with a broad perspective and narrow the
for Subject-based Link Analysis (starts with a known focus as the algorithms began to develop the scenario.
data point) to find links with other entities of unknown DeRosa (2004) would add that typical data mining pro-
interest. Pattern-based Analysis takes a known pattern cedures actually “prioritize attention and provide clues
of behavior and searches for similarities in the dataset about where to focus” and this concept speaks to one
(DeRosa, p. 8). While the behavior may be known, of the fundamental problems exposed by Heuer (1999)
the pattern-based query does not require a known that what the analyst needs is more “truly useful infor-
subject but rather retains its usefulness by identify- mation” (Heuer, p. 6). SNA helps to limit many of
ing extremely rare activities associated with terrorist the perception problems that exist within the subcon-
planning and attacks (DeRosa, p. 8). Skillicorn (2004) scious mind by providing the additional information
suggests that for terrorist groups such as al Qaeda, that Heuer describes as being required to “recognize an
which operate much like “venture capitalist for ter- unexpected phenomenon” (Heuer, p. 8).The patterns
ror” that receive proposals for attacks and then decide and relationships revealed by SNA help to reduce eth-
which proposals to support, even the most trivial con- nocentric bias and are not subject to lack of cultural,
tact between actors could be viewed as “significant” religious or political knowledge of the scenario under
and worthy of SNA efforts (Skillicorn, p. 2). The abil- investigation.
ity to quantify the relationships inherent in the social Skillicorn (2004) improves traditional SNA by
networks tends to reduce the subjectivity in analysis including additional information (demographics)
and limits the risk to missed signals inherent in covert about the individual actors in the social network and

119 Social Network Analysis and the Intelligence Community


“higher-order” information about the relationships in (terrorists typically live less than 30 miles from target,
a technique of “Matrix Decomposition” (Skillicorn, preparations and planning also within this distance)
p. 1). Singular value decomposition (spectral graph (Smith et al., p. 2). Four major categories of activities
partitioning) is used to detect the most anomalous were revealed: recruitment, preliminary organization
nodes by correlation of the data and semi-discrete and planning, preparatory conduct, and terrorist acts
decomposition; data are partitioned into subsets (Smith et al. p. 3). All sources were open sources pro-
with similar attributes and independent component vided by subject matter experts and included many
analysis, which allows graphs of cliques (Skillicorn, incidents of illegal activity, acquisition of bomb mak-
p. 5). The power here is in the ability to reveal that ing materials, and several other activities that would be
one actor may be a member of more than one clique considered anomalous to normal civil activities of law
simultaneously, not available in traditional SNA meth- abiding individuals. The immediate research indicates
ods. These actors represent high value targets worthy that this list should be combined with a list from classi-
of additional resource allocation; their isolation could fied sources to create a relational database available to
limit potential actions of disparate and previously analysts when operating any of a number of different
unknown cells. SNA platforms currently available.
The practical application of SNA to past and cur-
rent events has been successfully demonstrated by
Krebs, when he uses centrality, degrees, closeness,
5. ALGORITHMS AND FRAMEWORKS
and betweeness to identify the cellular nature of the Memon and Larsen (2006) propose a framework of
9/11 attackers and their hierarchy (Krebs, 2002, p. 7). algorithms, web spiders, and a knowledgebase to reveal
Duval, Christenson, and Spahiu (2010) examine the hidden hierarchies in nonhierarchical networks to pro-
practical application of SNA to current data to appre- vide a predictive capability to SNA. The framework
hend Saddam Hussein (links to family members, Sunni introduces three novel concepts within iMiner (proto-
tribal loyalists, and former protégés) and the discovery type software application) to achieve these predictive
of the “Virginia Jihad” (link analysis on the network of capabilities. Position Role Index identifies key actors
relationships) and proposes the use of “Bootstrapping” and followers in the network as defined by their effi-
or resampling to add the dimension of statistical anal- ciency; Dependence Centrality defines how much a
ysis to enhance the ability to quantify relationships node is dependent on another node; Degree Centrality
(Duval et al., p. 2). The ability to provide a measure and Eigenvector Centrality are combined with Measure
of statistical validity in assessing the roles of actors in a Dependence Centrality to estimate the hierarchical
terrorist organization could be instrumental in assuring structure (Memon & Larsen, p. 2). Algorithms are
that the most lethal of cell members are receiving the developed for each of these concepts and are applied
attention needed in a world of limited resources. in several cases studies. Kreb’s dataset of 9/11 hijack-
Koschade (2006) provides a framework for “real ers, Sageman’s dataset of the global Salafi Jihad, the
time analysis” of terrorists’ networks by employing Bali night club bombing, the dirty bomb plot of 2002,
SNA measures of density, degree of connection, degree the World Trade Center bombing of 1993, and the
centrality, closeness centrality, betweeness centrality, 9/11 plot.
and the use of clusters on the Jemaah Islamiyah cell Koltko-Rivera (2004) proposes a computer based
responsible for the 2002 Bali nightclub bombing. system, Fuzzy Signal Expert system for the Detection
Smith, Damphouse, and Paxton (2006) evalu- of Terrorism preparations (FUSEDOT) that analyzes
ated 67 U.S. cases of terrorism, including domestic anomalous data such as interpersonal relationships,
and international terrorist groups. Information from financial relationships, travel patterns, purchasing pat-
200 incidents were used to create a relational database terns, patterns of Internet usage, and personal back-
of 265 variables, including geospatial data (terrorists’ ground. FUSEDOT desires to identify those individu-
residences, planning locations, preparatory activities als who are terrorist by four simple steps: sort through
and target locations), general temporal data (average massive amounts of information about massive num-
terrorist group existed for 1,205 days, planning aver- bers of individuals, connect disparate pieces of infor-
aged 2–3 months and there was a lull of 3–4 weeks mation about these massive numbers of people, con-
prior to the incident), and spatial patterns of activity nect individuals to other individuals who are in their

K. E. Gumm Jr. 120


social circles, and process this data to identify terrorist to establish legal guidelines in promoting civil rights
activity (Koltko-Rivera, p. 4). The FUSEDOT system should be based on the need to gain public acceptance
is centered on the proposition of detecting “fuzzy sig- and follow the intent of the constitution.
nals” (sometimes the signal is representative of terrorist The subjective-legal argument identified by Taipale
activity, but often it is not) (Koltko-Rivera). Risk scores is based on concerns that any use of data mining
are assigned to individuals based on the presence of or SNA violates the rights granted by the Fourth
these “fuzzy signals”; the higher score would reflect Amendment. The Supreme Court has ruled in United
more “fuzzy signals.” Bayesian algorithms are used for States v. Cortez and United States v. Sokolow that “subjec-
data processing, and knowledge discovery algorithms tive and objective expectations of privacy should rea-
are used to create a feedback loop from SME. The use sonably apply to the data being analyzed or observed
of Bayesian algorithms is widespread and the concept in relation to the government’s need for that data
of developing hierarchies of the terrorist networks pro- in a particular context” not based on any technique
vides another level of comprehension to define the (Hearing, 2007, p. 12). Taipale would address the pri-
operations of the cells and their networks, but the vacy concerns and limitations with three concepts
immediate research could not find any examples of the applied to the architecture design and implementation
actual application of this concept since its introduction stages:
in 2004.
1. Rule based processing and distributed database
architecture that limit the scope of inquiry and
6. CIVIL LIBERTIES AND PRIVACY processing of data within policy guidelines.
CONCERNS 2. Multi-stage classification architectures and itera-
The Homeland Security Act of 2002 requires the tive analytic processes with selective revelation and
Department of Homeland Security to “establish and access control at each stage before additional data
utilize data mining and other advanced analytical tools collection, access or disclosure.
to access, receive and analyze data to detect and iden- 3. Strong credential and audit features and diversified
tify threats of terrorism against the United States” authorization and oversight.
(Cate, 2008, p. 439). The legal requirements are obvi-
ously present to provide for the lawful use of data min- These three steps immediately demonstrate some com-
ing and SNA, but the introduction of additional guide- plications for successful operation of a computer-based
lines and policy enhancements to improve the operat- information discovery system. Policy guidelines are
ing parameters of such a vague task would seem pru- always subject to interpretation by the individual
dent given the current status of programs across the IC. tasked with the responsibility to analyze the data, and
Cate (2008) finds the lack of governmental regu- any need to request clarification or guidance is going
lation on the use of data mining as a contributing to slow the process. Subjecting each stage to a process
factor to the current debate about privacy concerns. of reauthorization will further slow and encumber the
He contends that the lack of regulation also serves to process. The final step could and should be incorpo-
exacerbate the problems inherent in data mining by rated into the system such that operators are required
allowing the use of “outdated, inaccurate, and inappro- to sign-in with password authorization and the sys-
priate data” and further indicates that the lack of clear tem should have credentialing procedures that limit
policy on the issue denies the “guidance as to what is the ability to venture beyond areas of known explo-
and is not acceptable conduct” (Cate, p. 437). These ration. An audit trail feature should be included for
concerns assume that a government agency would each stage of the architecture with analyst full aware-
deliberately use “outdated, inaccurate, and inappropri- ness of this capability that their actions will be revealed
ate data” just because they had no legal reason not automatically to supervisors.
to. This assertion presumes that no government agency One of DARPA’s programs involving SNA is
would have the concept of the parameters necessary to Terrorism Information Awareness, which uses cate-
operate a data mining system simply because it had not gories of transactional data to detect terrorist activ-
been legislated. The evidence from DARPA’s programs ities including communications, finances, education,
to date would seem to indicate otherwise, but the need travel, medical, country entry, place or event entry,

121 Social Network Analysis and the Intelligence Community


transportation, housing, critical resources, and govern- rulings from the Supreme Court, Congress should
ment records. “Red Teams” would create scenarios of seize the moment and demonstrate its constitutional
terrorists’ attacks and determine the kinds of planning responsibility by writing effective legislature that
and preparations that would be necessary and then tar- guides the development and implementation of SNA
get those activities with the pattern-based data mining programs.
programs. When William Safire’s New York Times story The immediate research has identified three forms
“You are a Suspect” broke in 2003, it was the beginning of legislature that pertain to data mining and the
of an avalanche of criticisms resulting in the Senate issue of privacy: the Homeland Security Act of
amendment to the Omnibus Appropriations Act that 2002, Section 222; the Privacy Act of 1974; and the
prohibited the deployment of Terrorism Information E-Government Act of 2002. The Homeland Security
Awareness in connection with data about U.S. persons Act requires the chief privacy officer to assure that
without specific congressional authorization (Safire, “the use of technologies sustains, and do not erode,
2002). Eight months later Congress terminated the privacy protections” as it pertains to personal infor-
funding for Terrorism Information Awareness with the mation (2007 Report, p. 10). The Privacy Act of
exception of “processing, analysis, and collaboration 1974 establishes the requirement to publish a notice
tools for counterterrorism foreign intelligence,” which when personally identifiable information is maintained
was specified in a classified annex (Safire, 2002). In in a system, the System of Records Notice. The System
essence, the programs were for public consumption of Records Notice is published in the Federal Register
and defunded, but in practicality they just moved into and it identifies the purpose of the system, the cate-
the realm of classified programs where development gories of individuals, and the routine uses of the data.
and oversight concerns are no longer open to pub- System of Records Notice acts as a public notice and
lic debate. This is not oversight by Congress; it is an provides a mechanism of redress for individuals. The
attempt to placate media criticism by removing the E-Government Act requires all agencies to conduct
SNA programs from public scrutiny. Privacy Impact Assessments for all systems that collect,
The Supreme Court has failed to interpret that the maintain, or disseminate personally identifiable infor-
Fourth Amendment guarantees extend to the domain mation. Privacy Impact Assessments are not required
of data housed in a third party’s control. In United States for national security systems or classified systems.
v. Miller, the court ruled that there was no “reasonable
expectation of privacy” in information held by a third
7. DARPA
party; in this case Miller was contending that checks
held by the bank were subject to Fourth Amendment DARPA was initially established as a response to the
protection, but the court ruled against. Cates indicates Russians’ reaching space first with Sputnik. DARPA’s
that the Court has “repeatedly” ruled that the Fourth mission was to ensure that the United States main-
Amendment does not prohibit the Government from tained a lead in military capabilities and to prevent
obtaining information from a third party even when “technological surprise from her adversaries” (Van Atta,
the information was stipulated for a limited purpose 2007, p. 21) The DARPA model is primarily tasked with
(Cates, 2008, p. 452). This third party holding per- the responsibility to develop complex advanced tech-
sonal information is a direct correlation to the current nologies and systems that transcend traditional incre-
scenario of data warehouses that the government is mental improvements by promoting “entrepreneurial
currently accessing using data mining and SNA pro- performers” from university and industry (DARPA,
grams. There simply is no constitutional protection for 2009, p. 1). DARPA was created from within the DOD,
personally identifiable information housed by a third and it operates as an independent entity with auton-
party. The major difference today is that with the power omy in its operational parameters. Program managers
of computers and search algorithms, data that were are co-located with similar experts and are encouraged
lost in a massive data warehouse are now accessible for to “challenge” the status quo and establish metrics
viewing. based on actual results. The DARPA paradigm of “high
It remains to be seen if the Supreme Court’s risk-high payoff ” is fueled by the characteristics of
comprehension of the issue has evolved to the point small size (no laboratories or facilities), lean nonbu-
to affect future rulings. Rather than wait for new reaucratic structure (limited management hierarchy),

K. E. Gumm Jr. 122


focused on change state technologies, and a highly flex- in the post-9/11 environment were developed with
ible and adaptive research program (Van Atta, p. 20). goals of enhancing IC analysts’ capabilities with lit-
tle or no regard for existing legal limitations. Even if
It was explicitly chartered to be different, so it could do funda- existing laws were not specifically designed to address
mentally different things than had been done by the military service
data mining as a tool in the fight against interna-
R&D organizations. (Van Atta, 2007, p. 20)
tional terrorism, the intent to provide some level of
DARPA’s key characteristics can be identified from control on privacy of personally identifiable informa-
an examination of the past 50 years: the ability to rein- tion on U.S. citizens was apparent but not applied.
vent itself rather than become tied to previous projects, The zeal to protect and the initial naive assumption
independence from service research and development that the enemy consisted of only foreign individu-
organizations, a lean and agile organization with a risk- als, not U.S. citizens, speaks to the need to have
taking culture, concentration on program managers protections built into the architecture of any future
(program manager conceives and owns the program), systems. The decision to build systems and then later
and a business model that is idea driven and results retrofit those systems with legal constraints could result
oriented (Van Atta, 2007, p. 23). The program man- in reducing the effectiveness of the original design;
agers seek out concepts that are not only unproven and at the very least, it would require redesigning the
cutting edge but also offer potential solutions to mili- systems. The additional costs resulting from such a
tary problems. After gaining the approval of the office redesign could be measured in real dollars for duplica-
director and DARPA director, the program manager tive work, and the costs of potential attacks missed
will locate defense contractors, private companies, and could include loss of life and substantial monetary
universities with the desire and capabilities to deliver costs.
success on a project. What Fuchs (2010), Van Atta (2007), DARPA (2009),
DARPA’s 2009 Strategic Plan identifies its only char- Block (2007), and others have explored are the inner
ter as “radical innovation,” something which serves as operations of DARPA. More specifically, the char-
a “specialized technological engine” that is “forward acteristics that have made DARPA the successful
looking, relevant and responsive to new opportuni- paradigm that it is—flexibility, innovative, synergistic,
ties” and will serve to transform the DOD (DARPA agile, focused—all describe the mechanisms at work.
2009, p. 3). DARPA’s management has devised nine What is missing from previous evaluations is the exam-
“strategic thrusts” to address the existing and emerg- ination of oversight or guidance provided by the DOD
ing threats confronting the nation: robust, secure, or Congress.
self-forming networks; detection, precision identifica- The development of predictive models using
tion, tracking, and destruction of elusive targets; urban data mining concepts at DARPA actually pre-
area operations; advanced manned and unmanned sys- dates 9/11, with the Evidence Extraction and
tems; detection, characterization and assessment of Link Detection (EELD) program from DARPA.
underground structures; space; increasing the “tooth to After 9/11 EELD would become part of DARPA’s
tail” ratio; bio-revolution; and core technologies. The Information Awareness Office, and the lessons learned
business model to address these “thrusts” is a simple would fuel future research into the Total Information
process of bringing in expert entrepreneurial program Awareness (TIA) program. Among the early lessons
managers, empowering them, protecting them from learned were:
red tape, and quickly making decisions about start-
ing, continuing, or stopping research projects (DARPA, • It is more efficient and less error prone to start with
2009, p. 14). This business model indicates where part known subjects of interest.
of their success comes from and represents where diffi- • Combining common low-level activity patterns
culties with data mining and SNA may have originated. (e.g., illegal immigration, operating front businesses,
While eliminating “red tape” allows program managers money transfers, use of drop boxes and hotel
the ability to concentrate on projects identified by addresses for commercial enterprises and individuals
military and policy leaders as key to national secu- with multiple identities) improves detection of high-
rity, it eliminates any need to conform to existing level patterns (e.g., training, logistics and reconnais-
policies. Many of the programs initiated by DARPA sance activities).

123 Social Network Analysis and the Intelligence Community


• Terrorists preparatory and planning activities can be will ensure the protection of civil liberties by pro-
identified to help predict terrorist plots (DeRosa, viding the oversight and clear limitations on deploy-
2008, p. 12). ment. Government personnel should have a clear
understanding of what is permissible and not simply
The use of data mining software demonstrates the by the agency administration they operate under, but
usefulness of these methods (TIA, CAPPSII, Secure also by the specific legislative action from Congress.
Flight, MATRIX, and HTF among others) and exposes One method to protect personal information is to
the potential for invasion of privacy issues. In direct anonymize the data by one-way hashing (an algo-
response to 9/11 and the belief that there may be rithm that is used to enhance security of data), mask-
“sleeper cells” in the United States, the Information ing (obscuring data), and blind matching (matching
Awareness Office (IAO) was created at DARPA. The records without revealing the identity) until the data
concept was to coordinate the various different infor- present evidence of illegal activity and then a FISA
mation technology (IT) programs currently operating warrant could be obtained to reveal the identity of the
within DARPA under one Technical Office Director. actor. Another method involves a hierarchy of infor-
IAO’s mission statement was to “counter asymmetric mation rights. That is, lower level analysts at the initial
threats by achieving total information awareness useful stages of an investigation would not have access to
for preemption, national security warning and national personally identifiable information, but as the SNA
security decision making” (Seifert, 2008, p. 6). Out develops a list of actors of interest the analyst would
of this mission statement came the Total Information bump the reports to a supervisor with authority to
Awareness Program (TIA) task with three specific access the identity of the actors. The assurance of
areas of research: language translation (automated these measures would require the use of standard secu-
rapid language translation), data search (with pattern rity protocols such as passwords, login procedures,
recognition and privacy protection), and advanced audit trails, data encryption, and key management.
collaborative and decision support tools (to facilitate If the analyst was not prepared to file a report, a
search and coordination activities across disparate mechanism requiring written authorization could be
agencies) (Seifert). established before any personally identifiable infor-
Some of the most successful programs were mation was revealed. The creation of a hierarchy of
defunded because of pressures brought to bear on security controls should start with a requirement of no
Congress after media exposed the possibility of civil personally identifiable information during the initial
rights violations. Funding for TIA was prohibited by use of broad search parameters. As the search narrows,
the 2004 DOD Appropriations Act (P.L. 108-87), but and its focus and the need for identification increases,
there was a caveat: Section 8131 allowed “unspeci- the use of written authorization for actors with pre-
fied sub-components of the TIA to be funded as part sumed noncitizen status should be required and for
of DOD’s classified budget” (Seifert, 2008, p. 7). The known or suspected citizenship a FISA warrant should
ethical and legal issues are paramount to the reasons be required. It could be argued that Constitutional
that many of the DARPA programs that involve SNA rights should be extended to all individuals regard-
and data mining have gone underground since being less of citizenship. To provide such protections would
defunded by Congress and have to facilitate their fund- demonstrate to the world that U.S. concepts of civil
ing by classified means. The problem this presents to rights are not just for the privileged citizens of the
Congress is that public scrutiny has been eliminated United States but also for the entire world.
for now, but the issue of privacy concerns involving The SNA architecture should include the capabili-
the use of personally identifiable information remains. ties to handle erroneous data, missing data, and data
of different forms and from different sources. The
examination of algorithms in this research demon-
8. SNA ARCHITECTURE strates that there are several different algorithms avail-
SNA provides a mechanism to help focus resources able that address these concerns. The development
on the most important actors. With the support of of architecture and legislation should not be centered
Congress, SNA policy clearly states the operational on determining which algorithm best defeats any lim-
parameters to be used in applying these technologies itation. The fact that each agency within the IC

K. E. Gumm Jr. 124


has been tasked with specific responsibilities will 9. CONCLUSION
require different software applications that best fit the
With the proper guidance from Congress, the future
responsibilities that Congress has placed on those agen-
implementation of SNA and data mining processes
cies. The process that is used to select those algorithms
could achieve the goal of enhanced performance for
and applications systems should be the major concern
the analysts of the IC while protecting the civil rights of
for Congress. Development of policies that provide
U.S. citizens. The algorithms and frameworks exist to
guidelines for agencies to operate will direct agency
exploit data that exist in various formats and disparate
managers to the selection of systems that perform the
databases. The ability to search data anonymously and
task while protecting civil rights. The development of
detect terrorist activities does not require the disclo-
a Super SNA architecture that embraces flexibility by
sure of PII until the presence of illegal or anomalous
adopting open source coding would allow the intro-
behavior is revealed. The involvement of DARPA to
duction of new algorithms as they are developed and
help build an improved architecture by leveraging its
would permit one super architecture to be developed
unique ability to bring together industry, academia,
that feeds subarchitectures that operate within each
and government entities should be encouraged. The
agency and communicate across all agencies. This sim-
development of legislation that provides limitations
ple concept would eliminate the need for duplicitous
and guidance on when PII is to be disclosed, by
legislature and customized software applications for
whom, and for what reasons should be a priority for
each agency.
Congress. The comprehension of advanced algorithms
The current structure of evolution in SNA and
and frameworks that extract data from the Internet or
data mining is chaotic and driven by individual
privately held databases is not required for Congress.
researcher interests and funding agencies needs. The
What is required are legal limits on who sees what,
future of computer-assisted analysis would greatly ben-
when, and for what reasons.
efit from DARPA involvement to synergize the efforts
Future research should look to determine how these
of industry and academia. However, with the current
legal limitations could be written such that DARPA
Congressional attitude of defunding and segregation to
and other research entities are not restricted in their
classified programs, many researchers and corporations
quest but rather are empowered with specific knowl-
are left to travel the road of discovery without a map.
edge of what society and Congress will tolerate.
The development of a Super Architecture for SNA
Research that examines precisely how DARPA’s over-
would allow for all the algorithms discussed above to
sight is regulated would be a first step in defining any
be incorporated, and each individual agency could sim-
needed regulations.
ply choose the subelements most appropriate for the
task as defined by Congress. The architecture should
be user friendly and menu driven to flatten the learn- REFERENCES
ing curve, and the different measures available in SNA Backus, G., & Glass, R. (2005). An agent-based model component to a
should be able to refute or substantiate the other mea- framework for the analysis of terrorist-group dynamics. Albuquerque,
NM: Sandia National Laboratories. Retrieved from.htpp://citeseerx.ist.
sures without requiring a computer science degree to psu eduviewdoc/download?doi=10.1.1.60.9584&rep
manipulate. As each algorithm develops, its output Best, R. (2009). Intelligence issues for Congress. Washington, DC:
anomalies should be automatically reported to analyst Congressional Research Service. Retrieved from http://www.fas.org/
sgp/crs/intel/RL33539.pdf
without the need for the analyst to request the infor- Block, F. (2007). Swimming against the current: The hidden developmen-
mation. Automated reporting would allow the analyst tal state in the U.S. Politics and Society. 36, pp. 169–206. Retrieved
from http://sociology.ucdavis.edu/people/fzblock/pdf/swimming.pdf
to concentrate on the human side of analysis with the Carley, K. (2009). Dynamic network analysis for counter-terrorism.
computer calculating and processing the data with lit- Pittsburgh, PA: Carnegie Melon University. Retrieved from http://
tle need for operator input. The true power of SNA is www.nap.edu/openbook.php?record_id=12083&page=169
Carley, K., Diesner, J., Reminga, J., & Tsvetovat, M. (2006).
the ability to find patterns in the data not otherwise Toward an interoperable dynamic network analysis toolkit.
detectable by humans and then display that data in Pittsburgh, PA: Carnegie Melon University. Retrieved from https:/
/community.bus.emory.edu/dept/ISOM/Shared%20Documents/
graphical or visual means that quickly tells the story Hightower%20Speaker%20Papers/KathleenCarley.pdf
to analysts without their even understanding or com- Cate, F. (2008). Government data mining: The need for a legal frame-
prehending the various algorithms that are responsible work. Harvard Civil Rights-Civil Liberties Law Review, 43(2), pp.
435–489. Retrieved from http://www.law.harvard.edu/students/orgs/
for the information displayed. crcl/vol43_2/435–490_Cate.pdf

125 Social Network Analysis and the Intelligence Community


Crenshaw, M. (2005). Terrorism in context. University Park, PA: The Norway: Norwegian Defense Research Establishment. Retrieved from
Pennsylvania State University Press. http://rapporter.ffi.no/rapporter/2004/04307.pdf
DeRosa, M. (2004). Data mining and data analysis for counterterrorism. Markle Foundation. (2003). Creating a trusted network for homeland
Washington, DC: The CSIC Press. Retrieved from http://csis.org/files/ security. New York, NY: Author. Retrieved from http://www.markle.
media/csis/pubs/040301_data_mining_report.pdf org/publications/666-creating-trusted-network-homeland-security
Deshpande, S. P., & Thakare, V. M. (2010, September). Data mining sys- Memon, N., & Larsen, H. (2006). Investigative data mining toolkit: A
tem and applications: A review. International Journal of Distributed software prototype for visualizing, analyzing and destabilizing terror-
and Parallel Systems, 1(1), pp. 32–44. Retrieved from http://airccse. ist networks. Esbjerg, Denmark: Aalborg University. Retrieved from
org/journal/ijdps/papers/0910ijdps03.pdf http://www.dtic.mil/cgibin/GetTRDoc?Location=U2&doc=GetTRDoc.
Defense Advanced Research Projects Agency. (2009). Strategic plan pdf&AD=ADA477075
May 2009. Arlington, VA: Author. Retrieved from http://www.scribd. Perliger, A., & Pedahzur, A. (2010). Social network analysis in the
com/doc/15833928/Defense-Advanced-Research-Projects-Agencys- study of terrorism and political violence. Carbondale, IL: Southern
DARPA-Strategic-Plan-May-2009 Illinois University. Retrieved from http://utexas.academia.edu/
Duval, R., Christenson, K., & Spahiu, A. (2010). Bootstrapping a terrorist AmiPedahzur/Papers/290973/Social_Network_Analysis_in_the_
network. Carbondale, IL: Southern Illinois University. Retrieved from Study_of_Terrorism_and_Political_Violence
http://opensiuc.lib.siu.edu/pnconfs 2010/20 Reid, E., & Chen, H. (2006). Mapping the contemporary terrorism
Fuchs, E. (2010). Rethinking the role of the state in technology devel- research domain. Tucson, AZ: The University of Arizona. Retrieved
opment: DARPA and the case for embedded network governance. from http://ai.arizona.edu/intranet/papers/paper-Reid-terrorism-
Pittsburgh, PA: Carnegie Mellon University. Retrieved from http:// researcher.pdf
repository.cmu.edu/epp/3 Safire, W. (2002, November 14). You are a suspect. The New York
Hanneman, R., & Riddle, M. (2005). Introduction to social network meth- Times. Retrieved from http://www.nytimes.com/2002/11/14/opinion/
ods. Riverside, CA: University of California. Retrieved from http:// you-are-a-suspect.html
faculty.ucr.edu/~hanneman/nettext/C1_Social_Network_Data.html Seifert, J. (2008). Data mining and homeland security: An overview.
The privacy implications of government data mining programs: Hearing Washington, DC: Congressional Research Service. Retrieved from
before the United States Senate Committee on the Judiciary, http://www.fas.org/sgp/crs/intel/RL31798.pdf
(2007) (testimony of Kim Taipale). Retrieved from http://data-mining- Skillicorn, D. B. (2004). Social network analysis via matrix decomposition:
testimony.info/ al Qaeda. Ontario, Canada: Queen’s University. Retrieved from http://
Heuer, R. (1999). Psychology of intelligence analysis. Washington, DC: research.cs.queensu.ca/~skill/alqaeda.pdf
Central Intelligence Agency, Center for the Study of Intelligence. Smith, B., Damphouse, K., & Paxton, R. (2006). Pre-incident indicators
Retrieved from https://www.cia.gov/library/center-for-the-study-of- of terrorist incidents: The identification of behavioral, geographic,
intelligence/csi-publications/books-and-monographs/psychology-of- and temporal patterns of preparatory conduct (Document number
intelligence-analysis/index.html 214217). Washington, DC: Department of Justice. Retrieved from
Hulst, R. (2009). Introduction to social network analysis as an http://www.ncjrs.gov/pdffiles1/nij/grants/222909.pdf
investigative tool. The Hague, the Netherlands: Research and U.S. Department of Homeland Security. (2007). 2007 report to Congress
Documentation Centre, Ministry of Justice. Retrieved from http://web. on the impact of data mining technologies on privacy and civil lib-
ebscohost.com.ezproxy2.apus.edu/ehost/detail?vid=1&hid=10& erties. Washington, DC: Author. Retrieved from http://www.dhs.gov/
sid=d013775abb12405cb8dd70f6c135091f%40sessionmgr13& xlibrary/assets/privacy/privacy_rpt_datamining_2008.pdf
bdata=JnNpdGU9ZWhvc3QtbGl2ZQ%3d%3d#db=tsh&AN= U.S. Department of Homeland Security. (2010). 2010 data mining report
40529769 to Congress. Washington, DC: Author. Retrieved from http://www.
Huisman, M., & van Duijn, M. (2003). Software for social network analy- dhs.gov/xlibrary/assets/privacy/2010-dhs-data-mining-report.pdf
sis. Groningen, the Netherlands: University of Groningen. Retrieved Van Atta, R. (2008). Fifty years of innovation and discovery.
from http://stat.gamma.rug.nl/Software%20for%20Social%20Netw Washington, DC: DARPA. Retrieved from www.darpa.mil/WorkArea/
ork%20Analysis%20CUP_ch13_Oct2003.pdf DownloadAsset.aspx?id=2553
Jensen, D., Rattigan, M., & Blau, H. (2003). Information awareness: A Watts, D. J. (1999). Networks, dynamics, and the small world phe-
prospective technical assessment. Retrieved from http://kdl.cs.umass. nomenon. American Journal of Sociology, 13(2), pp. 493–527.
edu/papers/jensen-et-al-kdd2003.pdf Retrieved from http://www.cc.gatech.edu/~mihail/D.8802readings/
Katz v. United States. 389 U.S. 347,361 (1967). watts-swp.pdf
Koltko-Rivera, M. (2004). Detection of terrorist preparations by an arti- Xu, K., Tang, C., Ali, G., Li, C., Tang, R., & Zhu, J. (2010). A compar-
ficial intelligence expert system employing fuzzy signal detection ative study of six software packages for complex network research.
theory. London, England: RTO SCI Symposium on Systems, Concepts Cheng Du, China: Sichuan University. Retrieved from http://cs.scu.
and Integration Methods and Technologies for Defence Against edu.cn/~tangchangjie/paper_doc/2010/XKKcomparision.pdf
Terrorism. Retrieved from http://psg-fl.com/downloads/NATO%20- Zegart, A. B. (2005). September 11 and the adaptation failure of U.S.
%20Detection%20of%20Terrorist%20Preparations.pdf intelligence agencies. International Security, 29(4), 78–111. Retrieved
Koschade, S. (2006). A social network analysis of Jemaah Islamiyah: from http://faculty.spa.ucla.edu/zegart/pdf/29.4zegart.pdf
The applications to counter-terrorism and intelligence. Queensland,
Australia: Queensland University of Technology. Retrieved from http://
eprints.qut.edu.au/6074/
Krebs, V. (2002). Uncloaking terrorist networks. First Monday, 7(4).
Retrieved from http://131.193.153.231/www/issues/issue7_4/krebs/ BIOGRAPHY
Krueger, A., & Maleckova, J. (2002). Education, poverty, violence, and
terrorism: Is there a causal connection? (Working Paper 9074). Kenneth Earl Gumm Jr. holds a Master’s in
Washington, DC: National Bureau of Economic Research. Retrieved
from www.nber.org/papers/W9074
Intelligence Studies with a concentration in Terrorism
Kutcher, C. (2008). Social network analysis—linking foreign terrorist from American Military University, Charles Town,
organizations. International Analyst Network. Retrieved from http:// West Virginia. In 2004 he earned his BS in
www.analyst-network.com/article.php?art_id=1590
Lia, B., & Skjolberg, K. (2004). Causes of terrorism: An expanded and Management, graduating Magna Cum Laude from
updated review of the literature (FFI/RAPPORT-2004/04307). Kjeller, Troy State University.

K. E. Gumm Jr. 126


Copyright of Information Security Journal: A Global Perspective is the property of Taylor & Francis Ltd and its
content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's
express written permission. However, users may print, download, or email articles for individual use.

You might also like