Geolinguistics The Incorporation of Geographic Inf

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/264885039

Geolinguistics: The Incorporation of Geographic


Information Systems and Science

Article · January 2010

CITATIONS READS

36 2,799

2 authors, including:

James Hayes
Moreno Valley College
13 PUBLICATIONS 152 CITATIONS

SEE PROFILE

All content following this page was uploaded by James Hayes on 22 September 2015.

The user has requested enhancement of the downloaded file.


Geolinguistics: The Abstract
Incorporation of Modern geographic information systems
Geographic Information (GIS) and its incorporated spatial analysis
Systems and Science tools allow sophisticated and efficient analysis
of spatial data by researchers in many fields.
Although the field of linguistics has long
Shawn Hoch been of interest to geographers and spatial
Children’s Health Services Research variation of language to linguists, research-
Indiana University School of Medicine ers have made little use of the power of GIS
Indianapolis, IN 46202 and GIScience theory to address hypotheses
E-mail: shoch@indiana.edu regarding spatial variation of language and
correlated physical and social variables.
James J. Hayes Discussion of modern GIS tools for spatial
Department of Geography analysis, quantitative analysis, and cartogra-
California State University Northridge phy in geolinguistics has been largely absent
Northridge, CA 91330 from the literature. Linguists have applied
E-mail: james.hayes@csun.edu GIS technology in language atlases, including
recent on-line atlases; however, analytic and
data processing capabilities are seldom dis-
cussed. Following a review of geolinguistics
work incorporating GIS, this article discusses
potentially useful GIS tools and techniques
for geolinguistics. The article concludes with
reflection on the future role of GIS in geo-
linguistic thought and practice.
Key Words: dialectometry, geolinguistics,
GIScience, GISystems, linguistic geography

INTRODUCTION

Geolinguistics is an interdisciplinary field


that often incorporates language maps depict-
ing spatial patterns of language location or
the results of processes that lead to language
change. Accordingly, GIS is well-suited for
geolinguistic studies, although researchers
have yet to fully explore the potential for data
management and analysis tools incorporated
in GIS software. In a review of the literature
on geolinguistics, we found few studies either
employing GIS or discussing methodology
for doing so (Lee and Kretzschmar 1993;
Williams and Van der Merwe 1996; Goebl
2006). Also, we found few studies that ac-
knowledge early advances in the use of GIS to
examine language variation (Pederson 1993;
The Geographical Bulletin, 51: 23-36 23
©2010 by Gamma Theta Upsilon
Shawn Hoch and James J. Hayes

Kretzschmar and Schneider 1996; Kretz- gage this historically important, but currently
schmar 2003). Although researchers have quiet, area of geographic inquiry.
developed GIS methods for spatial language The aim of this article is to highlight
data analysis, they do not often cite the his- discussion in the literature that does spe-
tory and progress of this development in the cifically address GIS methodology used in
geolinguistics literature. geolinguistic research and map making, and
Linguists have produced extensive carto- to reflect on the relationship between theory
graphic work, most notably in the form of and method in geolinguistics and GIScience.
linguistic atlases (Kurath et al. 1939-1943; We do not seek to present a comprehensive
Pederson et al. 1986; Labov, Ash, and Boberg overview of the use or function of GIS in
2006). GIS has undoubtedly played an geolinguistics research, but rather to high-
increasing role in spatial data analysis and light cartographic products, research articles,
cartographic methods for linguistic data in and books which have explicitly discussed the
recent geolinguistics research; however, we role of GIS in their production. We begin by
suggest that a more open discussion of, and reviewing some early applications of spatial
focus on, the role of GIS in geolinguistics data analysis in the field, many of which
would further benefit spatial linguistics and took place in the formative stages of both
GIScience. Geolinguistics is poised to adapt GIScience and contemporary geolinguistics.
GIS and the fundamentals of geography and In this section, we also address the linguistic
cartography to address both well-developed atlas as a traditional product of cartographic
and new questions within the field. methods in geolinguistics and we note its
Despite early definitions of geolinguistics advances towards incorporation of GIS. After
as inherently interdisciplinary (Van der reviewing recent applications and ongoing
Merwe 1992) or even as a subdiscipline of projects, we aim to invigorate the discussion
geography (Williams 1988), there remains initiated by Lee and Kretzschmar (1993) by
great potential for mutually enriching col- suggesting GIS tools potentially useful to the
laboration between geolinguists and GI- geolinguist.1
Science practitioners. Lee and Kretzschmar
(1993) described infrequent contribution of Geolinguistics: Foundations
geographic expertise to linguistics research of Spatial Analysis of
beyond the purposes of cartographic support, Language
noting the absence of quantitative spatial
analysis methods in previous work of linguis- Early theoretical studies indicated the
tic geographers. Their call for collaboration field of geolinguistics is rich with questions
was elaborated with examples and discussion and challenges that can be approached with
of the use of GIS to analyze data from the GIS. Breton described the process through
Linguistic Atlas of the Middle and South which geographic thought becomes a tool for
Atlantic States (LAMSAS) database (Lee and linguists: “In analyzing the distribution in
Kretzschmar 1993). Williams (1996) also de- space and in society of the facts of language,
scribed the relationship between linguistics the linguist employs the methods of geogra-
and geography as slow to develop, pointing to phy: cartography and the establishment of
their differing academic cultures. It appears correlations and causalities between spatial
that these calls have been largely unanswered phenomena” (1991, 19). Breton’s model indi-
as evidenced by the paucity of subsequent cated that linguists have engaged geographic
research. Since these publications, GIS has thought throughout the development of
developed substantially in quantitative and geolinguistics, especially those interested in
visual spatial analysis, as well as in its further dialectology, phonology, word choice, and the
democratization. Given these advances, we more overarching areas of language change,
see an opportunity for geography to re-en- contact, function, history, and policy.

24
Geolinguistics: The Incorporation of Geographic Information Systems and Science

Given the long-established ties between highlighted the Atlas Lingüístico y Etnográ-
linguistics and geography, what potential fico de la Provincia de Santander (Linguistic
questions in geolinguistics can geographic and Ethnographic Atlas of the Province of
information systems and science address? Santander) as an example of an “automated”
Mackey (1988) began to pose questions of linguistic atlas and extolled the advantages of
geolinguistics which find potential solutions a computerized versus manually drawn and
in GIS, asking the reader to consider the reproduced atlas. Alvar (1991) described the
meaning of language boundaries in carto- database developed for this atlas as a highly
graphic representation. Do borders represent useful product of the project facilitating
transitions between languages or dialects? Do mapping on-demand and the preparation of
they represent zones of conflict or thriving indices used in interpretation of linguistic
multilingualism? Ormeling (1992) suggested atlases. The end result was a leap forward in
that boundaries should represent the course time- and cost-effectiveness of atlas design
along which the largest number of sociode- and reproduction.
mographic and physical characteristics di- Thomas (1980) presented an early example
verge. Kretzschmar (1992) and Davis (2000) of GIS used to measure spatial autocorrela-
framed much use of isoglosses, boundaries tion in computerized data from a linguistic
delineating diverging linguistic features, as survey, describing how he placed numerical
conceptual models rather than statistically values representing Welsh word usage in ap-
reliable figures. Mackey (1988) also pointed propriate regions on a base map of Wales.
out that language mapping should take into He then used a specialized grid overlain on
account the various functions and sociologi- mapped survey sites to reveal “site clusters”
cal aspects of language such as education and based on the rate at which survey results
commerce. Through such questions, Macau- coincided with those of neighboring units.
ley (1985), Mackey (1988) and others began In his explanation of the process, Thomas
early conversations on geolinguistic analyses expressed the need for a more advanced spa-
such as language border measurement before tial analysis than his “relative geographical
the tools to conduct them were readily avail- disposition of sites”: “Ideally, enquiry sites
able outside of GIS specialist circles. would have been located in the cells of a
regular geometrical grid superimposed on
Early GIS Applications: a geographical map, with the closeness of
Realized Benefits of its mesh adjusted according to population
Computerized Linguistic Data density and the frequency of settlements”
(13). Here Thomas alluded to the advantages
How have GIS applications traditionally of GIS raster analysis and vector grid capa-
assisted in geolinguistic research when used? bilities that would be readily available to a
What were the immediate appeals of comput- language mapping project today.
erized linguistic data? Though the examples Throughout the 1980s, linguists heralded
are few, evidence suggests the introduction the increasing availability of desktop com-
of computer technology for storage of sur- puting as a benefit to geolinguistic work in
vey data and production of linguistic atlases attribute storage and recall (Pederson 1986,
beginning in the mid-1970s. Researchers 1988) and in providing easily generated
during this period commonly cited benefits maps as research tools (Pederson 1988; Alvar
of data storage and transport (Pederson 1986; 1991). These were early indicators of the vital
Alvar 1991; Nerbonne and Kretzschmar roles of some basic GIS functions in address-
2003) and mapping on the fly (Pederson ing significant limitations in managing and
1988; Kretzschmar 1996). Alvar (1991) com- displaying large linguistic survey datasets.
posed a collection of writings on linguistic However, linguistic techniques benefiting
atlas projects and on the field in general. He from computation, in dialectology in par-

25
Shawn Hoch and James J. Hayes

ticular, were still hampered in their develop- which GIS has played a prominent role is the
ment and acceptance due to limitations of the Linguistic Atlas of the Middle and South Atlan-
technology available (Kirk and Kretzschmar tic States (LAMSAS) (McDavid and O’Cain
1992; Nerbonne and Kretzschmar 2006). 1980). Origins of the current LAMSAS proj-
Moreover, early examples of work using GIS ect can be found in some of the earliest large-
had to endure the transition from hard copy scale linguistic mapping efforts in the United
cartography to digitized base maps (Kirk and States (Kurath et al. 1939-1943). Schneider
Kretzschmar 1992). In spite of these limita- and Kretzschmar (1989) began to report data
tions, Pederson’s resourceful efforts represent organization of LAMSAS enabling comput-
early advances in geolinguistic visualization erized statistical testing and the creation of
of survey data with multiple variables and a grid optimized to contain equal numbers
quantitative measurement of word frequency. of respondents in a cell for the purposes of
Citing inspiration by Thomas (1980), Peder- analyzing linguistic variation and regional
son’s work towards computerized storage and characteristics. In the following years, their
display of data from the Linguistic Atlas of the work with LAMSAS continued towards geo-
Gulf States (Pederson et al. 1986) calls to mind graphical analysis, commenting on the use of
some essential tools of GIS that would not be MapInfo in which they mapped coordinates
widely commercially available in a graphical of atlas informants (Kretzschmar and Sch-
user interface until nearly a decade later. In neider 1996). They observed the modifiable
establishing the visual arrangement of ASCII areal unit problem (MAUP) arising from
characters representing informant positions their grid of irregularly shaped polygons. As
and responses (e.g., uses “soda” or does not; Gotway and Young (2002) suggested, further
represented as “+” or “-,” respectively), Ped- exploration of GIS tools and geovisualiza-
erson (1986, 1988) placed the characters as tion could help analyses in projects such as
close as possible to the known locations on LAMSAS address the modifiable areal unit /
a base map, essentially manually geocoding change of support problems.
the informant locations. He also employed a One of the most frequently referenced
sequence of ASCII characters at the geocoded collections of language data, and a widely
locations displaying several sociolinguistic consulted source in the formation of other
attributes of informants or multiple phone- atlases, is the Ethnologue (Gordon 2005).
mic or lexical variants (e.g., race/education/ The project was initiated and is maintained
income represented as the string R-E-I) at by SIL International, an organization origi-
one time. This innovation allowed storage nally concerned with biblical translations in
and display of multiple linguistic attributes, minority languages. For over fifty years, the
albeit limited in the latter by the readability Ethnologue has appeared in numerous edi-
of strings of multiple characters. tions primarily as an authoritative directory
The linguistic atlas has proved a vital tool of living languages, the locations of their
and product of geolinguistics since the earli- speakers, and basic speaker population sta-
est stages of the field and has provided a stage tistics. For nearly as long as it has been pub-
for the incorporation of GIS. French linguist lished, it has also included maps of countries
Jules Gilliéron is considered the pioneer of and linguistic regions. With recent editions
the linguistic atlas, having coauthored the available online (http://www.ethnologue.
Atlas Linguistique de la France (1902-10). com), it has added basic data exploration
Henceforth, linguists have produced the- capabilities insofar as the user can call up
matic language maps and atlases of various maps of countries and regions by clicking on
regions. The atlas has traditionally been the their respective links. SIL has also collabo-
starting point for research and progress in the rated with a vector data resource called the
formation of geolinguistics as a field. World Language Mapping System (WLMS),
An ongoing linguistic atlas project in making WLMS boundaries and attribute

26
Geolinguistics: The Incorporation of Geographic Information Systems and Science

data comprising the Ethnologue available gan by establishing enumeration units based
for purchase in GIS-ready formats. on neighborhood subdivisions throughout
the area and compiling multiple years of
Recent Applications and South African census data for these units. He
Projects Incorporating GIS frequently used spatial measures of central
tendency to display center of gravity shifts
Some recent applications of GIS in in English, Afrikaans, and Xhosa throughout
linguistics have begun to work towards the area.
greater ease in data exploration. Whereas Williams and Van der Merwe (1996) went
early linguistic atlases offered little in the way on to combine their experience in spatial lan-
of data exploration, an example of a more guage data analysis with theories concerning
fully and intentionally interactive linguistic informed language policy in linguistically
atlas is the Modern Language Association complex urban environments. The authors
(MLA) Language Map. Designed using described the overall goal of their work as
ESRI’s ArcIMS, the MLA Language Map the compilation of comprehensive, dynamic,
compiles vast amounts of U.S. Census data and up-to-date geolinguistic data to assist in
(MLA 2009). The user is able to produce sound decisions in education, urban plan-
and manipulate thematic maps by choosing ning, and language policy. They argued that
various language distributions. One can overly simplistic data and a general lack of
also vary the region being mapped (U.S. or interdisciplinary geolinguistic work had
individual states), and the enumeration units left national-level planning at a loss with
(counties or zip codes). The user also has little accurate data on changing language
access to the data tables providing languages use, resulting in planning and policy that
spoken and numbers of speakers by state and was out of touch with urban realities. They
county. Clearly, the MLA Language Map pointed out that notions of national-level
offers a degree of flexibility in language data language patterns prior to the early 1990s
representation that would be beneficial for simply omitted the linguistic complexities
users of all large-scale language data projects of urban South Africa, which comprised over
such as the Ethnologue. half of the population. They offered GIS-
One of the most recent disseminations of based analysis at the localized urban level as
data from the aforementioned LAMSAS is an answer to the problems associated with a
maintained online as part of the “Linguistic coarser regional perspective.
Atlas Projects” (www.lap.uga.edu). The site GIS also played a key role in a study of
hosts survey data from this and several other language use and the state of bilingualism.
atlas projects developed in the U.S. in the McGuirk (2004) explored the roles of several
early- to mid-20th century. It currently allows demographic data, their associations with lan-
the user to browse the survey areas by state, guage use, and implications for the future of
with survey locations geocoded and linked to bilingualism in Miami-Dade County, Florida.
informant descriptors and responses. The site Of chief concern was the issue of language
has been accessible in other versions since the maintenance in a country that has historically
mid-1990s and represents a long-standing assimilated immigrant cultures such that
resource for visualization of some of the most multilingualism represents only a provisional
influential work in the field. phase in the process, eventually resulting in
The work of Van der Merwe arguably set increasingly monolingual (English-speaking)
the stage for a subsequent generation of generations. Williams (1988) addressed the
interactive linguistic atlases such as those role of place in settings where speakers must
discussed above while also providing a role navigate socially constructed rules of using
for GIS beyond visualization. In his analyses more than one language. This importance is
of Cape Town (Van der Merwe 1993), he be- reflected in one of McGuirk’s central research

27
Shawn Hoch and James J. Hayes

questions: “What sociolinguistic characteris- the following section, we continue to discuss


tics…make Spanish-English bilingualism and methods and possibilities that spatial science
Spanish language vitality unique within the can offer geolinguistics given similar collabo-
Miami-Dade County social and geographic ration and forward-looking techniques.
context?” (McGuirk 2004, 8)
McGuirk began his geolinguistic analysis Spatial Theory and
by aggregating census tracts based on estab- Methodology Applied in
lished neighborhoods such as Little Havana, Geolinguistics
mapping these units and linking census data
using manifold.net’s Manifold System 5.50. While the latest technological innovations
After performing multiple regressions, he of geolinguistic study are making use of the
used University of Illinois Spatial Analysis data handling and display capability of GIS,
Laboratory’s GeoDa to produce choropleth there is infrequent evidence of adoption of
maps displaying the same units with a color the analysis and cartographic functionality
scheme based on the Moran Local Indica- offered through modern GIS tools. Early
tor of Spatial Association statistic (Anselin examples of spatial language analysis exist,
1995). He then compared the results to those but there are limited signs that researchers
from San Diego County, California, a com- have carried them forward with the advance
munity with a comparably large immigrant of tools and methods. Relative to the body
Hispanic population. In his conclusion, of geolinguistics literature, there are few pub-
McGuirk (2004) noted how these geographic lished examples of how quantitative spatial
analyses helped to confirm the sociolinguistic methods can be applied to geolinguistic
uniqueness of the target area in that Spanish research questions. Understanding the role
speakers in Miami-Dade County were not of space and distance and their relationships
clustered in socioeconomically deprived areas to other variables is a key component to
as found elsewhere. understanding any phenomenon that plays
Some recently presented work offers a rare out over a geographic area. Explicitly con-
example of the results of a vibrant relation- sidering their effects can reveal important
ship between geographers and linguists, and relationships that affect linguistic processes
in particular demonstrates how GIScience (Nerbonne and Heeringa 2007).
can advance language mapping techniques. There are four “broad areas” of geographic
Using LAMSAS data, Thill et al. (2008) ap- information analysis that are relevant for geo-
plied a self-organizing map (SOM) algorithm linguistic research: spatial data manipulation,
to assign informants to geospatial clusters, spatial data analysis, spatial statistical analy-
exploring the emerging patterns and compar- sis, and spatial modeling (O’Sullivan and Un-
ing them with U.S. dialect regions defined win 2003). Familiarity with the theory and
by Kurath (1949) decades earlier which had methodological limitations of each is critical
been untested by empirical studies until to its use and here we see great potential for
recently. SOM algorithms offer a form of interface between geography and linguistic
exploratory data analysis which reduces science. GIS and the geographical approach
the dimensionality of a spatially referenced offer geolinguistics researchers many possibil-
dataset and re-displays the data in a desired ities for advancing and reexamining theory,
number of classes. Although the authors hypotheses, and data visualization. GIS and
noted that SOM techniques are far from GIScience can offer an articulation of spa-
straightforward and must be painstakingly tial theory as a framework for approaching
tailored to unique datasets, their work with hypotheses in linguistics research. In addi-
this tool, bridging advanced geographical tion, GIS can simply make much research
analyses and linguistics, was in itself a com- in geolinguistics faster and easier.
mendable step forward for geolinguistics. In Much language mapping still uses chorop-

28
Geolinguistics: The Incorporation of Geographic Information Systems and Science

leth maps; however, traditional choropleth communication model and emphasizing the
mapping has two distinct disadvantages need for balance between precision mapping
for dialect mapping. First, discrete polygon and usability of choropleth maps can help
boundaries (usually political boundaries) address cartographic issues in language map-
are incompatible with modern geolinguistic ping. A suggested solution for maintaining
theory (Mackey 1988; Dahl and Veselinova spatial and attribute accuracy while accom-
2005). The boundaries (isoglosses) between modating spatial gradation is the graded
areas of language usage (mapping units) are area-class map (Kronenfeld 2005). Area-class
not discrete, but rather are features defined maps do not have predefined boundaries, but
by gradual changes in a number of variables boundaries based on the spatial variation in
including dialect, ethnicity, and location (Gi- the attribute of interest itself and the prob-
rard and Larmouth 1993). Linguistic bound- ability distribution that points within the
aries are therefore more like the boundaries mapped area belong to a designated class
of climatic regions or forest types (Mark and (Mark and Csillag 1989). Probability surfaces
Csillag (1989). Second, each area on the of class membership (Mark and Csillag 1989)
map is required to belong to one and only and fuzzy set membership functions (Girard
one class, but many points on a linguistic and Larmouth 1993) have been used to bet-
choropleth map will share some affinity with ter describe and locate class (attribute) and
nearby classes and areas. Several geospatial map (spatial) boundaries by noting variations
techniques have been developed to address in the rate of dialect change across space.
these cartographic boundary problems. Graded area-class maps share this basic ap-
Points on any classification map of lan- proach to identifying class membership, but
guage variation will have some probability rather than drawing discrete boundaries of
of belonging to multiple classes. Assigning a rigid classification, use gradation of light-
points and areas outright to discrete classes ness or hue to indicate changes across space
therefore increases both spatial and attribute based on a multidimensional attribute space
error in the map. Traditional choropleth and (Kronenfeld 2005). Kronenfeld (2007) in-
isoline mapping techniques ignore both the troduced the idea of the categorical gradient
nature of the dialect boundary and the com- field, implemented with categorical data
plex multi-attribute nature of dialect space, in vector or TIN data models to represent
yet some generalization and error is necessary transition between areas of more certain class
to make the map of use. membership (Fig. 1). Using polygon and
Techniques drawing on the cartographic TIN data can be an advantage with linguistic

Figure 1. Illustration representing a transitional zone between four categorical classes. The
gradation zone indicates an area where probability of membership is > 0 for more than one
class (left). Probability of class membership may be mapped as a categorical gradient field to
indicate how membership affinity varies across the transition (right).

29
Shawn Hoch and James J. Hayes

data which are often aggregated into areal Kriging, a method of spatial interpolation
units rather than points or grids. This ap- based on semivariance analysis, may also be
proach could be very useful for geolinguists of use to geolinguistics. Kriging recognizes
when classes are determined from multiple that spatial variables are too stochastic to be
measures of class membership and exhibit mapped using deterministic interpolation
spatial gradation. methods. Such variables are better represented
Discussions of quantitative analysis of as regionalized and having some systematic
linguistic data have been careful to include component such as a mean, but also a sto-
sampling bias and statistical independence chastic, spatially autocorrelated component
(Guy 1993; Kretzschmar and Schneider and a random “noise” component (Burrough
1996); however, the literature does not and McDonnell 1998). Also, kriging has the
consistently consider spatial dependence, a advantage of providing error estimates for
potential constraint on achieving unbiased the interpolated values at any point on the
and independent samples. Thomas (1980) map. The “quantitative maps” of linguistic
and McGuirk (2002) applied an understand- data described by Guy (1993) would lend
ing of spatial autocorrelation to linguistic themselves well to this type of analysis, po-
data, but infrequent discussion of this tentially revealing underlying relationships
phenomenon in geolinguistics suggests that and the role of space in shaping the observed
researchers may not widely recognize its ef- patterns.
fects or are just barely exploring them. GIS A hypothetical example of kriging is il-
makes the spatial analysis techniques related lustrated in Figure 2. The points in Figure
to spatial dependence more accessible than 2A represent the centroids of neighborhoods
ever. Mapped linguistic similarity indices and within a city. Frequencies of informant at-
“dialect kernels” are examples of methods tributes (e.g., race, educational attainment)
for detecting “spatiolinguistic correlation” and linguistic features are associated with
using geovisualization software (“Visual each point in the spatial database. Figure 2B
DialectoMetry”) developed expressly for is an example of a prediction map created by
analyzing linguistic data (Goebl 2006). Yet, kriging for one attribute from a survey. One
these visualization techniques can be taken can follow the same procedure for additional
further to quantitative exploration of spatial variables and compare the characteristics of
relationships. the variograms and prediction maps to assess
Geostatistical methods such as semivari- postulated relationships (e.g., are ethnolin-
ance (or semivariogram) analysis can be use- guistic features coincident with patterns of
ful for better understanding linguistic varia- segregation in the city?)
tion related to spatial dependence, revealing Point pattern analysis (PPA) is another
important information about rates of change quantitative approach for point data that
across space, language variability as a func- can allow inference of patterns in linguistic
tion of distance between samples, random phenomena (Lee and Kretzschmar 1993).
variability in data, inter-sample distances Second-order nearest-neighbor PPA statistics
necessary to achieve independent samples, such as Ripley’s K (Ripley 1976, 1988; Dale
and uncertainty in interpolated values. 1999) can help identify spatial patterns that
Semivariance analysis models variability as are more “clumped” or “dispersed” than a
a function of the distance between sampling random spatial process. This approach also
points (Burrough and McDonnell 1998). provides information on the scale of clump-
Such a model provides information on the ing or dispersion. The idea behind this
relationship between distance and the inten- technique is to examine a neighborhood of
sity of spatial dependence between sampling a given size (radius) around every point and
locations, and the distance at which samples determine if the points in that neighborhood
are independent (Rossi et al. 1992). are more or less dense than expected. The

30
Geolinguistics: The Incorporation of Geographic Information Systems and Science

Figure 2. Example of kriging to create a continuous map from survey point data. (A) Neigh-
borhood centroids where informants live. (B) Spatial variation of one variable from informant
data estimated using kriging.

Ripley’s K statistic can be used to examine domly. Data values outside of the envelope
spatial distributions of points for departure are clumped if above the envelope, dispersed
from complete spatial randomness (CSR) if below. This example indicates that the
(Haase 1995). An edge-corrected transforma- points are clumped more than expected and
tion of Ripley’s K is the L(t) transformation that the clumping is especially pronounced
(Haase 1995). The L(t) statistic is calculated at a neighborhood size of about seven kilo-
for all data using a given distance t, then meters in diameter.
repeated with sequentially larger values of t. Bivariate PPA can be of use in geolin-
Positive deviations from 0 indicate aggrega- guistic data analysis for comparing spatial
tion of points (clumping), while negative de- distributions of, for example, two alternate
viations indicate uniform dispersion. Monte pronunciations. In areas where dialects or
Carlo simulations can be used to generate languages intermix, bivariate PPA could be
confidence envelopes of “significant” CSR useful in determining whether the two occur
deviation. Figure 3A illustrates a case of together randomly, if they tend to cluster, or
univariate application of the L(t) function. if they are spatially segregated. Figures 3B
In the example showing hypothetical data and 3C are examples of bivariate Ripley’s K
points across the Indianapolis metropolitan analysis. In Figure 3B the two types of points
area, data values within the spatially random are segregated from one another at neighbor-
envelope limits (dashed lines) indicate that hood sizes of about 12 kilometers (roughly
points at those distances are distributed ran- the size of most individual clumps), but at

31
Shawn Hoch and James J. Hayes

Figure 3. Three examples of PPA applied to hypothetical data. (A) Univariate example ex-
hibiting spatial clustering. (B) Bivariate example exhibiting segregation of two responses,
switching to random association before forming a cluster of aggregation. (C) Bivariate example
exhibiting segregation at two different spatial scales.

32
Geolinguistics: The Incorporation of Geographic Information Systems and Science

a neighborhood size of about 30 kilometers Conclusion


the two are more clustered than a random
distribution. The two types of points are ran- This review has explored a broad scope of
domly arranged with respect to one another early and recent GIS applications in linguis-
at other distances. Figure 3C illustrates a case tics including linguistic atlases, lexical and
where the two types of points are segregated phonological surveys, and a sociolinguistic
at short distances, randomly intermixed at analysis. As GIS continues to find a place in
intermediate distances, and segregated again geolinguistics, some questions remain that
at larger distances. Segregation at the shorter concern GIS applications from perspectives
distances reflects the small individual clumps of both parent disciplines. On a practical
of points, while at the longer distances re- level, we note that geolinguists must also de-
flects the segregation of the points in the velop distinct approaches to storage, analysis,
southeast half of the map from the points and display of linguistic data from the feature
in the northeast. level, as in phonetic variation, to the largest
Inferential spatial statistics can be useful scales found in the Ethnologue (2005) and
for objective assessment of a spatial hypoth- other catalogues of modern spoken languages.
esis, but it will not always be necessary or This will require the expertise of linguists and
appropriate. Visual analysis of map data by the data processing and visualization skills of
an experienced geolinguist may be all that both disciplines.
is necessary in some cases to identify and Pederson (1995) and Kretzschmar (2006)
interpret observed spatial patterns. This can both addressed the distinction between de-
at least lead to development of new hypoth- ductive and inductive approaches in geolin-
eses. guistics. Pederson (1995) described deductive
GIS offers geolinguistics a range of possibili- research as well-established within linguistics
ties for visualization of geographic relationships, since Gilliéron’s work, and his conceptual
allowing creation and comparison of multiple models depicted the process of deductive
alternative maps with ease once the data are col- language structure investigation as begin-
lected and organized. Map overlay and tools to ning with known classes, then examining
examine spatial relationships among variables components of these classes whereas induc-
are easily accessible in most current GIS soft- tive processes imply the inverse (Pederson
ware. For geolinguistics, map overlay is likely 1995). Kretzschmar (2006) also narrated
to be concerned with the spatial coincidence the entrenchment of deductive approaches
among language and other variables; maps within American dialectology and linguis-
of these variables can be quickly created and tic thought more generally. Given that, as
compared to base language maps. Buffering is a Kretzschmar explained, both deductive and
common type of overlay technique that could be inductive approaches have long-standing
useful in examining the occurrence of language roles in dialectology, how might the option
within a specified distance around particular affect GIS applications and outcomes?
cultural, political, or physical features. With Kretzschmar (2006) also posed a related
GIS one can quickly and easily create multiple and more poignant question for geolinguis-
buffer maps for numerous variables to explore tics researchers using GIS in whether future
potential relationships. Other common overlay work will focus more intently on the “science”
operations include containment, proximity, ad- of problems and hypotheses systematically
jacency, and Boolean AND/OR and TRUE/ approached through technology rather than
FALSE operations. All of these visual analysis the “art” inherent in the experimentation and
alternatives are available with little cost and ef- development of computational methods thus
fort once data entry is complete, and provide a far. Over a decade ago, geographers posed
means for efficient quantitative summaries of similar questions regarding the science of
spatial characteristics. the use of GISystems in general. Wright,

33
Shawn Hoch and James J. Hayes

Goodchild, and Proctor (1997) suggested “If Acknowledgements


[GIS is a tool]…significance derives strictly
from the progresses made on the substan- Thanks to Barry Kronenfeld for the use
tive research problem.” They then offered of the diagrams in Figure 1. We also thank
that GIS as a science “is concerned with the Steven Schnell and two anonymous reviewers
analysis of the fundamental issues raised by for comments and suggestions that greatly
the use of GIS in geography or in other dis- improved the manuscript.
ciplines” (Wright, Goodchild, and Proctor
1997). Moving forward, GIS can become an References
integral part of the science of geolinguistics
while also answering Lee and Kretzschmar’s Alvar, M. 1991. Estudios de Geografía Lingüís-
(1993) call for interdisciplinary collaboration tica (Studies of Linguistic Geography). Ma-
and advanced techniques. drid: Colección Filologica Paraninfo.
Possibly the most salient issue for future Anselin, L. 1995. Local Indicators of Spatial
consideration is how to facilitate the overall Association – LISA. Geographical Analysis
progress of GIS in geolinguistics. Literature 27: 93-115.
on recent projects often includes comments Breton, R. J.-L. 1991. Geolinguistics: Lan-
indicating a lack of awareness of previous guage Dynamics and Ethnolinguistic Geogra-
GIS applications in the work of other active phy. Translated by H.F. Schiffman. Ottawa:
geolinguists (see Rivero, Llull, and Merlo University of Ottawa Press.
2002). This indicates not only a need for Burrough, P.A. and R.A. McDonnell. 1998.
reviews of the literature such as included Principles of Geographic Information Sys-
here, but also for a vigorous discussion of tems. New York: Oxford University Press.
methodology that seems to be lacking. This is Dahl, Ö. and L. Veselinova. 2005. Language
critical in order for ongoing and future proj- Map Server. ArcUser [http://proceedings.
ects to benefit from and build on earlier work esri.com/library/userconf/proc05/papers/
in both geography and linguistics. As GIS pap2425.pdf ].
and geolinguistics become more conversant, Dale, M.R.T. 1999. Spatial Pattern Analysis
the overall role of GIS in major publications in Plant Ecology. Cambridge: Cambridge
and products of the field might become more University Press.
tangible, allowing for further exploration, Davis, L.M. 2000. The Reliability of Dia-
criticism, and progress. lect Boundaries. American Speech, 75(3):
257-9.
Notes Gilliéron, D. and E. Edmont. 1902-10. Atlas
Linguistique de la France. Paris: Cham-
1. Though not the focus in this paper, we pion.
note that GIScience has begun discus- Girard, D. and D. Larmouth. 1993. Some
sion of incorporating spatial information Applications of Mathematical and Sta-
ontologies, informed by sociolinguistics, tistical Models in Dialect Geography. In
into GIS interfaces to account for differ- American Dialect Research, edited by D.
ing conceptions of geographic concepts Preston, Amsterdam: John Benjamins, pp.
across languages (Mori 2002). We focus 107-132.
here on the placement of GIS in geolin- Goebl, H. 2006. Recent Advances in Salz-
guistics, not the reverse; although the burg Dialectometry. Literary and Linguistic
two are interrelated, the direction of the Computing, 21(4): 411-435.
relationship remains important. Gordon, R.G. 2005. Ethnologue: Languages of
the World. Dallas: SIL International.
Gotway, C. and L. Young. 2002. Combining
Incompatible Spatial Data. Journal of the

34
Geolinguistics: The Incorporation of Geographic Information Systems and Science

American Statistical Association, 97: 632- Atlas of North American English. Berlin:
648. Mouton de Gruyter.
Guy, G. 1993. The Quantitative Analysis of Lee, J. and W.A. Kretzschmar. 1993. Spatial
Linguistic Data. In American Dialect Re- Analysis of Linguistic Data with GIS Func-
search, edited by D. Preston. Amsterdam: tions. International Journal of Geographical
John Benjamins, pp. 223-250. Information Science, 7(6): 541-560.
Haase, P. 1995. Spatial Pattern Analysis in Macauley, R.K.S. 1985. Linguistic Maps:
Ecology Based on Ripley’s K-function: Visual Aid or Abstract Art? In Studies in
Introduction and Methods of Edge Cor- Linguistic Geography, edited by J.M. Kirk,
rection. Journal of Vegetation Science 6(4): S. Sanderson, and J.D.A. Widdowson.
575-582. London: Croom Helm, pp. 172-86.
Kirk, J.M. and W.A. Kretzschmar. 1992. Mackey, W.F. 1988. Geolinguistics: Its Scope
Interactive Linguistic Mapping of Dialect and Principles. In Language in Geographic
Features. Literary and Linguistic Comput- Context, edited by C.H. Williams. Clev-
ing, 7(3): 168-75. edon: Multilingual Matters Ltd., pp. 20-
Kretzschmar, W.A. 1992. Isoglosses and Pre- 46.
dictive Modeling. American Speech, 67(3): Mark, D.M. and F. Csillag. 1989. The Na-
227-249. ture of Boundaries on ‘Area-Class’ Maps.
_____. 1996. Quantitative Areal Analysis of Cartographica, 26: 65-78.
Dialect Features. Language Variation and McDavid, R. and R. O’Cain. 1980. Linguis-
Change, 8: 13-39. tic Atlas of the Middle and South Atlantic
_____. 2003. Mapping Southern English. States. Chicago: University of Chicago
American Speech, 78(2): 130-149. Press.
_____. 2006. Art and Science in Computa- McGuirk, D.G. 2004. An Ethnolinguistic
tional Dialectology. Literary and Linguistic Analysis of Hispanics in Miami-Dade
Computing, 21: 399-410. County. Ph.D. diss., Florida International
Kretzschmar, W.A. and E. Schneider. 1996. University.
Introduction to Quantitative Analysis of Modern Language Association. 2009. The
Linguistic Survey Data. Thousand Oaks: Modern Language Association Language
SAGE Publications. Map: A Map of Languages in the United
Kronenfeld, B. 2005. Gradation as a Com- States. [http://www.mla.org/map_main].
munication Device in Area-Class Maps. Mori, M. 2002. Semantic Analysis of Spatial
Cartography and Geographic Information Expressions in Japanese. Ph.D. diss., State
Science, 32: 231-241. University of New York at Buffalo.
Kronenfeld, B. 2007. Triangulation of Gradi- Nerbonne, J. and W. Heeringa. 2007. Geo-
ent Polygons: A Spatial Data Model for graphic Distributions of Linguistic Varia-
Categorical Fields. In Spatial Information tion Reflect Dynamics of Differentiation.
Theory, edited by S. Winter, M. Duck- In Roots: Linguistics in Search of its Eviden-
ham, L. Kulik, and B. Kuipers. New York: tial Base, edited by S. Featherston and W.
Springer, pp. 421-37. Sternefeld. New York: Walter de Gruyter,
Kurath, H. 1949. A Word Geography of the pp. 267-318.
Eastern United States. Ann Arbor: Univer- Nerbonne, J. and W.A. Kretzschmar. 2003.
sity of Michigan Press. Introducing Computational Techniques
Kurath, H., M. Hansen, B. Bloch, and J. in Dialectometry. Computers and the Hu-
Bloch. 1939-43. Linguistic Atlas of New manities, 37: 245-255.
England. 3 vols. Providence: Brown Uni- _____. 2006. Progress in Dialectometry: To-
versity Press for American Council of ward Explanation. Literary and Linguistic
Learned Societies. Computing, 21(4): 387-397.
Labov, W., S. Ash, and C. Boberg. 2006. The O’Sullivan, D. and D.J. Unwin. 2002. Geo-

35
Shawn Hoch and James J. Hayes

graphic Information Analysis. Hoboken: Data by Computer: A Welsh Example. Car-


Wiley. diff: University of Wales Press.
Ormeling, F. 1992. Methods and Possibilities Van der Merwe, I.J. 1992. A Conceptual
for Mapping by Onomasticians. Discussion Home for Geolinguistics: Implications
Papers in Geolinguistics, 19-21: 50-67. for Language Mapping in South Africa.
Pederson, L. 1986. A Graphic Plotter Grid. Discussion Papers in Geolinguistics 19-21:
Journal of English Linguistics, 19: 25-41. 33-49.
_____. 1988. Electronic Matrix Maps. Jour- _____. 1993. The Urban Geolinguistics of
nal of English Linguistics, 21: 149-174. Cape Town. GeoJournal, 31: 409-417.
_____. 1993. An Approach to Linguistic Williams, C.H. 1988. An Introduction to
Geography. In American Dialect Research, Geolinguistics. In Language in Geographic
edited by D. Preston. Amsterdam: John Context, edited by C.H. Williams. Clev-
Benjamins, pp. 31-92. edon: Multilingual Matters Ltd., pp.
_____. 1995. Elements of Word Geography. 1-19.
Journal of English Linguistics, 25(1): 33- _____. 1996. Geography and Contact Linguis-
46. tics. In Contact Linguistics: An International
Pederson, L., S. McDaniel, G. Bailey, and Handbook of Contemporary Research, edited
M. Basset. 1986. Linguistic Atlas of the by H. Goebl, P.H. Nelde, Z. Stary, and W.
Gulf States. Athens: University of Georgia Wolck. New York: Walter de Gruyter, pp.
Press. 63-75.
Ripley, B.D. 1976. The Second Order Analy- Williams, C.H. and Van der Merwe, I.J.
sis of Stationary Point Processes. Journal of 1996. Mapping the Multilingual City: A
Applied Probability 13: 255-266. Research Agenda for Urban Geolinguistics.
_____. 1988. Statistical Inference for Spatial Journal of Multilingual and Multicultural
Processes. Cambridge: Cambridge Univer- Development, 17: 49-66.
sity Press. Wright, D.J., M.F. Goodchild, and J.D.
Rivero, A., G. Llull, and G.D. Merlo. 2002. Proctor. 1997. GIS: Tool or Science? De-
Mapping the Spatial Distribution of mystifying the Persistent Ambiguity of GIS
Language. ArcUser [http://www.esri.com/ as “Tool” versus “Science”. Annals of the
news/arcuser/1002/linguistics.html]. Association of American Geographers, 87(2):
Rossi, R.E., D.J. Mulla, A.G. Journel, and 346-362.
E.H. Franz. 1992. Geostatistical Tools for
Modeling and Interpreting Ecological Spa-
tial Dependence. Ecological Monographs,
62: 277-314.
Schneider, E. and W.A. Kretzschmar. 1989.
LAMSAS Goes SASsy: Statistical Methods
and Linguistic Atlas Data. Journal of Eng-
lish Linguistics 22: 129-136.
Thill, J.-C., W.A. Kretzschmar, I. Casas,
and X. Yao. 2008. Detecting Geographic
Associations in English Dialect Features
in North America within a Visual Data
Mining Environment Integrating Self-
Organizing Maps. In Self-Organising Maps:
Applications in Geographic Information Sci-
ence, edited by P. Agarwal and A. Skupin.
London: Wiley, pp. 87-105.
Thomas, A.R. 1980. Areal Analysis of Dialect

36

View publication stats

You might also like