Masters Thesis Peer Effects and Politics

Master Project
Peer Effects and Politics
Sarat Chandra Akella

Manuel Yañez Dominguez
Puskal Pal
Eric William Ragan
Lukas Schaefer
June 8, 2019
Barcelona Graduate School of Economics

Master’s Degree in International Trade, Finance, and Development,
2018/2019
Abstract
We use data on US Facebook friendship networks to study changes in political

opinion during the two most recent presidential elections at the county level.
Our results confirm a strong correlation between social connectivity and political
outcomes. We propose a causal identification strategy to identify the influence
of peer effects on political orientation and obtain initial results that indicate
this possibility. We then explore how distance between counties affects the
magnitude of these peer effects. Our results show that while distance is an
important determinant of political influence, it leaves a lot unexplained. Finally,
we explore how international Facebook connectivity with selected countries and
regions is linked to domestic political outcomes. We find significant correlations
in case of Mexico, Europe, South America, Africa and Asia.
Contents
1. Introduction 1
2. Literature Review 2
3. Assumptions 4
4. Data 6
5. Indices 7
6. Identification Strategy and Results 8

6.1. Ordinary Least Squares (OLS) Specifications . . . . . . . . . . . . . . . . . 9
6.2. Two Stage Least Squares (2SLS) Specifications . . . . . . . . . . . . . . . . 10
7. Distance Effects 14
8. International Connectivity and Voting Patterns 16
9. Conclusions and Future Work 19
A. Appendices 21
1. Introduction
Social networks are inextricably linked to political opinion and outcomes. They also
play an important role in educational achievement, welfare usage, housing decisions and
many other important social outcomes (Bailey et al., 2016; Bank et al., 1990; Bertrand
et al., 2000). But the relationship between politics and social networks is particularly
perplexing, and any systematic study of it must contend with a number of issues, in
particular of econometric nature. However, the introduction of a new data set by Bailey
et al. (2018) encapsulating the entire Facebook network of the United States facilitates a
more systematic study of this relationship.
Anecdotal evidence that social networks are important is mounting, in large part thanks
to the rise of social media platforms like Facebook and Twitter. One-in-five U.S. adults
regularly gets news from social media, and 39% say that they have engaged in some type of
civic or political activity using social media (Shearer, 2018). Statistics like these suggest
that social media is important, but what do they say about the significance of social
networks beyond the internet?
Facebook friendships are a good representation of actual social networks; individuals pri-
marily form Facebook links with people they know (Bailey et al., 2018). Thus, the 39% of
American adults who have engaged in political activity on social media have largely done so
with people they personally know or have met. This raises the question: do social networks
influence political views? Estimating this influence using Facebook friendship networks is
particularly meaningful, in light of the growing importance of social media.
Social media is not a one-to-one substitute for conventional news. Unlike traditional
media where information emanates from a small number of outlets, every individual on
social media can produce and consume information. Rather than the Wall Street Journal
dictating the information the country reads, individuals on social media choose the news
they are exposed to. Furthermore, the speed that information travels over social media is
also determined by how quickly individuals choose to share it.
For the blue-collar worker assembling Jeeps in the Midwestern United States who consumes
her news primarily through Facebook, the people she includes in her network will determine
the slant, quantity, quality, and availability of the information she uses to update her
political preferences. Since her network primarily consists of people in her offline (actual)
network, the individuals around her play a vital role in political preference formation. Pew
Research says that this woman is increasingly representative of the American citizen. For
a researcher interested in peer effects and politics, this not only makes social networks
more important and more relevant, but it suggests an excellent source of data to draw
upon: social media data.
We use Facebook data containing the universe of U.S. friendship links to study the change
in political orientation of 3,000+ counties in the most recent (2016) presidential election.
1
We ask three primary questions. (1) Do socially connected regions display co-movement in
political orientation? (2) Does social connectivity influence change in political orientation?
(3) Which network connections influence politics?
2. Literature Review
While not a complete taxonomy, research investigating politics and social networks can
broadly be divided into two groups. The first group focuses on how networks form and
how information moves through the network. The second group studies the real effects of
these networks. This group can be further classified into studies using directly observable
networks (incoming college freshmen, for example) and, more recently, those that rely on
large data sets generated on social media platforms like Facebook and Twitter.
Network formation is commonly modeled by considering individuals who make connections

in two primary ways; either by randomly encountering another individual, or through a
search for connection, often facilitated by those to whom the individual is already con-
nected to (meeting friends of friends) as in Jackson and Rogers (2007). Bias in individuals’
location within the network and in their search for new connections can be introduced to
provide a robust platform from which to study homophily, the tendency of individuals to
associate with others similar to themselves (Bramoullé et al., 2012).
Individuals with common characteristics are more likely to associate with one another; this
holds along the lines of racial identity, gender, age, religion, and education (McPherson
et al., 2001). However, as the internet has enhanced the degree of homophily possible
in networks, concerns have arisen around the growing polarization of politics and the
creation of so-called echo chambers, groups that share similar views and experiences which
continually reinforce a group perspective (Sunstein, 2001). In particular, the discussion
has focused around the quality of information now used by voters to form preferences and
make decisions.
The importance of providing voters with high quality information to select and monitor
political candidates has long been established (Becker, 1958). But the effect of the internet,
especially social media, on the information used to make political decisions is unclear. On
one hand, the quantity of information available and the ease with which it may be accessed
has been dramatically expanded. On the other hand, individuals may be self-selecting
into groups that isolate them from important information that conflicts with their current
views.
Halberstam and Knight (2016) present a model with homophily in social networks in
which individuals tend to produce like-minded information. They predict that members
of larger groups will have more connections and be exposed to more information than
smaller groups. Furthermore, the information to which the two groups are exposed will
2
be disproportionately like-minded. They confirm the predictions of the model using a
sample of some 2.2 million Twitter users from the 2012 presidential election. They suggest
that Twitter networks form in a homophilic manner and that this impacts the availability
and slant of information transmitted over these networks, but this begs the important
question: does this affect real political outcomes?
Any study attempting to make inferential statements about social networks and politics
must confront a fundamental problem; Manski (1993) calls it the reflection problem. Con-
sider the following example to conceptualize it. Lucas county in Ohio is strongly linked
to ten other counties. In the 2016 election, all eleven counties (including Lucas county)
shifted significantly toward the Republican party. It is tempting to naively conclude that
Lucas county must have changed because the other counties have changed.
But now consider an alternative explanation. Suppose all eleven counties are in “Rust Belt”
and have high shares of their populations employed in manufacturing. Also suppose that
they formed strong networks as a result of their shared experience in the manufacturing
industry. A protectionist Republican platform appealed to these voters, and rather than
voting Republican because their peers did, they did so independently because they hoped
to benefit from protectionism. Both the strong social networks and the shift in political
preferences are explained by the same variable, and social networks themselves had no
effect on voting.
In actuality, these two different scenarios cannot be distinguished from each other (Angrist,
2014). This is an example of the reflection problem.
When a researcher observing the distribution of behavior in a population

tries to infer whether the average behavior in some group influences the behav-
ior of the individuals that comprise the group. It is found that inference is not
possible unless the researcher has prior information specifying the composition
of reference groups. . . Inference is difficult to impossible if these variables are
functionally dependent or are statistically independent (Manski, 1993, p. 531).
Some researchers like Lazer et al. (2009) have developed strategies to circumvent the
reflection problem. They argue that the networks formed by college freshmen across
Midwestern universities in their first months of college are sufficiently random so as to
constitute a natural experiment (sufficiently random in so far as they show no signs of
political homophily). Using initial survey responses to gauge the political orientation of
the students, they then track changes in the students’ politics over the first half of 2008.
They find that the politics of the students’ friends significantly explain changes in the
student’s own politics. The argument hinges on the random formation of the networks
with respect to politics. If this is the case, Manski’s concerns may be obviated.
Other studies confronting the reflection problem similarly require unique and special cir-
cumstances to make inferential statements (Bertrand et al., 2000; Sacerdote, 2001). They
3
find settings in which they can argue the random formation of networks or the presence
of an instrument that can be exploited to overcome the problem. However, this relegates
these studies to research with relatively small networks, often requires them to have indi-
vidualized data, and limits the scope of inferential investigation to those topics possessing
the unique and compelling set of circumstances necessary.
The final class of research considered, which deals with the real effects of social networks
using social media data sets, struggles to implement the same identification strategies
of previous studies. With the emergence of large social media data sets, the reflection
problem (and others) must now be confronted on a massive scale. While the number of
ways in which factors that drive network formation may also drive political preferences are
innumerable, the project is nonetheless worth undertaking. The new nature of this data
makes our research among the first to investigate aggregate political outcomes with an
accurate and comprehensive measure of the social network structure in the United States.
We attempt to overcome the reflection problem using an instrumental variable approach.
We do not make a causal claim about the effects of social networks on political outcomes,
but rather, we progressively eliminate econometric problems to make the most compelling
causal claim possible given the data set we use.
3. Assumptions
To analyze peer effects and political outcomes at the county level, certain assumptions
were necessary. Here, we list all theoretical assumptions that are implicit in our analysis,
along with their justifications.
1. RD Universe: The universe our study considers is binary; one where all political
preferences, opinions and influences are polarized into Republican (R) or Democrat
(D). Further, we only consider votes in favour of Republican (R) or Democrat (D)
candidates. This is reasonable considering that in the 2016 US presidential elections,
an average of 95% of votes polled in a county were in favour of either R or D. The
fraction was even larger for 2012.
2. Representative Data: County-level data on number of Facebook users and friendship

links is confidential. Hence, we assume that Facebook user count and networks are
a sufficiently accurate representation of actual population and friendship networks.
We were informed by Facebook that actual usage rates at US county level vary with
a mean and SD of (80±5)%. Moreover, Bailey et al. (2018) found no substantial
variation in their results by using county population instead of Facebook user count,
making our assumption reasonable.
3. Counties as agents: The availability of social connectivity data from Facebook at
4
county level limits the granularity of our analysis. In the construction of indices
(section 5), we treat counties as if they are "agents with homogeneous political ori-
entation subject to influence of other agents (counties)." This political orientation is
modelled as a continuous quantity given by ∆(R − D)% of the agent (i.e. county) at
any point of time. A positive (negative) value indicates inclination towards Republi-
can (Democrat). We do not analyze heterogeneity inside a county and the influence
of a county’s population on itself. This assumption is simply a consequence of Face-
book data limitations.
4. Relative stability of social networks: The availability of social connectivity data from
Facebook at a particular point in time (i.e. cross-sectional network data) also limits
our analysis. In the construction of indices (section 5), we implicitly assume that
social network structures remain stable relative to political orientation. This is not
ideal because network data corresponding to April 2016 is used to explain changes
in political orientation from 2012 to 2016. This assumption is another consequence
of limitations imposed by Facebook data and is discussed further in section 9.
5. Focus on change in political orientation: The regressions of section 6 are based on

the effect of changes in political orientation from 2012 to 2016. In other words, sig-
nals that reflect changes in orientation (type 1) are assumed to be infinitely stronger
to signals that represent pre-existing orientation (type 2) or signals that represent
ambiguity (type 3). Signals of type 3 are unlikely to influence political orientation.
However, signals of type 2 cannot be disregarded because members of social networks
continue to receive and process signals from other members of their network despite
the latter’s political orientation remaining unchanged.
At an aggregate level, the signal we considered (∆(R − D)%, which represents the
change in political orientation from 2012 to 2016) has a Pearson correlation of nearly
zero (0.05) with the pre-existing signal that we ignored ((R − D)%, which captures
the level of political orientation in 2012). Thus, signals of types 1 and 2 don’t seem
to interfere with each other. This allows us to address the question: “do socially
connected regions display co-movement in political orientation?” (the first primary
research question stated in section 1), rather than “do social connected regions have
the same political orientation?” Thus, this assumption is consistent with our research
question (and vice versa). Moreover, as discussed in section 6, the focus on change
removes time-invariant fixed effects, alleviating a few econometric concerns imposed
by data limitations.
5
4. Data
The most crucial source of data we exploit in our analysis is the Social Connectedness
Index (SCI) data set created by Bailey et al. (2018) for their recently published paper in
the Journal of Economic Perspectives. Their cross-sectional data set provides information
about the relative number of Facebook connections between U.S. county pairs as of April
2016. As stated in assumption 2, the actual number of friendship links is confidential.
To facilitate an interpretation with social connectedness, they define the following relative
probability of friendship (RFij ) as:
SCIij
RFij = 1012 ∗
F B_usersi ∗ F B_usersj
Due to normalization and scaling (appendix A.1), RFij has a straightforward interpreta-
tion: say RFAB is twice as large as RFCD , individuals between counties A and B are twice
as likely to be friends as individuals between counties C and D.
Initially, there was a total of 9,834,496 (3,1362 ) observations. Throughout our analysis,
we consider only 9,084,196 (3,0142 ) observations as we eliminated 122 counties out of
3,136 (justification in appendix A.1). To illustrate the relationship between distance and
connectivity, we supplement the SCI data set with 2010 county distance data from the
NBER (appendix A.1).
The SCI data set also includes county – country data from April 2016 for 157 countries.
Along the lines of county - county data, county - country SCI data represents a relative
measure of the total number of Facebook friendship links between individuals of a county
in the US with individuals of a given country. However, this data is normalized and scaled
differently, rendering comparison across countries impossible (appendix A.1 and appendix
A.1).
Our second major data set sources 2012 and 2016 presidential election data at county level
from the MIT Election Data and Science Lab. To carry subsequent analysis, it was critical
to make our two major data sets dimensionally compatible (appendix A.1). Thus, our final
election data set contains only 3014 observations, corresponding to the 3014 counties that
we consider.
To control for other factors that could tilt a county’s political orientation, we make use
of several control variables. Most of the controls added are significant in explaining the
county level political outcomes. By including a rich set of controls, we are able to account
for significant variations, reducing the unexplained component in the dependent variable
of interest,∆(R − D)% (change in (R-D)% from 2012 to 2016). Data pertaining to most
of the controls are sourced from the 2013 American Community Survey 5-year estimates,
provided by the Census Bureau. A few other controls (such as unemployment, median
6
household income) contain data released by Bureau of Labor Statistics (BLS, February
2016). For a more detailed explanation of data cleaning and control variables chosen, refer
to appendix A.1.
5. Indices
Before introducing and explaining the core indices of our paper, consider the setting of
an ideal experiment: a data set containing every connection in the social network along
with the unique personal characteristics for each individual, including their initial political
preferences. Next, a researcher would introduce an exogenous shock in political opinion
to a subset of the network. It would then be straightforward to study the precise trans-
mission of this shock through the rest of the network by measuring the change in political
preferences of every individual over time.
However, several factors limit our ability to implement the ideal experiment just described:
(1) Our data is aggregated at the county level and precludes any experimental design
relying on individual characteristics and variation: results must be studied on the county
level. (2) Our network data is a snapshot in time. (3) Our measure of political preference
is aggregated on the county level and hence is imperfect. Assumptions 3 and 4 address
these issues in the best possible manner given the constraints imposed by Facebook data
limitations.
Our primary index for county i is a weighted average of the change in political opinion
of every other county between 2012 and 2016. For the two periods, we measure the
political preferences of a county by the difference in the percentage shares of Republican
and Democratic votes. The change in political preferences over the period is obtained by
taking the difference of political preferences in 2016 and 2012. Each county j is weighted
by the relative probability of friendship between the two counties and the total number
voters in county j. The influence of county j on county i is the product of the weight
(relative probability of friendship between i and j × the number of voters in county j)
and the change in political preferences. Summing across all these weighted effects for all
counties (excluding county i) and dividing by the sum of the weights yields our primary
index, indexsc,i , a measure of political influence exerted on county i by its entire network;
the division preserves the % units, making the index (as well as its β coefficient in the
regressions of section 6) easier to interpret.
Pn
j=1,j6=i RFij · [∆ (R − D) %]j · vj
indexsc,i = Pn (5.1)
j=1,i6=i RFij · vj
This index combines data from both the major data sets (social connectivity data from
Facebook and county-level election data). How important is social connectivity data in
7
explaining political preferences? The significance of our research hinges on this question.
One way to check is to compare the performance of the above index against an index
that does not use social connectivity data. We now construct an index, indexdistance , that
weights the influence of counties based on distance, a natural driver of social connectivity.
In particular, we use the inverse of the distance instead of relative probability of friendship
to construct this alternative index. We use a simple inverse because it closely matches the
elasticity of friendship probability with respect to distance of -1.07 calculated in Bailey
et al. (2018). This is the simplest index we construct, and it establishes a baseline against
which the usefulness of indexsc and social connectivity data can be assessed.
Pn −1
j=1,j6=i (Distanceij ) · [∆ (R − D) %]j · vj
indexdistance,i = Pn −1 (5.2)
j=1,,j6=i (Distanceij ) · vj
There are a number of logical variations of indexdistance , which we also explore. They
are constructed to test the effects of distance (section 7). In particular, we introduce
a distance threshold that excludes all counties within a specified distance; we consider
thresholds of 100, 300, 600, 1,250, and 2,000 miles. Further, we construct an index that
excludes all counties in the same state. The exact specifications of these indices can be
found in appendix A.2.
We also consider the effects of international networks that can be used to study the
relationship between foreign countries and regions and the change in political preferences
of U.S. counties (section 8. See appendix A.2 for the exact specification and discussion of
these indices.
Finally, we construct an index, indexmale , that substitutes the percentage share of the
county population made up by males for the change in political preferences between 2012
and 2016. indexsc,i represents the average male percentage share of the connections of
county i. It will later be used as an instrument for indexsc in section 6.
Pn
j=1,,j6=i RFij · [Pop_Male%]j · vj
indexmale,i = Pn (5.3)
j=1,,j6=i RFij · vj
6. Identification Strategy and Results
Having discussed the data and primary indices, we now try to establish co-movement in
political orientation among connected regions and propose a strategy to identify a causal
relationship between social connectivity and changes in political orientation. The vari-
able of interest is our constructed index, indexsc,i , a measure of the aggregate influence
exerted on county i. The dependent variable is ∆(R − D)%i , county i’s change in political
preferences between 2012 and 2016. Using the change in political outcome between 2012
and 2016 rather than the outcome of a single election removes time-invariant county fixed
8
effects and is also consistent with assumption 5 in section 3. β represents the average
change in voting behaviour between 2012 and 2016 that results from a 1 percent change in
socio-political influence, as measured by indexsc . Throughout this section, the emphasis
is on the significance of β. Refer to appendices to understand the choice of control vari-
ables (appendix A.1) or to find complete regression results that show the direction and
significance of controls and state dummies (appendix A.3). All results were obtained on
STATA 15.1 using ‘regress’ and ‘ivregress’ commands.
6.1. Ordinary Least Squares (OLS) Specifications
We begin with an OLS regression of the dependent variable on our variable of inter-
est:
∆(R − D)%i = β ∗ indexsc,i + i (6.1.1)
This regression yields a highly significant β of 1.737 (table 1) which points to a possibly
strong correlation. We add relevant controls to arrive at:
∆(R − D)%i = β ∗ indexsc,i + Controlsi + i (6.1.2)
There are likely state specific factors which may confound the relationship of interest. In
the United States, individual states differ in history, legal system, geography, and many
other factors. Bailey et al. (2018) present strong evidence that networks form along state
lines, and it seems likely that political preferences are also partly determined by factors
unique to individual states. To control for state fixed effects, we include dummy variables
for each state in the next regression
∆(R − D)%i = β ∗ indexsc,i + StateDummies + i (6.1.3)
If social networks are correlated with political preferences, the most stringent test for β
is to remain positive and significant even after the inclusion of both controls and state
dummies:
∆(R − D)%i = β ∗ indexsc,i + Controlsi + StateDummies + i (6.1.4)
Table 1 summarizes the results of all four OLS specifications above. We note that including
indexsc clearly improves the explanatory power. While the inclusion of control variables
and state dummies reduces the significance of β, it still remains statistically significant.
The last result, in particular, provides strong evidence that socially connected counties
display co-movement, answering our first primary question stated in the introduction
(section 1).
9
Table 1: OLS Regression
Specification Description β (SE) of indexsc t-statistic p-value Adj R2 (%)
6.1.1 No Controls, No SFE 1.737 (0.021) 83.51 <0.001 69.83
6.1.2 Controls, No SFE 1.381 (0.020) 69.01 <0.001 81.87
6.1.3 No Controls, SFE 2.040 (0.035) 58.40 <0.001 72.50
6.1.4 Controls, SFE 1.293 (0.038) 34.26 <0.001 83.61

All coefficients show the change in the difference of Republican and Democratic vote shares between 2012
and 2016 associated with a one percent increase in indexsc . A more complete presentation of the
regression tables is included in appendix A.3
We now turn towards the second primary question: does social connectivity influence
change in political orientation? While the last result above demonstrates co-movement
among counties, it is not sufficient to conclude that political change was driven by the net-
work itself. Any causal statement about the role social networks play in this co-movement
is impeded by endogeneity problems, as discussed in 2. In order to strengthen the iden-
tification approach presented so far, we now adopt an instrumental variable strategy to
improve the consistency of our estimator and attempt to move towards a causal statement
about the influence of social networks on political orientation.
6.2. Two Stage Least Squares (2SLS) Specifications
If there are endogeneity problems with indexsc , an instrumental variable can help address
underlying causes of inconsistency in estimation. In this section, we instrument indexsc
with indexmale , constructed as described in equation 5.3. Before using the instruments,
we consider two issues: (1) Relevance (2) Exogeneity
The instrument derives its relevance from the tendency of counties with a higher share
of men to become more Republican from 2012 and 2016. Specification 6.1.2 indicates
this through the significance of male_population_pct, which can be be seen in figure 5.
The variable represents the percentage male population share as described in A.1). By
substituting male population share for the change in political preferences we produce an
instrument correlated with indexsc . As the results below indicate, the instrument passes
relevance tests.
Exogeneity requirements of the instrument dictate that it not be directly associated with
the dependent variable. It should only affect the latter through the regressor it instru-
ments for (Kolesár et al., 2017). This indirect effect follows from male bias to change
toward Republican on the political spectrum. It seems implausible that the population
percentage of men in county A could affect political preferences in another county B in
any way other than through a social network. One could contest that adjacent counties
present potential issues. Workers from a highly male populated county C that work in
10
county D could directly influence opinions in county D through personal relationships.
However, even if this were true, these networks are likely to be reflected in the Facebook
network. Furthermore, the probability of finding adjacent counties with a significantly
large variation in male share is minimal. The 2SLS regressions follow the same sequence
of regressions as in the OLS case, in terms of inclusion/ exclusion of controls and state
dummies. The specifications are listed below for the convenience of the reader:
∆(R − D)%i = β ∗ indexsc,i +i (6.2.1)

| {z }
indexmale as IV
∆(R − D)%i = β ∗ indexsc,i +Controlsi + i (6.2.2)

| {z }
indexmale as IV
∆(R − D)%i = β ∗ indexsc,i +StateDummies + i (6.2.3)

| {z }
indexmale as IV
∆(R − D)%i = β ∗ indexsc,i +Controlsi + StateDummies + i (6.2.4)

| {z }
indexmale as IV
Table 2: 2SLS Regression

Specification β (SE) of indexsc t-statistic p-value Adj R2 (%) Durbin Score p-value
6.2.1 2.628 (0.160) 16.46 <0.001 51.49 51.46 <0.001
6.2.2 2.305 (0.175) 13.17 <0.001 69.09 48.76 <0.001
6.2.3 2.158 (0.094) 22.96 <0.001 72.86 01.84 0.1747
6.2.4 1.425 (0.224) 06.37 <0.001 83.86 00.36 0.5472

The results of the 2SLS regressions are summarized in table 2. For brevity, we report
only Durbin score and its significance (Wu-Hausman statistic yielded similar results in all
cases). We drop the “Description” column as it remains unchanged from table 1. The
F-test statistic for weak instrument assessment was 85.21 (for specification 6.2.4), well
above the Stock-Yogo requirements for relevance, which is 16.38. This confirms relevance
and means that our instrument is valid as long as exogeneity holds. If exogeneity holds,
the most conservative estimate of the causal impact of peer effects on political outcomes
is 1.425%, obtained in the last specification above.
We also note (based on Durbin scores, their corresponding p-values and assuming the
instrument’s exogeneity which is necessary to carry Durbin-Wu-Hausman test) that indexsc
11
itself satisfies exogeneity once the state-level dummies are included. This is not the case
when only controls are included. Thus, the unobserved error terms seem to contain a lot
of state-level variation, possibly due to federal structure of the U.S. and the consequent
variation in crucial policies/ laws at the state level. Thus, one likely source of endogeneity
in the error term are state-specific effects.The significant improvement in explanatory
power by inclusion of state dummies is cause to suspect this.
In order to control for this potential issue, we modify our instrument to exclude count-
ing male share for those counties j that belong to the same state as county i, for which
the index indexsc is constructed. Very similar to indexmale (equation 5.3), it is named
indexmale−nostate to emphasize that it excludes counties from within a state in its compu-
tation.
Pn
j=1,j6∈state(countyi ) RFij · [Pop_Male%]j · vj
indexmale−nostate,i = Pn (5.3)
j=1,j=1,j6∈state(countyi ) RFij · vj
This exclusion restriction improves the validity of the instrument, in terms of its likely
exogeneity, as compared to indexmale . However, whether it still maintains relevant needs
to be tested. Again, all the specifications are listed below for the convenience of the
reader:
∆(R − D)%i = β ∗ indexsc,i +i (6.2.5)

| {z }
indexmale−nostate as IV
∆(R − D)%i = β ∗ indexsc,i +Controlsi + i (6.2.6)

| {z }
∆(R − D)%i = β ∗ indexsc,i +StateDummies + i (6.2.7)

| {z }
∆(R − D)%i = β ∗ indexsc,i +Controlsi + StateDummies + i (6.2.8)

| {z }
The results of the 2SLS regressions are summarized in table 3. The F-test statistic for
weak instrument assessment is 39.81 (for specification 6.2.8), still above the Stock-Yogo
requirements for relevance, which is 16.38. There are clear effects of using the weaker
instrument indexmalenostate : Durbin scores now indicate indexsc ’s exogeneity only after
including both controls and state-level dummies (whereas with the earlier instrument,
indexmale , state-level dummies were sufficient). The F-test statistic also falls from 85.21 to
39.81 but still meets the required thresholds for being a relevant instrument. If exogeneity
holds, the most conservative estimate of the causal impact of peer effects on political
12
Table 3: 2SLS Regression: Modified Instrument
Specification β (SE) of indexsc t-statistic p-value Adj R2 (%) Durbin Score p-value
6.2.5 2.512 (0.129) 19.45 <0.001 55.95 54.60 <0.001
6.2.6 2.376 (0.213) 11.14 <0.001 67.03 40.36 <0.001
6.2.7 2.317 (0.098) 23.62 <0.001 72.39 09.32 0.002
6.2.8 1.361 (0.394) 03.45 <0.001 83.91 00.03 0.861

outcomes is 1.361%, obtained in the last specification above.
The results above point toward a causal relationship between peer effects and social net-
works on political outcomes. However, a few caveats are in order: while we expect male
population percentage of counties outside one’s state to influence political outcomes only
through peer effects, it is still possible that certain components of unobserved county-
specific error have some correlation with it (Bramoullé et al., 2009). Moreover, we use the
same weights to construct indexsc and either instrument: indexmale or indexmalenostate .
To mitigate the possibility of spurious correlation/ relevance, we normalize all indices to
remove the effect of weights. We also exclude counties within the same state in construct-
ing indexmale−nostate . Section 9 discusses further ways to improve the instrument and
identification strategy. We conclude this section with a pair of reduced form regressions
demonstrating significant correlation between each of the instruments and the dependent
variable. The specification and corresponding results are stated below:
∆(R − D)%i = βIV 1 ∗ indexmale,i + Controlsi + StateDummies + i (6.2.9)
∆(R − D)%i = βIV 2 ∗ indexmalenostate,i + Controlsi + StateDummies + i (6.2.10)
Table 4: OLS Regression: Reduced Form

Instrument β (SE) of instrument t-statistic p-value Adj R2 (%)
indexmale 0.020 (0.004) 5.38 <0.001 77.32
indexmalenostate 0.018 (0.006) 2.90 0.004 77.16
The results above reinforce the relevance of both instruments used earlier. As one would
expect, indexmalenostate is less significant than indexmale . Causal effects of peer effects and
social networks on political outcomes seem plausible based on our results.
13
7. Distance Effects
Individuals mostly form connections with others that they are close in distance to (Gra-
ham, 2017). It is not surprising that this regularity is reflected in Facebook networks
which, as previously mentioned, tend to approximate actual social networks well. For the
average individual, 40% of the total US population lives no further than 500 miles away,
while 80% of their friend network lives within the same radius; the percentage is higher
in rural communities and lower in urban areas (Bailey et al., 2018).
The high correlation between distance and the formation of friendship links challenges
the relevance of our research. Our thesis is trivial if the indices we construct using the
relative probability of friendship to study political change within the network have no
more explanatory power than a similar index simply weighting the connection of counties
by the distance between them.
We compare indexsc to indexdistance constructed in section 5 to address this concern.

Consider the two regression models below. Specification 7.1 is general: we run it once using
indexsc and again using indexdistance . In model 7.2 we use both indices simultaneously
and are interested in how that might change the coefficients, their significance, and t-
statistics.
∆(R − D)%i = β ∗ indexi + Controlsi + StateDummies + i (7.1)
∆(R − D)%i =
β1 ∗ indexsc,i + β2 ∗ indexdistance,i + Controlsi + StateDummies + i (7.2)
Table 5: Distance Regression

Index Specification β (SE) t-statistic p-value
indexsc 7.1 1.29 (0.038) 34.26 <0.001
indexdistance 7.1 2.51 (0.168) 14.98 <0.001
indexsc 7.2 1.28 (0.043) 29.70 <0.001
indexdistance 7.2 0.06 (0.169) 0.36 0.720
Both indexsc and indexdistance significantly explain changes in voting when included indi-
vidually. This is unsurprising given the results established in section 6 and a correlation
of 0.76 between indexdistance and indexsc . However, the large coefficient for indexdistance is
slightly misleading, and comparing the t-statistics for the different results of 7.1 suggests
that indexsc it actually more significant in explaining the outcome variable. In regression
14
model 7.2, indexdistance loses its significance while indexsc remains highly significant, pro-
viding further evidence that indexsc has superior explanatory power to indexdistance .
The superiority of indexsc is driven by a more intelligent method of weighting the links
between distant counties. For counties close to one another indexsc and indexdistance pro-
duce essentially identical estimates of connection and influence; this a simple result of
“people tend to make friends with their neighbours.” While all counties are strongly linked
to other counties close to them, the Facebook data set from Bailey et al. (2018) reveals
that many counties also share strong connections with a number of geographically dis-
tant counties. The index constructed using Facebook data accounts for these connections
while the index using distance fails to utilize this information, instead treating all distant
counties identically. Thus, the additional influence that indexsc explains is the influence
exerted by individuals located on the periphery of a county’s network.
County level heat maps (figure 1) help to visualize these results. We observe that indexsc
replicates the actual outcome with astonishing accuracy whereas indexdistance is a very
poor estimation of the actual change in political opinion. The maps not only confirm the
regression output visually, but they make it apparent why indexdistance loses its significance
when regressed together with indexsc .
Figure 1: Comparison of indexdistance and indexsc,i with respect to ∆(R − D)%i
The ability of distant connections to influence the politics of a county is a surprising

result. Bond et al. (2012) presents strong evidence from a randomized controlled trial
15
with over 61 million Facebook users that voting behavior can be transmitted over social
networks, but that the transmission happens almost exclusively between close friends who
are likely to have face-to-face relationships. However, our results seem to suggest that
distant connections are also capable of transmitting political preferences and influencing
voting.
We also modify indexsc introduced in section 5 to investigate the effects of distant net-
work connections by introducing a distance threshold within which all friendship links are
excluded in the calculation of the index; we consider thresholds of 100, 300, 600, 1,250,
and 2,000 miles.
Table 6: Distance Regression

Index β, SE t-statistic p-value Adj R2
Controls and FE NA NA NA 77.11
indexsc 1.29 (0.038) 34.26 <0.001 83.61
indexsc−100 1.29 (0.082) 16.57 <0.001 79.05
indexsc−300 2.01 (0.107) 18.77 <0.001 79.54
indexsc−600 2.12 (0.130) 16.21 <0.001 78.97
indexsc−1250 0.38 (0.033) 11.57 <0.001 78.09
indexsc−2000 0.028 (0.042) 0.67 0.504 77.10
indexsc−nostate 1.20 (0.050) 23.82 <0.001 80.79
indexsc distance 2.51 (0.168) 14.98 <0.001 78.71
The results of the regressions using indexsc with the distance threshold imposed confirm
what we began to suspect in the earlier regressions comparing indexsc and indexdistance .
While the network exerts less political influence when only distant friends are considered,
the rate at which this influence decays is surprisingly slow. The coefficients of all index
thresholds up to 600 miles are not only significant, but their R-squared values are actually
greater than the R-squared of indexdistance . indexsc only loses significance after imposing
the highest threshold of 2,000 miles. This is strong evidence that distant friends are also
important and may transmit political preferences.
8. International Connectivity and Voting Patterns
If distant within-country connections have a significant association with political opin-

ions of US citizens (conclusion of section 6.1), a logical question to ask is whether or not
16
foreign connections also share a similar association. We construct international connec-
tivity indices (ICIs) for countries (China, India, Mexico and Russia) and regions (Europe,
South America, Africa and Asia) to study their relationship to US politics. The indices
are based entirely on Facebook friendship links between domestic US users and foreign
Facebook users living in the countries and regions considered. However, we attempt no
causal statement about foreign influence on US politics; we use the ICIs only to identify
patterns (i.e. correlations) in political outcomes. Details regarding the construction and
intuition of the international indices are contained in appendix A.2.
We first run a baseline OLS regression with the new ICIs using the same set of controls
and state-level dummies from section 6. indexsc , which has established its significance in
terms of capturing the effect of domestic social networks, is also included and serves as a
control.
The baseline specification is:
∆(R − D)%i = βsc ∗ indexsc,i + Controlsi + StateDummies + i (8.1)
As all ICIs represent non-intersecting territories (for example: indexasia excludes con-
nectivity with Facebook users in China and India, which are captured by indexchina and
indexindia ), we perform an OLS regression adding all eight ICIs simultaneously.
The specification now becomes:
8
X
∆(R − D)%i = βsc ∗ indexsc,i + βp ∗ ICIp + Controlsi + StateDummies + i (8.2)
p=1
The results of the above regressions are summarized in table 7:
indexsc becomes slightly less significant in the presence of ICIs, suggesting that some
of the changes in political preference attributed to domestic social networks may be co-
transmitted through international social networks. That is reasonable because domestic
social networks, although stronger, are not exclusive and cross-cut international social net-
works. Among the ICIs, indexmexico , indexeurope , indexsoutham , indexaf rica and indexasia
are all significant, even in the presence of controls and state-level dummies. The map below
(figure 2) illustrates the explanatory power of two ICIs: indexmexico and indexaf rica .
17
Table 7: 2SLS Regression Reduced Form
Index Specification β, SE t-statistic p-value
indexsc 8.1 1.293 (0.038) 34.26 <0.001
indexsc 8.2 1.193 (0.037) 32.18 <0.001
indexindia 8.2 -0.015 (0.021) -0.73 0.468
indexchina 8.2 -0.030 (0.026) -1.15 0.251
indexmexico 8.2 -0.161 (0.016) -9.80 <0.001
indexrussia 8.2 0.003 (0.022) 0.15 0.878
indexeurope 8.2 0.259 (0.023) 11.43 <0.001
indexsoutham 8.2 -0.049 (0.019) -2.62 0.009
indexaf rica 8.2 -0.086 (0.017) -4.97 <0.001
indexasia 8.2 -0.084 (0.025) -3.32 0.001
Figure 2: International Connectivity
18
We can see that connectivity with Mexico and Africa are quite different in their distribu-
tion. However, both of them are significantly linked to counties that changed their political
orientation towards the Democrats (i.e. away from Republican). The map explains why:
connectivity with Mexico is higher in the South-West (close to Mexico), where the general
tendency to shift towards Democrat is the strongest. Connectivity with Africa is more
geographically dispersed but is clearly weaker in the northern states where the tendency
to shift towards Republican is the strongest. Unlike the case with Mexico, there is also sig-
nificant presence along the eastern coast where there is no particularly clear trend towards
Republican or Democrat.
9. Conclusions and Future Work
While there exists already a substantial body of work investigating the formation of social
networks and their effects, it is not obvious that their conclusions apply to the modern age.
Additionally, the presence of new data sets such as the one from Bailey et al. (2018) make
it possible to study real social networks on an aggregate level for the first time.
The idea that social networks directly influence political preferences and decision making is
highly intuitive, almost self-evident. Our work overcomes some of the difficulties inherent
to studying this phenomenon. We demonstrate a strong tendency of US counties to co-
move in political preferences within their nation-wide social network. This co-movement
is not purely a result of homophily, but in part is caused by transmission of political pref-
erences through social networks. The transmission of preferences surprisingly occurs even
from distant subsets of the network, confirming that even distant connections can exert
political influence. Finally, there is limited evidence that residents of foreign countries
exert rather small political influence through their links with US residents.
While we are among the first to study changes in political opinion with social media
data providing a comprehensive picture of aggregate network structures, there are many
potential ways to extend this analysis. We use this data to contextualize the behaviour
of voters and move toward a more complete understanding of how political preferences
communicated through social networks ultimately affect political outcomes. One major
difficulty throughout our analysis was the cross-sectional nature of the Facebook data (a
single observation year of the network structure).
In the future, researchers will have the opportunity to explore the network structure in
panel data form as more observation years are added. This will increase the number
of research designs possible; we consider one such design. An unexpected change in a
county’s social network (perhaps precipitated by the opening of a nationally popular ski
resort) would provide a natural experiment which could be compellingly used to investigate
changes in political preferences caused by the change in social connection. This is another
method of circumventing the reflection problem.
19
Researchers would also benefit immensely from more granular information in future re-
leases of Facebook data. A random sample of anonymized political data from each county
would enable a more accurate measure of both a county’s political orientation and its
precise political network structure. This data could be used to test models of network
formation and homophily at a national level. Researchers would also benefit from the
provision of data at the level of congressional districts, enabling elections to the House of
Representatives to also be studied.
As social media becomes a more important part of the way human beings connect with
one another, so does the underlying social networks which they represent. Research in
this area is fascinating and will only grow in importance over the coming years.
20
A. Appendices
A.1.
SCI: The “Social Connectedness Index” (SCI) is a measure of county – county connect-
edness. It is calculated as the sum of all individual active Facebook connections between
any US county pair. Only focusing on active users adds to the credibility of the data as
it removes all users that have been inactive within the last 30 days before measurement.
Bailey et al. (2018) rescale all SCI values relative to the county pair with the largest
number of friendship links, which is assigned a maximum value of 1,000,000.
RFij : This index accounts for county sizes, meaning that Bailey et al. (2018) normalize
by the multiplication of Facebook users across a county pair. We have tested, and Bailey
et al. (2018) have specified in their paper, that normalizing by the number of Facebook
users instead of the population does not alter the results in any meaningful way. Due to
the previous rescaling of SCI, rescaling RFij by 1012 is needed to minimize the number of
decimal places. This does not disturb its interpretation.
International SCI: In addition to domestic county – county data, Bailey et al. (2018)
provide us with county – country data. There are pairwise SCI and RFij values between
all 3,136 counties and 157 countries. For each country, the SCI of the county with the most
connections to a given country is normalized to 1,000,000. As a result, it is impossible to
make comparisons of connection between countries. While one can say Lucas county in
Ohio is more connected to Spain than Lima county, it is not possible to say Lucas whether
county in Ohio is more connected to Spain than to Germany The relative probability of
friendship for a county-country pair is only normalized by the county population as a
country’s population always remains the same.
Election data: We obtained Presidential election data from the MIT Election Data and
Science Lab at county level for the years 2004 until 2016; we only use 2012 and 2016 data.
Within that time period some independent cities have merged with counties or vice versa.
We accounted for such changes by using Census Bureau information that allowed us to
determine which FIPS codes we needed to merge through information on population sizes.
The election data is crucial as it allows us to merge connectivity with political opinion. In
particular, since our focus is mainly to explain the change in political preferences we take
first differences from 2012 to 2016 and add a column called did2012_2016 to the dataset,
which throughout will represent our dependent variable.
Manipulations: The domestic SCI dataset initially had a total of 9,834,496 observations.
This is because there are a total of 3,136 counties that have pairwise observations with
all other counties including themselves. The election data set had 3,142 observations.
Hence we needed to make certain adjustments to be able to make those two data sets
compatible. Throughout our analysis, we decided to consider only 3,014 counties and
21
therefore 9,084,196 observations. Hence, we dropped 750,300 observations (122 counties)
from the SCI data set and after merging several independent cities to their counties from
the election data set we dropped the same amount there. We essentially excluded all
counties of Alaska, whose election data could not be matched to the SCI data and 92
U.S. counties with a population below 1000 from our analyses. To assure consistency
throughout our analysis, we equally dropped all observations that involved any of the 122
counties from the international SCI data set.
Figure 3: A1
Control variables: Our choice county-level control variables were drawn out of a pool of
diverse income, education, labour and demographic factors. The choice made was based
on stability and multicollinearity considerations.
For example, white male % could not be included as its high variance had a severe destabi-
lizing effect on regression coefficients. However, it shared a significant (positive or negative)
correlation with several controls that were ultimately included: median income, percent-
age of adults with bachelor’s degree or higher, unemployment and median age, which are
expected to compensate for its absence.
Mean income could not be included due to its near-perfect correlation with median income.
Instead, we use median income and (mean income - median income) as a measure of
inequality. The final controls, their descriptions and their sources are listed below:
1. rd2012, which corresponds to (R-D)%2012, constructed based on difference in Re-

publican and Democrat % vote shares in 2012 under assumption 1: MIT Election
Data and Science
2. log_median_hh, logarithm of median household income: U.S. Census Bureau
3. unemployment, which is the unemployment rate: Bureau of Labor Statistics
4. log_total_population, logarithm of the total population: U.S. Census Bureau
5. pct_adult_bachelor, percentage of adults with bachelor’s degree or higher: U.S.

Census Bureau
22
6. male_population_pct: U.S. Census Bureau
7. log_mean_median_hh: logarithm of difference between mean and median house-

hold income levels, constructed based on mean and median income levels: U.S.
Census Bureau
8. median_age: U.S. Census Bureau
9. manufacturing_share, share of manufacturing industries in the economy: Bureau of

Labor Statistics
23
A.2.
No-State Index: This index specification is identical to indexsc , but it excludes counties
from the same state. Here, counties are indexed by i and j while states are indexed by
k. mk represents the number of counties in state k, and RFijkl represents the relative
probability of friendship between county i in state k and county j in state l.
P50 Pmk kl
k=1,k6=l j=1,,j6=i RFij · [∆ (R − D) %]j · vj
P50 Pn kl
(1)
k=1,k6=l j=1,i6=j RFij · vj
Distance Threshold Index: This next index specification is identical to indexsc , but it only
includes counties past a certain distance threshold in the calculation. Idij is an indicator
variable which equals 1 if county i and county j are greater than d miles apart.
Pn
j=1,,j6=i RFij · [∆ (R − D) %]j · vj · Idij
Pn (2)
j=1,,j6=i RFij · vj · Idij
International Indices: The SCI data requires some creativity to study international net-
work effects on American politics because of the way it is constructed. Appendix A.1
discusses the normalization of the country SCI values and the inability to compare the
strength of connection across countries.
This leaves us with two options. The connection between an individual country and all
counties can be studied. We do this for several large and influential countries such as
Russia, China, India, and Mexico. Alternatively, a regional index can be constructed
that combines the connections of relatively similar countries from a region; for example,
Europe, South America, Africa, and Asia.
Country Index: We use a simple index to study the effect of an individual country on
the U.S. network. It is the SCI between county i and a given country normalized by the
population of the county. This is essentially the relative probability of friendship between
the county and the foreign country as mentioned in appendix A.1; the only difference is
that the SCI is normalized by the county’s population instead of the product of the county’s
population and the country’s population. We test for country influences by regression the
change in county political preferences on the country index with county controls.
SCIik
ICIcountry,i = (3)
P opi
SCIik is the measure of connection between county i and country k. We rescale this index
to be between zero and one.
Region Index: We also study the effects of being connected to international regions (Eu-
rope, South America, Africa, Asia). The inability to compare the strength of a county’s
24
connection across countries prevents the construction of an index similar to those above.
Suppose that the county most connected to Germany is also the county most connected
to Luxembourg (Luxembourg is not in the data set, but that is not important for this
example). Furthermore, suppose—as is reasonable—that Germany has many more con-
nections to the county than Luxembourg. For this county, the SCI of both Germany and
Luxembourg would be normalized to 1,000,000 and would produce identical values for the
initial international index we constructed. Naively combining these two values in a manner
similar to our primary county-county index would overweight connections to Luxembourg
and underweight those to Germany.
The solution to this problem is to construct an index based on the product of the SCI values
for different countries. This maintains the proportionality of the connections between
counties and countries and gives us the ability to rank every county in terms of its total
connection to a region.
To make the index more credible, we select countries that are reasonably “homogenous.”
By homogenous we mean that they are not too small and reasonably share some common
cultural factors with the other countries in the region.
Our final index for county i is the product of all the simple international index values
between that county and ever country in the region considered. We then take the natural
log and use the properties of the function to split the product into a sum.
m
!
Y SCIik
log
k=1
P opi
m
X
ICIregion,i = log (SCIik ) − m · log (P opi ) (4)
k=1
We rescale this index to be between zero and one.
25
A.3.
Throughout appendix A.3, the notation followed to denote p-values is:

* p < 0.05
** p < 0.01
*** p < 0.001
Figure 4: A3.1
26
Figure 5: A3.2
Figure 6: A3.3
27
Figure 7: A3.4
28
Figure 8: A3.5
29
Figure 9: A3.6
30
Figure 10: A3.7
31
A.4.
Figure 11: A4.1
Social connectedness remains closely linked to changes in political orientation even after
excluding influence of counties in the same state. In fact, excluding within state
connections does surprisingly little harm to the index; index_nostate still appears to
explain the change in political orientation with great accuracy.
32
Figure 12: A4.2
We contrast two distance thresholds in their ability to replicate the actual political
change from 2012 to 2016. We have established that distance may not be the most
significant determinant of influence, but it still remains a key component. The index
including all counties at least 100 miles away far better explains changes in political
orientation than the index that includes counties more than 600 miles distant. We can
see that as the distance threshold increases, the networks becomes dominated by the
more distant friends, which leads to a deterioration of the index’s ability to replicate the
change in political opinion.
33
Figure 13: A4.3
This illustration ignores the actual change in voting and solely serves as a visualization
of the very different predictions the two indices make. Whereas index_sc’s colors are
fairly spread throughout the United States, index_distance splits the country in two
extremely concentrated areas.
34
References
Angrist, J. (2014). The perils of peer effects. Labour Economics, 30(C):98–108.
Bailey, M., Cao, R., Kuchler, T., and Stroebel, J. (2016). Social networks and housing
markets.
Bailey, M., Cao, R., Kuchler, T., Stroebel, J., and Wong, A. (2018). Social connectedness:
Measurement, determinants, and effects. Journal of Economic Perspectives, 32(3):259–
80.
Bank, B. J., Slavings, R. L., and Biddle, B. J. (1990). Effects of peer, faculty, and parental
influences on students’ persistence. Sociology of Education, 63(3):208–225.
Becker, G. S. (1958). Competition and democracy. Journal of Law and Economics, 1:105–
109.
Bertrand, M., Luttmer, E. F., and Mullainathan, S. (2000). Network effects and welfare
cultures. The Quarterly Journal of Economics, 115(3):1019–1055.
Bond, R. M., Fariss, C. J., Jones, J. J., Kramer, A. D. I., Marlow, C., Settle, J. E., and
Fowler, J. H. (2012). A 61-million-person experiment in social influence and political
mobilization. Nature, 489:295–298.
Bramoullé, Y., Currarini, S., Jackson, M. O., Pin, P., and Rogers, B. W. (2012). Ho-
mophily and long-run integration in social networks. Journal of Economic Theory,
147(5):1754–1786.
Bramoullé, Y., Djebbari, H., and Fortin, B. (2009). Identification of peer effects through
social networks. Journal of Econometrics, 150(1):41–55.
Graham, B. S. (2017). An econometric model of network formation with degree hetero-
geneity. Econometrica, 85(4):1033–1063.
Halberstam, Y. and Knight, B. (2016). Homophily, group size, and the diffusion of political
information in social networks: Evidence from twitter. Journal of Public Economics,
143(C):73–88.
Jackson, M. O. and Rogers, B. W. (2007). Meeting strangers and friends of friends: How
random are social networks? American Economic Review, (3):890–915.
Kolesár, M., Chetty, R., Friedman, J. N., Glaeser, E. L., and Imbens, G. W. (2017). Iden-
tification and inference with many invalid instruments. Journal of Business Economic
Statistics, 33(4):474–484.
Lazer, D. M. J., Rubineau, B., and Neblo, M. A. (2009). Picking people or pushing politics:
Selection and influence on five network criteria.
Manski, C. F. (1993). Identification of endogenous social effects: The reflection problem.
The Review of Economic Studies, 60(3):531–542.
McPherson, M., Smith-Lovin, L., and Cook, J. M. (2001). Birds of a feather: Homophily
in social networks. Annual Review of Sociology, 27(1):415–444.
Sacerdote, B. (2001). Peer effects with random assignment: Results for dartmouth room-
mates. The Quarterly Journal of Economics, 116(2):681–704.
35
Shearer, E. (2018). Social media outpaces print newspapers in the u.s. as a news source.
Pew Research Center, 63(3).
Sunstein, C. R. (2001). Echo Chambers: Bush v. Gore, Impeachment, and Beyond. Prince-
ton University Press.
36

Masters Thesis Peer Effects and Politics

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Masters Thesis Peer Effects and Politics

Uploaded by

Copyright:

Available Formats

Master Project

Peer Effects and Politics

Sarat Chandra Akella

Barcelona Graduate School of Economics

We use data on US Facebook friendship networks to study changes in political

6. Identification Strategy and Results 8

8. International Connectivity and Voting Patterns 16

9. Conclusions and Future Work 19

Network formation is commonly modeled by considering individuals who make connections

When a researcher observing the distribution of behavior in a population

2. Representative Data: County-level data on number of Facebook users and friendship

3. Counties as agents: The availability of social connectivity data from Facebook at

5. Focus on change in political orientation: The regressions of section 6 are based on

6. Identification Strategy and Results

6.1. Ordinary Least Squares (OLS) Specifications

∆(R − D)%i = β ∗ indexsc,i + Controlsi + i (6.1.2)

∆(R − D)%i = β ∗ indexsc,i + StateDummies + i (6.1.3)

∆(R − D)%i = β ∗ indexsc,i + Controlsi + StateDummies + i (6.1.4)

6.1.2 Controls, No SFE 1.381 (0.020) 69.01 <0.001 81.87

6.1.3 No Controls, SFE 2.040 (0.035) 58.40 <0.001 72.50

6.1.4 Controls, SFE 1.293 (0.038) 34.26 <0.001 83.61

6.2. Two Stage Least Squares (2SLS) Specifications

∆(R − D)%i = β ∗ indexsc,i +i (6.2.1)

∆(R − D)%i = β ∗ indexsc,i +Controlsi + i (6.2.2)

∆(R − D)%i = β ∗ indexsc,i +StateDummies + i (6.2.3)

∆(R − D)%i = β ∗ indexsc,i +Controlsi + StateDummies + i (6.2.4)

Table 2: 2SLS Regression

6.2.2 2.305 (0.175) 13.17 <0.001 69.09 48.76 <0.001

6.2.3 2.158 (0.094) 22.96 <0.001 72.86 01.84 0.1747

6.2.4 1.425 (0.224) 06.37 <0.001 83.86 00.36 0.5472

∆(R − D)%i = β ∗ indexsc,i +i (6.2.5)

∆(R − D)%i = β ∗ indexsc,i +Controlsi + i (6.2.6)

∆(R − D)%i = β ∗ indexsc,i +StateDummies + i (6.2.7)

∆(R − D)%i = β ∗ indexsc,i +Controlsi + StateDummies + i (6.2.8)

6.2.6 2.376 (0.213) 11.14 <0.001 67.03 40.36 <0.001

6.2.7 2.317 (0.098) 23.62 <0.001 72.39 09.32 0.002

6.2.8 1.361 (0.394) 03.45 <0.001 83.91 00.03 0.861

outcomes is 1.361%, obtained in the last specification above.

∆(R − D)%i = βIV 1 ∗ indexmale,i + Controlsi + StateDummies + i (6.2.9)

∆(R − D)%i = βIV 2 ∗ indexmalenostate,i + Controlsi + StateDummies + i (6.2.10)

Table 4: OLS Regression: Reduced Form

indexmalenostate 0.018 (0.006) 2.90 0.004 77.16

We compare indexsc to indexdistance constructed in section 5 to address this concern.

∆(R − D)%i = β ∗ indexi + Controlsi + StateDummies + i (7.1)

Table 5: Distance Regression

indexdistance 7.1 2.51 (0.168) 14.98 <0.001

indexsc 7.2 1.28 (0.043) 29.70 <0.001

indexdistance 7.2 0.06 (0.169) 0.36 0.720

Figure 1: Comparison of indexdistance and indexsc,i with respect to ∆(R − D)%i

The ability of distant connections to influence the politics of a county is a surprising

Table 6: Distance Regression

indexsc 1.29 (0.038) 34.26 <0.001 83.61

indexsc−100 1.29 (0.082) 16.57 <0.001 79.05

indexsc−300 2.01 (0.107) 18.77 <0.001 79.54

indexsc−600 2.12 (0.130) 16.21 <0.001 78.97

indexsc−1250 0.38 (0.033) 11.57 <0.001 78.09

indexsc−2000 0.028 (0.042) 0.67 0.504 77.10

indexsc−nostate 1.20 (0.050) 23.82 <0.001 80.79

indexsc distance 2.51 (0.168) 14.98 <0.001 78.71

8. International Connectivity and Voting Patterns

If distant within-country connections have a significant association with political opin-

The baseline specification is:

∆(R − D)%i = βsc ∗ indexsc,i + Controlsi + StateDummies + i (8.1)

∆(R − D)%i = β ∗ indexsc,i + Controlsi + i (6.1.2)

∆(R − D)%i = β ∗ indexsc,i + StateDummies + i (6.1.3)

∆(R − D)%i = β ∗ indexsc,i + Controlsi + StateDummies + i (6.1.4)

∆(R − D)%i = β ∗ indexsc,i +i (6.2.1)

∆(R − D)%i = β ∗ indexsc,i +Controlsi + i (6.2.2)

∆(R − D)%i = β ∗ indexsc,i +StateDummies + i (6.2.3)

∆(R − D)%i = β ∗ indexsc,i +Controlsi + StateDummies + i (6.2.4)

∆(R − D)%i = β ∗ indexsc,i +i (6.2.5)

∆(R − D)%i = β ∗ indexsc,i +Controlsi + i (6.2.6)

∆(R − D)%i = β ∗ indexsc,i +StateDummies + i (6.2.7)

∆(R − D)%i = β ∗ indexsc,i +Controlsi + StateDummies + i (6.2.8)

∆(R − D)%i = βIV 1 ∗ indexmale,i + Controlsi + StateDummies + i (6.2.9)

∆(R − D)%i = βIV 2 ∗ indexmalenostate,i + Controlsi + StateDummies + i (6.2.10)

∆(R − D)%i = β ∗ indexi + Controlsi + StateDummies + i (7.1)

∆(R − D)%i = βsc ∗ indexsc,i + Controlsi + StateDummies + i (8.1)