Zha Columbia 0054D 15223

Essays on Education and the Marriage Market
Danyan Zha
Submitted in partial fulfillment of the

requirements for the degree of
Doctor of Philosophy
in the Graduate School of Arts and Sciences
COLUMBIA UNIVERSITY
2019
© 2019
Danyan Zha
All rights reserved
ABSTRACT
Essays on Education and the Marriage Market
Danyan Zha
Chapter one of this thesis examines one of the largest primary school construction program, INPRES SD,
in late 1970s in Indonesia. Using the variation across regions in the number of schools constructed and the
variation across birth cohorts, I show that in densely populated areas, primary school construction did not
affect the primary school attainment rate. More surprisingly, the program decreased the secondary school
attainment rate for both men and women due to a crowding out of teacher resources.
Chapter two of this thesis examines how education distribution affects the marriage market, in particular,
female marriage age. I first develop a two-to-one dimensional matching model with transferable utility in an
OLG framework, in which the marital surplus allows complementarity between men’s education and both
characteristics of women: education and youth, to understand how female marriage age is affected by others’
education. I then use INPRES SD as a quasi-natural experiment and find that a woman marries earlier and
the spousal age gap increases when fewer women in her birth cohort graduate from secondary school and
the education distribution of their potential husbands does not change. The empirical finding suggests that
men’s education and women’s young age are complementary in generating the marital surplus in the current
setting.
Chapter three of this thesis examines how hukou system affects the marriage market in China. I build a
bidimensional matching model in which individuals are determined by a continuous attribute (that indicates
social economic status) and a discrete attribute (hukou status, either rural or urban). Urban hukou is more
valuable for men than women since it is more likely for a woman to move to her husband’s location upon
marriage in a patrilocal society. The model gives predictions on the matching patterns which are validated
using the China 2000 0.095% sample census.

Contents
List of Figures iii
List of Tables v
Acknowledgements vi
1 Chapter 1. Schooling Expansion and Education 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 School Construction and Education in Indonesia . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Empirical Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6 Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.8 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.9 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.10 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2 Chapter 2. Schooling Expansion and the Female Marriage Age 30
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 The Marriage Market in Indonesia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.5 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
i
2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.8 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.9 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.10 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3 Chapter 3. Multidimensional Matching: Hukou status in the Marriage Market 73
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.3 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.4 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.5 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.6 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
References 110
ii
List of Figures
1.1 Map of primary school construction intensity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2 Construction intensity for sparsely and densely populated areas . . . . . . . . . . . . . . . . . . . 16
1.3 Coefficients of the interactions: age in 1974 * program Intensity in the region of birth in the
education equation for completing primary school . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4 Coefficients of the interactions: age in 1974 * program intensity in the region of birth in the
education equation for completing secondary school . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5 Effect of the program on completing primary school (top) and completing secondary school (bot-
tom) in sparsely populated areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.6 Effect of the program on completing primary school (top) and completing secondary school (bot-
tom) in densely populated areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7 Coefficients of the interactions: census year * program intensity in the regency in average number
of secondary school teachers equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.8 Effect of the program on primary teacher education . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.A.1Number of newly appointed primary school teachers,1974-1998 . . . . . . . . . . . . . . . . . . . 29
2.1 Marriage frequencies (left) and marriage proportions (right) by education for females . . . . . . . 50
2.2 Effect of the program on female age at first marriage (top) and the spousal age gap (bottom) in
sparsely populated areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.3 Effect of the program on female age at first marriage (top) and the spousal age gap (bottom) in
densely populated areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.1 Predicted matching pattern with symmetric population . . . . . . . . . . . . . . . . . . . . . . . 91
3.2 Matching patterns under different parameter values . . . . . . . . . . . . . . . . . . . . . . . . . 92
iii
3.3 Predicted matching pattern with asymmetric population . . . . . . . . . . . . . . . . . . . . . . . 93
3.4 Predicted matching pattern with asymmetric population (Detailed) . . . . . . . . . . . . . . . . . 94
3.5 Utility in the quadratic example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.6 Comparative statics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.7 Heatmap for husbands’ and wives’ education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
iv
List of Tables
1.1 Summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.2 Effect of school construction on education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3 Heterogeneous effect of school construction on education . . . . . . . . . . . . . . . . . . . . . . . 25
1.4 Effect of school construction on number of teachers in secondary and primary education . . . . . 26
1.A1 Inpres Sekolah Dassar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.A2 Development grant to regions (in billion Rp.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.1 Summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.2 First marriage age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.3 Reduced-form effect of school construction on female marriage outcomes . . . . . . . . . . . . . . 55
2.4 Results of female education distribution on female marriage outcomes . . . . . . . . . . . . . . . 56
3.1 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.2 Matching patterns by hukou status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.3 Positive assortative matching in each matching cateogry . . . . . . . . . . . . . . . . . . . . . . . 99
3.4 Selection into four matching categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
3.5 Explain individuals’ education using spousal characteristics, by hukou status. . . . . . . . . . . . 100
3.6 Unconditional and conditional correlation between hukou and SES, by spousal type . . . . . . . . 101
v
Acknowledgements
I am deeply grateful and indebted to my three advisors, Pierre-André Chiappori, Bernard Salanié and Cris-
tian (Kiki) Pop-Eleches for their continuous encouragement, guidance and support. Pierre-André sparkled
my research interest in using matching theories to understand marriage markets. Our discussions and his
sharp intuition have helped me develop my own skills to tackle complicated issues with simple models.
Bernard taught me the importance of details and perseverance in research. He has spent enormous time
reading my drafts and helping make them better. Kiki helped me come up with the idea of using Indonesia
as a setting for the empirical analysis in the first two chapters. He encouraged me ceaselessly and was always
there to help.
I also thank other committee members, Suresh Naidu and Jack Willis. Suresh has helped me since my
second year and discussions with him always broadened my thinking. Jack has helped me improve the first
two chapters a lot since he joined Columbia. I have also learned a lot from other development faculties at
Columbia: Alex Eble, Jonas Hjort, Rodrigo Soares and Eric Verhoogen. I would also like to thank Esther
Duflo for generously sharing her original school construction data to make the first two chapters possible. I
also benefited a lot from email exchange with Natalie Bau.
I have also benefited greatly from my fellow colleagues at Columbia, both in research and in life, including
So Yoon Ahn, Ashna Arora, Yoon Joo Jo, Sun Kyoung Lee, Xuan Li, Yifeng Luo, Anh Nguyen, Qiuying
Qu, Anurag Singh, Yue Yu, Qing Zhang and Weijie Zhong.
Lastly, I am immensely grateful to my family and Lihong for their unconditional love and support.
vi
Dedicated to the memory of my mother, who has always believed in me.
vii
Chapter 1
Schooling Expansion and Education
1.1 Introduction
1
Human capital is key to individuals’ lifetime outcomes and countries’ economic growth . This provides
a rationale for world-wide schooling expansion, especially in low- and middle-income countries in the past
three decades (World Bank, 2018). Many papers have documented the positive effect of these education
policies on individuals’ education, wage, income, wealth and health outcomes (Malamud et al., 2018; Jürges
et al., 2011). Careful evaluations are necessary to guide government policies and international organizations
2
to find the most cost-effective policies.
The potential existence of externalities makes program evaluation even harder. A program that targets
one particular population may affect another population through information transmission, resource alloca-
tion or other channels. Other untreated individuals may benefit if there is social positive externality, or may
be affected negatively if their resources decrease due to the existing program.
This chapter evaluates the INPRES SD in Indonesia, the largest and most successful primary school
construction program so far. I find that surprisingly, building primary school has an unintended consequence
on secondary education. It actually has a negative impact on the secondary school attainment rate for both
3
men and women in more densely populated areas due to a crowding-out of teacher resources.
This chapter builds upon earlier studies using the same schooling expansion program (Duflo, 2001; Ashraf
et al., 2016). I replicate some of their findings but also find some surprising results not mentioned in the
previous literature. Consistent with previous findings, there is a positive effect on primary school attainment
1 A large literature in economic growth, (see Mankiw et al., 1992; Young, 1994,9; Barro and Sala-i-Martin, 1995), documents
the importance of endogenous human capital to economic growth.

2 See (Glewwe and Kremer, 2006; Glewwe et al., 2013; McEwan, 2015; Glewwe and Muralidharan, 2016) for a good review
of research on the effectiveness of education policies.

3 The secondary school attainment rate is defined as the percentage of people completing at least secondary school for a
given birth cohort in one place. Similarly, the primary school attainment rate is defined as the percentage of people who
complete primary school or above.
1
rate for men but not for women. However, I find a negative effect on secondary school attainment rate for
women in the full-sample analysis 4 . As suggested in Duflo (2001), the program may have different effect
in sparsely populated and densely populated regions. Exploring the heterogeneity of the effects depending
on population densities of the regions considered, I find that in sparsely populated regions, the school
construction program had a positive effect on both primary school and secondary school attainment rates
for men but not for women; in densely populated regions, the school construction program did not affect
primary school attainment rate but had a negative effect on secondary school attainment rate for both men
and women.
I then investigate two potential mechanisms leading to the negative secondary school attainment result:
(1) a decrease in secondary school quality due to resources being crowded out; (2) a decrease in primary
school quality due to the massive scale of construction. The analysis supports the first mechanism. Building
a primary school increases the total demand for teachers in a region. In 1970s, teacher hiring was very
centralized, hence a huge demand from primary schools may affect the availability of potential new teachers
in secondary education. Moreover, the demand for teachers can be more competitive in densely populated
regions than sparsely populated regions since it is easier to relocate for teachers in the former area. I show
that the total number of teachers and the average number of teachers in secondary school increase less in the
regions where more primary schools were constructed after the launch of the school construction program.
The negative effect on teacher availability in secondary education in future years only exists in densely
populated regions, not in sparsely populated regions. Moreover, rapidly constructing primary schools could
also decrease the quality of primary education.
Using the education level of primary school teachers in the censuses as a proxy for school quality, I show
that teacher education increases less in regions where more primary schools were constructed. However, I
do not find a difference between sparsely and densely populated regions. In summary, the negative result
on secondary school attainment rate is due primarily to the crowding out of teacher resources in secondary
education because of the primary school construction.
This chapter is closely related to other papers that have studied the effect of INPRES SD program since
the seminal paper Duflo (2001), which focuses men and finds a positive effect on male years of schooling
4 This is consistent with other papers evaluating this program using different datasets, Akresh et al. (2018)
2
and their wages in 1995 using SUPAS 1995. Using the same dataset, Ashraf et al. (2016) instead looks at
women and find that the program increases years of schooling for women, but only for the ethnicities that
practice bride price. Breierova and Duflo (2004) shows that mother’s and father’s education are equally
important factors in reducing child mortality. Using the fifth wave of the Indonesian Family Life Survey
(IFLS, 2014), Bharati et al. (2018) shows that the school construction increases schooling for individuals
who experienced negative shock (low rainfall) in the first year of life but not those who didn’t experience the
adverse rainfall shock, partly due to deteriorating school infrastructure and increased competition. Akresh
et al. (2018) and Rosales-Rueda et al. (2019) examine the long-term and intergenerational effect of this
program on individuals’ work choice, household behavior, and children’s education. Martinez-Bravo (2017)
shows that local public good provision increases in the villages where the education of the village heads
increase due to this program. Dominguez (2014) uses structural estimation to show that an increase in
the primary school graduates increase the single rate and decrease the marital utility of the primary school
graduates.
This chapter also contributes to the literature studying the effect of teacher availability and quality on
student learning outcomes. Many papers have shown that higher teacher quality matters more to students’
achievement than other education input including class size and school infrastructure. (Rivkin et al., 2005;
Hanushek, 2011)
Finally, this chapter provides evidence for the existence of another type of externality in the large scale
intervention. Miguel and Kremer (2004) shows a large positive externality of deworming for untreated
children in the treatment and neighboring schools. Bobonis and Finan (2009) finds a positive externality
of the PROGRESA program for program-illegible children’s secondary school participation in the program
communities. Castro and Esposito (2018) finds a negative externality of the bonus paid to incentivize
teachers on nearby rural schools.
3
1.2 School Construction and Education in Indonesia
INPRES primary school construction program in Indonesia
The Indonesian government has consistently sought to broaden educational opportunity since the country’s
independence in 1945. However, due to financial difficulties and political conflict, in the country’s early
years, Indonesia remained backward relative to neighboring countries and to countries with similar levels
of income. As late as the 1971 population census, only 62% of primary school-aged children (ages 7-12
inclusive) were enrolled in any kind of school, while only 54% appeared on the rolls of public and private
schools reporting to the Ministry of Education, (see Snodgrass, 1984). Due to the increased oil production
and the first OPEC-engineered price rise in 1972-1973, which unexpectedly raised government revenue,
a primary school construction aid program (Program Bantuan Pembangunan Sekolah Dasar), known as
5
INPRES Sekolah Dasar and more informally as INPRES SD, was inaugurated in 1973 to help increase
primary school enrollment rate which had been stagnant before 1973.
Between 1973/74 and 1978/79, 62,000 primary schools were scheduled to be built. Each school consists
of three classrooms, and each classroom has one teacher and can accommodate 40 pupils. The allocation rule
every year is as follows: (a) ensure that each district(kecamatan in Indonesian, one level below the regency
and two levels below the province level) was allocated at least one school and each province at least 50, (b)
the remainder were distributed according to the estimated population of unenrolled 7-12 year old children,
(Snodgrass et al., 1980). This creates variation in the construction intensity that I exploit in my empirical
analysis.
In addition to school construction, the government also provided textbooks and teacher training to ensure
that the buildings were used for education purposes. Moreover, the primary school fee was abolished in 1977.
By 1983, nearly all Indonesian children had at least begun to enroll in primary school, while the percentage
of 7-12 year olds enrolled exceeded 90%. INPRES SD has been a successful case of education policies in
developing countries.
This program spanned 1973/74-1988/89, as shown in Table 1.A1. However, in the empirical analysis,
only the first six years (1973/74-1978/79) were used because of the region specific construction target was
5 INPRES stands for Instruksi Presiden (Presidential Instruction) and Sekolah Dasar means primary school.
4
only available for these years. For later years, only the aggregate number of school construction targets
6
was available. In the empirical work, we were comparing older cohorts who wouldn’t have been affected
by the program with younger cohorts who were of age at least 7 in 1979. Unobserving school construction
data after 1979 wouldn’t invalidate this comparison. However, this would still create two issues. First, the
effect of one additional school built may be overestimated if schools built after 1979 could have positive
effect on current students who have enrolled before 1979. Second, this creates more difficulties to test the
mechanism behind the negative effect in the secondary school attainment. If school construction ended in
1979, we could test whether it’s the short-run teacher shortage that contributed to the negative effect on the
secondary school education by comparing the children who were exposed (to schools construction between
1973 and 1979) and those even younger children. If it was short-run shortage, we should expect the negative
effect diminishes for the younger children.
Education system in Indonesia
In Indonesia, the education system consists of six years of primary school (sekolah dasar, SD), three years
of middle school (sekolah menengah pertama, SMP) and three years of high school (sekolah menengah atas,
SMA), followed by various kinds of higher education. Children generally begin primary school at age 7. Two
ministries are responsible for managing the education system, with 84 percent of schools being under the
Ministry of National Education and the remaining 16 percent being under the Ministry of Religious Affairs.
In the 2000 census, although 86.1 percent of the Indonesian population was registered as Muslim, only 15
percent of school-aged individuals attended religious schools. (Frederick and Worden, 1993)
INPRES 1973 initiated Indonesia’s program of compulsory education, but six-year compulsory education
for primary school-aged children (7-12 age group) was not fully implemented until 1984. In May 1994, nine-
year compulsory education for the 7 to 15 age group was introduced. Of all pupils, 92% were enrolled in
public schools for primary education, and 50% were enrolled in public schools for secondary education. The
Indonesian government focused more on primary education than on the secondary level. In 1985, of public
spending on education, 62% went to primary education, while 27% went to secondary education. (see Tan
6 New primary school entrants kept increasing between 1973 and 1979, then fluctuated about 4.3 million. (World Bank,
1989, page 16)
5
and Mingat, 1992, table 3.1, table 6.5)
In the 1980s, although all children began primary school, only approximately 62% of pupils entering
primary school actually graduated from grade 6. Transition between primary school and junior secondary
school was also low, at approximately 60%. (see Jones and Hagul, 2001, table 1, figure 2). Transition
between junior secondary and senior secondary was also low: 53%. However, the survival rates of junior
secondary school and senior secondary school are fairly high in Indonesia, at more than 90%. (see Tan and
Mingat, 1992, table4.5, table 4.6, Table A.1)
Teacher in Indonesia
Teachers used to be of high quality and the profession used to be regarded as highly prestigious before early
1970s. However, with rapid school construction, there were not enough trained teachers and teachers were
prepared in rush, which diluted the teacher quality between 1970s and 1980s in Indonesia.(Jalal et al., 2009)
Figure 1.A.1 plotted the number of newly appointed primary school teachers between 1974 and 1998. The
number of new hires kept increasing since 1974 and followed a similar trend with the funding in the last
column of Table 1.A1. Lots of effort have been spent by Indonesian government to upgrade the teacher
profile including the implementation of Law No. 14/2005 on Teachers and Lecturers, known as the Teacher
Law which contains certification requirements for teachers. (World Bank, 2016)
Teacher salary increases on average 6.5% from primary school to junior secondary school, and increases
7
on average 15% from junior secondary school to senior secondary school in 2004/05. Compared to others
with similar education levels, teachers with high education are paid less, while teachers with low education
are overpaid.
Before the decentralized Education Law 20/2003, teacher hiring was very centralized, as well as the
delivery of other public services. Central government agencies, the Ministry of National Education (MONE)
and the Ministry Religious Affairs (MORA), were responsible for hiring teachers and paying salaries. Public
teachers have always been trained by centrally accredited teacher training institutions through public exam-
inations. In the 1970s, primary school teachers were prepared in the teacher education school called Sekolah
7 In 2004/05, salary for primary school ranges from 2,733 to 3,941 (in US Dollars), ranges from 2,913 to 4,281 for junior
secondary school, and ranges from 3,373 to 4,756 for senior secondary school. (Jalal et al., 2009, Table 1.5)
6
Pendidikan Guru (SPG) after completing junior secondary school. Junior secondary school teachers were
prepared in the institutes and faculties of teacher education (IKIP/FKIP) with Diploma 1 qualification after
completing senior secondary school. Senior secondary school teachers were prepared in the institutes and
faculties of teacher education (IKIP/FKIP) with Diploma 2 qualification after completing senior secondary
school. (See Jalal et al., 2009, Table 1.11)
1.3 Data
Indonesian census data
For the main analysis, I use information from the 10% sample of the Indonesian Population Census 2010 and
the 0.51% sample of the Indonesian Intercensal Population Survey (SUPAS) 2005 downloaded from IPUMS
International 8 . The two censuses were designed to be representative of the whole country. Moreover, the
birthplace (regency level) of individuals is recorded in both censuses, which can be used to proxy for their
exposure to the primary school construction program when they were of primary school age.
Table 1.1 displays the descriptive statistics of individuals’ education using the 2010 census for the older
cohort born between 1950 and 1961 (not exposed to the program) and younger cohort born between 1962
and 1972 (exposed to the program). The number of individuals with secondary school degrees is fairly small,
even for the younger cohort. An increase in education is observed for the younger cohort. Men are more
educated than women.
School construction data
The number of schools planned to be constructed across regencies is collected in Duflo (2001) from each
year’s presidential instruction. Intensity is defined as the average number of primary schools planned to
be constructed between 1973 and 1978 (inclusive) per 1000 children aged 5-14 at the regency level in the
1971 census. INPRES school construction began in 1973/74, the last year of Repelita I (the first five-year
development plan) and continued through Repelita II(1974/75-1978/79), Repelita III (1979/80-1983/84)
8 Minnesota Population Center. Integrated Public Use Microdata Series, International: Version 7.1 [dataset]. Minneapolis,
MN: IPUMS, 2018. https://doi.org/10.18128/D020.V7.1. I would also like to acknowledge the statistical agency that originally
produced the data: Statistics Indonesia
7
9
and Repelita IV(1984/85-1988/89). However, regency-specific plan data are available only for 1973/74-
1978/79; hence I limit my sample to individuals born at or before 1972 who were older than age 7 in 1979.
For those born after 1972, I am unable to identify the primary school construction intensity they were
exposed to at age 7.
Link regency code between censuses
School intensity data are available for 290 unique regencies in Duflo (2001), which were coded using 1995
labels. There were 304 regencies in 1995, and the 14 lost regencies were in East Timor, which became part
of Indonesia as the 27th province in 1976.
Indonesia has experienced a substantial increase in the number of regions (Pemekaran Daerah) since the
enactment of Law No.22 of 1999 concerning regional autonomy. The number of regencies increased from 271
in 1971, to 304 in 1995, to 437 in 2005 and to 494 in 2010. To tackle this issue, I use the GIS shapefiles
provided by IPUMS across census years to link regencies of birth in 2005 and 2010 back to the regency
of birth variable in 1995 to assign the proper program intensity to each individual. Since most of this
expansion is in the form of dividing existing regencies into several small regencies, I can link the majority of
10
the regencies.
School infrastructure and teacher quality data
The number of schools and teachers at different levels is available from the Ministry of Education, which is
also collected in the original dataset used in Duflo (2001). As for teacher quality, I adapt the method used
in Behrman and Birdsall (1983); Bharati et al. (2018): calculating the percentage of teachers (self-report)
who complete secondary school or some college across regions in the Indonesian censuses of 1971, 1980, and
1990 and the inter-censuses of 1976 and 1985, as a proxy.
9 See Table 1.A2 for the number of grants given to primary school building in each Repelita.
10 Between 1995 and 2010, I can link all regencies except Sarmi regency (9419) in Papua Province. Between 1995 and 2000,
I can link all regencies expect Ternate city (8271) in Maluku Utara province.
8
1.4 Empirical Strategy
To analyze how education is affected across regencies and birth cohorts, the empirical strategy is difference-
in-differences, as used in Duflo (2001). One difference comes from the school construction intensity, defined
as the average number of primary schools built between 1973 and 1978 in one regency per 1000 children
aged 5 ∼ 14 in 1971. The other difference comes from birth cohorts. In Indonesia, children begin attending
primary school at ages 7 ∼12. Those aged 13 or above in 1974 would not have been impacted by the program
because they were already out of primary school. For those aged less than or equal to 12 in 1974, the younger
they were, the more exposed they were to this school construction program. A valid difference-in-differences
strategy requires that all regions should have parallel trends in the education of different birth cohorts.
The quantitative effect of the school construction program on individuals born in birth cohort k and
regency j can be estimated with the following specification:
∑
12 ∑
21 ∑
21
yjk = αj + βk + (Pj dkl )γl + (Pj dkl )γl + (Cj dkl )δl + εjk
l=2 l=14 l=2
where yjk is the percentage of individuals completing primary school (secondary school) born in regency j,
and in birth cohort k, dkl is a dummy that indicates whether birth cohort k individuals are age l in 1974
(year-of-birth dummy). αj denotes the regency fixed effect, and βk denotes the birth cohort fixed effect. Pj
is the school construction intensity in regency j. εjk is the error term. Cj represents other region-specific
variables.
The coefficients γl are the coefficients of interest. They represent the effect of one additional primary
school constructed on the dependent variable for individuals of age l in 1974. There is a testable restriction
on coefficients γl . A valid identification strategy would require that γl = 0 if l > 13, i.e., the variation in the
outcome variable is not correlated with the primary school available starting in 1974 for the children who
were already out of primary school in 1974. I should expect that for l ≤ 12, γl > 0, and that γl decreases
with l as the effects should be larger in the younger cohorts.
9
1.5 Results
In this section, I present my empirical results on education. I first present the results for the full sample, then
show the results for two subsamples depending on population density. Finally, I provide further evidence for
the mechanisms behind the different results observed in the subsamples.
Full sample
In Figure 1.3, I plot γl when the dependent variable is the percentage of individuals who complete at least
primary school for men (or women), i.e., the effect of one additional primary school constructed per 1000
children on primary school attainment rate for men (or women) with age l in 1974. To simplify the graph,
I combine three birth cohorts together on the graph.
Two important results stand out from Figure 1.3. First, γl is not significantly different from 0 for l larger
than 13 for both men and women. This lends confidence in the identification assumption: the birth cohort
trend in the primary school attainment rate does not differ across regions with different school construction
intensities. Secondly, γl is positive for men with age l ≤ 12 in 1974, indicating a positive effect on the primary
school attainment rate for men; γl is zero for women except the youngest cohorts, aged l ≤ 3, indicating a
lagged effect on female primary school attainment rate. Both results are consistent with previous findings
in Duflo (2001) and Ashraf et al. (2016).
Difference-in-differences estimates are provided in columns (1)-(3) in Table 1.2. Following Duflo (2001),
the sample includes individuals born between 1950 and 1961 who are older than 12 in 1974, and individuals
born between 1968 and 1972, who are younger than 7 in 1974. ”Post” indicates individuals born between
1968 and 1972. Column (3) suggests that one additional school increases male primary school attainment
rate by 0.6 percentage points. This is smaller than the estimate in Figure 2 in Duflo (2001) where it’s shown
that approximately 1.5% more individuals had at least 6 years of schooling between high program regions
(where on average 2.44 schools were built) and low program regencies (where on average 1.54 schools were
built). My estimate is smaller; one potential reason for this divergence is the inclusion of more controls in
my analysis compared to Equation (4) in Duflo (2001).
In Figure 1.4, I plot the coefficients of the interactions of age in 1974 and program intensity for completing
10
secondary school or above. I find a negative impact on secondary school attainment rate, especially for
women. This is surprising because, if anything, one should expect positive spillover effects from primary
school completion to secondary school completion. This finding is also mentioned for men in Duflo (2001)
but not discussed in detail there. The difference-in-differences estimates in columns (4)-(6) in Table 1.2
suggest that one additional school being built decreases women’s secondary school attainment rate by 0.53
percentage points.
Heterogeneity results on education
Further insight into the effect of the program can be obtained by examining its impact on different types
of regions. In this section, I repeat the previous exercise on two subsamples divided by population density:
sparsely populated regions with densities below the medium density and densely populated regions with
densities above the medium. Population density is calculated as the population in the 1971 census divided
by the area of each region in 1971. The median density (the density for the region of birth for the median
person in the weighted sample) is 470 inhabitants per square kilometer. There are 183 regions in the sparsely
populated subsample, and the average number of schools constructed per 1000 children is 2.1. There are
91 regions in the densely populated subsample, and the average number of schools constructed per 1000
children is 1.67, which is somewhat lower than that in the sparsely populated subsample. Figure 1.2 shows
the distribution of the school construction intensity for the two subsamples.
In Figure 1.5 and Figure 1.6, I plot the coefficients on education γl for both sparsely populated and
densely populated subsamples. The difference-in-differences estimates are shown in Table 1.3.
As Figure 1.5 shows, in sparsely populated areas, the program increased the primary school attainment
rate (top) and secondary attainment rate (bottom) for men but did not affect women’s education. Difference-
in-differences estimates are provided in Panel A of Table 1.3. For men, one additional school constructed
per 1000 children increased the percentage completing primary school or above by 1 percentage point and
the percentage completing secondary school or above by 0.69 percentage points.
As Figure 1.6 shows, in densely populated areas, the program did not affect the primary school attainment
rate (top) but decreased the secondary school attainment rate (bottom) for both men and women. Difference-
in-differences estimates in Panel B of Table 1.3 suggests that one additional school being built per 1000
11
children decreased the secondary school attainment rate by 2.3 percentage points for both men and women.
These heterogeneous effects are consistent with the finding in Duflo (2001) that the program increased
years of schooling in sparsely populated areas but not in densely populated areas for men. Duflo (2001)
interprets this as evidence that the program increased men’s education mainly by decreasing the average
distance to a school. This could explain the difference in the results on the primary school attainment rate
across the two subsamples, but has no explanatory power for the negative result on the secondary school
attainment rate in densely populated regions.
1.6 Mechanism
In this section, I investigate further the surprising finding of a negative effect on secondary school attainment
rate in densely populated regions. There are at least two possibilities: (1) building primary schools crowds
out resources available to secondary schools and deteriorates secondary school quality and (2) a sudden
increase in primary school availability may decrease primary education quality and hence the quality of
primary school graduates. I explore the heterogeneity in the results for sparsely and densely populated
regions and show that the first conjecture is more consistent with the data.
Deterioration in secondary education quality?
Teacher scarcity is always a challenge in Indonesia’s education system. Building primary schools increases
the aggregate demand for teachers. This could affect the availability of secondary school teachers. To test
this conjecture, I use the total number and average number of teachers per school in secondary education
across regions in the years after the INPRES-SD program and to check whether there is a differential change
in regions where more primary schools were constructed. Specifically, I estimate the following specification:
∑
6 ∑
6
yjt = αj + βt + (Pj dtl )γl + (Cj dtl )δl + εjt
l=2 l=2
where j denotes region, and t denotes the survey year: 1 indicates year 1973/74, 2 indicates year 1978/79,
3 indicates year 1983/84, 4 indicates year 1988/89, 5 indicates year 1993/94, and 6 indicates 1995/96. yjt
indicates the total or average number of secondary school teachers in year t in region j. dtl is a year dummy
12
indicating whether t = l. αj denotes the regency fixed effect, βj denotes the year fixed effect. Pj is the school
construction intensity in regency j. εjk is the error term. Cj represents other region-specific variables. The
baseline year is 1973/74 (t = 1).
The results are presented in Table 1.4. The omitted baseline year is 1973/74. The negative coefficients
in column 1 and column 2 suggest that in regions where more schools were constructed, a smaller increase
is observed in the total number and the average number of teachers per school in secondary education in
later years. Reassuringly, column (3) shows a positive effect of the program on the total number of teachers
in primary school education, which is consistent with the teacher crowding out story.
Moreover, since the negative effect on secondary school attainment is only observed in densely populated
regions, this negative effect on the number of teachers in secondary school should also only exist in such
regions. Figure 1.7 separately plots the coefficients before the interaction term of the year dummy and school
construction intensity from the previous specification for sparsely and densely populated regions. A negative
effect on the average number of teachers in secondary education appears for densely populated regions but
not for sparsely populated regions. This confirms my conjecture that primary school construction increases
the demand for teachers, which crowds out teacher resources available for secondary school education and
leads to a negative effect on the secondary school attainment rate. Moreover, this phenomenon exists only
in densely populated regions.
Deterioration in primary education quality?
A second conjecture is that the deterioration in primary school quality leads to a decrease in student quality
among primary school graduates, and this in turn induces a lower secondary school attainment rate. To
meet the surge in demand for teachers created by the school expansion, primary school teacher quality may
have been sacrificed. (Jalal et al., 2009; Bharati et al., 2018)
To test it, I use a similar empirical specification. The outcome variable is the percentage of primary
school teachers who completed secondary school (or some college) in one regency in that census year. The
baseline year is 1971, before the school expansion program started.
Figure 1.8 shows the coefficients of the interaction term between the year fixed effect and school con-
struction intensity, separately for sparsely and densely populated regions, for my two proxies of teacher
13
quality: the percentage of teachers completing secondary school (top) and completing some college (bot-
tom). Consistent with the results in Bharati et al. (2018), I observe a negative impact of the program on
teacher quality in 1976, but not for later years. However, I do not find different patterns between sparsely
and densely populated regions. Therefore, this suggests that deterioration in primary education quality is
not the main reason for the negative impact on the secondary school attainment rate.
1.7 Conclusion
In this chapter, I reevaluate the INPRES primary school construction program in Indonesia and show that
it has an unintended consequence on secondary school education. Moreover, in densely populated regions,
the secondary school attainment rate declines for both men and women due to a crowding out of teacher
resources in secondary education due to primary school construction.
1.8 Figures
14
15
Figure 1.1: Map of primary school construction intensity

.6
Construction intensity
.2 0 .4
0 2 4 6 8
x
Sparsely Populated Areas Densely Populated Areas
Figure 1.2: Construction intensity for sparsely and densely populated areas
Note: This figure shows the distribution of the school construction intensity for the sparsely and densely populated regencies.
Density is calculated using population in 1971 divided by the total area using 1995 maps.
16
.03 .02
complete primary school
0 .01
-.01
24-22 21 to 19 18 to 16 15 to 13 12 to 10 9 to 7 6 to 4 3 to 2
Age at 1974
males females
Figure 1.3: Coefficients of the interactions: age in 1974 * program Intensity in the region of birth in the
education equation for completing primary school
Note: This figure reports estimates of the effect of school construction on primary school completion for 3-year cohorts separately
for males and females born in one regency. Dependent variable is the percentage of individuals completing primary school when
observed in 2010. The x-axis reports the age range (in 1974) for each cohort and the y-axis reports the estimated coefficient,
which can be interpreted as the effect of one additional primary school built per 1000 kids on primary school attainment rate in
that regency. The sample consists of individuals born between 1950 and 1972 observed in 2010 Indonesian census. The vertical
line indicates the youngest cohort that did not receive any treatment from school construction, since they were out of primary
school at 1974, when the first round of constructed primary schools became available. Confidence intervals of 95% were plotted.
The figure shows zero effect for individuals older than 13 at 1974, but an increasing positive effect for males younger than 13.
For females, the effect is smaller.
17
.005
complete secondary school
-.01 -.005
-.015 0
24-22 21 to 19 18 to 16 15 to 13 12 to 10 9 to 7 6 to 4 3 to 2
Age at 1974
males females
Figure 1.4: Coefficients of the interactions: age in 1974 * program intensity in the region of birth in the
education equation for completing secondary school
Note: This figure is built like Figure 1.3 but considers secondary school attainment rate. It reports estimates of the effect of
school construction on secondary school completion for 3-year cohorts separately for males and females born in one regency.
Dependent variable is the percentage of individuals completing secondary school when observed in 2010. The x-axis reports the
age range (in 1974) for each cohort and the y-axis reports the estimated coefficient, which can be interpreted as the effect of
one additional primary school built per 1000 kids on primary school attainment rate in that regency. The sample consists of
individuals born between 1950 and 1972 observed in 2010 Indonesian census. The vertical line indicates the youngest cohort that
did not receive any treatment from school construction, since they were out of primary school at 1974, when the first round of
constructed primary schools became available. Confidence intervals of 95% were plotted.
18
.03 .02
-.01 0 .01
-.02
24-22 21 to 19 18 to 16 15 to 13 12 to 10 9 to 7 6 to 4 3 to 2
Age at 1974
males females
.02
-.01 0 -.02 .01
24-22 21 to 19 18 to 16 15 to 13 12 to 10 9 to 7 6 to 4 3 to 2
Age at 1974
males females
Figure 1.5: Effect of the program on completing primary school (top) and completing secondary school
(bottom) in sparsely populated areas
Note: This figure is similar to Figure 1.3 and Figure 1.4 but focuses on a subgroup: sparsely populated regions. It reports
estimates of the effect of school construction on primary school completion (top) and secondary school completion (bottom) for
3-year cohorts separately for males and females in this subgroup. Sparsely populated regions are defined as those regions with
population density smaller than the weighted medium density in 1971.
19
.04 .02
-.02 0-.04
24-22 21 to 19 18 to 16 15 to 13 12 to 10 9 to 7 6 to 4 3 to 2
Age at 1974
males females
.02
-.02 -.04 0
24-22 21 to 19 18 to 16 15 to 13 12 to 10 9 to 7 6 to 4 3 to 2
Age at 1974
males females
Figure 1.6: Effect of the program on completing primary school (top) and completing secondary school
(bottom) in densely populated areas
Note: This figure is built like Figure 1.5 but focuses on the other subgroup: densely populated regions. It reports estimates of
the effect of school construction on primary school completion (top) and secondary school completion (bottom) for 3-year cohorts
separately for males and females in this subgroup. Densely populated regions are defined as those regions with population density
larger than the weighted medium density in 1971.
20
.5
0
-.5
-1
-1.5
1973/74 1978/79 1983/84 1988/89 1993/94 1995/96

Year
Sparsely Densely
Figure 1.7: Coefficients of the interactions: census year * program intensity in the regency in average number
of secondary school teachers equation
Note: This figure reports estimates of the effect of school construction on average number of teachers in secondary school across
different years in sparsely populated areas and densely populated areas. The baseline year is 1973/74. The data was provided by
Indonesian Education Ministry and was collected in Duflo (2001). The dependent variable was the average number of teachers
in secondary school across different regencies. This figure supports the argument that the negative effect on secondary school
attainment is due to teacher resource crowding out in densely populated regions because of primary school construction.
21
.2
.1
seniorhigh
0 -.1
-.2
1971 1976 1980 1985 1990

Year
Sparsely Densely
.05 0
somecollege
-.05 -.1
1971 1976 1980 1985 1990

Year
Sparsely Densely
Figure 1.8: Effect of the program on primary teacher education
Note: This figure reports estimates of the effect of school construction on the education level of primary school teacher in the two
subsamples: sparsely and densely populated regions. Dependent variable for the top panel is a dummy indicating the teacher
completes secondary school, for the bottom panel is a dummy indicating the teacher has some post-secondary education. The
baseline year is 1971. Primary school teacher information for each region is obtained from identifying those individuals who claim
their occupation is primary school teacher in the census year.
22
1.9 Tables
Table 1.1: Summary statistics
Old Cohorts Young Cohorts

2010 Census: Born between 1950 and 1961 Born between 1962 and 1974
Males Females Males Females
Education Attainment
Some School 0.22 0.33 0.10 0.16
Primary School 0.56 0.54 0.53 0.57
Secondary School 0.17 0.11 0.30 0.23
University or above 0.05 0.02 0.07 0.05
Observations 1,275,648 1,231,961 2,148,572 2,128,266
‘
Source: Indonesian Census 2010.
23
Table 1.2: Effect of school construction on education
All sample: Indicator for Completing at least:

Primary School Secondary School
Males: (1) (2) (3) (4) (5) (6)
Post × Intensity 0.022∗∗∗ 0.0079∗ 0.0062∗∗ -0.0077∗∗ -0.0012 -0.0014
(0.0060) (0.0042) (0.0025) (0.0033) (0.0030) (0.0018)
Dep. var. mean 0.847 0.306
Observations 6509 6302 6302 6509 6302 6302
Clusters 283 274 274 283 274 274
Adjusted R-squared 0.917 0.951 0.974 0.961 0.959 0.974
Duflo Controls: No Yes Yes No Yes Yes
Log-linear Trend: No No Yes No No Yes
Females:
24
Post × Intensity 0.020∗∗∗ 0.0027 0.0041 -0.021∗∗∗ -0.0064∗ -0.0053∗∗

(0.0065) (0.0052) (0.0034) (0.0049) (0.0033) (0.0023)
Dep. var. mean 0.766 0.211
Observations 6509 6302 6302 6509 6302 6302
Clusters 283 274 274 283 274 274
Adjusted R-squared 0.934 0.956 0.979 0.949 0.960 0.976
Duflo Controls: No Yes Yes No Yes Yes
Log-linear trend: No No Yes No No Yes
Notes: This table displays results on the effect of school building on education attainment (completing primary school and completing secondary school) for males and
females. Following the strategy of Duflo (2001), the sample consists of individuals born between either 1968 and 1972 or 1950 and 1961. Post refers to the treated cohort,
born between 1968 and 1972, while the untreated cohort was born between 1950 and 1961. Educational attainment data are taken from the Indonesian 2010 Census.
Intensity is the number of schools built in a region per 1,000 kids in the school-aged population. All columns include district fixed effect, school year fixed effect, school
year interacted with number of children at 1971. Duflo Controls consist of school year interacted with enrollment rate at 1971 and school year interacted with water
sanitization program. Standard errors are clustered at the birthplace district level. Significance levels: * 10%, ** 5%, *** 1%.
Source: Indonesian Census 2010
Table 1.3: Heterogeneous effect of school construction on education
Panel A:
Density < Medium: Indicator for Completing at least:
Primary School Secondary School
(1) (2) (3) (4)
Male Female Male Female
Post × Intensity 0.010∗∗ 0.0066 0.0069∗∗ -0.00066
(0.0051) (0.0062) (0.0032) (0.0037)
Dep. var. mean 0.820 0.736 0.270 0.183
Observations 4209 4209 4209 4209
Clusters 183 183 183 183
Adjusted R-squared 0.949 0.952 0.937 0.941
Panel B:
Density > Medium:
Post × Intensity 0.0054 -0.0060 -0.023∗∗∗ -0.023∗∗∗
(0.0066) (0.0073) (0.0055) (0.0073)
Dep. var. mean 0.873 0.795 0.341 0.239
Observations 2093 2093 2093 2093
Clusters 91 91 91 91
Duflo Controls: Yes Yes Yes Yes
Log-linear trend: Yes Yes Yes Yes
Notes: This table is similar to Table 1.2 and displays the heterogeneity effect of school building on education attainment
in sparsely and densely populated regions. All columns include district fixed effect, school year fixed effect, school year
interacted with number of children at 1971. Duflo Controls consist of school year interacted with enrollment rate at 1971
and school year interacted with water sanitization program. Standard errors are clustered at the birthplace district level.
Significance levels: * 10%, ** 5%, *** 1%.
Source: Indonesian Census 2010
25
Table 1.4: Effect of school construction on number of teachers in secondary and primary education
Secondary School Primary School
(1) (2) (3) (4)

Total number Average number Total number Average number
INPRES Intensity ×
year=1978/79 -13.9 -0.19∗ 28.0∗∗∗ -0.16∗∗
(8.48) (0.11) (9.91) (0.063)
year=1983/84 -43.5 -0.14 61.1∗∗ -0.069
(31.8) (0.18) (29.2) (0.11)
year=1988/89 -65.1 -0.20 89.4∗ -0.031
(49.1) (0.21) (45.5) (0.066)
year=1993/94 -59.3 -0.086 95.1∗ -0.062
(52.0) (0.22) (57.0) (0.075)
year=1995/96 -51.7 -0.074 177.4∗∗ 0.29∗
(60.0) (0.19) (68.6) (0.17)
Dep. var. mean in 1973/74 555.996 14.723 1529.996 6.762
Dep. var. mean in 1995/96 2583.821 22.989 4207.180 8.345
Observations 1,656 1,656 1,664 1,664
R-squared 0.928 0.929 0.942 0.829
Notes: This table displays the effect of school construction on the number of teachers in secondary and primary education
in the future years. Baseline year is 1973/74. All columns include district fixed effect, school year fixed effect, school year
interacted with number of children at 1971. Duflo Controls consist of school year interacted with enrollment rate at 1971
and school year interacted with water sanitization program. Standard errors are clustered at the birthplace district level.
Significance levels: * 10%, ** 5%, *** 1%.
Source: Indonesian Education Ministry
26
1.10 Appendix
Table 1.A1: Inpres Sekolah Dassar
Primary School Investment Program

1973/74-1988/89
New
Primary Total
Class- Primary Primary
New principal Primary allocation
rooms for schools to school
Financial Primary and school (billions
existing be reha- books
year schools teacher sport kits of current
primary bilitated (mln)
housing rupiah)
schools
1973/74 6,000 - - - 6.6 - 17.2

1974/75 6,000 - - - 6.9 - 19.7
1975/76 10,000 - 10,000 - 7.3 - 49.9
1976/77 10,000 - 16,000 - 8.6 - 57.3
1977/78 15,000 - 15,000 - 7.3 - 85.0
1978/79 15,000 15,000 15,000 - 8.5 - 111.8
1979/80 10,000 15,000 15,000 5,000 12.5 - 155.8
1980/81 14,000 20,000 20,000 7,500 14.0 - 249.8
1981/82 15,000 25,000 25,000 9,500 15.0 - 374.5
1982/83 22,600 35,000 25,000 20,000 30.0 50,000 267.4
1983/84 13,140 15,700 21,000 50,000 32.0 96,000 549.3
1984/85 2,200 12,500 31,000 60,000 32.0 157,799 526.1
1985/86 3,200 12,500 31,000 60,000 32.0 157,799 526.1
1986/87 2,200 10,000 95,000 44,070 32.6 120,000 495.9
1987/88 660 2,200 157,500 2,400 22.9 - 100.8
1988/89 250 1,350 6,000 2,650 18.5 - 112.5
‘
Note: For the first time in 1980/81 new first-phase units were started while second-phase units were still being added to
first-phase units built in the preceding year. The 1980/81 targets were 4,000 first-phase units and 10,000 second-phase
units.(Snodgrass et al., 1980, table 2)
Original source: Republik Indonesia, ”Nota Keuangan dan Rancangan Anggaran Pendapatan dan Belanja Negara, Tahun
1988/1989”
Source: Annex Table 4 of World Bank (1989), page 109.
27
Table 1.A2: Development grant to regions (in billion Rp.)
Grant to:
Primary Market
Health Rural
Period Villages District Provinces school Reforestation place Total
improv. road
building dev.
REPELITA I 26.8 46.3 - 17.2 - - - - 90.3

REPELITA II 94.8 304.0 317.4 323.6 94.6 76.5 12.5 - 1,223.4
REPELITA III 332.2 760.3 989.8 1,939.1 355.8 333.8 35.2 254.2 5,000.4
REPELITA IV 501.3 1,131.8 1,417.0 1,828.3 494.9 156.8 39.6 607.6 6,177.3
‘
Original source: Bappenas
Source: Table 4.4 of Hady (1989), page 151.
28
Figure 1.A.1: Number of newly appointed primary school teachers,1974-1998
160,000
141,324
140,000
121,100
120,000
103,350
100,000
91,050
80,000
75,000
29
60,000 60,000 58,840

60,000
50,000 50,000 50,000
40,000
21,000
20,000 18,000 17,050 16,800
14,000
10,000 10,000 10,150
8,000 8,000
5,160 4,100 4,100 5,000
0
1974/75 1975/76 1976/77 1977/78 1978/79 1979/80 1980/81 1981/82 1982/83 1983/84 1984/85 1985/86 1986/87 1987/88 1988/89 1989/90 1990/91 1991/92 1992/93 1993/94 1994/95 1995/96 1996/97 1997/98 1998/99
Original source:Ministry of National Education, Indonesia, 2005

Source:Jalal et al. (2009), Figure 1.1
Chapter 2
Schooling Expansion and the Female Marriage Age
2.1 Introduction
Female marriage age is important because it could affect fertility, maternal health and even wives’ autonomy
within households. (Jensen and Thornton, 2003; Kirbas et al., 2016) All these aspects could all have long-
term economic consequences. For example, fertility affects future labor force structure, maternal health
affects individuals’ long-term achievements, wives’ bargaining power affects within-household investment
decisions including children’s education.(Fergusson and Woodward, 1999; Chari et al., 2017; Sekhri and
Debnath, 2014)
What could affect female marriage age? We know that a woman’s own education is a key factor that
affects her marriage age (Carmichael, 2011; Ozier, 2018), how about others’ education? Individuals of the
same gender are competing among themselves for their potential spouses on the other side. Therefore, a
change in others’ education could potentially have a spillover effect. Understanding this is also important
since we do observe change in education levels worldwide in recent decades due to schooling expansion
policies.
In this chapter, I first build a two-to-one dimensional matching model in which men differ in education but
women differ in both education and age in a two-period overlapping generation framework, to understand
how female marriage age reacts to the change in the education distributions of men and women across
cohorts. Then I exploit the setting of primary school construction in Indonesia in the late 1970s introduced
in Chapter 1 as a quasi-natural experiment to answer the question empirically. The model gives different
predictions depending on the production function of male education and female youth in the marriage
surplus. In the empirical analysis, I find that for a given woman, when other women’s education decreases
holding everything else constant including their potential husbands’ education, she would be induced to
30
marry earlier and the spousal age gap increases, defined as age difference between husbands and wives.
Combined with the theoretical model, the empirical finding women marrying earlier when other women’s
education decreases suggests that in Indonesia, male education and female young age are complementary in
generating the marital surplus. One way to interpret this complementarity is that: if we think young age
is valued in marriage, then more educated men would value wives’ young age even more than less educated
1
men.
The theoretical model is a two-period OLG model in which women can choose to seek partners either in
the first period or the second to incorporate marriage age as a choice, but men all marry in the second period
to keep a tractable model. In any given year, the marriage market unfolds as in Choo and Siow (2006), where
the marital surplus generated by a couple depends on their types and some idiosyncratic draws modeled by
random vectors. Women differ in two dimensions (education and age) while men only differ in one dimension
(education). In a stationary equilibrium, a woman’s expected return from the marriage market should be
equalized between choosing to marry in the first period or the second.
How the percentage of women choosing to marry in the first period changes with respect to the education
distributions of men and women will depend on how male education interacts with female age and female
education in generating martial surplus. If there is no interaction between male education and female age
in generating marital surplus, female marriage age choice does not depend on the education distribution of
men or women. Intuitively, an individual’s gain from marriage comes from his/her marginal contribution to
the martial surplus. Hence, in this case, women will fully capture the contribution of their age to the martial
surplus in their own utilities. Therefore, the marriage age decision would be fully determined by how young
age and old age contribute differently to marital surplus.
Suppose that women marrying at a young age is ”good” for marital surplus; then, in the case in which
there is complementarity between men’s education and women’s young age, the model predicts that an
increase in the proportion of educated men would decrease the female marriage age (i.e., increase the per-
centage of women marrying in first period). If there is also complementarity between men’s education and
women’s education, an increase in the proportion of educated women would have the opposite effect: an
1 Of course, it can also be interpreted as female preference: if we think husband’s education is valuable in marriage, then
younger women would value the education even more than older women.
31
increase in the female marriage age (i.e. a decrease in the percentage of females marrying in first period).
Intuitively, an increase in the share of educated women would create a relative shortage of educated men
when there is complementarity between men’s education and women’s education. Hence it would have the
opposite effect of an increase in educated men.
In the empirical analysis, I use the INPRES SD program as an instrument variable for a change in
others’ education. The massive size of the primary school construction program in Indonesia provides a
large exogenous shock to a change of other women’s education.
However, two main obstacles remain in the empirical analysis.
First, since the program affects both men and women, it is usually unlikely to separate the effect of the
change in male education and the change in female education in other settings. However, in the current
setting, the fact that the spousal age gap is very large in Indonesia, about 5 years, can help tackle this
challenge. Therefore, for the first few cohorts of women who were impacted by the school construction,
their potential husbands’ education remained the same. By comparing these female cohorts with the older
cohorts who were not impacted by the program, I am able to observe how the female marriage age reacts to
the change in the female education distribution while holding the male education distribution unchanged.
In sparsely populated regions where there is no effect on female education, as expected, I do not observe
any effect on female age at first marriage or the spousal age gap. In densely populated regions where there
is a negative effect on secondary school attainment rate for women, I find a decrease in female age at first
marriage and an increase in the spousal age gap. A 10 percentage points decrease in the percentage of
secondary graduates led to a decrease of 1.1 year in average female marriage age and 0.35 years in spousal
age gap.
Secondly, since the program affects both a woman’s own education and other women’s education in her
birth cohort, I have to identify the effect of a change in her own education to capture the spill-over effect. I
observed that in Indonesia, female secondary school graduates marry on average four years later than female
primary school graduates both before and after the school expansion program. Hence you may think that
a decrease in female education would mechanically lead to a decrease in marriage age and an increase in
spousal age gap. Empirically, I find the overall effect to be much larger than the mechanical effect. In the
reduced form analysis, a 10 percentage points decrease in the percentage of secondary graduates led to a
32
decrease of 1.1 year in average female marriage age, however, the mechanical effect would be 10 percentage
points times 4, which is 0.4 years. Hence the additional 0.7 year is the effect of the education distribution
change on the female marriage age. A similar logic applies to the spousal age gap. A 10 percentage points
decrease in the percentage of secondary graduates led to an increase of 0.35 years in average spousal age
gap, however, the mechanical effect would be 10 percentage points times 1.5, which is 0.15 years. Hence the
additional 0.20 year is the effect of the education distribution change on spousal age gap.
The empirical result is consistent with the model prediction when there exists complementarity both
between male education and female education, and between male education and female youth in generating
marital surplus.
This chapter is related to several distinct literature. The modeling approach in this paper is built on pre-
vious research studying marriage age using OLG models (Bhaskar, 2015; Iyigun and Lafortune, 2016; Zhang,
2018) and a model of matching with TU with separable idiosyncratic preferences in marital surplus.(Choo
and Siow, 2006; Chiappori et al., 2017; Galichon and Salanié, 2015). Of the OLG papers, some only focus on
age (Bhaskar, 2015), while others simultaneously study individuals’ educational and marriage age decisions.
It also contributes to a growing literature studying the impact of education reform on marriage market.
Hener and Wilson (2018) studies a compulsory reform in UK and finds that women decrease the marital age
gap to avoid marrying less-qualified men. André and Dupraz (2018) studies school construction in Cameroon
and finds that education increases the likelihood of being in a polygamous union for both men and women. In
contrast to both of these papers, the present paper analyzes the effect via a general equilibrium framework.
This chapter complements a large literature on the impact of marriage market conditions on individuals’
outcomes. Most of the existing literature focuses on the sex ratio in the marriage market. (e.g. Abramitzky
et al., 2011; Angrist, 2002; Charles and Luoh, 2010) I focus on a distinct but equally important dimension
of marriage market conditions: the education distributions of men and women.
2.2 Model
In this section, I develop a two-period OLG matching model with Transferable Utility (TU) to study how a
change in the education distribution across birth cohorts may affect marriage market outcomes, in particular,
33
female marriage age. There are several important features:
• Individuals get utility from participating in the marriage market.
• Individuals’ education affect the marital surplus, for both men and women.
• Individuals’ age play an asymmetric role for men and women. Women’s age matters but not men’s.
in the surplus function. Much research has documented that female youth is more important than
male youth in the marriage market, this could be due either to the fundamental difference of female
age and male age in the household production function related to fertility, or due to a stronger male
preference for youth related beauty. (Low, 2017; Siow, 1998; Edlund, 2006; Dessy and Djebbari, 2010;
Zhang, 2018; Arunachalam and Naidu, 2006)
• Women are allowed to choose to participate in the marriage market either early or late. However, a
woman who participated in period 1 cannot enter into the marriage market in period 2, whether she
remains married or single. This can be rationalized as the existence of a stigma associated with women
who have tried to seek partners in an early period.
• Each marriage market is modeled as a matching model with TU with idiosyncratic random preference
draws. The existence of random preference draws allows the existence of couples of all types with
respect to male education, female education and female age, which suits the reality more compared to
the static model. In each marriage market, women differ in both education and age, while men only
differ in education.
Two-period OLG
There is an infinite number of periods, r=1,2.... At the beginning of each period, a unit mass of men and
a unit mass of women enter the economy. Assume people can only make marriage decisions in the first two
periods, therefore the problem is simplified to a two-period OLG problem. Furthermore, to focus on female
marriage age decision, I assume that women choose whether they want to seek partners in period 1 (when
they are young) or delay this process to period 2 (when they are old). Men always seek partners in period
2. Individuals differ in their education type, L or H. In the model, let’s focus on the utilities individuals
obtain from the marriage market.
34
Marriage market at one period
I will first discuss how marriage market unfolds given women’s marriage timing choices in any given period.
Individual types
Women can choose to participate in one of the two periods, hence in any period, there are at most four
types of women: Low education and Young (L1 ), Low education and Old (L2 ), High education and Young
(H1 ), and High education and Old (H2 ). Men only participate in period 2, hence there are two types of men
in any period: Low education (L) and High education (H).
Utilities and matching surplus
Denote x as the type of women and X as the type set, i.e. x ∈ X = {L1 , L2 , H1 , H2 }. Similarly, denote
y as the type of men and Y as the type set, i.e. y ∈ Y = {L, H}. To include the possibility of being single,
denote X0 = X ∪ ∅, Y0 = Y ∪ ∅. Suppose that a woman i with type x and a man j with type y form a
couple. I assume their lifetime utilities are as following:
woman i’s utility: uij = αxy + τij + εiy
man j’s utility: vij = γxy − τij + ηxj
αxy , γxy indicate the systematic part of the utility each individual gets from the marriage depending on their
2
types. τij represents the transfer between i and j, which is going to be determined in equilibrium. εiy , ηxj
represent the individuals’ idiosyncratic tastes in partner types. Notice they only depend on the partners’
types.
For individual singles, their utilities will be:
ui∅ = αx∅ + εi∅
v∅j = γ∅y + η∅j
2 τ can be either positive or negative.
35
3
Without loss of generality, we can normalize αx∅ = 0 and γ∅j = 0. Then αxy and γxy can be interpreted
as the net systematic gain from marriage.
4
There are three important assumptions underlying my specification of individuals’utility:
• There exists a transfer technology among a couple to transfer their utilities one to one without loss,
which is the basic feature of a matching model with TU.
• Both transfer and the random taste terms are additive to the systematic part.
• The random terms are individual specific but only depend on the partner’s type.
This utility specification may seem restrictive, but it allows for ”matching on unobservables” and allows
model tractability. What it rules out is the ”chemistry” term between two individuals conditional on their
types, i.e., some unobserved preferences of one individual towards some unobserved characteristic of one
partner.
Stable Matching
Given the population and type distribution,Gx , Gy in a marriage market, a matching is defined as a
measure µ on set X × Y and a set of payoffs {ui , vj , i ∈ I, j ∈ J} such that ui + vj = αxy + γxy + εiy + ηxj
for any matched couple (i, j). In other words, a matching specifies who marries with whom and how each
mathched couple divides the surplus. Notice that the female type distribution Gx is endogenously determined
by female marriage timing choices and the exogenous type distribution, denoted as Ef = (nL , nH ). And the
male type distribution Gy is the same as the exogenous type distribution, denoted as Em = (mL , mH ).
In a stable matching, there are two requirements:
• (Individual rationality) Any matched individual is weakly better off than being single.
ui ≥ εi0 , vj ≥ η0j , ∀i ∈ I, j ∈ J
3 Because we can always define α̃

xy = αxy − αx∅ ; γ̃xy = γxy − γ∅y , as the systematic utility surplus an individual obtain
from marriage compared to being single.
4 This is the ”Separability” assumption in Galichon and Salanié (2015). As noted in that paper, what matters in the model
is the surplus a couple can jointly achieve, i.e. αxy + γxy + εiy + ηxj in our case here. How we attribute this surplus to male
preference or female preference doesn’t matter. For example, it can be the case that women don’t have any random taste for
men and their utilities without any transfer is αxy . Men’s utilities are γxy + εiy + ηxj , indicating that man j not only has a
random draw ηxj depending on women’s type, but also has own-type specific random taste for a particular woman i, represented
by εiy . The solution to the model is the same no matter how we interpret the joint surplus into people’s preference. The same
assumption is also imposed in Choo and Siow (2006) and Chiappori et al. (2017).
36
• (No blocking pair) There doesn’t exist any two individuals, woman i and man j, who are currently
not matched to each other but would both rather match to each other compared with their current
condition.
ui + vj ≥ αxy + γxy + εiy + ηxj , ∀i ∈ I, j ∈ J
Therefore, in any stable matching and given equilibrium transfers τij , the following conditions hold true:
Woman i chooses j ∗ (i) : j ∗ (i) = max uij

j∈J0
Man j chooses i∗ (j) : i∗ (j) = max vij

i∈I0
where J0 represent all men and the possibility of being single, I0 represent all women and the possibility of
being single.
Lemma 2.1. For any stable matching, there exists two vectors U xy and V xy such that:
(i) Woman i of type x achieves utility:
ũi = max(U xy + εiy )

y∈Y0
and she matches some man whose type y achieves the maximum;
(ii) Man j of type y achieves utility:
v˜j = max (V xy + ηxj )

x∈X0
and he matches some woman whose type x achieves the maximum.
(iii) If there exist women of type x matched with men of type y at equilibrium, then
U xy + V xy = αxy + γxy
This lemma has been proved in Chiappori et al. (2017); Galichon and Salanié (2015). I’ll write a short
version of the proof in the appendix. With TU, the additive structure and type-specific heterogeneity, this
37
two-sided matching problem is simplified to a one-sided discrete choice problem.
Solutions with Gumbel distribution
If we further assume Gumbel distribution for ε, η, a closed form solution of the stable matching and the
expected utilities of each type can be derived. From now on, let’s assume the random terms εiy , ηxj follow
independent Gumbel distributions G(−k, 1), with k ≃ 0.5772 being the Euler constant. With the properties
of the Gumbel distribution and Lemma 2.1, for a given woman i of type x,
µy|x := Pr (Woman i (of type x) matched with a man of type y)
exp(U xy )
= ∑
1 + y∈Y exp(U xy )
µ∅|x := Pr (Woman i (of type x) is single)
1
= ∑
1+ y∈Y exp(U xy )
Therefore,
µy|x
= exp(U xy ), ∀x ∈ X
µ∅|x
Similar logic applies to the other side: men:
µx|y
= exp(V xy ), ∀y ∈ Y
µ∅|y
Denote nx , my as the population of each type. Note that nx depends on women’s participation choices.
Denote µxy as the mass of matched couples between woman of type x and man type y, note that µxy = µyx
by construction since it’s a one-to-one match; denote µx0 as the mass of single women of type x, µ0y as the
mass of single men of type y; then we have:
µ2xy
= exp(U xy + V xy ) = exp(αxy + γxy )
µx0 µ0y
Denote Φxy = αxy + γxy . Then given Φxy , the previous equation provides a matching function between the
38
mass of any couple type and the probabilities of singlehood. With the following feasibility constraints, we
can construct a system of equations with |X| + |Y | unknowns (probabilities of singlehood for each type) and
|X| + |Y | equations. Decker et al. (2013) shows the existence and uniqueness of the solution to this system.
µx0 + µxL + µxH = nx , ∀x ∈ {L1 , L2 , H1 , H2 }
µ0y + µL1 y + µL2y + µH1 y + µH2 y = my , ∀y ∈ {L, H}
Moreover, we can recover the expected utilities each type gets from participating in this marriage market.
With the properties of Gumbel distributions,
∑ µx0
ux := E[ũi ] = E[max(U xy + εiy )] = ln(1 + exp(U xy )) = −ln(µ∅|x ) = −ln( )
y∈Y0 nx
y∈Y
∑ µ0y
vy := E[v˜j ] = E[max (V xy + ηxj )]ln(1 + exp(U xy )) = −ln(µ∅|y ) = −ln( )
x∈X0 my
y∈Y
In this case, the expected utility has one-to-one correspondence with the single rate in this case. The smaller
5
the single rate is, the larger the expected utility is.
Stationary equilibrium with OLG
Before participating in any marriage market, the strategic choice for each woman in the model is to choose
when to enter into the marriage market, given the predetermined education distribution of women and men,
denoted by (Ef , Em ). For a woman with education e, if she chooses to enter in period 2 instead of period
1, this increases the expected marital return of all women in period 1 marriage market and decreases the
6
expected return of all women in period 2 marriage market. In a stationary equilibrium, the percentage of
women who choose to wait until period 2 equates women’s expected returns in the two marriage markets.
Denote the percentage of women with education e who choose to seek partners in period 1 (or period 2) as
qe1 ( or qe2 ), assume e ∈ {L, H}. Of course, qe1 + qe2 = 1, ∀e.
5 This is a specific property of the Gumbel distribution.

6 Asproved in Galichon and Salanié (2017), an addition of one woman hurts all women and benefits all men; an addition of
one man hurts all men and benefits all women.
39
We say the marriage market with distribution of female types and male types as (Gx , Gy ) is the induced
marriage market of a strategy vector q if the distribution of female types (four) and male types (two) in the
marriage market is (Gx , Gy ) when women adopt strategy q. Note that for male distribution, Gy = Gm , ∀q.
{ 1 2 1 2
}
Definition 2.1. Strategy vector q = qH , qH , qL , qL forms a stationary equilibrium if uH1 = uH2 and
uL1 = uL2 in the induced marriage market, where ue1 (ue2 ) is the expected marriage payoff of women with
education e who choose to enter the marriage market in period 1 (period 2).
Denote Φxy = αxy + γxy . We have woman’s type x ∈ {L1 , L2 , H1 , H2 }, man’s type y ∈ {L, H}.
Proposition 2.1. There exists a unique stationary equilibrium, and the equilibrium strategy q satisfy:
1
qL
min(ΦL1 L − ΦL2 L , ΦL1 H − ΦL2 H ) ≤ ln( 2 ) ≤ max(ΦL1 L − ΦL2 L , ΦL1 H − ΦL2 H )
qL
1
qH
min(ΦH1 L − ΦH2 L , ΦH1 H − ΦH2 H ) ≤ ln( 2 ) ≤ max(ΦH1 L − ΦH2 L , ΦH1 H − ΦH2 H )
qH
Proof. See the appendix
Intuitively, the equilibrium percentage of women who decide to participate in period 1 depends on the
marital surplus difference between marrying in period 1 and period 2 given any partner type. The larger
the difference, the higher the percentage of women seeking partners in period 1.
One corollary of Proposition 2.1 is that the equilibrium strategy q satisfies the following conditions:
0 < qe1 < 1, 0 < qe2 < 1, ∀e ∈ {L, H}. In equilibrium, it will never happen that all women of the same
education type choose to participate in period 1 or period 2, as long as the surplus Φ terms are bounded.
Intuitively, if all women of one education type choose to participate in period 1, a woman could benefit by
choosing to participate in period 2, which makes her the only older woman with that education. The scarcity
of this type would earn large marital returns for the woman. Since the support of Gumbel distribution is R,
the potential return could be large enough such that being the only one of older type in period 2 is more
rewarded than participating in period 1 no matter how large the surplus difference Φe1 y − Φe2 y is as long as
7
it is finite.
7 One can also understand this in terms of the probability of singlehood. In the model, single probability has one-to-one
correspondence with the expected utility: the lower the single probability, the higher the expected marital return. For a woman
40
Proposition 2.2. If given education type e ∈ {L, H}, Φe1 H − Φe2 H = Φe1 L − Φe2 L , then qe1 , qe2 are uniquely
pinned down by:

exp(Φe1 L ) exp(Φe2 L )
qe1 = , qe2 =
exp(Φe1 L ) + exp(Φe2 L ) exp(Φe1 L ) + exp(Φe2 L )
Φe1 H − Φe2 H = Φe1 L − Φe2 L indicates that the gain of female youth in surplus is independent of men’s
8
education. This means that male education and female youth don’t interact in the marital surplus, hence
the marginal contribution of female youth in the surplus doesn’t depend on their partner’s education type
either. In a matching model, individuals’ marital gain come from their marginal contributions to the surplus.
In this case, women get all the benefit (or cost) of female youth if they choose to participate in period 1.
Their choice of marriage market is fully pinned down by this difference in marital surplus independent of
the education distribution of both sides.
Comparative statics
School construction would lead to a dynamic change in the population education. However, unlike in Bhaskar
(2015), the current model doesn’t focus on the transitory period, which is of less interest in this paper. I
will concentrate instead on how the stationary equilibrium changes in response to the change in population
education. For simplicity, let’s assume male population and female population are equal. Without loss of
generality, I can also normalize the population of each side to 1 since the model has constant returns to
scale. Let us analyze how female marriage age decision would change when the education distribution of
men or women changes, respectively.
Proposition 2.3. Denote female education distribution as Gf = (nL , 1−nL ) and male education distribution
as Gm = (mL , 1 − mL ).
Keeping n constant, ∀y ∈ {L, H}, a decrease in mL would
who is the only one of older type in period 2, she would almost for sure get married since the men who have very large draws
for this particular older type would compete fiercely among themselves and want to marry her.
8 It can depend on female education, e. For example, the return of female youth is larger for less educated women than
more educated women, or the other way around. The empirical observation that less educated women marry earlier supports
the case that the gain is larger for less educated women.
41
• increase qe1 , if Φe1 H − Φe2 H > Φe1 L − Φe2 L ;
• decrease qe1 , if Φe1 H − Φe2 H < Φe1 L − Φe2 L .
Proof. See the appendix.
If the percentage of more-educated men increases, the equilibrium percentage of women marrying in
9
period 1 increases if male education and female youth are complementary in the marital surplus; the
equilibrium percentage of women marrying in period 1 decreases if instead male education and female
maturity are complementary in the marital surplus. Notice that whether the marital surplus is super-
modular in male education and female education does not matter.
A stable matching maximizes the total social surplus in a TU framework. (Shapley and Shubik, 1971)
When male education and female youth are complementary, the social surplus is larger if we pair more
educated men with younger women. Hence when there is a decrease in mL , the existence of more educated
men would induce more women to marry in period 1 to take advantage of the higher social surplus. Vice
versa.
Proposition 2.4. Denote female education distribution as Nf = (nL , 1−nL ) and male education distribution
as Nm = (mL , 1 − mL ).
Further assume super-modularity in men’s education and women’s education: holding m constant, ∀e ∈
{L, H}, a decrease in nL would
• decrease qe1 , if Φe1 H − Φe2 H > Φe1 L − Φe2 L
• increase qe1 , if Φe1 H − Φe2 H < Φe1 L − Φe2 L
A change in female education distribution affects the equilibrium female choice by affecting the potential
gain of female youth via affecting the potential distribution of men a woman can marry to. If nL decreases,
for a given woman, other women are more educated. They are more likely to marry with more educated
9 There are at least four ways to interpret the complementarity between male education and female youth. For example, (1)
all men prefer female youth and more educated men value female youth more than less educated men. (2) All women prefer
more educated men and younger women value male education more than older women. (3) All men dislike female youth and
more educated men dislike female youth less than less educated men. (4) All women dislike more educated men and younger
women dislike more educated men less than older women. Of course, the first and second seem to be more plausible than the
last two.
42
men due to the complementarity in education. Therefore, on the market, more educated men are more
scarce, which will discourage all women from participating in period 1 as predicted in Proposition 2.3 if male
education and female youth are complementary.
2.3 The Marriage Market in Indonesia
Marriage traditions differ in Indonesia’s hundreds of different ethnolinguistic groups. However, under the
influence of national policies, certain commonalities also emerge. (Frederick and Worden, 1993)
With more than 87% population as Muslim (according to 2010 census), polygamy is legal. However, only
2% of marriage is polygamous (Jones, 1994).
Arranged marriage still exists, but the percentage is decreasing. Most marriages require the consent of
the children, especially for the groom’s family. (Malhotra, 1991) In Indonesia, average female marriage age
is about 19. It’s low but similar to other southeastern Asian countries.
Divorce rate used to be very high (about 60%) around 1960s, however, it has been decreasing since 1970s
and is now less than 40%. Fertility rate has also been declining since 1970s when average education increases.
There is no evidence for son preference in Indonesia. (Frederick and Worden, 1993)
2.4 Data
To avoid truncation problems caused by the fear that young men and women who are single in the survey
year may marry in future years, I choose the latest censuses available from IPUMS. I use information
from the 10% sample of the Indonesian Population Census 2010 and the 0.51% sample of the Indonesian
Intercensal Population Survey (SUPAS) 2005 downloaded from IPUMS International 10 . Education, current
martial status and current spousal information is also available in both censuses. However, only SUPAS
2005 records detailed lifetime marital outcomes such as age at first marriage and number of marriages;
moreover, only women were surveyed on those questions. Men’s marriage age can be proxied using the wife’s
information if both spouses are in their first marriage.
10 Minnesota Population Center. Integrated Public Use Microdata Series, International: Version 7.1 [dataset]. Minneapolis,
MN: IPUMS, 2018. https://doi.org/10.18128/D020.V7.1. I would also like to acknowledge the statistical agency that originally
produced the data: Statistics Indonesia
43
Table 2.1 displays the descriptive statistics of individuals’ marital outcomes and characteristics of married
people using the 2010 census. For married couples, on average, husbands are 5 years older than wives, and
this is much larger than the gap of 2 years observed in the US.
Table 2.2 presents the detailed summary statistics of female first marriage age by education. Higher
educated women marry later. Younger cohorts tend to marry later, but the difference is very small. More
importantly, the difference of first marriage age between women with different education is very stable across
old and young cohorts.
Figure 2.1 displays the matching patterns with respect to education for the observed married couples in
which wives were aged 40-50 in 2010. Left panel shows the raw frequency numbers and the right panel shows
the weighted percentage. For all four education levels, the percentage of people who have spouses with same
education is the largest, which is a universal phenomenon that’s documented in the literature.
2.5 Main Results
In this section, I present my empirical results on female marriage age and the spousal age gap. I first
show reduced-form event study results on the impact of school construction on female marriage age and the
spousal age gap for the treated female cohorts, separately for sparsely and densely populated regions. I then
provide the 2SLS estimate of how female marriage age and the spousal age gap change with respect to the
female education distribution using the school construction program as an instrument variable for female
education distribution.
Reduced-form results
The empirical specification for the reduced-form results is the same as the previous specification for the
education results.
Figure 2.2 presents the coefficients of the interaction between the birth cohort dummy and school con-
struction intensity on female age at first marriage (top) and the spousal age gap (bottom) by female age
group in 1974 in sparsely populated regions. All coefficients of the interaction between the birth cohort
dummy and school construction intensity are not significantly different from zero. This is expected since
44
female education was not substantially affected by the school construction program in sparsely populated
regions. The results for densely populated regions are presented in Figure 2.3. The top panel shows a
negative effect on female age at first marriage for one additional primary school being built in the region.
Correspondingly, the bottom panel shows a positive effect on the spousal age gap.
Difference-in-differences estimates are presented in Table 2.3. The sample includes women born between
1953 and 1961 who were older than 12 in 1974, and women born between 1965 and 1970. ”Post” indicates
women born between 1965 and 1970. Columns (1) and (2) show the estimates for sparsely populated regions.
Neither female age at first marriage nor the spousal age gap was impacted. Columns (3) and (4) present the
estimates for densely populated regions and suggest that one additional school being constructed decreased
the average female age at first marriage by 0.25 years and increased the spousal age gap by 0.075 years.
2SLS estimate
In chapter 1, I showed that:
• Result 1: The program has a positive effect on primary school attainment rate for men and a surprising
negative effect on secondary school attainment rate for women.
• Result 2: In sparsely populated regions, there is a positive effect on primary school attainment and
secondary school attainment rate for men but zero effect for women.
• Result 3: In densely populated regions, for both men and women, there is no effect on primary school
attainment rate, but negative effect on secondary school attainment rate.
In light of the different effects on education in sparsely and densely populated regions, I should expect
different results on marriage market outcomes in sparsely and densely populated regions. Moreover, I should
expect zero effect on female marriage age or spousal age gap in sparsely populated regions since female
education is not impacted.
Since I lack first stage results for female education in sparsely populated regions, in this subsection, l
focus on densely populated regions. Consider the following equation that characterizes how own education
45
and the education distribution may affect an individual’s choice of marriage age and the spousal age gap:
yijk = αj + βk + Dijk c + Ejk b + νijk
where αj is a region fixed effect, βk is a birth cohort fixed effect. yijk denotes the marriage age or spousal
age gap of a woman i born in year k in region j, Dijk is a dummy variable denoting whether woman i
completes secondary school, and Ejk denotes the female secondary school attainment rate for birth cohort
k in region j.
The coefficient of interest is b, indicating the impact of an increase in the proportion of educated women
on female marriage age and the spousal age gap. However, ordinary least-squares (OLS) estimates of this
equation may lead to biased estimates if there is correlation between Ejk and νijk or between Dijk and νijk .
Unobserved individual characteristics such as ability or family attitudes could affect both her education
attainment and marriage decisions, leading to a correlation between Dijk and νijk . Unobserved region cohort
specific characteristics such as a construction of entertainment facilities or a promotion of family planning
policies could affect the education attainment and marriage decisions of a few cohorts in the region, leading
to a correlation between Ejk with νijk .
To address this issue, let us take the average across individuals i given birth cohort k and region j:
ȳjk = αj + βk + Ejk (b + c) + ν̄jk
The school construction program provides a good instrument variable for Ejk , and hence I can obtain a
valid estimate of (b + c). OLS and 2SLS estimates of this specification are shown in Panel A of Table 2.4
for female age at first marriage and the spousal age gap. The IV estimate for female age at first marriage,
although imprecisely estimated, indicates that increasing the share of female secondary graduates by 10
percentage points would increase the average female marriage age by 1.09 years. The IV estimate for the
spousal age gap indicates that increasing the share of female secondary graduates by 10 percentage points
would decrease the average spousal age gap by 0.35 years.
Separating the Effects of Own Education and the Education Distribution. From the previous
46
specification, we know that:
E(yijk |Dijk = 0) = αi + βk + Ejk b
E(yijk |Dijk = 1) = αi + βk + c + Ejk b
Hence, c = E(yijk |D = 1) − E(yijk |D = 0), which can be empirically estimated as the difference in the
outcome variables conditional on education level. From the summary statistics, we know that the difference
in age at first marriage age between female secondary school graduates and female primary school graduates
is 4 years, while the difference in the spousal age gap between secondary school graduates and primary
school graduates is (-1.5) years. Comparing this with previous estimates indicates that when controlling for
a woman’s education, increasing the percentage of female secondary graduates by 10 percentage points in
her birth cohort would increase her first marriage age by 0.69 years and decrease the spousal age gap by 0.2
years.
Interpretation
My findings on the marriage market are consistent with the model when there is complementarity between
higher education of husbands and younger age of wives in the marital surplus. A decrease in the percentage
of female secondary school graduates creates a relative abundance of secondary school graduate men, which
would encourage more women to marry earlier. In the Indonesian setting, the regions with more school
constructed experienced a smaller increase in female secondary graduates, which created a relative abundance
of male secondary graduates in the marriage market, and this encouraged even more women to marry earlier.
2.6 Discussion
Since places where people get married are not recorded in the data, the birth place is used to define a
marriage market. This could create measurement error if many people marry to people outside of their birth
place. In the sample, the percentage of couples with different birth regencies is 77%. In densely populated
regencies, it’s about 73%. And in sparsely populated regencies, it’s a bit larger, about 81%.
The previous model is one way to rationalize the empirical result. There could be other potential stories.
47
For example, if we think women prefer to marry earlier when others do irrespective of the marriage market
conditions. Then when there are more less educated women who tend to marry early, then all women marry
earlier. However, this sort of stories relies on the ad-hoc preference which should be rationalized itself by
certain kinds of micro foundations. With this regard, I argue that the current model provides one possible
micro foundation for all potential stories.
As for the model, endogenizing both education choice and marriage age choice would be very rewarding
in future work.
2.7 Conclusion
I have shown that women adjust their marriage age when the average education of women changes in the local
marriage market. Exploiting a massive school construction program in the late 1970s in Indonesia, I analyze
the age at first marriage of the first few cohorts of women who were exposed to the school construction
program. Since the spousal age gap is on average 5 years, these women’s potential husbands’ educations
were minimally impacted. I find that women decrease their marriage age when there is a decrease in the
average secondary school attainment rate of other women in the same cohort. To explain this, I construct
a two-to-one dimensional matching model embedding female choice of marriage age into a two-period OLG
framework and show that if with respect to marital surplus, (1) there exists complementarity between
male education and female education, (2) there exists complementarity between male education and female
youth, then women will decrease their marriage age in response to a decrease in other women’s education.
Intuitively, when the education of other women decreases, they tend to marry less-educated men due to
the complementarity with education, and this creates an abundance of more-educated men. Due to the
complementarity between men’s education and women’s young age, the abundance of more-educated men
induces women to marry earlier.
This study is a step toward further understanding the effect of market conditions on individuals’ marriage
decisions and outcomes. Education expansion policies have been observed around the world. The empirical
finding that female marriage age responds to other women’s education has direct policy implications. When
evaluating education policies with potential market-level impacts, we as researchers should consider both
48
the direct effect on individuals and the indirect effect via changing market conditions.
2.8 Figures
49
50
Figure 2.1: Marriage frequencies (left) and marriage proportions (right) by education for females
.3 .2
Female first marriage age
0 .1
-.1
-.2
24-22 21 to 19 18 to 16 15 to 13 12 to 10 9 to 7 6 to 4 3 to 2
Age at 1974
.3
.2
Spousal age gap
0 .1 -.1
-.2
24-22 21 to 19 18 to 16 15 to 13 12 to 10 9 to 7 6 to 4 3 to 2
Age at 1974
Figure 2.2: Effect of the program on female age at first marriage (top) and the spousal age gap (bottom) in
sparsely populated areas
Note: This figure reports estimates of the effect of school construction on female first marriage age (top) and spousal age gap
(bottom) for 3-year cohorts females in sparsely populated areas. The x-axis reports the age range (in 1974) for each cohort and
the y-axis reports the estimated coefficient, which can be interpreted as the effect of one additional primary school built per 1000
kids on primary school attainment rate in that regency.
51
.4
Female first marriage age
-.2 0 -.4
-.6 .2
24-22 21 to 19 18 to 16 15 to 13 12 to 10 9 to 7 6 to 4 3 to 2
Age at 1974
.4
.2
Spousal age gap
-.2 0 -.4
-.6
24-22 21 to 19 18 to 16 15 to 13 12 to 10 9 to 7 6 to 4 3 to 2
Age at 1974
Figure 2.3: Effect of the program on female age at first marriage (top) and the spousal age gap (bottom) in
densely populated areas
Note: This figure reports estimates of the effect of school construction on female first marriage age (top) and spousal age gap
(bottom) for 3-year cohorts females in densely populated areas. The x-axis reports the age range (in 1974) for each cohort and
the y-axis reports the estimated coefficient, which can be interpreted as the effect of one additional primary school built per 1000
kids on primary school attainment rate in that regency.
52
2.9 Tables
Table 2.1: Summary statistics
Old Cohorts Young Cohorts

2010 Census: Born between 1950 and 1961 Born between 1962 and 1974
Males Females Males Females
Marriage outcomes
Never Married ( = 1) 0.01 0.02 0.04 0.03
Separated ( = 1) 0.01 0.04 0.02 0.04
Married Couples
Husband Age minus Wife Age 5.83 4.84 4.45 4.60
Husband More Educated (=1) 0.26 0.27 0.27 0.25

Same Education (=1) 0.62 0.63 0.57 0.59
Wife More Educated (=1) 0.13 0.11 0.16 0.16
Years of Schooling Gap (Hus- 0.57 0.68 0.44 0.37
band’s minus Wife’s)
Observations 1,275,648 1,231,961 2,148,572 2,128,266
‘
53
Table 2.2: First marriage age
mean sd p10 p25 p50 p75 p90 count

Women born between 1950 and 1961:
Some School 17.91 (4.19) 14.00 15.00 17.00 20.00 23.00 26,550
Primary School 18.87 (4.21) 15.00 16.00 18.00 21.00 24.00 27,509
Secondary School 22.19 (4.55) 17.00 19.00 22.00 24.00 27.00 6,971
University or above 24.39 (4.45) 19.00 21.00 24.00 27.00 30.00 938
Women born between 1962 and 1974:
Some School 17.75 (3.79) 14.00 15.00 17.00 20.00 22.00 22,872
Primary School 18.73 (3.74) 15.00 16.00 18.00 20.00 24.00 51,265
Secondary School 22.58 (4.00) 18.00 20.00 22.00 25.00 28.00 22,624
University or above 25.43 (3.97) 20.00 23.00 25.00 28.00 30.00 3,080
‘
54
Table 2.3: Reduced-form effect of school construction on female marriage outcomes
Density < Medium Density > Medium

Panel A:
Female First Marriage Age:
(1) (2) (3) (4)
Post × Intensity -0.054 0.024 -0.27∗∗ -0.25
(0.076) (0.078) (0.12) (0.16)
Dep. var. mean 19.231 19.153
Observations 2664 2664 1365 1365
Clusters 183 183 91 91
Panel B:
Spousal age gap
Post × Intensity 0.030 0.057 0.18∗∗∗ 0.075∗
(0.025) (0.040) (0.044) (0.040)
Dep. var. mean 4.838 4.776
Observations 2745 2745 1365 1365
Clusters 183 183 91 91
Log-linear trend: No Yes No Yes
Notes: This table displays the reduced-form effect of school construction on female first marriage age (top) and spousal age
gap (bottom) in sparsely populated regions (left) and densely populated regions (right). Post refers to the first few treated
cohorts that were affected by the school construction program, i.e., those born between 1965 and 1970, while the untreated
cohort was born between 1953 and 1961. Female first marriage data is taken from Indonesian SUPAS 2005. Spousal age
gap data is taken from the Indonesian 2010 Census. All columns include district fixed effect, school year fixed effect, school
year interacted with number of children at 1971. Duflo Controls consist of school year interacted with enrollment rate at
1971 and school year interacted with water sanitization program. Standard errors are clustered at the birthplace district
level. Significance levels: * 10%, ** 5%, *** 1%.
Source: Indonesian SUPAS 2005, Indonesian Census 2010
55
Table 2.4: Results of female education distribution on female marriage outcomes
Female first marriage age Spousal age gap

Panel A:
OLS and IV
(1) OLS (2) IV (3) OLS (4) IV
Percentage of females 1.92 10.9∗ -2.71∗∗∗ -3.49∗
with secondary degree
(1.37) (6.53) (0.38) (2.03)
First Stage F statistics 12.929 12.929
Dep. var. mean 19.647 4.550
Observations 1365 1365 1365 1365
Clusters 91 91 91 91
Log-linear trend: Yes Yes Yes Yes
Panel B:
First stage and Reduced form
Female age
Complete Spousal age
at first
Secondary gap
marriage
Post × Intensity -0.021∗∗∗ -0.23∗ 0.075∗
(0.0059) (0.14) (0.040)
Dep. var. mean 0.261 19.647 4.550
Observations 1365 1365 1365
Clusters 91 91 91
Adjusted R-squared 0.988 0.763 0.917
Duflo Controls: Yes Yes Yes
Log-linear trend: Yes Yes Yes
Notes: This table displays the OLS and IV estimates of the effect of female education distribution on marriage market
outcomes. All columns include district fixed effect, school year fixed effect, school year interacted with number of children
at 1971. Post refers to the first few treated cohorts that were affected by the school construction program, i.e., those born
between 1965 and 1970, while the untreated cohort was born between 1953 and 1961. Duflo Controls consist of school year
interacted with enrollment rate at 1971 and school year interacted with water sanitization program. Standard errors are
clustered at the birthplace district level. Significance levels: * 10%, ** 5%, *** 1%.
Source: Indonesian SUPAS 2005, Indonesian Census 2010
56
2.10 Appendix
A. Proof for Lemma 2.1
Proof. Denote ũi , v˜j the equilibrium utility individuals get. We know that if woman i and man j match in
equilibrium, then ũi + v˜j = αxy + γxy + εiy + ηxj .
For woman i of type x,
ũi = max {αxy + γxy + εiy + ηxj − v˜j , εi0 }

j∈J
{ }
= max max (αxy + γxy + ηxj − v˜j ) + εiy , εi0
y∈Y j where ji =y
Define U xy = maxj where ji =y (αxy + γxy + ηxj − v˜j ), U x0 = 0, then we get:
ũi = max(U xy + εiy )

y∈Y0
Moreover,
ũi ≥ U xy + εiy , ∀y ∈ Y0
and it achieves equality when the set of women of type x matched with men of type y is nonempty.
With similar notations, define V xy = maxi where xi =x (αxy + γxy + εiy − ũi ), V 0y = 0, then:
v˜j = max (V xy + ηxj )

x∈X0
v˜j ≥ V xy + ηxj , ∀x ∈ X0
and it achieves equality when the set of men of type y matched with women of type x is nonempty.
If there exist women of type x matched with men of type y,
ũi = U xy + εiy
v˜j = V xy + ηxj
57
Hence U xy + V xy = αxy + γxy
B. An important lemma
To prove the propositions, I’ll first establish an important lemma related to how probabilities of singlehood
change related to the shift of marginals in types.
Lemma 2.2. Assume the idiosyncratic tastes follow Gumbel distributions. Assume there are two types for
each side, denote the female marginal as n = (x, 1 − x) and male marginal as m = (y, 1 − y), the surplus
matrix as:  
ΦLL ΦLH
Φ= 
ΦHL ΦHH
denote the mass of singles of females (males) in equilibrium as: µL0 , µH0 (µ0L , µ0H ) then:
(a)
∂µL0 ∂µH0
> 0, <0
∂x ∂x
(b) If the marital surplus function is super-modular, i.e., ΦLL + ΦHH > ΦLH + ΦHL , then
(b1)
∂µ0L ∂µ0H
>0⇒ >0
∂x ∂x
∂µ0H ∂µ0L
<0⇒ <0
∂x ∂x
(b2) There exists some δx , δ¯x , δy , δ¯y , such that if δx < x < δ¯x , δy < y < δ¯y , then:
∂ µµ0H
0L
<0
∂x
Proof. Denote a = exp( ΦLL ΦLH ΦHL ΦHH

2 ), b = exp( 2 ), c = exp( 2 ), d = exp( 2 );
√ √ √ √
denote sL0 = µL0 , sH0 = µH0 , s0L = µ0L , s0H = µ0H ;
∂sL0 ∂sH0 ∂s0L ∂s0H
denote DL0 = ∂x , DH0 = ∂x , D0L = ∂x , D0H = ∂x .
Then we can rewrite the feasibility constraints with the matching function as:
s2L0 + sL0 s0L a + sL0 s0H b = x
58
s2H0 + sH0 s0L c + sH0 s0H d = 1 − x
s20L + sL0 s0L a + sH0 s0L c = y
s20H + sL0 s0H b + sH0 s0H d = 1 − y
In the four equations above, taking the derivative with respect to x, we get:
(2sL0 + as0L + bs0H )DL0 + sL0 (aD0L + bD0H ) = 1 (2.1)
(2sH0 + cs0L + ds0H )DH0 + sH0 (cD0L + dD0H ) = −1 (2.2)
(2s0L + asL0 + csH0 )D0L + s0L (aDL0 + cDH0 ) = 0 (2.3)
(2s0H + bsL0 + dsH0 )D0H + s0H (bDL0 + dDH0 ) = 0 (2.4)
Hence we can express D0L , D0H using DL0 , DH0 from Equation 2.3 and Equation 2.4:
s0L (aDL0 + cDH0 )

D0L = − (2.5)
2s0L + asL0 + csH0
s0H (bDL0 + dDH0 )

D0H = − (2.6)
2s0H + bsL0 + dsH0
Plugging in Equation 2.1 and Equation 2.2, we get:
as0L (2s0L + csH0 ) bs0H (2s0H + dsH0 ) acsL0 s0L bdsL0 s0H
(2sL0 + + )DL0 −( + )DH0 = 1
2s0L + asL0 + csH0 2s0H + bsL0 + dsH0 2s0L + asL0 + csH0 2s0H + bsL0 + dsH0
(2.7)
59
cs0L (2s0L + asL0 ) ds0H (2s0H + bsL0 ) acsH0 s0L bdsH0 s0H
(2sH0 + + )DH0 −( + )DL0 = −1
(2.8)
Add Equation 2.7 and Equation 2.8, we get:
2as20L 2bs20H 2cs20L 2ds20H

(2sL0 + + )DL0 +(2sH0 + + )DH0 = 0
(2.9)
Hence DL0 and DH0 have opposite signs. With Equation 2.7, we know:
DL0 > 0, DH0 < 0
This completes the proof for (a).
For part (b1) of the lemma, with super-modularity, we know:
a∗d>b∗c
Since DL0 > 0:

a b
DL0 > DL0
c d
a b
⇒: DL0 + DH0 > DL0 + DH0
c d
Hence:
aDL0 + cDH0 < 0 ⇒ bDL0 + dDH0 < 0
bDL0 + dDH0 > 0 ⇒ aDL0 + cDH0 > 0
Recall Equation 2.5 and Equation 2.6, we have:
∂µ0L ∂µ0H
>0⇒ >0
∂x ∂x
60
∂µ0H ∂µ0L
<0⇒ <0
∂x ∂x
Proof for (b1) is complete.
Now let’s prove part (b2):

∂ ss0H
0L
D0L s0H − D0H s0L
=
∂x s20H
Using Equation 2.5 and Equation 2.6,
s0L s0H (aDL0 + cDH0 ) s0L s0H (bDL0 + dDH0 )

D0L s0H − D0H s0L = − +
2s0L + asL0 + csH0 2s0H + bsL0 + dsH0
s0L s0H ([b(2s0L + csH0 ) − a(2s0H + dsH0 )]DL0 + [d(2s0L + asL0 ) − c(2s0H + bsL0 )]DH0 )
=
(2s0L + asL0 + csH0 )(2s0H + bsL0 + dsH0 )
It has the same sign as:
[2bs0L − 2as0H + (bc − ad)sH0 ]DL0 + [2ds0L − 2cs0H + (ad − bc)sL0 ]DH0
= 2s0L (bDL0 + dDH0 ) − 2s0H (aDL0 + cDH0 ) − (ad − bc)(DL0 sH0 − DH0 sL0 )
We know that (ad − bc)(DL0 sH0 − DH0 sL0 ) > 0, since ad − bc > 0, DL0 > 0, DH0 < 0.
According to (b1), there are only three cases:
(Case 1): aDL0 + cDH0 > 0, bDL0 + dDH0 < 0; it’s straightforward to show:
∂ ss0H
0L
<0
∂x
(Case 2): aDL0 + cDH0 > 0, bDL0 + dDH0 > 0
in this case, from Equation 2.9, we know sL0 DL0 + sH0 DH0 < 0, hence:
a b sL0
> >
c d sH0
sL0
Since we know sH0 increases with x, to satisfy previous inequality, we know that x is also relatively small in
this case.
61
There exists some δx , δ¯y such that for x > δx , y < δ¯y ,
∂ ss0H
0L
<0
∂x
(Intuition: we need x to be away from 0 and y to be away from 1 to avoid large value of s0L and small value
of s0H .)
(Case 3): aDL0 + cDH0 < 0, bDL0 + dDH0 < 0
in this case, from equation (9), we know sL0 DL0 + sH0 DH0 > 0, hence:
sL0 a b
> >
sH0 c d
x is relatively large in this case. There exists some δ¯x , δy such that for x < δ¯x , y > δy ,
∂ ss0H
0L
<0
∂x
(Intuition: we need x to be away from 1 and y to be away from 0 to avoid small value of s0L and large value
of s0L .)
Proof for part (b2) is complete.
Lemma 2.3. An extension of Lemma 2.2:
Suppose there are two types on one side, and there are K > 2 types on the other side, denote the marginals
∑
as n = (x1 , x2 , ..., xK ), m = (y, 1 − y), where k xk = r, where r is a constant. The surplus matrix is:
 
Φ11 Φ12
 
 
Φ =  ... ... 
 
ΦK1 ΦK2
denote the mass of singles in equilibrium as: µk0 , µ01 , µ02 then:
(a)
∂µ01 ∂µ02
> 0, <0
∂y ∂y
62
(b) For any two types k1 , k2 , if we increase k1 by decreasing k2 , then µxk1 increases and µxk2 decreases.
(c) For any two types k1 , k2 , if Φk1 + Φk2 2 > Φk2 1 + Φk1 2 , then there exist values δx1 , δ¯x1 , δx2 , δ¯x2 , δy , δ¯y :
xk1 ∈ (δx1 , δ¯x1 )
xk2 ∈ (δx2 , δ¯x2 )
y ∈ (δy , δ¯y )
µ01
such that: µ02 decreases if we shift some mass from type k2 to type k1 , i.e.:
µ01 µ02
|(n=(...,xk1 +∆,xk2 −∆,...),m) < |(n=(...,xk1 ,xk2 ,...),m) , ∀∆ > 0
µ02 µ01
Proof. The proof is very similar to the proof of Lemma 2.2. WLOG, assume we shift the mass from type
2 to type 1 and denote x1 = x, x2 = γ − x, then n = (x, γ − x, x3 , ..., xk ), and m = (y, 1 − y). Denote
√ √
si0 = µi0 , s0j = µ0j . First, write down the feasibility conditions:
s210 + s10 s01 ϕ1 + s10 s02 ϕe1 = x
s220 + s20 s01 ϕ2 + s20 s02 ϕe2 = γ − x

..
.
s2K0 + sK0 s01 ϕK + sK0 s02 ϕeK = xK
s201 + s10 s01 ϕ1 + s20 s01 ϕ2 + ... + sK0 s01 ϕK = y
f1 + s20 s02 ϕ
s202 + s10 s02 ϕ f2 + ... + sK0 s02 ϕeK = 1 − y
To prove part (a), let’s take the derivative with respect to y for all K + 2 equations and denote Di0 =
∂si0 ∂s0j
∂y , D0j = ∂y .
63
D10 (2s10 + ϕ1 s01 + ϕe1 s02 ) + s10 (ϕ1 D01 + ϕe1 D02 ) = 0 (2.10)
D20 (2s20 + ϕ2 s01 + ϕe2 s02 ) + s20 (ϕ2 D01 + ϕe2 D02 ) = 0 (2.11)
..
.
DK0 (2sK0 + ϕK s01 + ϕeK s02 ) + sK0 (ϕK D01 + ϕeK D02 ) = 0 (2.12)
D01 (2s01 + ϕ1 s10 + ϕ2 s20 + · · · + ϕK sK0 ) + s01 (ϕ1 D10 + ϕ2 D20 + · · · + ϕK DK0 ) = 1 (2.13)
D02 (2s02 + ϕe1 s10 + ϕe2 s20 + · · · + ϕeK sK0 ) + s02 (ϕe1 D10 + ϕe2 D20 + · · · + ϕeK DK0 ) = −1 (2.14)
We can rearrange Equation 2.10 - Equation 2.12 to express Dk0 as a function of D01 , D02
sk0 (ϕk D01 + ϕek D02 )

Dk0 = − , ∀k = 1, 2, ..., K (2.15)
2sk0 + ϕk s01 + ϕek s02
We can substitute Equation 2.15 to Equation 2.13 and Equation 2.14:
∑
K
ϕk sk0 (2sk0 + ϕek s02 ) ∑
K
s01 ϕk sk0 ϕek
D01 (2s01 + ) − D02 =1 (2.16)
2sk0 + ϕk s01 + ϕek s02
k=1 2sk0 + ϕk s01 + ϕek s02
k=1
∑
K e
ϕk sk0 (2sk0 + ϕk s01 ) ∑
K
s02 ϕek sk0 ϕk
D02 (2s02 + ) − D01 = −1 (2.17)
2sk0 + ϕk s01 + ϕek s02
k=1 2sk0 + ϕk s01 + ϕek s02
k=1
Add Equation 2.16 and Equation 2.17,
∑
K
ϕk 2s2k0 ∑
K
ϕek 2s2k0
D01 (2s01 + )) + D02 (2s02 + )=0 (2.18)
k=1 2sk0 + ϕk s01 + ϕek s02 k=1 2sk0 + ϕk s01 + ϕek s02
Therefore D01 and D02 should have negative signs. Moreover, with Equation 2.16, we know:
D01 > 0, D02 < 0
Part (a) is proved.
64
Now let’s prove part (b) Let me abuse the use of the notation Di0 and D0j . For the proof of part (b),
∂si0 ∂s0j
denote Di0 = ∂x , D0j = ∂x . Let’s take the derivative with respect to x for all K + 2 feasibility equations:
D10 (2s10 + ϕ1 s01 + ϕe1 s02 ) + s10 (ϕ1 D01 + ϕe1 D02 ) = 1 (2.19)
D20 (2s20 + ϕ2 s01 + ϕe2 s02 ) + s20 (ϕ2 D01 + ϕe2 D02 ) = −1 (2.20)
D30 (2s30 + ϕ3 s01 + ϕe3 s02 ) + s30 (ϕ3 D01 + ϕe3 D02 ) = 0 (2.21)
..
.
DK0 (2sK0 + ϕK s01 + ϕeK s02 ) + sK0 (ϕK D01 + ϕeK D02 ) = 0 (2.22)
D01 (2s01 + ϕ1 s10 + ϕ2 s20 + · · · + ϕK sK0 ) + s01 (ϕ1 D10 + ϕ2 D20 + · · · + ϕK DK0 ) = 0 (2.23)
D02 (2s02 + ϕe1 s10 + ϕe2 s20 + · · · + ϕeK sK0 ) + s02 (ϕe1 D10 + ϕe2 D20 + · · · + ϕeK DK0 ) = 0 (2.24)
Rearrange Equation 2.21 - Equation 2.22 to express Dk0 as a function of D01 , D02 for k > 2:
sk0 (ϕk D01 + ϕek D02 )

Dk0 = − , ∀k = 3, ..., K (2.25)
2sk0 + ϕk s01 + ϕek s02
Substitute Equation 2.25 to Equation 2.23 and Equation 2.24:
∑
K
ϕk sk0 (2sk0 + ϕek s02 )
D01 (2s01 + ϕ1 s10 + ϕ2 s20 + )
2sk0 + ϕk s01 + ϕek s02
k=3
∑
K
s01 ϕk sk0 ϕek
−D02
e
k=3 2sk0 + ϕk s01 + ϕk s02
+s01 (ϕ1 D10 + ϕ2 D20 ) = 0 (2.26)
∑
K e
ϕk sk0 (2sk0 + ϕk s01 )
D02 (2s02 + ϕe1 s10 + ϕe2 s20 + )
2sk0 + ϕk s01 + ϕek s02
k=3
∑
K
s02 ϕek sk0 ϕk
−D01
e
k=3 2sk0 + ϕk s01 + ϕk s02
+s02 (ϕe1 D10 + ϕe2 D20 ) = 0 (2.27)
65
Then (Equation 2.19 + Equation 2.20 )- ( Equation 2.26 + Equation 2.27) gives us:
∑
K
2ϕk s2k0 ∑
K
2ϕek s2k0
D10 2s10 + D20 2s20 − D01 (2s01 + ) − D02 (2s02 + )=0
k=3 2sk0 + ϕk s01 + ϕek s02 k=3 2sk0 + ϕk s01 + ϕek s02
(2.28)
Moreover, from Equation 2.26 and Equation 2.27, we can express D01 and D02 as a linear combination of
D10 and D20 . Denote We can also show that the coefficents are all negative. Combing Equation 2.28, D10
and D20 should have negative signs. Therefore D10 > 0, D20 < 0. Part (b) is proved.
∂si0 ∂s0j
Now let’s prove part (c). Let’s follow the notation of the proof for part (b): Di0 = ∂x , D0j = ∂x .
Rearrange Equation 2.19 - Equation 2.22 to express Dk0 as a function of D01 , D02 :
1 − s10 (ϕ1 D01 + ϕe1 D02 )

D10 = (2.29)
2s10 + ϕ1 s01 + ϕe1 s02
−1 − s20 (ϕ2 D01 + ϕe2 D02 )

D20 = (2.30)
2s20 + ϕ2 s01 + ϕe2 s02
sk0 (ϕk D01 + ϕek D02 )

Dk0 = − , ∀k = 3, ..., K (2.31)
2sk0 + ϕk s01 + ϕek s02
Substitute Equation 2.29 - Equation 2.31 to Equation 2.23 and Equation 2.24:
∑
K
ϕk sk0 (2sk0 + ϕek s02 ) ∑
K
s01 ϕk sk0 ϕek
D01 (2s01 + ) − D02
2sk0 + ϕk s01 + ϕek s02
k=1 2sk0 + ϕk s01 + ϕek s02
k=1
ϕ2 ϕ1
= s01 ( − ) (2.32)
2s20 + ϕ2 s01 + ϕe2 s02 2s10 + ϕ1 s01 + ϕe1 s02
∑
K e
ϕk sk0 (2sk0 + ϕk s01 ) ∑
K
s02 ϕek sk0 ϕk
D02 (2s02 + ) − D01
2sk0 + ϕk s01 + ϕek s02
k=1 2sk0 + ϕk s01 + ϕek s02
k=1
ϕe2 ϕe1
= s02 ( − ) (2.33)
2s20 + ϕ2 s01 + ϕe2 s02 2s10 + ϕ1 s01 + ϕe1 s02
66
Denote:
s01 ∑ ϕk sk0 2 ssk0
K
02
A=2 +
s02 2sk0 + ϕk s01 + ϕek s02
k=1
∑
K
ϕek sk0 ϕk
B=
k=1 2sk0 + ϕk s01 + ϕek s02
s02 ∑
K
ϕek sk0 2 ssk0
01
C=2 +
s01 2sk0 + ϕk s01 + ϕek s02
k=1
ϕ2 ϕ1
F = s01 ( − ) (2.34)
2s20 + ϕ2 s01 + ϕe2 s02 2s10 + ϕ1 s01 + ϕe1 s02
ϕe2 ϕe1
G = s02 ( − ) (2.35)
2s20 + ϕ2 s01 + ϕe2 s02 2s10 + ϕ1 s01 + ϕe1 s02
we know A > 0, B > 0, C > 0, moreover:
D01 s02 (A + B) − D02 s01 B = F (2.36)
D02 s01 (C + B) − D01 s02 B = G (2.37)
Therefore:
D01 s02 − D02 s01 < 0 ⇐⇒ C ∗ F − A ∗ G < 0
One sufficient condition for CF − AG < 0 is that F < 0, G > 0. One sufficient condition for F < 0, G > 0
e2
ϕ ϕ2
when e1 > ϕ1 is that:
ϕ
ϕ2 s20 ϕe2
< <
ϕ1 s10 ϕe1
since we can arrange Equation 2.34 and Equation 2.35:
1 1
F = s01 ( e2
− e1
) (2.38)
2s20 ϕ ϕ
ϕ2 + s01 + ϕ2 s02 2 sϕ101 + s01 + ϕ1 s02
67
ϕe2 ϕe1
G = s02 ( − ) (2.39)
2s20 + ϕ2 s01 + ϕe2 s02 2s10 + ϕ1 s01 + ϕe1 s02
Hence there exists δx1 , δ¯x1 , δx2 , δ¯x2 , when
x ∈ (δx1 , δ¯x1 )
(γ − x) ∈ (δx2 , δ¯x2 )
we have:
∂ ss01
02
<0
∂x
68
C. Proof for the Propositions
Proof for Proposition 2.1
Proof. To prove the existence of a stationary equilibrium, we need to show that there is a solution to the
following equilibrium conditions given Gf , Gm , Φ, denote Gf = (nL , nH ), Gm = (mL , mH ):
√ ΦL y √ ΦL y √ ΦH y √ ΦH y
µ0y + µL1 0 µ0y exp( 1 )+ µL2 0 µ0y exp( 2 )+ µH1 0 µ0y exp( 1 )+ µH2 0 µ0y exp( 2 ) = my , ∀y ∈ {L, H}
2 2 2 2
(2.40)
√ Φe L √ Φe H
µe1 0 + µe1 0 µ0L exp( 1 ) + µe1 0 µ0H exp( 1 ) = qe1 ∗ ne , ∀e ∈ {L, H} (2.41)
2 2
√ Φe L √ Φe H
µe2 0 + µe2 0 µ0L exp( 2 ) + µe2 0 µ0H exp( 2 ) = qe2 ∗ ne , ∀e ∈ {L, H} (2.42)
2 2
qe1 + qe2 = 1, ∀e ∈ {L, H} (2.43)
µe1 0
exp(−ue1 ) = , ∀e ∈ {L, H} (2.44)
qe1 ∗ ne
µe2 0
exp(−ue2 ) = , ∀e ∈ {L, H} (2.45)
qe2 ∗ ne
ue1 = ue2 , ∀e ∈ {L, H} (2.46)
Equation 2.40-Equation 2.42 characterize the equilibrium conditions of marriage market stability for given
q strategy under the assumption of Gumbel distribution. Equation 2.44-Equation 2.45 characterize the
expected marital utilities of females. Equation 2.43 comes from the property of stationarity. Equation 2.46
guarantees that women are indifferent between choosing to marry at period 1 or period 2.
69
Re-arrange Equation 2.41 and Equation 2.42 , we can get:
√ √
µe1 0 √ µ e1 0 1 Φ √ µ e1 0 1 Φ
+ µ0L √ exp( e1 L ) + µ0H √ exp( e1 H ) = ne
qe1 qe1 1
qe 2 qe1 1
qe 2
√ √
µe2 0 √ µ e2 0 1 Φ √ µ e2 0 1 Φ
+ µ0L √ exp( e2 L ) + µ0H √ exp( e2 H ) = ne
qe2 qe2 2
qe 2 qe2 2
qe 2
Combining with Equation 2.44-Equation 2.46, we can get:
√ √ Φ √ Φ
qe1 µ0L exp( e21 L ) + µ0H exp( e21 H )
= √ Φ √ Φ
qe2 µ0L exp( e22 L ) + µ0H exp( e22 H )
√ √ Φ −Φ
Φe1 L − Φe2 L µ0L + µ0H exp( e1 H 2 e1 L )
= exp( )√ √ Φ −Φ (2.47)
2 µ0L + µ0H exp( e2 H 2 e2 L )
√ Φe1 H −Φe1 L
µ0H
Φe1 L − Φe2 L 1 + µ0L exp( 2 )
= exp( ) √ Φ −Φ
2 1 + µ0H exp( e2 H e2 L )
µ0L 2
There are three cases:
1. Φe1 H − Φe1 L = Φe2 H − Φe2 L
2. Φe1 H − Φe1 L > Φe2 H − Φe2 L
3. Φe1 H − Φe1 L < Φe2 H − Φe2 L
Case one: In the first case, we have:
√
qe1 Φe L − Φe2 L
= exp( 1 ) (2.48)
qe2 2
Hence equilibrium strategy q is pinned down by Equation 2.48 and Equation 2.43. Moreover, we know that
given q, Equation 2.40-Equation 2.42 has a unique equilibrium solution according to Decker et al. (2013).
Hence stationary equilibrium exists in this case and is unique.

qe1 µ0H
Case two: In the second case, qe2 is an increasing function of µ0L in Equation 2.47. Moreover, according
to Lemma 2.3, we know that when Φe1 H − Φe1 L > Φe2 H − Φe2 L indicating there is a complementarity
qe1 µ0H
between male High type and female marrying at period 1, an increase in qe2 would lead to a decrease in µ0L
from Equation 2.40-Equation 2.43.
70
Moreover, from Equation 2.47, we know that:
√
qe1 Φe L − Φe2 L µ0H
2
→ exp( 1 ), as →0
qe 2 µ0L
√
qe1 Φe H − Φe2 H µ0H
→ exp( 1 ), as → +∞
qe2 2 µ0L
µ0H
While from Equation 2.40 - Equation 2.43, we know µ0L is bounded by finite positive number when
√ 1
Φ −Φ q Φ −Φ
exp( e1 L 2 e2 L ) ≤ qe2 ≤ exp( e1 H 2 e2 H ).
e
Hence equilibrium exists and is unique.

qe1 µ0H
Case three: In the third case, qe2 is a decreasing function of µ0L in Equation 2.47. Moreover, according
to Lemma 2.3, we know that when Φe1 H − Φe1 L < Φe2 H − Φe2 L indicating there is a complementarity
qe1 µ0H
between male L type and female marrying at period 1, an increase in qe2 would lead to an increase in µ0L
from Equation 2.40 - Equation 2.43. Applying the same logic as in case two, equilibrium exists and is unique.
Moreover, we know that equilibrium strategy satisfies:
1
qL
min(ΦL1 L − ΦL2 L , ΦL1 H − ΦL2 ) ≤ ln( 2 ) ≤ max(ΦL1 L − ΦL2 L , ΦL1 H − ΦL2 H )
qL
1
qH
min(ΦH1 L − ΦH2 L , ΦH1 H − ΦH2 ) ≤ ln( 2 ) ≤ max(ΦH1 L − ΦH2 L , ΦH1 H − ΦH2 H )
qH
Proof. This is our first case in the previous proof of Proposition 2.1. Hence from Equation 2.47 equation
(17), we know:
qe2
= exp(Φe2 L − Φe1 L )
qe1
with qe1 + qe2 = 1, we have:
exp(Φe2 L ) exp(Φe1 L )
qe2 = , qe1 =
exp(Φe2 L ) + exp(Φe1 L ) exp(Φe2 L ) + exp(Φe1 L )
71
Proof for Proposition 2.3 and Proposition 2.4
Proof. From the proof of proposition 1, we know that equilibrium strategy is pinned down by both Equa-
tion 2.47 and Equation 2.40-Equation 2.43. Hence how equilibrium strategies change depend on whether
Φe1 H − Φe2 H > Φe1 L − Φe2 L or Φe1 H − Φe2 H < Φe1 L − Φe2 L , and how µ0H
µ0L changes in equilibrium.
Let’s first prove Proposition 2.3, according to Lemma 2.3 result (a), an increase in mH would increase
µ0H
µ0H and decrease µ0L , which increases µ0L given any strategy qy , hence an increase in mH would
• increase qe1 , if Φe1 H − Φe2 H > Φe1 L − Φe2 L
• decrease qe1 , if Φe1 H − Φe2 H < Φe1 L − Φe2 L
µ0H
Then let’s prove Proposition 2.4, according to Lemma 2.3(b), an increase in nH would decrease µ0L given
any strategy qy if the following condition holds:
√
Φe H − Φe2 H µe1 0 Φ e L − Φ e2 L
exp( 1 )≤ ≤ exp( 1 )
2 µe2 0 2
Moreover, we know that:

µe1 0 q1
= e2
µe2 0 qe
and
√
Φe H − Φe2 H qe1 Φe L − Φe2 L
exp( 1 )≤ ≤ exp( 1 )
2 qe2 2
from Equation 2.47. Therefore the condition always holds in the neighborhood of the equilibrium. Hence
an increase in nH would
• decrease qe1 , if Φe1 H − Φe2 H > Φe1 L − Φe2 L
• increase qe1 , if Φe1 H − Φe2 H < Φe1 L − Φe2 L
72
Chapter 3
Multidimensional Matching: Hukou status in the Marriage Market
3.1 Introduction
Mating patterns affect social equality. For example, the empirical pattern that people marry partners with
similar social economic status (SES), called positively assortative matching (PAM), increase social inequality.
(Greenwood et al., 2014) In practice, people consider multiple characteristics when seeking partners. How
people match along all these dimensions affect social structure. In China, one particular and important one
is the hukou status, which classifies people into either the urban or the rural population. Urban hukou enjoys
better amenities including education, health care and working opportunities than rural hukou. Understanding
how people match along hukou status is the first steppingstone to understand how this particular system
affects individuals’ welfare. Moreover, multiple hukou reforms have taken place in the past two decades to
increase internal migration and to decrease social inequality. A matching framework is essential to analyze
how these reforms jointly affect people’s marital choices and labor migration in order to have a more complete
evaluation of their effect on social welfare.
This paper adopts the recently developed two dimensional matching model under Transferable Utility
(TU).(Chiappori et al., 2017) In the model, each agent differs in two attributes: one continuous one that in-
dicates SES and one discrete one that indicates the hukou status, either agriculture (rural) or non-agriculture
(urban). A key assumption of the TU model is that there exists a marital surplus a couple can jointly produce
and then bargain among themselves how to divide. The marital surplus depends on the two characteristics.
The higher either the husband’s SES or the wife’s SES is, the higher the surplus. However, wives’ and
husbands’ hukou status play asymmetric roles in the surplus. I assume that husbands’ urban hukou status
is more valuable than the wives’, because of the rationale that in a patrilocal society, it’s much more likely
for a wife to her husband’s location instead of the other way around.
73
I first analyze the case where rural women, urban women, rural men and urban men share the identical
SES distributions. I then analyze the case where the four groups have SES distributions suggested by the
empirical data, i.e., urban population have higher SES than rural population and rural population are three
times as urban population. The predictions have very similar qualitative properties. First, mixec couples are
less frequent than non-mixed couples. Second, PAM exists in each couple category by hukou status. Third,
for all four population categories, rural women, urban women, rural men and urban men, those married to
urban spouses have higher SES than those married to rural spouses. Fourth, when two urban men with same
SES marry to a urban woman and a rural woman respectively, the rural woman has higher SES than the
urban woman. When two urban women with same SES marry a urban man and a rural man respectively,
the rural man has higher SES than the urban man. When two rural men with same SES marry to a urban
woman a a rural woman respectively, the rural woman and the urban woman have the same SES.
I then use the China 2000 0.095% sample Census to test the predictions. Education (years of schooling)
is used as a proxy for a person’s SES. I show that first, the frequency of mixed couples is much fewer than
that predicted by random matching. Second, among all four marriage categories, own education is always
positively correlated with spousal education. Third, using both simple summary statistics and regression
analysis of explaining a person’s own education using spousal characteristics, a rural spouse is negatively
correlated with own education. Fourth, the correlation between a person’s own education and a rural hukou
conditional on spousal education is less negative than the unconditional correlation between a person’s own
education and a rural hukou status.
The theoretical analysis of matching with TU dates back to (Shapley and Shubik, 1971) and (Becker,
1973). This paper contributes to a set of papers that consider multiple dimensional matching instead of
one dimensional matching, which has been paid much more attention due to its tractability. Chiappori
et al. (2017) considers a bidimensional matching model with SES and the smoking status as the discrete
variable. Ahn (2018) considers SES and a person’s nationality as the discrete variable to analyze cross-
border marriages. Low (2017) analyzes a one-to-two dimensional matching model in which fertility is the
other continuous dimension for women.
From a more applied perspective, this paper also contributes to a large literature that studies how the
hukou system affects rural to urban migration and migrant workers’ welfare. Research on its impact on
74
marriage market is limited, except that (Han et al., 2015) analyzes the policy change in 1998 that new-born
babies could freely choose their hukou status from either father or mother instead of inheriting it from mother
automatically. They found that this policy change increases inter-provincial marriage, benefits urban men
but hurts urban women.
More broadly, studying hukou system in China also allows us to link labor market with marriage market,
and studies how migration decision is made. (Dupuy et al., 2014) builds a marriage matching model of two
locations, each with a distinct labor market and marriage market, to analyse people’s decisions of migration
to work and migration to wed. Empirically, there are also many papers that document how marriage market
influences individuals’ migration choice. (Edlund, 2005) argues that the attractiveness of high-income men
in urban areas contributes to the empirical stylized fact that young women outnumber young men in urban
area though a high skilled labor market in urban area may predict the opposite with the assumption that
there are more skilled men than women. (Weiss et al., 2013) shows that young women migrate from mainland
China to Hong Kong for better marriage prospect after the lifting of migration ban which causes more women
emigration from Hong Kong. Using Danish data, (Gautier et al., 2010) argues that cities serve as a role of
providing dense marriage market, which influences singles and couples’ location choices.
The model is demonstrated in section 2, while section 3 shows the empirical results from data. Section
4 concludes and states possible future work.
3.2 Model
The basic framework
Populations and surplus
There are two populations: women’s and men’s, denoted by X and Y. We normalize the size of female
population to 1, and denote the size of male population as r. Both men and women differ in two dimensions.
First, they are characterized by a continuous attribute: their socioeconomic status which is a proxy for
income, education, prestige and so on. Second, agents differ in terms of hukou status; it can be either
Agriculture (A) or Non-agriculture (N). A woman (man) thus is formally characterized by a pair (x, X)
( (y, Y )) where x, y is the individual’s continuous socioeconomic index, and X, Y defines the individual’s
75
hukou status.
For simplicity, assume that the continuous index x for men with agriculture hukou, denoted by xA is
uniformly distributed over the interval [0, 1]; the continuous index x for men with non-agriculture hukou,
denoted by xN is uniformly distributed over the interval [a, 1 + a], where a represents the average difference
between agriculture population and non-agriculture population. Similarly, yA ∼ U [0, 1], yN ∼ U [b, 1 + b].
Assume a ≥ 0, b ≥ 0. Assume the share of population with agriculture is α, same for both men and women.
I then consider the workhorse model used on marriage market: a frictionless matching model with
transferable utility (TU) as in (Becker, 1973) and Shapley-Shubik(1971). The key assumption in this model
is that for any couple, there exists a marital surplus that the couple can decide how to divide upon marriage.
In this model, the marital surplus depends on both the socioeconomic status and hukou status of each
partner. Moreover, given the rationale that (1) in a patrilocal society, it is most likely that wives move to
the husbands’ place, (2) couples enjoy larger benefit in a urban place compared to a rural place, I assume
that the surplus function Σ has the form:




 if X = N, Y = N


f (x, y),



Σ((x, X), (y, Y )) = λf (x, y), if X = A, Y = N








µf (x, y), if Y = A
where the function f is strictly increasing and supermodular, and satisfies f (0, 0) = 0. Here, λ < 1 represents
the cost of changing wife’s hukou status upon marriage, µ < 1 indicates a lower welfare of living in a rural
area compared to urban area. I furthermore assume that λ > µ because of the patrilocality. Notice that the
cost relative to rural hukou is assumed to be multiplicative rather than additive, because it is very likely that
people with higher SES value the additional benefits of urban hukou more, for example, children’s education
and health care. Moreover, I do not allow a fixed cost of migration in the surplus. This is because compared
to the role that hukou status plays in the marital surplus, the migration cost is trivial. What keeps rural
people from migrating to urban areas is the difficulty of living in urban areas without a local hukou rather
than the migration cost (either physical or cultural).
A stable matching
76
A matching is defined as a measure µ on the set ([0, 1] × {A, N }) × ([b, 1 + b] × {A, N }) and four functions
(uA (x), uN (x), uA (y), uN (y) that captures the utilities of individuals of different types. The only constraint
on the measure µ is that its marginals should be equal to the initial male and female distributions. A matching
is stable if it satisfies individual rationality and the no blocking pair condition. Individual rationality requires
that no matched individual would be better off remaining single. The no blocking pair condition requires
that no two individuals would prefer being matched together to their current situation. Hence stability
would require for any (x, X), (y, Y ) we have that




 f (x, y), if X = N, Y = N





uX (x) + vY (y) ≥ λf (x, y), if X = A, Y = N








µf (x, y), if Y = A
where the equality is satisfied on the support of the matching measure µ, i.e. where we observe a positive
probability of matching for that couple type.
Existence of a stable matching is guaranteed by the property of a TU model. Stability in a TU framework
is equivalent to the maximization of aggregate surplus over all possible assignments; therefore the problem
boils down to the existence of a solution to a simple linear programming problem, for which one can readily
check that the standard conditions are satisfied.
As for pureness, as defined in (Chiappori et al., 2017), a matching is pure if almost all women with
same attributes (x, X) are matched with probability one to exactly one type of agent (y, Y ) = ρ(x, X)
and the same applies to men. In one-dimensional case where we have positive assortative matching with
supermodularity assumption, the stable matching is one-to-one and hence is pure. However, in the current
model, we would need the ”twisted” condition (Chiappori et al., 2010): for almost all (x0 , X), the partial
derivative of the surplus Σ((x, X), (y, Y )) with respect to x0 at two different points (x0 , X), (y1 , Y1 ) and
(x0 , X), (y2 , Y2 ) are equal to each other if and only if ((y1 , Y1 ) = (y2 , Y2 ). This property unlikely holds in the
current setting. If a woman with index x0 has an agriculture hukou and marries with an urban man with
index y1 , the partial of the surplus with respect to x is λ ∂f (x∂x0 ,y1 ) . If she is mated with a rural man with
77
∂f (x0 ,y2 )
index y2 , the partial is ∂x . We may still have:
∂f (x0 , y1 ) ∂f (x0 , y2 )
λ =
∂x ∂x
with y1 > y2 since λ < 1. Therefore, the stable matching may not be pure in the current setting.
Stable matching equilibrium: general properties
The stable matching depends on the following parameters: sex ratio r, the share of population with agricul-
ture hukou, α, the average difference in the socioeconomic status between people with Agriculture hukou and
Non-agriculture hukou, a and b, and the surplus function parameters, λ, µ, f (x, y). Let me first document
some general properties.
Let me denote pA (x) as the probability of marrying a rural husband for a rural woman with SES x. I
define similarly pN (x′ ), qA (y), qN (y ′ ) as the probability of marrying a rural partner for a urban woman, a
1
rural man and a urban man.
Proposition 3.1. In any stable matching, consider two couples with same hukou status, ((x, X), (y, Y )) and
((x′ , X), (y ′ , Y )). Then x ≥ x′ if and only if y ≥ y ′ .
Proposition 3.1 says that couples are positively matched within each hukou category.
Proof. Assume not, there exist two couples with x ≥ x′ and y < y ′ , then we can exchange their partners
among these two couples and achieve a higher social surplus because of the supermodularity assumption of
f (x, y).
f (x, y ′ ) + f (x′ , y) > f (x, y) + f (x′ , y ′ )
Proposition 3.2. In a stable matching, if there exists an open set Ox ⊂ X, where for any x ∈ Ox , a rural
woman x marries to a rural man y or a urban man y ′ with positive probability. Then the rural man y
marries to a rural woman with probability one. Moreover, y > y ′ .
1 Note that from now on, for the simplicity of notation, I’ll use ”rural” indicate people with Agriculture hukou and ”urban”
indicate people with Non-Agriculture hukou.
78
Similarly, if there exists an open set Oy ⊂ Y , where for any y ∈ Oy , a rural man y marries to a rural
woman x or a urban woman x′ with positive probability. Then the rural woman x marries to a rural man
with probability one. Moreover, x = x′ .
Proposition 3.2 states that certain types of randomization can’t exist together.
Proof. Suppose a rural woman x marries to a rural man y or a urban man y ′ with positive probability, I’d
like to first show that y > y ′ .
Denote u(x) as the rural woman’s utility, by stability,
u(x) = maxs µf (x, s) − vA (s) = max

′
λf (x, s) − vN (s′ )
s
where vA (s) is the utility of a rural man with SES s, and vN (s′ ) is the utility of a urban man with SES s′ .
The maximization is achieved for s = y, s′ = y ′ respectively. By the envelope theorem,
∂f (x, y) ∂f (x, y ′ )
u′ (x) = µ =λ
∂x ∂x
Since µ < λ and ∂f (x,y)

∂x is a non-decreasing function of y due to supermodularity, we have y > y ′ .
Then let’s prove by contradiction. Suppose that the rural man y also marries to a urban woman x′
with positive probability, we know that x′ = x. Then we have a couple (x, A), (y ′ , N )) and another couple
(x′ , N ), (y, A) where y > y ′ , x = x′ . However,
Σ((x, A), (y, A)) + Σ((x′ , N ), (y ′ , N )) = µf (x, y) + f (x′ , y ′ )
> µf (x′ , y) + λf (x, y ′ )
> Σ((x, A), (y ′ , N )) + Σ((x′ , N ), (y, A))
This violates the property that a stable matching maximizes the total social surplus. The proof of the
second statement is similar.
79
Case one: Identical distribution
Now let’s first solve the case where SES and hukou status are independent, i.e., a = b = 0 All four types
(rural women, urban women, rural men and urban men) have same SES distributions: U (0, 1). Let me
further assume that r = 1, α = 1: there are equal number of men and women, across the two hukou types.
Moreover, assume that λ ∼ 1, µ << λ. The main result is as following:
Proposition 3.3. There exists a function δ(λ), such that when µ < δ(λ), λ < 1, in a stable matching
outcome, there exists xe′ , ye′ , x

e:
• Very top urban women and very top urban men marry each other:
∀x′ ≥ xe′ , pN (x′ ) = 0; y ′ ≥ ye′ , qN (y ′ ) = 0
• Top rural women only marry urban men:
∀x ≥ x
e, pA (x) = 0
Proof is in the appendix. Figure 3.1 shows an example of the matching patterns when f (x, y) = xy. The
four colors indicate the four matching categories with respect to hukou. Blue indicates rural-women-rural-
men, purple indicates urban-women-urban-men, yellow indicates rural-women-urban-men and pink indicates
urban-women-rural-men.
Since λ = 0.99 < 1, the very top urban men (above ye1′ ) will only marry urban women (above xe′ ).
However, λ is large enough to incentivize other top urban men (y ′ ∈ (ye2′ , ye1′ )) marry both urban women
f′ , x
(x′ ∈ (x1 f 2 )) and rural women (x ∈ (e
x, 1)). Rural women with (x ∈ (0, x
e) marry up to rural men. Because
some urban men marry rural women, some urban women (x′ ∈ (0, xe′ )) have to marry rural men. They either
marry up to a rural man or marry down to a urban man with positive probability.
The cutoffs depend on µ and λ. In particular, in the case where f (x, y) = xy:
• When λ increases, ye1′ increase, x

e, xe′ , ye, ye2′ decrease;
e, xe′ , ye2′ increase, ye decreases, ye1′ doesn’t change.

• When µ increases, x
80
In the extreme case where λ = 1, ye1′ = 1. Moreover, in this case, urban women and rural women can not be
differentiated in the marriage market.
Fix µ, there exists a threshold λ, when λ ≤ λ, the matching is pure, i.e., rural people marry rural people
and urban people marry urban people. Fix λ, there exists a threshold µ̄, when µ ≥ µ̄, the matching is also
pure. This pattern is illustrated in Figure 3.2. Intuitively, when the cost of changing hukou status is too
high (i.e. λ is too small), we should expect no cross hukou type marriage. When the penalty of Agriculture
hukou is very small relative to the cost of changing hukou status (i.e., λ − µ is very small) , we should also
expect no cross hukou type marriage.
Case two: Different distributions
Empirically, urban populations on average have higher SES than rural populations. Moreover, the rural
population is much larger than the urban population in China. To match the empirical observations, I
assume that α = 3 since in China 2000 Census, 76% population have an agriculture hukou and 24% have a
non-agriculture hukou. Further assume that a = b = 0.5, hence x ∼ U (0, 1), y ∼ U (0, 1), x′ ∼ (0.5, 1.5), y ′ ∼
(0.5, 1.5). This corresponds to the empirical statistic that on average, the urban populations have 50% more
years of schooling than the rural populations. The main result is very similar to Proposition 3.3 where SES
and hukou status are independent, and urban population and rural population are the same. Figure 3.3
shows an example of the matching patterns when f (x, y) = xy. Because of the difference in average SES,
top urban men and top urban women marry each other, and bottom rural men and bottom rural women
marry each other. The only difference is that for urban woman, the lowest type marries both urban and
rural husbands in the first case but the lowest type only marries rural husbands in the second case. This is
because the smallest SES an urban woman may have is b > 0, instead of 0 as in the first case, and it’s also
specific to this specification of surplus function f (x, y) = xy.
A quadratic example
As stated above, the exact form of the stable matching depends on the distributions and surplus function.
To have a better picture of the matching , the utilities and the comparative statics, let me now consider a
81
simple example with the gain generated from marriage is quadratic. Specifically, I assume that:
f (x, y) = xy
which is the simplest form that captures PAM.
With this quadratic specification and the previous assumptions: (1) r = 1, α ≥ 1, a = b ≥ 0, λ < 1,
µ < 1, it’s possible to completely solve the matching model in close form.
µ(λ− µ
λ )(1+α)
With the following additional regularity conditions: (2) µ < λ2 , b ≤ (1−µ)(α+λ) ; the main result is as
follows:
Proposition 3.4. In a stable matching outcome, there exist thresholds:
α + µλ
e=
x
α+λ
f′ = α + µ ∗ λ − λ ,
µ
x f′ = α + µ b
x
1
1−µ α+λ 2
αµ + µ
1+α λ− λ
µ
ye1 = ∗ , ye2 = b
1−µ α+λ
µ
ye1′ = λ, ye2′ =
λ
such that:
• All agents marry;
• ∀x ≥ x
e, a rural woman with SES x marries with probability 1 to a urban man with SES:
λ − µλ µ
− λx̃
y′ = ∗x+ λ
1−x e 1−xe
82
• ∀x < x
e, a rural woman with SES x marries with probability 1 to a rural man with SES:




x + (1 − x e), if x ∈ (ex − (1 − ye1 ), xe)







 ye1 − µ f
x−1+ye1 )−ye1 x′

b b

(e f′ , x
, if x ∈ (x2 e − (1 − ye1 ))
2
f′
x+ µ f′
e−(1−ye1 )−x2
x e−(1−ye1 )−x2
x
y=

 f

 µ −b
b
x ′−b
f′ ]

 f x+b f2 µ
, if x ∈ (b, x

 x ′ −b x ′ −b 2

2 2




x, if x ∈ [0, b]
f′ , a urban woman with SES x marries with probability 1 to a urban man with SES:
• ∀x ≥ x1




 x, if x ≥ λ





y′ = λ2 −µ e−µ
λ2 x

 x) x − λ(1−e
λ2 (1−e x) , if x ∈ [λe
x, λ)





 f′ , λe
x + µ − λx̃, if x ∈ [x 1 x)
λ
f
• ∀x ∈ (x f′ ), a urban woman with SES x marries with probability p =
f′ , x α(1−e
x)−(x ′ −b)
2
to a rural man
2 1 f
x f
′ −x ′
1 2
with SES:
1 µye1 − b f′
bx˜′ − µye1 x
y= ( x+ 1 2
)
µ x f′
f′ − x f′ − x
x f′
1 2 1 2
marry with probability (1 − p) to a urban man with SES:
µye1 − b f′
bx˜′ − µye1 x
y′ = x+ 1 2
f′ − x
x f′ f′ − x
x f′
1 2 1 2
f′ , a urban woman with SES x marries with probability 1 to a rural man with SES:
• ∀x ≤ x2
b(1 − µ) b(1 − µ)
y= x+b− ∗b
f′
µ(x2 − b) f′ − b)
µ(x 2
Figure 3.4 shows the detailed matching pattern in this case. It shows how individuals marry among each
hukou category besides the four matching types as in Figure 3.3. The proof of this proposition is in the
appendix.
Individual utilities
83
Moreover, we can solve individual utilities from this system. Denote uA (x) as the utility of a rural woman
with SES x, similarly, let’s define uN (x′ ), vA (y), vA (y ′ ) as the utility of a urban woman, a rural man and a
e, her husband is a rural man with SES y = ϕA (x). The

urban man. Consider a rural woman with x < x
stability condition implies that:
uA (x) = max(µxy − vA (y))

y
the maximum achieved at y = ϕA (x). From the envelope theorem,
u′A (x) = µϕA (x)
Then,
∫ x
uA (x) = µϕA (s)ds + K, ∀x ∈ [0, x
e]
0
e, her husband is a urban man with SES y ′ = φA (x).

Since f (0, 0) = 0 in this case, we know K = 0. For x ≥ x
The stability condition implies that:
u′A (x) = max

′
λxy ′ − vN (y ′ )
y
the maximum achieved at y ′ = φA (x). From the envelope theorem
u′A (x) = λφA (x)
Then,
∫ x ∫ x ∫ e
x
uA (x) = λφA (s)ds + uA (e
x) = λφA (s)ds + µϕA (s)ds, ∀x ∈ [e
x, 1]
e
x e
x 0
The other utilities uN (x′ ), vA (y), vA (y ′ ) can be obtained in the similar way given the matching function.
A numerical example is given in Figure 3.5 when f (x, y) = xy, α = 3, λ = 0.95, µ = 0.6, b = 0.4. There
are several interesting patterns:
• Utilities increase with respect to SES for everyone.
• For urban women, those with (low) SES that marry a rural husband with positive probability obtain
84
the same utility as rural women with same SES. Because urban women and rural women have the
same contribution when they marry rural men.
• For urban women, those with (high) SES that only marry urban husbands have higher utility than the
rural women with same SES. Because urban women contribute more than rural women to the surplus
when they marry urban men.
• Urban men enjoy the largest utility compared to rural men, urban women and rural women with same
SES. This is because of the positive value a urban hukou gains for urban men.
• For rural men, those with (very low) SES that only marry rural wives have the same utility as their
rural wives. All other rural men with (high) SES have lower utility than the rural women with same
SES. This is because very top rural women marry urban men, which creates a scarcity of rural women
that helps them enjoy higher utilities.
Comparative statics Figure 3.6 shows an example of the utility change for the four population when we
change the parameters. The comparative statics in this case can be summarized as following:
• A larger λ, by reducing the cost of transfering wife’s hukou from Agriculture to Non-agriculture,
e, ye2′ , increases x
decreases the threshold x f′ , ye1 , ye′ . It benefits urban men and rural women, but hurts
1 1
urban women and top rural men by reducing their advantage. In Figure 3.6, the impact on urban
population is larger than that on rural population.
• A larger µ, by reducing the cost of living in rural area, increases the threshold x f′ , ye1 . It
e, ye2′ ; decreases x1
benefits rural men, rural women and urban women, but hurts urban men by reducing their advantage.
f′ , increases ye1 .
e, x
• A smaller α, by decreasing the share of rural population, decreases the threshold x 1
It hurts urban men by decreasing their relative scarcity and benefits rural women by increasing their
relative scarcity. Urban women also gain and rural men lose.
• A smaller b, by reducing the difference of the average SES between urban and rural population, this
hurts urban men but benefits rural men, urban women and rural women.
85
Model predictions
The frictionless model illustrated above is only an ideal illustration. Due to the existence of unobserved
characteristics by cliometricians, observed matching patterns are mostly stochastic. However, the previous
equilibrium solution can still shed light on the general properties of the empirical matching patterns, which
I summarize as following:
Prediction 3.1. Cross type couples in which wives and husbands have different hukou status should be
less frequent. More precisely, both the share of A-N couples (rural women and urban husband) and N-A
couples (urban women and rural men) should be much smaller than implied by random matching, i.e., than
1
1+α ∗ α
1+α . In the data, α ≈ 3, hence the threshold is about 0.1875.
Prediction 3.2. Matching within each category:
Among couples in the same hukou category, matching is positively assortative on SES.
Prediction 3.3. Selection into the four matching categories:
• Rural women with urban husbands have higher SES than rural women with rural husbands.
• Urban women with urban husbands have higher SES than urban women with rural husbands.
• Rural men with urban wives have higher SES than rural men with rural wives.
• Urban men with urban wives have higher SES than urban men with rural wives.
Prediction 3.4. Rural and urban partners for the same agent:
• When two urban women with the same (low) SES marry respectively a rural husband and a urban
husband, the rural husband should on average have higher SES than the urban husband.
• When two rural men with the same SES marry respectively a rural wife and a urban wife, the rural
wife and the urban wife should on average have the same SES.
• When two urban men with the same (low) SES marry respectively a rural wife and a urban wife, the
rural wife should on average have higher SES.
86
3.3 Empirical Application
Hukou system in China
Hukou system has been the family registration system in China since early 1950s. A person’s hukou status
contains two parts, (1) residential location (hukou suozaidi) that indicates the place of hukou registration and
(2) hukou type (hukou leibie) that categorizes people into either Agriculture or Non-agriculture population.
Hukou status is determined upon birth. A child’s hukou was automatically inherited from the mother before
1998 but it can be inherited from either the mother or the father after 1998.
Between 1960 and 1984, internal migration in China was very limited. Changing registration place
required a permit to move and a migration certificate. Changing hukou type from Agriculture to Non-
agriculture (nongzhuanfei) was even more difficult and the conversion quotas were limited. Conversion
quotas for personal reasons including family reunion were exceedingly small, fixed at about 0.15 to 0.2
percent of the non-agriculture hukou population of each locale. It was very common to observe separation
of spouses because of this. (Chan and Zhang, 1999)
This strict migration policy could not be maintained during the economic reforms in China. After 1984,
it became possible for people to work outside their hukou registration places. Urban areas have attracted
more and more rural migrant workers. However, without local hukou, migrants encounter various difficulties
including housing, health care and children’s education. To alleviate these problems, local governments
implemented various hukou reforms since 1997 to make it easier for migrants to change their hukou status.
Data
Estimates of the matching patterns are based on China 2000 1% sample census, a household survey conducted
by China National Bureau of Statistics in 2000. Hukou status, education, marriage status are available in this
dataset. The 2000 population census is used instead of the 1990 population census or the 1995 mini census
because of the following two reasons. (1) Before 1990, migration is still very restricted; hence the benefit of a
match between rural woman and urban man is very small, so that λ should be small. (2) Between 2000 and
2005, switching hukou status from Agriculture to Non-agriculture has become much easier. Historical hukou
status isn’t recorded in the census, I may misclassify rural population who have switched hukou to urban
87
population. However, before early 2000, converting a rural hukou to a urban one is still highly restricted.
Therefore, the 2000 population census is the best choice. Furthermore, I restrict my sample to women
between 18 and 33, men between 20 and 35.
Results
In this section, I first show the basic summary statistics and then present the empirical results on the
matching pattern in the order of the four predictions.
Summary statistics
In the sample, 51% are male, 73% have agriculture hukou. 73.43% of male population have agriculture
hukou and 73.18% of female population have agriculture hukou. These basic statistics confirm the parameter
assumptions that r = 1, α = 3. The average years of schooling are 12.2, 12.1, 8.3 and 7.8 for urban men,
urban women, rural men and rural women respectively. 32% are never married.
Table 3.1 presents the summary statistics of the married and singles in the sample. Women and men
have similar years of schooling. Rural people are more likely to get married at the survey time for both men
and women. Not surprisingly, never married people are younger than married couples.
Frequencies of mixed couples
Table 3.2 provides support for Prediction 3.1. There are far more rural-rural and urban-urban couples than
random matching would imply. Mixed couples with different hukou are much less frequent than predicted
by random matching.
Positive assortative matching within each category
Table 3.3 provides support for Prediction 3.2. Among all four panels of the table that represent the four
types of couples, own education is positively correlated with spousal education after controlling own age and
prefecture fixed effect. Figure 3.7 shows the heat map of couples’ years of schooling for the four types.
88
Selection into mixed matching
Table 3.4 provides support for Prediction 3.3. Comparing rural wives in Panel A and Panel B, we can see
that rural women with urban husbands on average have 1.7 more years of schooling than rural women with
rural husbands. Similarly, by comparing Panel C and Panel D, Panel A and Panel C, Panel B and Panel D,
we can find that urban women with urban husbands on average have 1.79 more years of schooling than urban
women with rural husbands, rural men with urban wives on average have 1.28 more years of schooling than
rural men with rural wives, and urban men with urban wives on average have 1.6 more years of schooling
than urban men with rural wives.
Table 3.5 provides further support for this selection pattern. In this table, I divide married people into
four types by sex and hukou status. Individuals’ education are explained by spousal characteristics including
hukou and SES. All regressions have own age and prefecture fixed effect as control. Column (1) and (3) in
Panel (A) and (B) replicate the finding in Table 3.4. Column (2) and (4) present the result with spousal
education as additional explanatory variable. For individuals married to all four types, conditional on SES,
those with a rural hukou tend to have a spouse with lower SES.
Moreover, in Column (4) of Panel B in Table 3.5, the small and insignificant coefficient of rural spouse,
-0.099, is also supportive of our assumption that λ is very close to 1. For those married to urban husbands,
rural women and urban women are married to husbands with similar SES, conditional on the wives’ SES.
This shows that wife’s rural hukou and wife’s urban hukou contribute similarly to the marital surplus.
Conditional and unconditional correlation between SES and hukou status.
In Table 3.6, I divide married people into four types by sex and spousal hukou status.
In the theoretical prediction, for two urban men who marry a rural woman and a urban woman respec-
tively, the rural woman has on average a higher SES than the urban woman. For two rural men who marry
a rural woman and a urban woman respectively, the rural woman has on average a similar SES to the urban
woman. For two urban women who marry a rural man and a urban man respectively, the rural man has
on average a higher SES than the urban man. This would predict that a rural hukou and SES should be
positively correlated for the spouses of urban men and urban women, and not correlated for the spouses
89
of rural men. However, in practice, people match on multiple characteristics including many unobserved
ones. Hence, to match the prediction into a hypothesis using data, the prediction is restated as following:
the conditional correlation between a rural hukou and SES, given spousal SES, should be less negatively
correlated than the unconditional one, for the spouses of urban men, urban women and rural men. This is
supported by Columns (1) and (2) in Panels (A),(B),(D) in Table 3.6.
3.4 Conclusion and Discussion
In this chapter, I build a bidimensional matching model incorporating to understand the marriage matching
patterns along SES and hukou status in China. I test the model’s predictions using the China 2000 0.095%
sample Census. The model can be a building block to evaluate the effect of the hukou policy reforms. In
future work, I plan to use the prefecture-level Hukou reform data between 1997 and 2010 collected in Fan
(2019), to test the comparative statics of the model. More specifically, since hukou reform increased λ, we can
test how the percentage of mixed couples and how individuals’ SES change in the mixed couples. Another
potential direction is to explore the details of the hukou reforms. Reforms at different prefectures at different
time may focus on different public goods, for example, some mentioned more about housing policies, some
mentioned more about children’s education. Empirical analysis combined with the model may be utilized
to evaluate people’s evaluation of different public goods.
Moreover, rural-to-urban migration is not limited to China. Almost all societies have experienced ur-
banization in history and/or are still in the process of urbanization. A cross-country comparison analysis of
how the matching pattern among rural and urban population changes over time will be interesting future
work.
90
3.5 Figures
Figure 3.1: Predicted matching pattern with symmetric population
Rural Urban Rural Urban
ye1′
ye
e
x
xe′
ye2′
Women Men
This figure illustrates an example of the equilibrium matching pattern with the asymmetric surplus function and symmetric
population. Each bar represents population of rural women, urban women, rural men and urban men. The height indicates
SES, the higher, the larger SES the agent has. The width indicates the mass of one particular SES type, the wider, the
more populated this particular type is. Area of same color indicates matching in equilibrium. Agents match positively
assortatively among each color category.
91
Figure 3.2: Matching patterns under different parameter values
N.A.
Pure
Exist Mixed
Couples
This figure illustrates the type of the equilibrium matching pattern with different values of λ and µ in the surplus function
and symmetric population. Cases where λ < µ are not applicable in the current setting. When µ is relatively large, there
is no mixec couple and the matching is pure. In the lower area, mixed couples are observed and the matching pattern is
similar to that depicted in Figure 3.1.
92
Figure 3.3: Predicted matching pattern with asymmetric population
e
x ye1′
f′ ye1
x1
f′
x 2 ye2′
ye2
Women Men
This figure illustrates an example of the equilibrium matching pattern with the asymmetric surplus function and asymmetric
population. Each bar represents population of rural women, urban women, rural men and urban men. The height indicates
SES, the higher, the larger SES the agent has. The width indicates the mass of one particular SES type, the wider, the
more populated this particular type is. Area of same color indicates matching in equilibrium. Agents match positively
assortatively among each color category.
93
Figure 3.4: Predicted matching pattern with asymmetric population (Detailed)
e
x
f′
x3 ye1′
f′
f1
x
x4 f′ ye1
x1 ye
3
f2 f′
x x ye2′
ye3′
2
f3
x ye2
Women Men
This figure is a detailed version of Figure 3.3. It shows how rural-rural and urban-urban couples couples match in details.
Different shades of cyan and different shades of blue are used to indicate the different subsets of rural population and urban
population that marry to the spouses with same hukou type.
94
Figure 3.5: Utility in the quadratic example
Utility
1.0
0.8
Rural Women
Urban Women
0.6
Rural Men
Urban men
0.4
0.2
SES
0.2 0.4 0.6 0.8 1.0 1.2 1.4
This figure illustrates the individuals’ utilities in the quadratic example when f (x, y) = xy. The parameter values
are: α = 3, λ = 0.95, µ = 0.6, b = 0.4. The four lines represent how utilities depend on SES for the four population:
rural women, urban women, rural men and urban men.
95
Figure 3.6: Comparative statics
Increase λ Increase μ
Utility change Utility change
0.015
0.010 0.04
0.005
0.02
SES
0.2 0.4 0.6 0.8 1.0 1.2 1.4 SES
0.2 0.4 0.6 0.8 1.0 1.2 1.4
-0.005
-0.02
-0.010
Rural Women
-0.015 -0.04
Urban Women
Decrease α Decrease b Rural Men
Utility change Utility change
Urban Men
96
0.010 0.04
0.005 0.02
SES
0.2 0.4 0.6 0.8 1.0 1.2 1.4 SES
0.2 0.4 0.6 0.8 1.0 1.2
-0.005
-0.02
-0.010
-0.04
This figure illustrates the comparative statics of individuals’ utilities with respect to the theoretical parameters in the quadratic example.
Baseline case is: α = 3, λ = 0.95, µ = 0.6, b = 0.4. Each panel changes one parameter at one time.
The four lines indicate the utility change at different SES levels for the four population: rural women, urban women, rural men and urban men.
In the upper-left panel, λ is increased from 0.95 to 0.99.
In the upper-right panel, µ is increased from 0.6 to 0.7.
In the bottom-left panel, α is decreased from 3 to 2.
In the bottom-right panel, b is decreased from 0.4 to 0.3.
Figure 3.7: Heatmap for husbands’ and wives’ education
15
15
Urban husband
10
10
5
5
0
0
0 5 10 15 0 5 10 15
15
15
Rural husband
10
10
5
5
0
0 5 10 15 0 5 10 15
Rural wife Urban wife
The four panels of this figure show a heatmap of the wives’ and husbands’ years of schooling for the four marriage types.
Red indicates more frequencies while blue indicates fewer frequencies. The dashed diagnol line has slope 1.
Source: China 2000 Census, men age 22-35 and women aged 20-33.
97
3.6 Tables
Table 3.1: Summary Statistics
A. Married Husbands Wives

Age (years) 29.52 28.10
(3.04) (3.05)
Education (years) 9.15 8.43
(2.69) (2.99)
Agriculture Hukou (=1) 0.76 0.77
(0.43) (0.42)
Observations 73,940 73,940
B. Never Married Men Women
Age (years) 24.35 21.38
(3.49) (2.81)
Education (years) 9.81 10.25
(3.27) (3.14)
Agriculture Hukou (=1) 0.68 0.66
(0.47) (0.47)
Observations 50,505 47,088
‘
Note: Sample comes from men age 22-35 and women aged 20-33 in China 2000
Census. Married couples in Panel A limit to those in which both husbands
and wives can be observed in the data.
Table 3.2: Matching patterns by hukou status.
A. Observed matching Rural husband Urban husband

Rural wife 74.03% 3.29%
(54,737) (2,431)
Urban wife 1.83% 20.86%
(1,350) (15,422)
B. Random Matching Rural husband Urban husband
Rural wife 58.52% 18.48%
Urban wife 17.48% 5.53%
‘
Note: Panel A shows the empirical frequencies of the four marriage types by
hukou status. Panel B shows the counterfactual matching frequencies when
people marry across hukou type randomly.
98
Table 3.3: Positive assortative matching in each matching cateogry
A. Rural wife-Rural husband B. Rural wife-Urban husband

Wife’s Husband’s Wife’s Husband’s
education education education education
Spousal education 0.44∗∗∗ 0.33∗∗∗ 0.29∗∗∗ 0.41∗∗∗
(0.0056) (0.0047) (0.020) (0.027)
N 54,737 54,737 2,431 2,431
R2 0.325 0.260 0.182 0.164
C. Urban wife-Rural husband D. Urban wife-Urban husband

Wife’s Husband’s Wife’s Husband’s
education education education education
Spousal education 0.50∗∗∗ 0.36∗∗∗ 0.62∗∗∗ 0.62∗∗∗
(0.036) (0.028) (0.0065) (0.0064)
N 1,350 1,350 15,422 15,422
R2 0.195 0.227 0.422 0.414
Note: This table shows the correlation between husbands’ and wives’ education (years of schooling) among the four marriage
categories by hukou status.
Table 3.4: Selection into four matching categories
A. Rural wife-Rural husband B. Rural wife-Urban husband

Wives Husbands Wives Husbands
Age (years) 28.07 29.41 27.37 29.30
(3.12) (3.08) (3.14) (3.05)
Education (years) 7.44 8.23 9.14 10.57
(2.33) (1.93) (2.15) (2.53)
C. Urban wife-Rural husband D. Urban wife-Urban husband
Wives Husbands Wives Husbands
Age (years) 27.64 28.84 28.38 29.99

(3.02) (3.13) (2.75) (2.80)
Education (years) 9.92 9.51 11.71 12.17
(2.50) (2.16) (2.77) (2.76)
‘
Note: This table presents the average characteristics of men and women in each marriage category by hukou status.
99
Table 3.5: Explain individuals’ education using spousal characteristics, by hukou status.
Panel A: Rural Population

Rural wives’ Rural husbands’
Dependent variable: own education own education
(1) (2) (3) (4)
Rural spouse -1.48∗∗∗ -0.53∗∗∗ -1.11∗∗∗ -0.40∗∗∗
(0.045) (0.043) (0.060) (0.054)
Spousal education 0.43∗∗∗ 0.33∗∗∗
(0.0053) (0.0046)
N 57,168 57,168 56,087 56,087
R2 0.221 0.332 0.143 0.265
Panel B: Urban Population
Urban wives’ Urban husbands’
Dependent variable: own education own education
(1) (2) (3) (4)
Rural spouse -1.83∗∗∗ -0.17∗∗ -1.71∗∗∗ -0.099∗
(0.072) (0.068) (0.056) (0.055)
Spousal education 0.61∗∗∗ 0.61∗∗∗
(0.0064) (0.0063)
N 16,772 16,772 17,853 17,853
R2 0.084 0.423 0.079 0.404
Note: This table shows regression results of using spousal characteristics to explain own education, separately for individuals
of different genders with different hukou status.
Notes: Significance levels: * 10%, ** 5%, *** 1%.
100
Table 3.6: Unconditional and conditional correlation between hukou and SES, by spousal type
Dependent variable:
Own education
A. Women with rural husbands B. Women with urban husbands
(1) (2) (3) (4)
Rural -2.20∗∗∗ -1.70∗∗∗ -2.65∗∗∗ -1.64∗∗∗
(0.070) (0.064) (0.050) (0.049)
Husband’s education 0.44∗∗∗ 0.58∗∗∗
(0.0055) (0.0062)
N 56,087 56,087 17,853 17,853
R2 0.224 0.337 0.146 0.446
C. Men with rural wives D. Men with urban wives
(1) (2) (3) (4)
Rural -2.23∗∗∗ -1.72∗∗∗ -2.70∗∗∗ -1.55∗∗∗
(0.053) (0.050) (0.064) (0.062)
Wife’s education 0.33∗∗∗ 0.61∗∗∗
(0.0046) (0.0063)
N 57,168 57,168 16,772 16,772
R2 0.173 0.290 0.105 0.437
Note: This table presents the correlation between education and hukou status for individuals by gender and spousal hukou
status, both unconditionally (Column 1) and conditionally on spousal education (Column 2).
Notes: Significance levels: * 10%, ** 5%, *** 1%.
101
3.7 Appendix
Proof. Let’s prove the first part of this proposition: Suppose x̄′ =
supx′ {Ox′ : those urban women who marry a rural man with positive probability}; y¯′ =
supy′ {Oy′ : those urban men who marry a rural woman with positive probability}. We want to show
that x̄′ < 1, and y¯′ < 1. Let’s prove by contradiction. There are three cases.
• Suppose x̄′ = 1 and y¯′ = 1, let’s denote the two couples as (x′ = 1, yA ) and (xA , y ′ = 1). Their total
surplus is:
Σ = µf (1, yA ) + λf (xA , 1)
while exchanging the partners will give us:
Σ1 = f (1, 1) + µf (xA , yA )
Σ1 − Σ = f (1, 1) − λf (xA , 1) + µ(f (xA , yA ) − f (1, yA ))
> λ(f (1, 1) − f (xA , 1)) − µ(f (1, yA ) − f (xA , yA ))
> λ((f (1, 1) − f (xA , 1)) − (f (1, yA ) − f (xA , yA ))) > 0
the last inequality is due to the supermodularity assumption. This violates the surplus maximization
property of a stable matching.
• Suppose x̄′ = 1 and y¯′ < 1, let’s denote the two couples as (x′ = 1, yA ) and (xN , y ′ = 1). The total
surplus is:
Σ = µf (1, yA ) + f (xN , 1)
Σ1 = f (1, 1) + µf (xN , yA )
102
Σ1 − Σ = f (1, 1) − f (xN , 1) + µ(f (xN , yA ) − f (1, yA ))
= (f (1, 1) − f (xN , 1)) − µ(f (1, yA ) − f (xN , yA ))
> µ((f (1, 1) − f (xN , 1)) − (f (1, yA ) − f (xN , yA ))) > 0
• Suppose x̄′ < 1 and y¯′ = 1, let’s denote the two couples as (x′ = 1, yN ) and (xA , y ′ = 1). The total
surplus is:
Σ = f (1, yN ) + λf (xA , 1)
Σ1 = f (1, 1) + λf (xA , yN )
Σ1 − Σ = f (1, 1) − f (1, yN ) + λ(f (xA , yN ) − f (xA , 1))
= (f (1, 1) − f (1, yN )) − λ(f (xA , 1) − f (xA , yN ))
> λ((f (1, 1) − f (1, yN )) − (f (xA , 1) − f (xA , yN ))) > 0
Therefore, x̄′ < 1 and y¯′ < 1. According to Proposition 3.1, we also know that all urban women with
SES larger than x̄′ and all urban men with SES larger than y¯′ marry each other positively assortatively.
Moreoever, in our case here, x̄′ = y¯′ . Define h(λ) as x̄′ = y¯′ = h(λ).
Now let’s move on to prove the second part. Denote x̄ =
supx {Ox : those rural women who marry rural man with positive probability}. We want to show that
x̄ < 1. Let’s prove by contradiction. Suppose x̄ = 1, hence pA (x = 1) > 0, there are two cases:
• 0 < pA (x = 1) < 1, for the partners of rural women with x = 1, denote the rural husband’s SES as
103
yA , and the urban husband’s SES as yN . Moreover, we know that yN = y¯′ = h(λ). And
∂f (x, y) ∂f (x, y) ∂f (x, y)

µ (1, yA ) = λ (1, yN ) = λ (1, h(λ)) (3.1)
∂x ∂x ∂x
∂f (x,y)
λ (1,h(λ))
If µ < ∂x
∂f (x,y) , there doesn’t exist yA ∈ [0, 1] satisfying Equation 3.1.
∂x (1,1)
• pA (x = 1) = 1, denote the rural man she marries has SES yA . Then yA = 1. Otherwise, the rural man
yA = 1 must marry a urban woman with xN < 1, couples (x = 1, yA ) and (xN , yA = 1) have smaller
total surplus than couples (x = 1, yA = 1)) and (xN , yA ).
At the same time, urban man with y¯′ = h(λ) marries some rural woman with xA < 1. Couples
(x = 1, yA = 1) and (xA , y ′ = y¯′ = h(λ) have joint surplus:
Σ = µf (1, 1) + λf (xA , h(λ))
Σ1 = µf (xA , 1) + λf (1, h(λ)))
Σ1 − Σ = λ(f (1, h(λ)) − f (xA , h(λ))) − µ(f (1, 1) − f (xA , 1))

∫ 1 ∫ 1
∂f (x, y) ∂f (x, y)
=λ (t, h(λ))dt − µ (t, 1)dt (3.2)
xA ∂x xA ∂x
∂f (x,y)
λ (t,h(λ))
If µ < mint∈[0,1] ∂x
∂f (x,y) , the RHS of Equation 3.2 is positive.
∂x (t,1)
∂f (x,y)
λ (t,h(λ))
Therefore, we show that if µ < mint∈[0,1] ∂x
∂f (x,y) , x̄ < 1.
∂x (t,1)
Proof. Let’s break the proofs into two parts. In the first part, I’ll show the equality conditions to pin down
the cutoffs. In the second part, I’ll prove that this matching is a stable matching.
f′ , x
e, x
To pin down the cutoffs, besides the original seven cutoffs in the proposition x f′ , ye1 , ye2 , ye′ , ye′ , let
1 2 1 2
f1 , x
me add seven auxiliary cutoff parameters as shown in Figure 3.4: x f2 , x f′ , x
f3 , x f′ , ye3 , ye′ . Therefore we have
3 4 3
104
14 unknown parameters to be determined in the equilibrium. There are two types of equality conditions.
The first set is that for a person that marries both rural and urban spouse with positive probability, his/her
marginal contribution should be the same in the two types of marriages. The second set is feasibility
constraint that for any type of marriage, the total mass of women should be equal to the total mass of men.
Here are eight equality conditions in the first set:
x, A) marries both rural man (1, A) and urban man (ye2′ , N ) with positive probability:
• Rural woman (e
µ ∗ 1 = λye2′ (3.3)
f′ , N ) marries both rural man (ye1 , A) and urban man (ye′ , N ) with positive probability:
• Urban woman (x1 3
µye1 = ye3′ (3.4)
f′ , N ) marries both rural man (ye3 , A) and urban man (b, N ) with positive probability:
• Urban woman (x 2
µye3 = b (3.5)
• Rural man (ye1 , A) marries both rural woman (f f′ , N ) with positive proba-
x1 , A) and urban woman (x1
bility:
µf f′
x1 = µx (3.6)
1
• Rural man (ye2 , A) marries both rural woman (f

x3 , A) and urban woman (b, N ) with positive probability:
µf
x3 = µb (3.7)
• Rural man (ye3 , A) marries both rural woman (f f′ , N ) with positive proba-
x2 , A) and urban woman (x2
bility:
µf f′
x2 = µx (3.8)
2
• Urban man (ye1′ , N ) marries both rural woman (1, A) and urban woman (x
f′ , N ) with positive proba-
3
105
bility:
f′
λ∗1=x (3.9)
3
• Urban man (ye2′ , N ) marries both rural woman (e f′ , N ) with positive proba-
x, A) and urban woman (x4
bility:
λ∗x f′
e=x (3.10)
4
Here are the additional equality conditions in the second set:
• Rural women with x ∈ (f e) should have the same mass as rural men with y ∈ (ye1 , 1):
x1 , x
e−x
x f1 = 1 − ye1 (3.11)
• Rural women with x ∈ (0, x

f3 ) should have the same mass as rural men with y ∈ (0, ye2 ):
f3 − 0 = ye2 − 0
x (3.12)
f′ , 1 + b) should have the same mass as urban men with y ′ ∈ (ye′ , 1 + b):
• Urban women with x′ ∈ (x3 1
f′ = 1 + b − ye′
1+b−x (3.13)
3 1
f′ , 1 + b) should be euqal to the mass of urban men with

• The mass of urban women with x′ ∈ (x 4
y ′ ∈ (ye2′ , 1 + b) minus the total mass of rural women with x ∈ (e

x, 1):
f′ ) = (1 + b − ye′ ) − α(1 − x
(1 + b − x e) (3.14)
4 2
f′ ) should be equal to the mass of urban men with y ′ ∈ (b, ye′ )

• The mass of urban women with x′ ∈ (b, x1 3
plus the mass of rural women with x′ ∈ (e

x, 1).
f′ − b = ye′ − b + α(1 − x
x e) (3.15)
1 3
106
f′ ) should be equal to the mass of rural men with y ∈ (0, ye3 )
• The mass of urban women with x′ ∈ (b, x 2
minus the mass of rural women with x ∈ (0, x

f2 ):
f′ − b) = α(ye3 − x
(x f2 ) (3.16)
2
There are 14 unknowns and 14 conditions.
From Equation 3.3: ye2′ = µ

λ.
From Equation 3.4: ye3′ = µye1 .
From Equation 3.5: ye3 = b

µ.
f′ = x
From Equation 3.6 and Equation 3.11: x f1 = x
e − (1 − ye1 ).
1
f3 = b.
From Equation 3.7: x
From Equation 3.8 and Equation 3.16: x f′ =

f2 = x b+αye3
= α+µ
2 1+α αµ+µ b.
f′ = ye′ = λ.
From Equation 3.9 and Equation 3.13: x3 1
f′ = λe
From Equation 3.10: x x.
4
From Equation 3.12: ye2 = x

f3 = b.
α+ µ
e=
From Equation 3.14: x α+λ
λ
λ− µ
From Equation 3.15: ye1 = 1+α
1−µ (1 −x
e) = 1+α
1−µ ∗ α+λ .
λ
With the cut-offs pinned down and the positive assortativeness property as shown in Proposition 3.1, the
matching function can be easily written down. The matching function is linear due to the nice property of
the uniform distributions.
Now let’s proceed to the second part: proving this matching is indeed stable. We have to show that
for all women and men, the sum of their current utilities are not less than the surplus they can produce
together. Define ∆((x, X), (y, Y )) = u(x, X) + v(y, Y ) − Σ((x, X), (y, Y )), we have to show that ∆ ≥ 0.
x, 1] who currently marry urban men with y ′ ∈ [ye2′ =

Let’s start with the rural women with x ∈ [e µ e′
λ , y1 ]:
• Urban men with y ′ ∈ (ye1′ , 1 + b] who currently marry urban women with x′ ∈ x
f′ = λ, 1 + b]:
3
∆(x, y ′ ) = uA (x) + vN (y ′ ) − λxy ′
107
∂∆
= u′A (x) − λy ′ = λy ∗ (x) − λy ′ < 0
∂x
∂∆ ′
= vN (y ′ ) − λx = x∗ (y ′ ) − λx > 0
∂y ′
Hence ∆ achives its infimum when x = 1 and y ′ = ye1′ , and we know that:
∆(1, ye1′ ) = 0
Therefore stability holds for this case.
• Urban men with y ′ ∈ [ye2′ , ye1′ ], stability holds trivially since the matching is positively assortative.
• Urban men with y ′ ∈ [b, ye2′ ) who currently marry urban women with x′ ∈ [x
f′ , x
2
f′ = λe
4 x):
∆(x, y ′ ) = uA (x) + vN (y ′ ) − λxy ′
∂∆
= u′A (x) − λy ′ = λy ∗ (x) − λy ′ > 0
∂x
∂∆ ′
= vN (y ′ ) − λx = x∗ (y ′ ) − λx < 0
∂y ′
e and y ′ = ye2′ , and we know that:

Hence ∆ achives its infimum when x = x
x, ye2′ ) = 0
∆(e
• All rural men with y ∈ [0, 1]:
∆(x, y) = uA (x) + vA (y) − µxy
∂∆ µ
= u′A (x) − µy = λy ∗ (x) − µy ≥ λ ∗ − µy ≥ 0
∂x λ
∂∆ ′
= vA (y) − µx = µx∗ (y) − µx ≤ 0
∂y ′
108
e and y = 1, and we know that:
Hence ∆ achives its infimum when x = x
∆(e
x, 1) = 0
In all, we prove that any rural women with x ∈ [e

x, 1] can’t form a blocking pair with all men. For other
rural women and all urban women, we can apply the same logic to prove the stability.
109
References
Abramitzky, R., A. Delavande, and L. Vasconcelos (2011). Marrying up: the role of sex ratio in assortative
matching. American Economic Journal: Applied Economics 3(3), 124–57.
Ahn, S. Y. (2018). Matching across markets: Theory and evidence on cross-border marriage.
Akresh, R., D. Halim, and M. Kleemans (2018). Long-term and intergenerational effects of education:
Evidence from school construction in Indonesia. Technical report, National Bureau of Economic Research.
André, P. and Y. Dupraz (2018). Education and polygamy: Evidence from Cameroon. Technical report.
Angrist, J. (2002). How do sex ratios affect marriage and labor markets? Evidence from America’s second
generation. The Quarterly Journal of Economics 117 (3), 997–1038.
Arunachalam, R. and S. Naidu (2006). The price of fertility: marriage markets and family planning in
Bangladesh. University of California, Berkeley.
Ashraf, N., N. Bau, N. Nunn, and A. Voena (2016). Bride price and female education. Technical report,
National Bureau of Economic Research.
Barro, R. and X. Sala-i-Martin (1995). Economic Growth. New York: McGraw-Hill.
Becker, G. S. (1973). A theory of marriage: Part i. The Journal of Political Economy, 813–846.
Behrman, J. R. and N. Birdsall (1983). The quality of schooling: quantity alone is misleading. The American
Economic Review 73(5), 928–946.
Bharati, T., S. Chin, and D. Jung (2018). Recovery from an early life shock through improved access to
schools: Evidence from Indonesia.
Bhaskar, V. (2015). The demographic transition and the position of women: A marriage market perspective.
Bobonis, G. J. and F. Finan (2009). Neighborhood peer effects in secondary school enrollment decisions.
The Review of Economics and Statistics 91(4), 695–716.
Breierova, L. and E. Duflo (2004). The impact of education on fertility and child mortality: Do fathers really
matter less than mothers? Technical report, National bureau of economic research.
Carmichael, S. (2011). Marriage and power: Age at first marriage and spousal age gap in lesser developed
countries. The History of the Family 16(4), 416–436.
Castro, J. F. and B. Esposito (2018). The effect of bonuses on teacher behavior: A story with spillovers.
Technical report, Discussion Paper 104, Peruvian Economic Association, August.
Chan, K. W. and L. Zhang (1999). The hukou system and rural-urban migration in China: Processes and
changes. The China Quarterly 160, 818–855.
Chari, A., R. Heath, A. Maertens, and F. Fatima (2017). The causal effect of maternal age at marriage on
child wellbeing: Evidence from India. Journal of Development Economics 127, 42–55.
Charles, K. K. and M. C. Luoh (2010). Male incarceration, the marriage market, and female outcomes. The
Review of Economics and Statistics 92(3), 614–627.
110
Chiappori, P.-A., R. J. McCann, and L. P. Nesheim (2010). Hedonic price equilibria, stable matching, and
optimal transport: equivalence, topology, and uniqueness. Economic Theory 42(2), 317–354.
Chiappori, P.-A., S. Oreffice, and C. Quintana-Domeque (2017). Bidimensional matching with heteroge-
neous preferences: education and smoking in the marriage market. Journal of the European Economic
Association 16(1), 161–198.
Chiappori, P.-A., B. Salanié, and Y. Weiss (2017). Partner choice, investment in children, and the marital
college premium. American Economic Review 107 (8), 2109–67.
Choo, E. and A. Siow (2006). Who marries whom and why. Journal of political Economy 114(1), 175–201.
Decker, C., E. H. Lieb, R. J. McCann, and B. K. Stephens (2013). Unique equilibria and substitution effects
in a stochastic model of the marriage market. Journal of Economic Theory 148(2), 778–792.
Dessy, S. and H. Djebbari (2010). High-powered careers and marriage: can women have it all? The BE
Journal of Economic Analysis & Policy 10(1).
Dominguez, C. (2014). Aggregate effects on the marriage market of a big increase in educational attainment.
Dissertation, Yale University.
Duflo, E. (2001). Schooling and labor market consequences of school construction in Indonesia: Evidence
from an unusual policy experiment. American Economic Review 91(4), 795–813.
Dupuy, A., A. Galichon, and L. Zhao (2014). Migration in China: to work or to wed? Technical report,
working paper.
Edlund, L. (2005). Sex and the city. The Scandinavian Journal of Economics 107 (1), 25–44.
Edlund, L. (2006). Marriage: past, present, future? CESifo Economic Studies 52(4), 621–639.
Fan, J. (2019). Internal geography, labor mobility, and the distributional impacts of trade. American
Economic Journal: Macroeconomics. Forthcoming.
Fergusson, D. M. and L. J. Woodward (1999). Maternal age and educational and psychosocial outcomes in
early adulthood. The Journal of Child Psychology and Psychiatry and Allied Disciplines 40(3), 479–489.
Frederick, W. H. and R. L. Worden (1993). Indonesia: A country study, Volume 550. Washington, DC:
Federal Research Division, Library of Congress.
Galichon, A. and B. Salanié (2015). Cupid’s invisible hand: Social surplus and identification in matching
models.
Galichon, A. and B. Salanié (2017). The econometrics and some properties of separable matching models.
American Economic Review 107 (5), 251–55.
Gautier, P. A., M. Svarer, and C. N. Teulings (2010). Marriage and the city: Search frictions and sorting of
singles. Journal of Urban Economics 67 (2), 206–218.
Glewwe, P., E. A. Hanushek, S. Humpage, and R. Ravina (2013). School resources and educational outcomes
in developing countries: A review of the literature from 1990 to 2010. Education Policy in Developing
Countries 4(14,972), 13.
Glewwe, P. and M. Kremer (2006). Schools, teachers, and education outcomes in developing countries.
Handbook of the Economics of Education 2, 945–1017.
Glewwe, P. and K. Muralidharan (2016). Improving education outcomes in developing countries: Evidence,
knowledge gaps, and policy implications. In Handbook of the Economics of Education, Volume 5, pp.
653–743. Elsevier.
Greenwood, J., N. Guner, G. Kocharkov, and C. Santos (2014). Marry your like: Assortative mating and
income inequality. American Economic Review 104(5), 348–53.
111
Hady, H. (1989). Capter iv : Regional development review of policies and achievements 1 hariri hady.
Indonesia, two decades of economic development, 142–168.
Han, L., T. Li, and Y. Zhao (2015). How status inheritance rules affect marital sorting: Theory and evidence
from urban China. The Economic Journal 125(589), 1850–1887.
Hanushek, E. A. (2011). The economic value of higher teacher quality. Economics of Education review 30(3),
466–479.
Hener, T. and T. Wilson (2018). Marital age gaps and educational homogamy-evidence from a compulsory
schooling reform in the UK. Technical report, Ifo Working Paper.
Iyigun, M. and J. Lafortune (2016). Why wait? a century of education, marriage timing and gender roles.
Jalal, F., M. Samani, M. C. Chang, R. Stevenson, A. B. Ragatz, and S. D. Negara (2009). Teacher cer-
tification in Indonesia: A strategy for teacher quality improvement. Departemen Pendidikan Nasional,
Republik Indonesia.
Jensen, R. and R. Thornton (2003). Early female marriage in the developing world. Gender & Develop-
ment 11(2), 9–19.
Jones, G. W. (1994). Marriage and divorce in Islamic South-East Asia.
Jones, G. W. and P. Hagul (2001). Schooling in Indonesia: crisis-related and longer-term issues. Bulletin of
Indonesian Economic Studies 37 (2), 207–231.
Jürges, H., S. Reinhold, and M. Salm (2011). Does schooling affect health behavior? evidence from the
educational expansion in Western Germany. Economics of Education Review 30(5), 862–872.
Kirbas, A., H. C. Gulerman, and K. Daglar (2016). Pregnancy in adolescence: is it an obstetrical risk?
Journal of pediatric and adolescent gynecology 29(4), 367–371.
Low, C. (2017). A �reproductive capital� model of marriage market matching. Manuscript, Wharton School
of Business.
Malamud, O., A. Mitrut, and C. Pop-Eleches (2018). The effect of education on mortality and health: Evi-
dence from a schooling expansion in Romania. Technical report, National Bureau of Economic Research.
Malhotra, A. (1991). Gender and changing generational relations: Spouse choice in Indonesia. Demogra-
phy 28(4), 549–570.
Mankiw, N. G., D. Romer, and D. N. Weil (1992). A contribution to the empirics of economic growth. The
quarterly journal of economics 107 (2), 407–437.
Martinez-Bravo, M. (2017). The local political economy effects of school construction in indonesia. American
Economic Journal: Applied Economics 9(2), 256–89.
McEwan, P. J. (2015). Improving learning in primary schools of developing countries: A meta-analysis of
randomized experiments. Review of Educational Research 85(3), 353–394.
Miguel, E. and M. Kremer (2004). Worms: identifying impacts on education and health in the presence of
treatment externalities. Econometrica 72(1), 159–217.
Ozier, O. (2018). The impact of secondary schooling in Kenya: A regression discontinuity analysis. Journal
of human resources 53(1), 157–188.
Rivkin, S. G., E. A. Hanushek, and J. F. Kain (2005). Teachers, schools, and academic achievement.
Econometrica 73(2), 417–458.
Rosales-Rueda, M., B. Mazumder, and M. Triyana (2019, May). Intergenerational human capital spillovers:
Indonesia�s school construction and its effects on the next generation. AEA Papers and Proceedings 109.
112
Sekhri, S. and S. Debnath (2014). Intergenerational consequences of early age marriages of girls: Effect on
children�s human capital. The Journal of Development Studies 50(12), 1670–1686.
Shapley, L. S. and M. Shubik (1971). The assignment game i: The core. International Journal of game
theory 1(1), 111–130.
Siow, A. (1998). Differntial fecundity, markets, and gender roles. Journal of Political Economy 106(2),
334–354.
Snodgrass, D. (1984, October). Development Program Implementation Studies No.5: Inpres Sekolah Dasar.
Cambridge, MA.: Harvard Institute for International Development.
Snodgrass, D., L. Hutagalung, and S. Dasar (1980). Inpres Sekolah Dasar: An Analytical Study. Economics
and Human Resources Research Center, Faculty of Economics,Padjadjaran University.
Tan, J. P. and A. Mingat (1992). Education in Asia : a comparative study of cost and financing (English).
World Bank regional and sectoral studies. Washington, DC : The WorldBank.
Weiss, Y., J. Yi, and J. Zhang (2013). Hypergamy, cross-boundary marriages, and family behavior.
World Bank (1989). Indonesia - Basic education study (English). Washington, DC: World Bank.
World Bank (2016, January). Indonesia Teacher Certification and Beyond. World Bank, Jakarta.
World Bank (2018). World Development Report, LEARNING to Realize Education’s Promise.
Young, A. (1994). Lessons from the East Asian NICs: a contrarian view. European economic review 38(3-4),
964–973.
Young, A. (1995). The tyranny of numbers: confronting the statistical realities of the East Asian growth
experience. The Quarterly Journal of Economics 110(3), 641–680.
Zhang, H. (2018, October). Human Capital Investments, Differential Fecundity, and the Marriage Market.
Working Papers 2018-7, Michigan State University, Department of Economics.
113

Zha Columbia 0054D 15223

Uploaded by

Copyright:

Available Formats

You might also like

Zha Columbia 0054D 15223

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Zha Columbia 0054D 15223

Uploaded by

Copyright:

Available Formats

Essays on Education and the Marriage Market

Submitted in partial fulfillment of the

Essays on Education and the Marriage Market

using the China 2000 0.095% sample census.

List of Figures iii

1 Chapter 1. Schooling Expansion and Education 1

1.2 School Construction and Education in Indonesia . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Empirical Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Chapter 2. Schooling Expansion and the Female Marriage Age 30

2.3 The Marriage Market in Indonesia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.5 Main Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3 Chapter 3. Multidimensional Matching: Hukou status in the Marriage Market 73

3.3 Empirical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

3.4 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

3.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

1.1 Map of primary school construction intensity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.2 Construction intensity for sparsely and densely populated areas . . . . . . . . . . . . . . . . . . . 16

education equation for completing primary school . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

education equation for completing secondary school . . . . . . . . . . . . . . . . . . . . . . . . . 18

tom) in sparsely populated areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

tom) in densely populated areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

of secondary school teachers equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1.8 Effect of the program on primary teacher education . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.A.1Number of newly appointed primary school teachers,1974-1998 . . . . . . . . . . . . . . . . . . . 29

sparsely populated areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

densely populated areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.1 Predicted matching pattern with symmetric population . . . . . . . . . . . . . . . . . . . . . . . 91

3.2 Matching patterns under different parameter values . . . . . . . . . . . . . . . . . . . . . . . . . 92

3.4 Predicted matching pattern with asymmetric population (Detailed) . . . . . . . . . . . . . . . . . 94

3.5 Utility in the quadratic example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

3.6 Comparative statics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

3.7 Heatmap for husbands’ and wives’ education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

1.1 Summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

1.2 Effect of school construction on education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.3 Heterogeneous effect of school construction on education . . . . . . . . . . . . . . . . . . . . . . . 25

1.A1 Inpres Sekolah Dassar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1.A2 Development grant to regions (in billion Rp.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.1 Summary statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

2.2 First marriage age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

2.3 Reduced-form effect of school construction on female marriage outcomes . . . . . . . . . . . . . . 55

2.4 Results of female education distribution on female marriage outcomes . . . . . . . . . . . . . . . 56

3.1 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

3.2 Matching patterns by hukou status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

3.3 Positive assortative matching in each matching cateogry . . . . . . . . . . . . . . . . . . . . . . . 99

3.4 Selection into four matching categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

also benefited a lot from email exchange with Natalie Bau.

Schooling Expansion and Education

be affected negatively if their resources decrease due to the existing program.

the importance of endogenous human capital to economic growth.

of research on the effectiveness of education policies.

also decrease the quality of primary education.

education because of the primary school construction.

teachers on nearby rural schools.

INPRES primary school construction program in Indonesia

effect diminishes for the younger children.

Education system in Indonesia

1989, page 16)

Mingat, 1992, table4.5, table 4.6, Table A.1)