Professional Documents
Culture Documents
Investigate Gender Equality and Empowerment of Women Status in Bangladesh
Investigate Gender Equality and Empowerment of Women Status in Bangladesh
2, 2013
Nahida Sultana received her MSc in Computer Science from North South
University in 2011. She is currently investigating on different spatial data
mining techniques on spatial and statistical data on Dhaka, especially on the
women literacy rate, women employment in service and agriculture sectors of
Bangladesh. She has also interest on regression model, both classical and
spatial regression model.
1 Introduction
Geospatial data is the data or information that identifies the geographic location of
features and boundaries on earth. Geospatial data mining basically performs exploratory
spatial data analysis (ESDA) on large computerised data repositories that have
geographic metadata. At present ESDA is more important than ever before due to the
remarkable processing power of modern computers, and the availability of enormous
amount of geospatial data. However, examples of the application of this field are
inadequate and almost unheard of in Bangladesh. In this analysis we apply the ESDA
techniques on Bangladeshi geospatial data, especially data that is related to women
literacy and empowerment.
The objective of this paper is to check whether the geographic distributions of women
education indicators, gender equality and GDP growth rate which related to women
literacy are consistent with Waldo Tobler’s (1970) first law of geography – “Everything
is related to everything else, but near things are more related than distant things”. Our
research aims to find the nature of the spatial patterns of these variables. We are
interested to see whether they produce any interesting outcome under spatial
autocorrelation and bivariate analysis. We are also interested to find probable ‘hot spots’
that are surrounded by high women literacy rates and ‘cold spots’ surrounded by low
women literacy rates. Moreover, in this research we investigate which regression and
spatial regression model is more appropriate to fit the data we have.
This paper highlights one of the significant parts of millennium development goals
(MDGs) of Bangladesh. We are focusing on Goal 3, that is, the promotion of gender
equality and empowerment of women. We analyse the women employment condition in
different sectors in this country. Women in Bangladesh are becoming increasingly visible
in economic spheres. Practically in all spheres of the development women are
contributing to the growth of economy. Women’s increasing involvement in both
agricultural work and in non-firm activities has provided with increased opportunities for
wage work and certain economic independence.
In this paper we introduce global autocorrelation. We use univariate Moran I and
applying various bivariate ESDA techniques. Using those techniques we illustrate the
women literacy rate within 64 zilla’s of Bangladesh. Also we look the local indicators of
spatial association (LISA), univariate LISA. We will find out the Moran I of the primary,
secondary and tertiary education of men and women. The situation of women and men
employment rate is also presented in the analysis. Finally, we use the regression analysis
on total women literacy rate, primary, secondary and adult literacy rate and women at
168 R.M. Rahman and N. Sultana
64 zilla’s. We fit the data with traditional and spatial regression and compare those
results.
The paper is organised as follows: Section 2 describers related work in the area of
geospatial data mining, studies on gender inequality and its causes, effects and
consequences. Section 3 presents the software package used in this research. Section 4
describes the data, the sources and the process of data acquisition. Section 5 outlines the
methodologies used in this research. A detailed analysis of our research findings is
presented in Section 6. Finally Section 7 concludes and gives direction of future research.
2 Related work
There are numerous works on gender inequality and its causes, effects and consequences.
Though the geospatial data mining and ESDA are relatively new field. The review of
some selected studies is as follows:
Chaudury et al. (2009) explore the impact of gender inequality in education on rural
poverty in Pakistan using logit regression analysis on primary datasets. It is
concluded that gender inequality in education has adverse impact on rural poverty. The
empirical findings suggest that female-male enrolment ratio, female-male literacy ratio,
female-male ratio of total years of schooling, female-male ratio of earners and education
of household head have significant negative impact on rural poverty.
Ahmed et al. (2004) explore the relationship between different levels of education
and poverty through an analysis of household-level data from 60 villages in Bangladesh.
First it depicts the overall trend in school enrolment at primary and secondary level
within 1988–2000, and confirms the inequality that exists in the access to education at
post-primary level. This is followed by a presentation of income and occupation data that
show a strong positive correlation with the level of education. In the second part, an
income function analysis has been done to assess the impact of education along with
other determinant. The third part analyses the effects of education on child/woman ratio,
and on the secondary school participation rate of male and female children.
Ahmad (2001) analyse private benefits and costs of primary versus secondary
education in rural Bangladesh on the basis of household-level data. It indicates that while
social benefits for primary education are high in Bangladesh, private benefits are higher
for secondary-level education than primary level. On the other hand, private costs are
lower for primary education than for secondary education. Poor households in
Bangladesh cannot afford to keep their children until they complete the secondary level
because of high costs – both direct costs and opportunity costs. Inequality in the access to
secondary education is the main cause of persistent poverty in Bangladesh. The recent
improvement of female participation rates in both primary and secondary levels confirms
the favourable impact of targeted approach. Policies should be directed to both boys and
girls from poor households.
Paraguas et al. (2005) studied the spatial relationship between the proportion of the
population with a standard of living below the poverty line and soil condition. de
Dominicis et al. (2007) compared traditional measurements of geographic concentration
of economic activities with spatial data analysis techniques. They also presented a
comprehensive analysis of the manufacturing industry and a set of hypotheses.
Türkcan et al. (2009) present the applications of spatial data analysis besides its
research findings. The authors elaborated on the potential application of spatial data
Geospatial data mining techniques to investigate gender equality 169
analysis techniques for policy making and discussed some interesting related projects
before delving into their research. This research is quite similar to de Dominicis et al.
(2007). It analyses the spatial properties of Turkey’s manufacturing industry. At the end,
the paper presents examples of the successful application of spatial analysis for
generating clustering policies.
Oliveau (2005) used different techniques to find the spatial patterns of India’s
contemporary demography. District level data on population density, urbanisation level,
fertility rate, child mortality rate and gender inequality were derived from India’s 2001
census. The data was used to identify the spatial patterns and spatial correlation that
described the demographic trend in India.
Uthman (2008) examined the impact of state-level access to basic environmental
services and neighbourhood deprivation on under-five mortality rate in Nigeria. They
concluded that spatial distribution of rates of under-five mortality rate was non-random
and clustered with a Moran’s I = 0.654 (p = .001). Spatial clustering suggested that
North-East and North-West of Nigeria could be grouped as under-five mortality
‘hot-spot’, and South-West, South-South, and South-East of Nigeria could be clustered as
under-five mortality ‘cold-spot’. The results outlined the consistent finding that access to
safe water, proper sanitation, and low pollution cooking fuel are important factors that
could increase the chances of child survival.
3 Data acquisition
Collecting geospatial data is not easy. It is more difficult in the context of a developing
country like Bangladesh. On the top of that, our research is about the geospatial analysis
of statistical data; not geographic or topographic data. So the data we collected did not
have any geographical metadata for geographic information systems – such as longitudes
170 R.M. Rahman and N. Sultana
and latitudes. We had to geo reference the data ourselves by associating elements from
the dataset to geographic polygons in a zilla level digital map of Bangladesh.
First we had to collect this digital map using a python27 software that make the
Shape file and also using the excel file to make. We got our statistical data from
Bangladesh Bureau of Statistics (BBS) – which is Bangladesh government’s official
organisation for the collection and dissemination of statistical data. We also collect data
from Bangladesh Bureau of Educational Information and Statistics (BANBEIS) and
website of UNDP, UNICEF. Literacy rates are collected at ten year intervals as part of
the National Population Census. Also, we collect data from Gender statistics book GSB,
(2008).
Our research deals with spatial autocorrelation and spatial regression. For preliminary
visualisation we used a choropleth map drawn using the equal interval classification
scheme (Choropleth, 2010) and a Moran scatter plot (Anselin, 1993). We used the global
Moran statistic to get a sense of the global spatial autocorrelation and LISA to find spatial
clusters. To visualise LISA we used significance maps and cluster maps. We also fit the
data to an appropriate spatial regression model.
I=
N
×
∑ ∑ W ( X − X )( X
i j ij i j −X )
∑∑W
i j ij
∑ (X − X ) i i
2
Here, I is the Moran’s I index. The sign and magnitude of this index gives the nature of
the correlation of the N number of points that are all indexed by i and j. Xi is the value at
current point in the dataset that is under consideration, X is the mean of all the points in
the dataset. Wij is the individual points in the weights matrix. Wij is 1 if Xi is a neighbour
of Xj, otherwise it will be 0.
I = xT Wx xT x
From the above equation it is clear that I is the slope of the regression of the Wx, on the
mean centred value of x. The plot of the constructed lag variable Wx versus x is called
Moran scatter plot and it provides a nice interpretation of Moran’s I. Here Wx is
standardised weighted average of neighbouring values of x.
Geospatial data mining techniques to investigate gender equality 171
The Moran Scatter Plot is basically a two dimensional grid which displays points
representing all the geospatial entities. It calculates the Moran’s I index from all the
points and displays a straight line whose gradient is the Moran’s I. The values on the X
axis are standardised so that their mean is zero and their variance is one. Along the Y axis
are the spatial lags of the chosen variable. This is calculated using the given contiguity
matrix. Spatial lag shows how much the neighbours of a point of interest are affecting
that point.
In Anselin (1993), this was developed and shown to be a good ESDA tool for
studying how the local spatial behaviour of the variable builds up the global Moran’s I
statistic. The scatter plot shows how spatially dispersed the data is. It also gives a hint as
to where the potential spatial clusters and outliers lie. The scatter plot is divided into four
quadrants namely high-high, low-low, high-low, and low-high. here points that lie in the
low-low and high-high regions indicate that positive correlation exits between the data in
those regions. Similarly, data in the high-low and low-high regions indicate that negative
correlation exits in the points of those regions.
∑W x
xi
Ii = ij j
m2 j
where m2 = ∑x
j
2
j / N , the relation between global Moran I and local Moran Ii is defined
by I = ∑I
i
i / N and N is the number of data points. Large positive Ii indicate clustering
of data values around ith location and it deviates strongly from the average.
X− X+
Y− a b a+b
+
Y c d c+d
a+c b+d n = a+b+c+d
We can understand the OR by first noticing what the odds are in each row of the table.
The odds for row Y– are a/b. The odds for row Y+ are c/d. The OR is simply the ratio of
the two odds.
a/b
OR =
c/d
which can be simplified to
ad
OR =
bc
Geospatial data mining techniques to investigate gender equality 173
r=
∑ xy − ( ∑ x )( ∑ y )
n
n (∑ x ) − (∑ x) n (∑ y ) − (∑ y )
2 2
2 2
5 Result analysis
Figure 1 A percentile choropleth map showing six types percentile range of woman literacy rate
of 64 zilla’s of Bangladesh (see online version for colours)
In the percentile map we could observe that in remote hill track area, like in Bandarban,
the women literacy rate is low, and Dhaka, Gajipur, Jhalokathi, Barisal, Pirozpur,
Bagerhut districts show high women literacy rate.
From Table 1 we see the Moran’s I value of primary, secondary and adult women
tertiary literacy rate of 2006, and 2009. Universal access to basic education and the
achievement of primary education by the world’s children is one of the most important
goals of the MDGs. Primary school level group includes persons who have completed up
to five years of schooling. Persons attending 1st and 6th year of schooling have also been
included in this group. Secondary school level group includes person who have
completed six to nine years of schooling. Persons attending 10th year of schooling have
also been included in this group. Table 1 illustrates the Moran I values for primary,
secondary and 15 to 24 years aged woman literacy rate. The literacy rate of aged 15 to 24
is the percentage of persons aged 15 to 24 who show their ability to both read and write
by understanding a short simple statement on their everyday life. The indicator has a
special significance in reflecting the recent outcomes of the basic education process.
Table 1 Calculated Moran’s I values for woman literacy rate
From Table 1 we see that all the Moran’s I value of primary, secondary and 15 to 24 ages
woman literacy rate at 2009 are higher than those values at 2006.
Geospatial data mining techniques to investigate gender equality 175
In Figure 2, we look at standardised primary girls literacy rates along X and standardised
average neighbour’s literacy rates along Y. The regression line with Moran’s I as slope is
reasonably accurate. So using spatial autocorrelation we have found that the woman
literacy rates in both times had a suitable positive global spatial autocorrelation. This is
also true for Figures 3 and 4.
Figure 2 Moran scatter plot of the primary girls literacy rate of 64 zilla in Bangladesh at 2009
and 2006 according to (a) and (b) (see online version for colours)
(a) (b)
Figure 3 Moran scatter plot of the secondary girls literacy rate of 64 zilla in Bangladesh at 2009
and 2006 according to (a) and (b) (see online version for colours)
(a) (b)
176 R.M. Rahman and N. Sultana
Figure 4 Moran scatter plot of the 15 to 24 years aged women literacy rate of 64 zilla in
Bangladesh at 2006 and 2009 according to (a) and (b) (see online version for colours)
(a) (b)
Moran I value
Variable
2001 1991
School attendance girls (5 to 24 years age) vs. literacy rate 0.3721 0.4470
School attendance boys (5 to 24 years age) vs. literacy rate 0.3697 0.4556
7+ and above aged girls vs. literacy rate 0.4349 0.4536
7+ and above aged boys vs. literacy rate 0.4028 0.4552
Geospatial data mining techniques to investigate gender equality 177
In Table 2 we see that in 2001 the bivariate Moran I for School attendance girl’s vs.
literacy rate 2001 is more than school attendance boy’s vs. literacy rate 2001. Also, the
Moran I value of 7+ and above aged girl’s verses literacy rate is more than the 7+ and
above aged boy’s vs. literacy rate.
For example, multivariate Moran scatter plot relates the values for literacy of 2001 at
each location (LR2001, horizontal axis) to the average literacy rate of school attendance
girls, 2001 for the neighbouring locations (W_WLR2001, vertical axis). The observed
Moran’s I value of 0.3721 is highly significant and not compatible with a notion of
spatial randomness. So zillas that were surrounded by high school attendance girls in
2001 showed a good impact on the overall literacy rate in 2001.
Figure 5 Univariate LISA cluster map of zilla level woman literacy rate of 2001 (see online
version for colours)
178 R.M. Rahman and N. Sultana
Figure 5 shows that there are two types significant cluster, one is low-low and other is
high-high. The low cluster area of women literacy rate 2001 is Jamalpur, Kurigram,
Rangpur, Lalmonirhut, Mymensingh, Netrokona, Chitagong, Bandarban, Rangamati,
Cox’s Bazar. This might be due to the fact that a large portion of tribal people are living
in those zillas, and tribal people are reluctant to avail the education facilities For the
women literacy rate 2001 the high cluster area is Narayangonj, Comilla. Gopalgonj,
Norail, Bagerhut, Pirojpur, Barisal, Jhalokathi, Barguna, PotuaKhali. Due to scarcity of
farmlands in those areas, people need to survive by service and industry oriented jobs.
The requirements of those jobs are generally at least ten years of education. This forces
the people of those areas to be literate.
Figure 6 presents the univariate LISA Moran scatter plot of zilla level woman literacy
rate of 2001. Where, the Moran I value is 0.6055. In this figure, there are four quadrants
which are low-low, high-low, high-high and low-high. Most of the districts situated only
two quadrant high-high and low-low in Figure 6.
Figure 6 Univariate LISA Moran scatter plot of zilla level woman literacy rate of 2001
(see online version for colours)
Female Male
Literate 66.2 65.6
Non-literate 33.8 34.4
Geospatial data mining techniques to investigate gender equality 179
Table 4 The employment data in the agricultural and non-agricultural sector from
1999 to 2006
The odd ratio female to male literacy ratio (FTMR) is 1.02 that represent female literacy
rate is more than male literacy rate.
The latest Labour Force Survey 2008 shows that total labour force participation rate
for females is around 29.2%. The male-female ratio of non-agricultural employment has
been 77:23 in 1995–1996 which went up to 80:20 in 2005–2006 indicating relative
decline of females’ share in the non-agricultural employment. Creation of opportunities
for the women labour force remains as the major bottlenecks for wage employment in the
non-agricultural sector for women with an exception in the garment sector. Table 4 shows
the employment in the agricultural and non-agricultural sector from 1999 to 2006.
Our study also illustrates the man verses woman employment ratio (MWER). We
calculate the odd ratio to find out this employment ratio. In this category we find out in
180 R.M. Rahman and N. Sultana
2002–2003 the odd ratio is 3.13 and in 2005–2006 the odd ratio is 6.28 that present in
Table 5.
Table 5 The man verses woman employment ratio (MWER)
2002–2003 2005–2006
MWER 3.13 6.28
The odd ratio more than one that illustrates that man employee persons are more than
woman employee person and it is not decreasing as year passes by that hinders the MDG
of Bangladesh.
From Table 7, we see that the adult women education rate is significant because it is
prob < 0.04. Other variables and constants are not significant as their prob > 0.04.
The R-squared for linear regression is 0.770252 that is close to 1 that demonstrates the
goodness of fit between the linear model and the data we have. As the adult women
education is significant, we analyse the adult women education with women literacy rate
and the regression equation is
ALFE 01 = CONST + a1 FADU 01
The regression analysis is in Table 8.
Before including the spatial characteristics into consideration we first test whether any
spatial autocorrelation exists in data. A total of five test statistics are reported in Table 9
to test for the spatial dependence.
The first statistic is Moran’s I. In here the probability of Moran’s I is 0.178132. This
Moran’s I is significant. The significance is tested by the value of the probability
(PROB). If the value is <0.04 then we consider it to be significant (Moran, 1950). As the
Moran’s statistic is significant it suggests spatial dependence. However, this test that data
illustrate in Table 9 could not say whether spatial lag model or spatial error model would
fit the data best. Four Lagrange multiplier test statistics are used for this purpose. The
following workflow (OpenGeoda, 2011) is used to take decision between two
alternatives, i.e., spatial lag model or spatial error model.
182 R.M. Rahman and N. Sultana
With the respect to above data flow we can see from Table 9 that the probability value of
LM Lag is <0.04, so it is significant among others. As the probability is of LM lag model
is 0.0003288, we fit the data with this model. The spatial lag model is presented below:
ALFE 01 = CONST + a1 FEADU 01 + β W _ ALFE 01
Here, W_ALFE01 is a spatially lagged dependent variable for the weight matrix W,
ALFE01 is the woman literacy rate in 2001, FEADU01 is the adult literacy rate of 2001
CONST is a constant term, a1 and β are parameters or the coefficients. Running the
spatial lag model we find the following value of coefficients and corresponding
significances.
Geospatial data mining techniques to investigate gender equality 183
From this regression analysis we get the R-squared is 0.816588 that is close to 1. So it
is closer to significant. In Table 10, where W_ALFE01 and FEADU01 probability is
<0.04 these are significant.
From Table 10, we see all the coefficients are significant including the autoregressive
coefficient with the value, β = 0.3650573.
Table10 Summary result of linear regression adult literacy rate verses women literacy rate
Table 11 presents a relative performance measure between classical and spatial lag
model. R-squared value is increased from classical model representing a better fit of the
linear spatial model than the classical model. Besides this there is an increase of log
likelihood in the spatial model from –175.44 to –168.583. Compensating the improved fit
for the added variable (the spatially lagged dependent variable), the AIC and SC
decreases relative to OLS.
Table 12 presents the prediction of the true woman literacy rate of 2001 (ALFE01),
predicted literacy rate ( n
ALFE 01), prediction error and residuals for the first ten
observations.
The residuals are the estimates of the model error term, i.e., (1 – βW)
ALFE01 – (CONST + a1 FEADU01) where the prediction error is, ALFE 01 − n ALFE 01.
We also calculate the Moran’s I test statistic for the residuals.
Figure 8 presents the scatter plot Moran’s test statistic is –0.0570 that present
the disperse area of women literacy, though it is close to 0. This indicates that
including the spatially lagged term into model eliminates all the spatial autocorrelation as
it should.
184 R.M. Rahman and N. Sultana
Figure 8 Moran scatter plot for spatial lag residuals (see online version for colours)
Geospatial data mining techniques to investigate gender equality 185
Finally, if we plot prediction errors against the predicted values we got a line almost
parallel to x axis with a slope 0.0135 (Figure 9). The slope is close to zero indicating the
goodness of fit of the data to the model. As a result prediction error is almost zero
everywhere.
Figure 9 Scatter plot of prediction errors against predicted values (see online version
for colours)
6 Conclusions
The analysis presented in this paper clearly shows that there is some spatial consistency
in the distribution of women literacy rate. It also reports the impact of women literacy
rate on GDP growth in Bangladesh. Besides, we present a comparative study on woman
and man literacy rate. In this study, we can observe that though the woman literacy rate is
increased, the employment rate is not satisfactory. On the other hand women literacy on
the agricultural and service growth rate has an impact on GDP growth rate in this
country. Taking policy level decisions with these spatial properties in mind can lead to
uniform positive development throughout the country.
References
Ahmad, A. (2001) ‘Inequality in the access to education and poverty in Bangladesh’, Poverty
Conference, Stockholm organized by Sida, 17–18 October.
Ahmad, A., Hossain, M. and Bose, M.L. (2004) ‘Inequality in the access to secondary education
and rural poverty in Bangladesh: an analysis of household and school level data’, Workshop
on Equity and Development in South Asia organized by the World Bank in New Delhi, India,
7–8 December.
Anselin, L. (1993) ‘The Moran scatter plot as an ESDA tool to assess local instability in spatial
association’, in GIS DATA Specialist Meeting on GIS and Spatial Analysis, paper 9330.
Anselin, L. and McCann, M. (2009) ‘OenGeoDa, open source software for the exploration and
visualization of geospatial data’, in Proc. ACMGIS ‘09, pp.550–551.
186 R.M. Rahman and N. Sultana
Anselin, L., Syabri, I. and Smirnov, O. (2002) ‘Visualizing multivariate spatial correlation with
dynamically linked window’, in Anselin, L. and Rey, S.(Eds.): New Tools for Spatial Data
Analysis: Proceedings of the Specialist Meeting, Center for Spatially Integrated Social Science
(CSISS) University of California, Santa Barbara.
Chaudhry, S. and Rahman, S. (2009) ‘The impact of gender inequality in education on rural
poverty in Pakistan: an empirical analysis’, European Journal of Economics, Finance and
Administrative Sciences, ISSN 1450-2275, No. 15, pp.174–188.
Choropleth (2010) Choropleth Mapping with Exploratory Data Analysis [online]
http://www.locationintelligence.net/articles/718.html (accessed 20 October 2010).
de Dominicis, L., Arbia, G. and de Groot, H.L.F. (2007) The Spatial Distribution of Economic
Activities in Italy, Tinbergen Institute Discussion Papers, 07-094/3, December.
GSB (2008) Gender Statistics of Bangladesh-2008, Bangladesh Bureau of Statistics.
Moran, P.A.P. (1950) ‘Notes on continuous stochastic phenomena’, Biometrika, Vol. 37, No. 1,
pp.17–33.
Oliveau, S. (2005) ‘Spatial correlation and demography: exploring India’s demographic patterns’,
25th IUSSP Conference, Tours, France.
OpenGeoda (2011) OpenGeoDa [online] http://geodacenter.asu.edu/software/downloads
(accessed 3 January 2011).
Paraguas, F.J., Kamil, A.A., Pheng, K.S., Dey, M.M. and Bose, M.L. (2005) ‘Exploration and
visualization of poverty-environment relationship using exploratory spatial data analysis
techniques’, in Proc. IRCMSA 2005, p.357.
Scrucca, L. (2005) Clustering Multivariate Spatial Data Based on Local Measures of
Spatial Autocorrelation, Technical report, Università di Perugia, Dipartimento Economia,
Finanza e Statistica.
Tobler, W, (1970) ‘A computer movie simulating urban growth in the Detroit region’, Economic
Geography, Vol. 46, No. 2, pp.234–240.
Türkcan, B., Çalışkan, E.T. and Kaya, A.A. (2009). ‘Industrial clusters as a regional development
tool: a spatial analysis on Turkey’, in Proc. Econ Anadolu.
Uthman, O.A. (2008) ‘Environmental factors, neighbourhood deprivation, and under-five mortality
in Nigeria: an exploratory spatial data analysis’, The Internet Journal of Paediatrics and
Neonatology, Vol. 9, No. 1.