Data Report by Shahan Majid, 21890

Data Report of Pakistan Social Indicator by PBS

Computation & Research Methods (6567)
Who was responsible for collecting the information? Who collects this data set and
why they bother to provide this data set.
Pakistan Bureau of Statistics (PBS) has released their 8th report of ‘Social Indicators of Pakistan’
in 2021. Pakistan Bureau of Statistics is the official institution formed by the government of
Pakistan which is responsible for conducting population censuses, data collection and publishing
reports on different sectors of the country. These reports help the government to have knowledge
about its population and its condition.
When was the information collected? How frequently it is being updated (if
Afterall)? What is the geographic and temporal coverage of it. How one would be
able to access it (public, subscription, purchase, upon formal request or various
other forms)?

The Social Indicators of Pakistan report had its first publication in 1985 and other reports are
released in every 5 years, except for the data set released in 2003 after seven years of previous
publication. However, one can find only the reports of 2021 and 2016 on their website; the other
versions have no digital footprint. It is unknown how the research could approach the previous
version of this report or if those versions do exist.

What kind of data it is? Quantitative, qualitative, or Mixed. Which formats the data
is in (normal formats include xls, sas, csv, txt dta etc. but data may be in other
forms, such as maps, photos, speeches, audios, and videos GIS layers etc

The data set is provided in a pdf document and is represented in the form of tables and figures,
which include graphs and charts. The source is stated after each table and a graph; however, not
in the form of a citation, which makes it vague sometimes for the researcher if the researcher
wants to dig deep and approach the source. The format in which the data is presented is not so
comfortable for the researcher; the 2021 version has 210 pages and although they have provided
a table of content and a table of figures and tables, it is hard to navigate because the table of
contents does not have direct links to those pages and vice versa.

The report has eleven chapters, therefore main 11 social concepts. The chapters are 1-population,
2-income and expenditure, 3-labour force and employment, 4-education, 5-health, 6-Water
supply and sanitation, 7-Leisure and Periodicals, 8-Public safety, 9-transport and
communication, 10-Core ICT indicator, 11-Tourism.

What methodology was employed in obtaining the data? Is the documentation such
as questionnaire, sampling details or other information available for public and
researchers to get the idea of the authenticity of data?

This secondary data report is compiled by using different surveys that the Pakistan Bureau of
Statistics has conducted themselves, like the Census, Labor Force Survey, etc. The details about
these surveys are given on their websites; the questionnaires, their sample size, the way they
choose the sample, and the data set in detail are available for everyone. Some of the sources are
from other institutions in the country, e. g., the Police Department, schools and universities, and
the Medical Councils and Commissions. PBS did not state the procedure for collection and
compilation of the data from other sources. Thus, inquiry for the authenticity of some of that data
is not possible until one goes asking each college for their enrollment for instance; or they could
contact somehow the mentioned sources. Although some of the institutions do have a digital
presence of those data for public access, the report does not mention or cite that. Few sources are
global organization and agencies such as Asian Development bank, World Bank and again these
organizations were not cited formally. 

What is the purpose of the data set? How can the data be useful?

The report is mainly for policymakers so that they have genuine information about the different
social sectors of different areas before they can make policies. The government plans different
ventures to develop the social sector each year, and the budget is passed in view of these
statistics. It is imperative to have information about the social and economic conditions of the
population to gauge the needs of people in making important decisions. It is also important for
the people of different other institutions and for the public to access information so that they can
scrutinise the government fairly and, therefore, justice is maintained for everyone. The regular
update of the data is also very essential as it gives us insights about the effectiveness of the
policies and the budget they spent. This report is also intended for the researchers to have
knowledge about the population of the region. This is detailed secondary data, which is useful
for quantitative and qualitative research. One can reach new conclusions about the social affairs
of Pakistan by interpreting and analyzing this data, and one can also use it to validate their

Besides research, this data holds great significance because it represents the country on a global
level and has effects on global politics too. Because many International organizations calculate
other indexes using these data, e. g., the Human Development Index, the World Happiness Index,
etc. These indexes rank each country and, from that data, countries and global organizations help
the countries which usually rank below. This data is also imperative for foreign and local
investors and businesspeople to make financial decisions in the region and keep the money

How many indicators/themes/concepts are covered in the data [just to give you a
hind, see page No. 12 of this report as an example of how you can summarize the
questionnaire that you are observing]? Are you satisfied with its
comprehensiveness? What are some of the good points or matters of concern about
this data set?

The report of Social Indicators of Pakistan includes statistics about the social sector of Pakistan.
The variables are diverse of the sort and cover most of the areas required to judge the societal
conditions. This report has eleven chapters, and each chapter covers a concept like population,
tourism, income, etc. Each chapter consists of different indicators that give us insights about the
concepts through numerical values. The indicators are comprehensive enough to give us
information about the concepts, but we lack data about the smaller scale of regions in Pakistan.

Here are few examples of the indicators and the concepts that they fall under:

Indicators for the concept population: population density, sex ratio, ratio of childbearing women,
and percentage distribution of population by age, etc.

Indicators for the concept Income: Unemployment rate, Crude activity rate, refine activity rate,
Total dependency ratio, etc.
Indicators for the concept education: Number of teachers, Literacy rate, Number of students
enrolled in colleges, Number of students enrolled in universities, expenditure on education, etc.

There are hundreds of such indicators in this report about the eleven mentioned concepts.

General review of the variables contained in the data set. Any idea on
their definition, measurement errors, or other concerns that you can
identify. Can you propose a few hypotheses that can be tested based on
the variable contained in this data set?

There are multiple ratio variables in this data set, and these have the tendency to be used in
multiple statistical process to test hypothesis. These are few of the variables and the way they are

1. Population Density= Population Density = Number of People/Area

This variable tells us how many people are expected in per unit area of a particular
2. Sex Ratio: Male population x 100/Female population
3. Total Dependency ratio: This ratio tells us how many people in the population are
expected to be dependent on others. A higher total dependency ratio means higher burden
on the population of the working age. People of age under 15 and over 65 are generally
considered as being dependents. However, this being an international convention I
wonder how these slandered could apply to Pakistan where child labor is common, and
people of old age also work.
4. Rural-Urban Ratio (Income): This ratio tells us the income of the people of rural
population as compared to the urban population. This is an important variable it can give
us insight that where the high job opportunity lies.
5. Unemployment Rate: This is the ratio of Unemployed people as compared to the total
work force.
6. Literacy Rate: Tells us how many people are literate as compared to the entire
7. Migration rate: This is the ratio of the net number of people migrated to the county
compared to the entire population of the country. A negative migration rate suggest that
more people are leaving the country than the people coming to the country. It can give us
multiple insights about the social conditions of the region as people prefer the region with
good standards of social conditions.

This report does not give explanations to these variables that what do they mean but just gives us
the formula of calculation. However, the formula for the migration rate and urbanization rate and
other such variables, taken from the foreign sources, are not mentioned.

Applying different statistical methods to these variables, many hypotheses can be tested. For
example, the unemployment ratio in Balochistan is very high because they have low literacy
rates; the growth rate of Balochistan is the lowest because of low population density and low
ratio of childbearing women; the increase in broadband and connectivity is one of the reasons
that the literacy rate is on the increase; we can correlate the number of cases registered to the
unemployment rate and do its regression analysis; and many more such hypotheses can be tested
because we have vast and diverse data.

Select between 12-16 variables and try to critically evaluate them using
level of measurement concept, i.e., nominal, ordinal, interval, and ratio –
at least three variables for each level of measurement. The kind of
statistical analysis these are suited to. Make it sure that you select the
variable from each level of measurement.

Nominal Scale of Measurements:

The category of urban and rural are the nominal scale of measurement for different variables
concerning population like population density. The category of the names of different
Languages in determination of the number of newspapers by language another example of
nominal scale of measurement. The category of cases reported by type is also a nominal scale
of measurement. This report is replete with nominal scale measurements corresponding with
ratio variables. The nominal variables mostly are of types, names of areas, etc. These categories
are nominal because they are known to us by their names, they don’t have any order or
hierarchy, and they do not have any true zero value. The nominal variables are important
although not statistically, but they give us more information about the other variables as they are
often corresponded with other ratio variables like population density, incomes, employment rate
etc. We can compare different types of things, like the population density of Balochistan and
Punjab, and this can give us significant insights; here the nominal values are for Balochistan and

Ordinal Scale of Measurements

This document does not have varieties of ordinal variables; however, year is the most common
ordinal measurement of quantities used for the representation of most of the data. Ordinal
variables are the same as nominal ones as they do not have numeric values, but except the
ordinal variables have a specific order, like the years 2017, 2018, and 2019. Ordinal scale
measurement is important because we compare different other variables across the order of the
ordinal variable. An important insight could be the increase of broadband and bandwidth in this
decade. The report represents both the ordinal and the nominal in the form of tabular and
graphical forms.

Interval Scale of Measurements:

The interval scale, as the name itself suggests, quantifies variables but has no true zero value. In
this report, no such interval variable is mentioned.

Ratio Scale of Measurements:

This report has multiple ratio variables, these variables are important because we can use wide
range of statistics with them and reach to new conclusions. Some of the ratio variable used are
Population Density, unemployment rate, crime rate, and migration rate. We can run
regression analysis for these variables, and we can test multiple hypothesis to reach to new
conclusions. These are ratio variables because these can be ordered, they have a statistical value,
and they can have true zero value.

Social Indicators of Pakistan is a very comprehensive report which gives us details of social
factors using statistical indicators. The social concepts are population, income and expenditure,
labour force and employment, education, health, water supply and sanitation, leisure and
periodicals, public safety, transport and communication, Core ICT indicator, and tourism. Within
these concepts, there are multiple indicators that give us information about these concepts, and
therefore, these cover almost every aspect of social life in Pakistan.


Although the report is comprehensive enough, it has some limitations. It gives us the data at a
macro level; however, the researchers and policy planners may often need reports at a district
level. So, we can see that the government still depends upon international sources while making
policies and decisions, which many times might not be good representatives. We can also say
that we are still short of many other useful variables as Pakistan is short of digital connectivity
and, therefore, slow progress is being made in the field of data science. It could also be a hassle
if a researcher wants to check the validity of this data since the report does not really cite the
sources from which they have taken the data. However, if we consider this data authentic, it can
be a great piece of information for social scientists to study and test different hypotheses and
perhaps some theories. Social scientists can also devise new theories and knowledge using this
data set.

Link to the reports:

Social Indicators of Pakistan (

