Education

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 31

B.

S Education
0334-5515779,0344,5515779,0345-7308411
Course: Educational Statistics (8674)
Level: B.Ed (1.5 Years) Semester: Autumn, 2020

ASSIGNMENT No. 1
(Units: 1-5)

Q.1 Describe level of measurement. Give five examples of each level and explain the role of level
of measurement in decision-making. ANS: Levels of Measurement The level of measurement refers to the
relationship among the values that are assigned to the attributes for a variable. What does that mean? Begin
with the idea of the variable, in this example "party affiliation." That variable has a number of attributes. Let's
assume that in this particular election context the only relevant attributes are "Tepublican", "democrat", and
"independent". For purposes of analyzing the results of this variable, we arbitrarily assign the values 1. 2 and 3 to
the three attributes. The level of measurement describes the relationship among these three values. In this case,
we simply are using the numbers as shorter placeholders for the lengthier text terms. We don't assume
that higher values mean "more" of something and lower numbers signify "less". We don't assume the the
value of 2 means that democrats are twice something that republicans are. We don't assume that
republicans are in first place or have the highest priority just because they have the value of 1. In this case, we
only use the values as a shorter name for the attribute. Here, we would describe the level of measurement as
"nominal". Why is Level of Measurement Important? First, knowing level of measurement helps you
decide how to interpret the data from that variable. When you know that a measure is nominal (like the
one just described), then you know that the numerical values are just short codes for the longer names,
Second, knowing the level of measurementbelps you decide what statistical analysis is appropriate on the
values that were assigned. If a measure to nominal, then you know that you would never average the
data values or do a t-test on the dato There are typically four levels of measurement that are
defined: Nominal Ordinal Interval Ratio In nominal measurement the punorical values justame the tribute
uniquely. No ordering of the cases is implied. For examples Jersey numbetsvid basketballare
measures at the nominal level. A player with number 30 is not more of anything than a player with
number 15. and is certainly not twice whatever number 15 is. In ordinal measurement the attributes can be
rank-ordered. Here, distances between attributes do not have any meaning. For example, on a survey you
might code Educational Attainment as C-less than high school; 1=some high school.: 2=high school
degree: 3=some college:
34244,

BS Education
AIOU Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬
9 ‫کے لیے‬
‫عالمہ اقبال اوپن یونیورسٹی کی معلومات‬
B.S Education
0334-5515779,0344,5515779,0345-7308411 4=college degree: 5-post
college. In this measure, higher numbers mean more education. But is distance from 0 to I same as 3 to
4? Of course not. The interval between values is not interpretable in an ordinal measure. In interval
measurement the distance between attributes does have meaning. For example, when we measure
temperature (in Fahrenheit), the distance from 30-40 is same as distance from 70 80. The interval between
values is interpretable. Because of this, it makes sense to compute an average of an interval variable,
where it doesn't make sense to do so for ordinal scales. But note that in interval measurement ratios don't
make any sense - 80 degrees is not twice as hot as 40 degrees (although the attribute value is twice as large).
Finally, in ratio measurement there is always an absolute zero that is meaningful. This means that you can
construct a meaningful fraction (or ratio) with a ratio variable. Weight is a ratio variable. In applied social
research most "count" variables are ratio, for example, the number of clients in past six months. Why?
Because you can have zero clients and because it is meaningful to say that "...we had twice as many
clients in the past six months as we did in the previous six months." It's important to recognize that there is
a hierarchy implied in the level of measurement idea. At lower levels of measurement, assumptions tend
to be less restrictive and data analyses tend to be less sensitive. At each level up the hierarchy, the
current level includes all of the qualities of the one below it and adds something new. In general, it is
desirable to have a higher level of measurement (e.g.. interval or ratio) rather than a lower one (nominal or
ordinal). Data will always take a back seat in the drive towards better decision-making. As companies spend
increasing amounts of time and effort in capturing organizational and market data, the return on these
investments will depend on our ability to transform the data into unpactful decisions. Data elone doesn't
produce pertinent decisions for decision-making a constantly handicapped by critertainty, ambiguity, and
complexity. Measuring the quality of our decision making may well prove more important than improving
the quality of our data Let's look both at why measuring devision-making is so difficult, and why it is so
potentially rewarding. One of the obstacles in proving decision-making today comes from da diversity of
managerial challenges, Onantobjective" level we refer to problem is simple when the data at hand is
sufficient to identify the best wa forward, and complex when the data provides nothing more the best
answer in a given context. On a "sabjective" level, a manager gauges risk when he or she understands
the probabilities and outcomes of the choices before them. A manager confronts Encantány when for
one reason or another the probabilities and outcomes
cannot be precisely determined. The factor of "ambiguity weighs into the equation when the decision-
maker questions the clarity of the problem itself. As a result, the goal of management is rarely about
finding the right awat, and larget about helping mangers and customers take better decisions through
reducing the sources of Nisk. uncertainty, and ambiguity. Measuring decision-making is complicated by
the fact that the "best" choice depends as much on the manager's state of mind as on the nature of
the problem itself. In Decision Science, we refer to four mindsets that condition human decision-making.
[i] The optimist. like the lotto player, is always betting on the largest payoff possible regardless of the
probabilities. Inversely, visions of the worst haunt the pessimist who will religiously try to minimize
potential losses.

AIOU Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬


9 ‫ کی معلومات کے لیے‬T‫عالمہ اقبال اوپن یونیورسٹی‬
B.S Education
0334-5515779,0344,5515779,0345-7308411 The opportunist will employ
reasoning similar to Jeff Bezos' regret minimization framework by focusing on outcomes that provide the
greatest peace of mind. [1] Finally, the realist calculates the potential return of each available outcome
and tries to maximize expected value. Because each vision relies on a different line of reasoning, it is
necessary to understand how each manager's mindset corresponds to the prevailing organizational culture.
The metrics used to measure decision-making process provide a third level of complexity. What
exactly are we trying to optimize (or minimize) when looking for the better decision? We can focus on
yield - what are the potential benefits of each of the alternatives? We might focus on the effort -
how many resources, and much energy. must be mobilized to put the choice into practice? Perhaps we
should focus on velocity, given our other responsibilities, how quickly can We find a suitable decision?
Finally, we could evaluate the pertinence of the model itself, how squarely does the proposed solution address
the problem at hand? The proper metrics depend upon how the organization, and the industry, measure
value.[iii] Finally, we can consider the operational challenges managers face when tring to take the best
decision in a given context. Let's take the self-serving example of improving the organization's decision-making
processes. The first complication comes from the problem itself—is there an organizational consensus of
what we mean by decision making? There is the issue of the pertinence of the data at hand what proof
do we have today that we are taking poor decisions? There can be discussion on the objective we
are trying to reach (the pertinence, the velocity, the effort, or the yield of each decision?). Finally, how can
study what might be the best means of testing our proposals (sampling. simulation, a survey. face to face
interviews)? Like any skill, improving decision-making requires both analysis and practice. In the Business
Analytics Institute you will study context specific challenges. mindsets and metries of managerial decision
making. In our Summer School in Bayonne, as well as in our Master
Classes in Europe we put data science to work for you and for your organization. The
Institute focuses on five applications of of data science for managers: working in the digital age, data
driven decision making machine learning, community management, and visual communications.
Datoriven decision making can make difference in your future work and career. Q.2 Differentiate
between primary and secondary data. Give meaningful examples with explanation. ANS: In a time
when data is becoming easily accessible to researchers all over the world, the practicality of utilizing
secondary data for research is becoming more prevalent, same as its questionable authenticity when
compared with primary data. These 2 types of data, when considered for research is a double-edged
sword because it can equally make a research project will it ca marito In a nutshell, primary data and
sceondary data both have their advantages and disadvantages. Therefore, when carrying out research. it is
left for the researcher to weigh these factors and choose the better one. It is therefore important for one to
study the similarities and differences between these data types so as to make proper decisions when
choosing a better data type for research work. What is Primary Data?

AIOU Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬


‫عالمہ اقبال اوپن یونیورسٹی کی معلومات کے لیے‬
9
B.S Education 0334-5515779,0344,5515779,0345-7308411
Primary data is the kind of data that is collected directly from the data source without going through any
existing sources. It is mostly collected specially for a research project and may be shared publicly to be
used for other research Primary data is often reliable, authentic, and objective in as much as it was
collected with the purpose of addressing a particular research problem. It is noteworthy that primary data is not
commonly collected because of the high cost of implementation.

PRIMARY DATA
>

A common example of primary data is the data collected by organizations during market
research, product research, and competitive analysis. This data is collected directly from its original
source which in most cases are the existing and potential customers. Most of the people who collect
primary data are government authorized agencies, investigators. research-based private
institutions, etc.

AIOU Studio ‫وب چینل کو سبسکرائب کریں۔‬T‫یوٹی‬


9 ‫عالمہ اقبال اوپن یونیورسٹی کی معلومات کے لیے‬
B.S Education
0334-5515779,0344,5515779,0345-7308411 Read More: Primary Data:
Definition, Examples & Collection Techniques Pros Primary data is specific to the needs of the
researcher at the moment of data collection. The researcher is able to control the kind of data that is
being collected. It is accurate compared to secondary data. The data is not subjected to personal bias
and as such the authenticity can be trusted. The researcher exhibit ownership of the data collected
through primary research. He or she may choose to make it available publicly. patent it, or even sell it.
Primary data is usually up to date because it collects data in real-time and does not collect data from old
sources. The researcher has full control over the data collected through primary research. He can decide
which design, method, and data analysis techniques to be used. Cons Primary
data is very
expensive compared to secondary data. Therefore, it might be difficult to collect primary
data. It is time-consuming. It may not be feasible to collect primary data in some cases
due to its complexity and required commitment. What is Secondary Data? Secondary data is the
data that has been collected in the past by someone else but made available for others to use. They
are usually once primary data but become secondary when used by a third party. Secondary data are
usually easily accessible to researchers and individuals because they are mostly shared publicly. This, however,
means that the data are usually general And not tailored specifically to nice the researcher's needs as
prinary data does.

BS Education
0345-7308411
AIOU Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬
9 ‫رسٹی کی معلومات کے لیے‬T‫عالمہ اقبال اوپن یونیو‬
B.S Education 0334-5515779,0344,5515779,0345-
7308411

SECONDARY DATA

For example, when conducting a research thesis, researchers need to consult past works done in this field and
add findin sathe literature review. Some other things like definitions and theorems are secondary data that are
added to the thesis to be properly referenced and cited accordingly. Some common sources of secondary dat
includerada obligations, government statistics, journals, etc. In most cases, these sources cannot be trusted as
authentic. Read More: What is Secondary Data? + [Examples. Sources, & Analysis Pros Secondary data
is easily accessible compared to primary data. Secondary data is available on different platforms that
can be accessed by the researcher.

AIOU Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬


9 ‫عالمہ اقبال اوپن یونیورسٹی کی معلومات کے لیے‬
B.S Education 0334-5515779,0344,5515779,0345-
7308411 Secondary data is very affordable. It requires little to no cost to acquire them because they are
sometimes given out for free. The
time spent on collecting secondary data is usually very little
compared to that of primary data. Secondary data makes it possible to carry out
longitudinal studies without having to wait for a long time to draw conclusions. It helps to
generate new insights into existing primary data. Cons Secondary data may not be authentic
and reliable. A researcher may need to further verify the data collected from the available sources.
Researchers may have to deal with irrelevant data before finally finding the required data. Some of the data is
exaggerated due to the personal bias of the data source. Secondary data sources are sometimes outdated
with no new data to replace the old ones. Here are 15 differences between primary and secondary data
Definition Primary data is the type of data that is collected by researchers directly from main sources
while secondary data is the data that has already been collected through primary sources and made readily
available for researchers to use for their own research. The main difference between these 2 definitions
is the fact that primary data is collected from the main source of data, while secondary data is not The
secondary data made available to researchers from existing sources are formerly primary data which
was collected for research in the past. The availability of secondary data is highly dependent on
the primary researcher's decision to share their data publicly or not Examples: An example of primary data
is the national census data collected by the government while an example of secondary data is the data collected
from online sources. The secondary data collected from an online source could be the primary data collected by
arther researcher. For example, the government, after successfully the national census, they share the results in
newspapers, online magazines Press releases, etc. Another government agency that is trying to allocate the state
budget for Hair Care, education, etc. max med access the census results. With access to this information, the
number of children wbd teeds education can be analyzed and hard to determine the amount that should be
allocated to the education sector. Similarly. knowing the number old people will help in allocating funds
for them in the health sector. Data Types The type of data provided by primary data is real-time, while
the data provided by secondary data is stale. Researchers are able to evaccess the s cene data when
conducting primary research, which may not be the case for Yecondary data! Secondary data have to depend
on primary data that has been collected in the past to perform research. In some cases, the researcher
may be lucky that the data is collected close to the time that he or she is conducting research. Therefore,
reducing the amount of difference between the secondary data being used and the recent data.
ate budget for how the member of children and to
the educations
AIOU Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬
9 ‫ کی معلومات کے لیے‬T‫عالمہ اقبال اوپن یونیورسٹی‬
B.S Education 0334-5515779,0344,5515779,0345-
7308411
Process Researchers are usually very involved in the primary data collection process, while secondary
data is quick and easy to collect. This is due to the fact that primary research is mostly longitudinal.
Therefore, researchers have to spend a long time performing research, recording information. and
analyzing the data. This data can be collected and analyzed within a few hours when conducting
secondary research. For example, an organization may spend a long time analyzing the market size for
transport companies looking to talk into the ride-hailing sector. A potential investor will take this data and
use it to inform his decision of investing in the sector or not. Availability Primary data is available in crude
form while secondary data is available in a refined form. That is, secondary data is usually made available to the
public in a simple form for a layman to understand while primary data are usually raw and will have to be
simplified by the researcher. Secondary data are this way because they have previously been broken down by
researchers who collected the primary data afresh. A good example is the Thomson Reuters annual
market reports that are made available to the public. When Thomson Reuters collect this data afresh, they are
usually raw and may be difficult to understand. They simplify the results of this data by visualizing it with graphs,
charts, and explanations in words. Data Collection Tools Primary data can be collected using surveys and
questionnaires while secondary data are collected using the library, bots, etc. The different ones between
these data collection tools are glaring and can jabe interchangeably used. When collecting primary data,
researchers lookout for a tool that can be easily used and can collect reliable dan. One of the best primary data
collection tools that satisf bis condition is Formplus. Formplus is a web-based primary data collection tool that
helps researches collect reliable data while simultaneously increasing the response rate from respondents
Sources Primary data sources include: Surveyst observations, experiments, questionnaires, focus
groups, interviews, etc., while secondary data sources include; books, journals, articles, web
pages, blogs, etc. These source vin explicitly and there is no intersection between the primary and
secondary data sources Primary data sources are sources that require a decp commitment from
researchers and require interaction with the subject of ty. Secondary data anche 4ther hand, do not
require interaction with the subject of study befort it cant be collected In most cases, secondary researchers
do not have any interaction with the subject of research. Specific Primary data is always specific to the
researcher's needs, while secondary data may or may not be specific to the researcher's need. It
depends solely on the kind of data the researcher was able to lay hands on.
80
$. exp@344
AIOU Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬
9 ‫کے لیے‬
‫عالمہ اقبال اوپن یونیورسٹی کی معلومات‬
B.S Education
0334-5515779,0344,5515779,0345-7308411 Secondary researchers may be
lucky to have access to data tailored specifically to meet their needs, which mag is not the case in some
cases. For example, a market researcher researching the purchasing power of people from a particular
community may not have access to the data of the subject community. Alternatively, there may be another
community with a similar standard of living to the subject community whose data is available. The researcher
mag uses to settle for this data and use it to inform his conclusion on the subject community. Advantage
Some common advantages of primary data are its authenticity, specific nature, and up to date
information while secondary data is very cheap and not time-consuming. Primary data is very reliable
because it is usually objective and collected directly from the original source. It also gives up to date
information about a research topic compared to secondary data. Secondary day, on the other hand, is not
expensive making it easy for people to conduct secondary research. It doesn't take so much time and
most of the secondary data sources can be accessed for free. Disadvantage The disadvantage of primary
data is the cost and time spent on data collection while secondary data may be outdated or irrelevant. Primary
data incur so much cost and takes time because of the processes involved in carrying out primary research.
For example, when physically interviewing research subjects, one may need one or more professionals,
including the interviewees, videographers who will make a record of the interview in some cases and the
people involved in preparing for the interview. Apart from the time required, the cost of doing this may be
relatively high. Secondary data may be outdated and irelevant. In fact, researchers have to surf through
irrelevant data before finally having access to the data relevant to the research purpose. Accuracy and
Reliability
Primary data is more accurate and reliable while secondary data is reldiely less reliable and
accurate. This is mainly because the secondary data sources are no regulated and are subject to personal bias. A
good example of this is business owners who lay bloggers to write good reviews about their product just to gain
more customers. This is not the case with primary data which is collected by being a researcher unset
One of the researcher when gathering primary data for research will be gathering accurate data so as to
arrive at correct conclusions. Therefore, biases will be avoided at all costs (e.g. same businesses when
collearing Redback from CISTOMD) Cost-effectiveness Primary data is very expensive while secondary data is
economical. When working on a low budget, it is better for researchers to work with secondary data, then
analyze it to uncover new trends.

Al0U Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬


9 ‫عالمہ اقبال اوپن یونیورسٹی کی معلومات کے لیے‬
B.S Education
0334-5515779,0344,5515779,0345-7308411 In fact, a researcher might
work with both primary data and secondary data for one research. This is usually very advisable in cases
whereby the available secondary data does not fully meet the research needs. Therefore, a little
extension on the available data will be done and cost will also be saved. For example, a researcher
may require a market report from 2010 to 2019 while the available reports stop at 2018. Collection Time
The time required to collect primary data is usually long while that required to collect secondary
data is usually short. The primary data collection process is sometimes longitudinal in nature. Therefore,
researchers may need to observe the research subject for some time while taking down important data.
For example, when observing the behavior of a group of people or particular species, researchers have
to observe them for a while. Secondary data can, however, be collected in a matter of minutes and
analyzed to dead conclusions taking a shorter time when compared to primary data. In some rare cases,
especially when collecting little data, secondary data may take a longer time because of difficulty consulting
different data sources to find the right data. Similarities Between Primary & Secondary Data
Contains Same Content: Secondary data was once primary data when it was newly collected by the first
researcher. The content of the data collected does not change and therefore has the same content with primary
data. It
doesn't matter if it was further visualized in the secondary form, the content does not change. A
common erat ple of these are definitions, theorems, and postulates that were made years ago but still
remain the same Uses Primary data and condary data are both used in research and statisties. They can
be used to carry out the same kind of research in these fields depending on data availability. This is
because secondary date and primary data have the same content. The bly difference is the method by
which they are aditected. Since the method of collection not directly affect the use of data, they can be used to
perform similar research. For exampla, whether collected directly or from an existing database, the demography
of a particular target market can be used to inform similar business decisions. Conclusion :
A n n When performing research it is important to consider the
available data options so as to ensure that the right type of data is used to arrive at a feasibility
conclusion. A good understanding of the different data types, similares differents de toward equired to do this.
Primary data and secondary data both have applications M business and research. They may, however, differ
from each other in the way in which they are collected, used, and analyzed. The most common setback
with primary data is that it is very expensive, which is not the case for secondary data. Secondary data,
on the other hand, has authenticity issues. Q.3 Explain advantages and disadvantages of bar charts
and scatter plot. ANS: Data Presentation: Bar Graphs
busestill remain de same.
AIOU Studio ‫وب چینل کو سبسکرائب کریں۔‬T‫یوٹی‬
9 ‫عالمہ اقبال اوپن یونیورسٹی کی معلومات کے لیے‬
B.S Education
0334-5515779,0344,5515779,0345-7308411 Bar graphs are good
for showing how data change over time. Example:

Advantages show each data category in a frequency distribution display


relative numbers or proportions of multiple categories summarize a large data set in
visual form clarify trends better than do tables estimate key values at a glance permit a
visual check of the accuracy and reasonableness of calculations be easily
understood due to widespread use in business and the media Disadvantages require
additional explanation be easily manipulated to yield false impressions fail to reveal key
assumptions, causes, effects, or pattems

Scatter Plot A Scatter Plot is a straightforward yet powerful tool for visualizing data, which are new in this field of
statistics and data science. Today we are going to leam everything about Scatter Plots. So what is a Scatter Plot?
Well, "A Scatter Plot is a graphical tool for visualizing the relation between two different variables of the same
or different data groups, by plotting the data values along with a two dimensional Cartesian system."
The above definition will become more precise with the Scatter Graphı below Scatter Plots are also
known as Scatter Charts or Scatter Graphs.

87884 K OBAAS
BS Education
0345-7308411

AIOU Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬


9 ‫رسٹی کی معلومات کے لیے‬T‫عالمہ اقبال اوپن یونیو‬
B.S Education 0334-5515779,0344,5515779,0345-
7308411
Ground living area partially explains sale price of apartments

2001
3000
500

The above graph made is with two different variables: diameters (in centimeters) and height in meters)
for a group of trees. While the horizontal X-axis depicts the width the longitudinal Y axis represents the
height with each dot specifying a tree. We can derive variousCorrelations between the vadables using
such plots.

When to use a Scattet Plot?


A Scatter Chart or Plomalyzes the relation between two discrete variable. That is why when We plot
the aggregate date, we find different forms in which the data prents itself. The most widely used
application of a Sotter Plot lies, however, in finding at the correlation that exists or not between the two
variabled For example, say we know the values for one variable. best represented along the horizontal
axis, and we need to figure out the best possible prediction for the vertical axis. A Scatter Graph is very
useful at such an impasse.
A Scatter Chart can be useful in the following scenarios: For paired numerical data: In
cases where the dependentar ableh:3mUTAK @ 8 4.111gle value of the
independent variable: While trying to find out the correlation between two variables, etc.

Advantages
Disadvantages

AIOU Studio ‫ب چینل کو سبسکرائب کریں۔‬T‫یوٹیو‬


9 ‫عالمہ اقبال اوپن یونیورسٹی کی معلومات کے لیے‬
B.S Education 0334-5515779,0344,5515779,0345-7308411
Calculation errors can lead to faulty plotting ✓ They
are straightforward to draw, even
which in turn can lead to the wrong analysis of when the
dependent variable has multiple values:
data; V It is easy to
interpret and understand:
The precise extent of correlation cannot always
get determined from them: V Maxima and
minima are easily isolated, so
Over- plotting is a big issue while working they do
not affect the graph much:
with such graphs as they can significantly lead to the discretization of
the values.

Correlation & Correlation coefficient: The term correlation defined as the nature of the relationship between
two variables in this case, discrete variables) in any statistical study or survey A correlation coefficient is
a statistical measure of the extent or degree of this correlation. Positive, negative, and no correlation are
the three types. Thus one can say that a correlation coefficient will be positive or negative or 0. We will look
into these shortly. Line of Best Fit: The Line of Best Fit is drawn up according to previous data collected
and is used to predict the ideal correlation between two given variables. It acts as a reference while plotting a
Scatter Graph. Types of correlations: Positive Correlation: When the value of the dependent variable
increases with an increase in the cost of the independent variable, we say there is a positive correlation
between the two. A
B. Negative Correlation: When the value of the dependent value decreases with the
increase in the cour of the independent variable device-versa, then we say that the two variables
hace #negative correlation.
C. No Correlation: In case we don't find any apparaat relationship between the
Thiriables under study, we say there is no correlation between them. Scatter Plot Examples Eg. I:
Positive Correlation: Problem: To find the relation between electricity bill and temperature: Solution: The
data is gathered and tabulated, and the values are plotted in a Scatter Chart as follows:
0345-7308411
A
"P" between the
AIOU Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬
9 ‫رسٹی کی معلومات کے لیے‬T‫عالمہ اقبال اوپن یونیو‬
B.S Education 0334-5515779,0344,5515779,0345-
7308411
Electricity Bill vs Temperature
8,000.00
7.000.00

6,000.00

5,000.00
4,000.00 3.000.00
2.000.00

1.000.00

0.00

- Electricity bill jin Rs)


Temperature (in

From the above Scatter Plot, we can see that the electricity bill is less when the
temperature is comparatively lower. However, it rises with a rise in temperature. There are
other factors included as well, which does not make a linear relation. Still, we can infer that there is a
positive correlation between the rise in temperature and electricity bills. Eg. II: Negative
correlation: Problem: To find the relation between age and hours of sleep needed: Solution:
Once again, the data gathered is after survey, and a Scatter Graph created as follows:
Hours

Age

Al0U Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬


‫ اوپن یونیورسٹی کی معلومات کے‬6‫عالمہ اقبال‬
9 ‫لیے‬
B.S Education
0334-5515779,0344,5515779,0345-7308411 We can see from the graph
that as age increases, the amount of sleep decreases. Thus we can say that there is a clear negative
correlation. However, the data here gets restricted until the age of 20, which means the relation might or
might not change for higher values of the age: Eg. III: No Correlation: In such a scenario, there is no relation
between the two variables, and we can see it from a Scatter Chart as there is no direction for the values.
Here we have taken two independent variables like say height and hours of study. They have no
apparent relation to the graph if drawn will look something like this:
SCATTER PLOT

034
How to creatScatter Plot with Edraw Max Online? Nowadays, creating a Scatter Chart has
become very easy. You no longer need to do it with pen and paper, even though it is how we leam. Then
again, at a professional level the best results are always seen when you use a diagramming tool like the
Edraw Max Onlibe to create Scatter Plots. It is a great tool to have in your inventory. Moreover, being an online
tool. you don't need to download it on your computer. Before drawing a Scatter Graph you need to
understand the differatu torrelations and correlation coefficients as desudk above. "+1" means positive
linear correlation "O" means no correlation: --I" means negative naar correlation: If the value of the
coefheiatas 0<x<+1, then there is a positive correlation but not linear If the value of the coefficient is -1<x<0,
then there is a negative correlation but not linear; Secondly, get familiar with the interfate of Edd a Onlist: 11
With that taken care of, let us see how we care create y Statler Plot using the Edraw Max Online: Step 1:
In your web browser open the home page and login with your credentials Step 2: From the "Graphs &
Chart' menu, select the "Scatter' option, and a drawing window opens

AIOU Studio ‫وب چینل کو سبسکرائب کریں۔‬T‫یوٹی‬


9 ‫عالمہ اقبال اوپن یونیورسٹی کی معلومات کے لیے‬
B.S Education
0334-5515779,0344,5515779,0345-7308411 Step 3: To the left of the drawing
canvas, you will find some predefined templates for use: select the one you want to drag & drop on the canvas
Step 4: You can manually plot each data, or you can extract it from a saved file: for the latter, hover
above the action button to the top-right of the chart and click on the 'Load Data from File' option Step 5:
Select the file in question; it supports .csv, txt, xls. Isx, etc. file types Step 6: As you do so you will see the
scatter chart change accordingly Step 7: You can show or hide data label and even change it by double-
clicking on it: to do the former you need to select the same option from the action button on the top-right
Step 8: You can add or delete a point from the action button as well Step 9: You can set minimum and maximum
values from the same place Step 10: Once you finalize the chart save your work on the Google cloud
Common issues that you may face Over-plotting: It happens when there are too many data values: It
makes it hard to understand the correlation between the variables and as such the measure
becomes difficult to calculate: Solution: Do Random Sampling of the data values and plot these samples as it is a
subset of the whole data set. Correlation does not imply causation: Even though we can find a correlation
between two variables it does not mean that they are responsible for each other behavior; a third variable can be
affecting the action, and it can go unnoticed: 0 Solution: In such cases, different tools, such as Pearson's
Correlation Cecilicion comes in handy.
KIV

In this window. you can create your wiring diagram by choosing different wing diagram
symbols from the symbolibrary. There are various symbols available such as transmission path, qualifying
symbols, senticonductor devices, switches and relays and other necessary electrical symbols o .
03A Q.4 Explain normal distribution. How does
normality of data affect the analysis of data? ANS: What is Normal Distribution? Normal distribution, 15
known as the Gaussian distribution. is a probability distribution that is symmetric about the mean, showing
that data near the mean are more frequent in occurrence than data far from the mean. In graph form,
normal distribution will appear as a bell curve. Normal Distribution-0.345-7308411
Understanding Normal Distribation The normal distribution is the most common type of distribution assumed in
technical stock market analysis and in other types of statistical analyses. The standard normal distribution has
two parameters: the mean and the standard deviation. For a normal distribution, 68% of the
observations are within +/- one standard deviation of the mean, 95% are within +/- two standard deviations, and
99.7% are within t-three standard deviations.

AIOU Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬


9 ‫ کی معلومات کے لیے‬T‫عالمہ اقبال اوپن یونیورسٹی‬
B.S Education

perfect nor how different a


distribution
0334-5515779,0344,5515779,0345-7308411 The normal distribution
model is motivated by the Central Limit Theorem. This theory states that averages calculated from
independent. identically distributed random variables have approximately normal distributions,
regardless of the type of distribution from which the variables are sampled (provided it has finite
variance). Normal distribution is sometimes confused with symmetrical distribution. Symmetrical distribution
is one where a dividing line produces two mirror images, but the actual data could be two humps or a
series of hills in addition to the bell curve that indicates a normal distribution. Skewness and Kurtosis
Real life data rarely. if ever, follow a perfect normal distribution.
The skewness and kurtosis coefficients measure how different a given distribution is from a normal
distribution. The skewness measures the symmetry of a distribution. The normal distribution is symmetric and
has a skewness of zero. If the distribution of a data set has a skewness less than zero, or negative
skewness, then the left tail of the distribution is longer than the right tail: positive skewness implies that the
right tail of the distribution is longer than the left:
The kurtosis statistic measures the thickness of the tail ends of a distribution in relation to the tails of
the normal distribution. Distributions with large kurtosis exhibit tail data exceeding the tails of the normal
distribution (e.g., five or more standard deviations from the mean). Distributions with low kurtosis exhibit
tail data that is generally less extreme than the tails of the normal distribution. The normal distribution has a
kurtosis of three, which indicates the distribution has neither fat nor thin tails. Therefore, if an observed
distribution has a kurtosis greater than three, the distribution is said to have heavy tails when compared
to the normal distribution. The distribution has a kurtosis of less than three, it is said to have thin tails
when compared to the normal distribution. How Normal Distribution is Used in Finance
The assumption of a normal distribution is applied to asset prices as well asorice action. Traders may plot pride
points over time to fit recent price action into a new distribution. The further price action no es crom the
mean, in this case, the more likelihood that an asset is being over or undervalued. Tradors pan use the
standard deviations to suggest potential trades. This type of trading is generally done on very short time
frames as later timescales make it much harder to pick entry and exit points. 4 Similarly, many statistical
theories attempt to model asset prices under the assumption that they follow a normal distribulous reality,
price distributions tend to have fat tails and, therefore, have kurtosis greater than three. Such assets have had
price movements greater than three standard deviations beyond the mean more often than would be
expected under the assumption of a normal distribution. Even T echias Wenrtlırough a long period where
it fits a normal distribution, there is no guarantee that the past perfomance Truly informs the future
prospects Partb: A data set is a collection of the data of individual cases or subjects. Usually, it is
meaningless to present such data individually because that will not produce any important conclusions.
In place of individual case presentation, we present summary statistics of our data set with or without
analytical form which can be easily absorbable for the audience. Statistics

AIOU Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬


9 ‫عالمہ اقبال اوپن یونیورسٹی کی معلومات کے لیے‬
B.S Education
0334-5515779,0344,5515779,0345-7308411 which is a science of collection,
analysis, presentation, and interpretation of the data, have two main branches, are descriptive statistics and
inferential statistics. Summary measures or summary statistics or descriptive statistics are used to
summarize a set of observations, in order to communicate the largest amount of information as simply as
possible. Descriptive statistics are the kind of information presented in just a few words to describe the basic
features of the data in a study such as the mean and standard deviation (SD).The another is inferential
statistics, which draw conclusions from data that are subject to random variation (e.g., observational
errors and sampling variation). In inferential statistics, most predictions are for the future and
generalizations about a population by studying a smaller sample. To draw the inference from the study
participants in terms of different groups, etc., statistical methods are used. These statistical methods have
some assumptions including normality of the continuous data. There are different methods used to test the
normality of data, including numerical and visual methods, and each method has its own advantages and
disadvantages. Descriptive statistics and inferential statistics both are employed in scientific analysis of
data and are equally important in the statistics. In the present study, we have discussed the summary measures
to describe the data and methods used to test the normality of the data. To understand the descriptive statistics
and test of the normality of the data, an example [Table 1] with a data set of 15 patients whose mean
arterial pressure (MAP) was measured are given below. Further examples related to the measures of
central tendency. dispersion, and tests of normality are discussed based on the above data. Table 1
Distribution of mean arterial pressure (mmHg) as per sex
PatienGohmber

12 4 5 6 7 8 9 10 11 12 13 14 15 A MAP 82 84 85C 92 93 94
95 98 100 102 107 110-116 1162 Sex M F F M M F F M M F M F M F M MAP: Mean
arterial preseurs, M: Male. F: Female Descriptive Statistics /Ro There are three major types of desorippve
statistics Measures of frequency (frequency, percent), measures of central tendency (mean, median
and mode), and measures of dispersion or variation (variance, SD, stafidard error, quartile,
interquartile range, percentile, range, and coefficient of variation provide simple summaries about the
sample and the measures. A measure of frequency is usually used for the categorical data while others
are used for quantitative data. Measures of Frequency 0345-7308411 Frequency statistics
simply count the number of times that in each variable occurs, such as the number of males and females
within the sample or population. Frequency analysis is an important area of statistics that deals with the
number of occurrences (frequency) and percentage. For example, according to Table 1. out of the 15
patients, frequency of the males and females were 8 (53.3%) and 7 (46.7%), respectively.
2:344-5513

AIOU Studio ‫وب چینل کو سبسکرائب کریں۔‬T‫یوٹی‬


9 ‫عالمہ اقبال اوپن یونیورسٹی کی معلومات کے لیے‬
B.S Education 0334-5515779,0344,5515779,0345-
7308411

$ measures of hes of central , kewness. CO


Measures of Central Tendency Data are commonly describe the observations in a measure of central
tendency, which is also called measures of central location, is used to find out the representative value of
a data set. The mean, median, and mode are three types of measures of central tendency. Measures of
central tendency give us one value (mean or median) for the distribution and this value represents the
entire distribution. To make comparisons between two or more groups, representative values of these
distributions are compared. It helps in further statistical analysis because many techniques of statistical
analysis such as measures of dispersion, skewness, correlation, t-test, and ANOVA test are calculated using value
of measures of central tendency. That is why measures of central tendency are also called as measures of the first
order. A representative value (measures of central tendency) is considered good when it was calculated
using all observations and not affected by extreme values because these values are used to calculate for further
measures. Computation of Measures of Central Tendency Mean Mean is the mathematical average value of a set
of data. Mean can be calculated using summation of the observations divided by number of observations. It
is the most popular measure and very easy to calculate. It is a unique value for one group, that is, there
is only one answer, which is useful when comparing between the groups. In the computation of mean, all
the observations are used. One disadvantage with mean is that it is affected by extreme values (outliers). For
example, according to Table 2. mean MAP of the patients was 97.47 indicated that average MAP of the
patients was 97.47 mmHg. Table 2 Descriptive statistics of the mean arterial pressure (mmHg) Mean SD
SED Q1 Q2 Q3 Minimum Maximum Mode
97.47 11.01 2.
88 95 107 82 1 16 116 SD: Standard deviatio SE: Standard error,
Q1: First quartile. Q2: Seconquartile. Q3: Third quartile Median
The median is defined as the middle dost observation if alate arranged either in increasing or
decreasing order of magnitude. Thus, it is one of the observations, which occupies the central place in
the distribution data. This is also called positional average. Extreme values (outliers) do not affect the
median Its unique, that is, there is only one median of one data set which is useful when comparing
between the groups. There is one disadvantage of median over mean that it is not as popular as mean.
Eor example according to Table 2. median MAP of the patients was 95 mmHg indicated that 500 pesertions
of the 14-thier less than or equal to the 95 mmHg and rest of the 50% observations are either equal or
greater than 95 mmHg. Mode Mode is a value that occurs most frequently in a set of observation, that is,
the observation, which has maximum frequency is called mode. In a data set, it is possible to have multiple
modes or no mode exists. Due to the possibility of the multiple modes for one data set, it is not
344-561
AIOU Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬
9 ‫عالمہ اقبال اوپن یونیورسٹی کی معلومات کے لیے‬
B.S Education
0334-5515779,0344,5515779,0345-7308411 used to compare between
the groups. For example, according to Table 2. maximum repeated value is 1ló mmHg (2 times) rest are
repeated one time only, mode of the data is 116 mmHg. Measures of Dispersion Measures of dispersion is
another measure used to show how spread out (variation) in a data set also called measures of variation. It is
quantitatively degree of variation or dispersion of values in a population or in a sample. More specifically, it is
showing lack of representation of measures of central tendency usually for mean median. These are
indices that give us an idea about homogeneity or heterogeneity of the data. Common measures Variance,
SD, standard error, quartile, interquartile range. percentile, range, and CV. Computation of Measures of
Dispersion Standard deviation and variance
The SD is a measure of how spread out values is from its mean value. Its symbol is o (the
Greek letter sigma) or s. It is called SD because we have taken a standard value (mean) to measures
the dispersion. Where xi is individual value. K is mean value. If sample size is 30, we use "n 1" in denominator,
for sample size >30. use "n" in denonunator. The variance (s2) is defined as the average of the squared
difference from the mean It is equal to the square of the SD (s).

For example, in the above, SD is 11.01 mmHg When n<30 which showed that approximate
average deviation between mean value and individual values is 11.01. Similarly, variance is 121.22
[i.e.. (11.01)2], which showed that average square deviation between mean value and individual
values is 121.22 [Table 2]. Standard error Standard error she approximate difference between sample
mean and populition mean. When we draw the many samples from same population with same sample size
through random
sampling techniquorthen SD among the sample means is called standard ORLOR If sample SD
and sample size are given, we can calculate standard error for this sample by using the formula. Standard
error=sample 3D sample size For example, according to Tablom standard error is 2.84 mmHA which
showed that average mean difference between sample beans and population med 2.84 mmHg [Table 2].
Quartiles and interquartile range The quartiles are the three points that divide the data set into four
equal groups, each group comprising a quarter of the date, for a set of data values which are arranged in either
ascending or descending order. O1.02. and Q3 are represent the first second, and third quartile's value.
For ith Quartile = [1 *(n+1)4]th observation where I = 1.3 For example, in the above, first quae 700
14=4th observation from initial = 88 mmHg
(i.e., first 25% number of observations of the data are either <88 and rest 75% observations are either -88). Q2
(also called median) = [2 (n + 1)/4] = Sth observation from initial = 95 mmHg, that is, first 50% number of
observations of the data are either less or equal to the 95 and rest 50% observations are either >95. and
similarly Q3 = [3* (n + 174] = 12th observation from initial = 107 mmHg, i.e., indicated that first 75%
number of observations
AIOU Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬
9 ‫کے لیے‬
‫عالمہ اقبال او بین یو نیورسٹی کی معلومات‬
B.S Education
0334-5515779,0344,5515779,0345-7308411 of the data are either <107
and rest 25% observations are either 107. The interquartile range (IQR) is a measure of variability, also
called the midspread or middle 50%, which is a measure of statistical dispersion, being equal to the
difference between 75th (Q3 or third quartile) and 25th (QI or first quartile) percentiles. For example, in the
above example, three quartiles, that is, 01.02. and Q3 are 88.95. and 107, respectively. As the first and third
quartile in the data is 88 and 107. Hence, IOR of the data is 19 mmHg (also can write like: 88-107) [Table 2]
Percentile
The percentiles are the 99 points that divide the data set into 100 equal groups. each group
comprising a 1% of the data for a set of data values which are arranged in either ascending or descending
order. About 25% percentile is the first quartile, 50% percentile is the second quartile also called median value,
while 75% percentile is the third quartile of the data. For ith percentile = [1 + (n + 1)/100]th observation, where I =
1. 2. 3. 99. Example: In the above, 10th percentile=[10* (n+1)/100)=1.6th observation from initial which is
fall between the first and second observation from the initial = Ist observation +0.0* (difference between
the second and first observation) = 83.20 mmHg, which indicated that 10% of the data are either c83.20
and rest 90% observations are either >83.20. Coefficient of Variation Interpretation of SD without considering
the magnitude of mean of the sample or population may be misleading. To overcome this problem. CV gives
an idea, CV gives the result in terms of ratio of SD with respect to its mean value, which expressed in %. CV=
100 X (SD/mean). For example, in the above, coefficient of the variation is 11.3% which indicated that SD
is 11.3% of its mean value [ie., 100*(11.01/97.47) Table 2] Range Difference betyveen largest and smallest
observation is called range. If A and Bure smallest and largest observations in a data set, then the range I is
equal to the difference of largest and smallest observation that is, R=A-B. For example, in the above, minimum
and maximum observation in the data 82 mmHg and 116 mmHg. Hence, the range of the data is 34 mmHg
(also can write like-82-116) [Table 2]. Descriptive statistics can be calculated in the statistical software SPSS"
(analyze descriptive statistics frequencies or 2 finize2lve. Normality of data and testing
The standard normal distribution is the most important continuous probability distribution has a bell-
shaped density and described by its mean and SD and extreme values in the data set have no
significant impact on the mean value. If a continuous data is follow normal distribution then
68.2%.95.4%, and 99.7% observations are lie between mean = 1 SD, mean + 2 SD. and mean 3 SD.
respectively. 345-7308411 Why to test the normality of Malar Various statistical methods used for
data analysis make assumptions about normality, including correlation, regression, t-tests, and analysis of
variance. Central limit theorem states that when sample size has 100 or more observations, violation of the
normality is not a major issue. Although for meaningful conclusions, assumption of the normality should be
followed irrespective of the sample size. If a continuous data follow normal distribution, then we present
6

AIOU Studio ‫ب چینل کو سبسکرائب کریں۔‬T‫یوٹیو‬


9 ‫کے لیے‬
‫عالمہ اقبال اوپن یونیورسٹی کی معلومات‬
B.S Education
0334-5515779,0344,5515779,0345-7308411 this data in mean value.
Further, this mean value is used to compare between/among the groups to calculate the significance level (P
value). If our data are not normally distributed, resultant mean is not a representative value of our data. A
wrong selection of the representative value of a data set and further calculated significance level using this
representative value might give wrong interpretation.[9] That is why, first we test the normality of the data, then we
decide whether mean is applicable as representative value of the data or not. If applicable, then means
are compared using parametric test otherwise medians are used to compare the groups, using
nonparametric methods. Methods used for test of normality of data An assessment of the normality of
data is a prerequisite for many statistical tests because normal data is an underlying assumption in
parametric testing. There are two main methods of assessing normality: Graphical and numerical (including
statistical tests).Statistical tests have the advantage of making an objective judgment of normality but have
the disadvantage of sometimes not being sensitive enough at low sample sizes or overly sensitive to
large sample sizes. Graphical interpretation has the advantage of allowing pood judgment to assess normality in
situations when numerical tests might be over or undersensitive. Although normality assessment using
graphical methods need a great deal of the experience to avoid the wrong interpretations. If we do not have a
good experience, it is the best to rely on the numerical methods. There are various methods available to test the
normality of the continuous data, out of them, most popular methods are Shapiro-Wilk test. Kolmogorov-Smirnov
test, skewness. kurtosis, histogram, box plot, P-P Plot. Q-Q Plot, and mean with SD. The two well-known tests of
normality. namely, the Kolmogorov-Smimov test and the Shapiro-Will test are most widely used methods to test
the normality of the data. Normality tests can be conducted in the statistical softwade "SPSS" (analyze
descriptive statistics exploreplo normality plots with tests) The Shapira-Will test is more appropriate method
for small sample sizes (450 Samples) although it can also be handling on larger sample size while
Kolmogoroy Smirnov test is used for n >50. For both oNhe above tests, null hypothesis states that data to
taken from normal distributed population. When 0.05. null hypothesis accepted and data are called as
normally distributed. Skewness is a meacut of symmetry. or more Art Lisbly. the lack of symmetry of the
normal distribution, Kurtosis is a measure of the peakedness of a distribution. The original kurtosis value is
sometimes called kurtosis (proper). Most of the statistical packages such as SPSS provide "exces Osis
(also called kurtosis [excess])obtained by subtracting 3 from the kurtosis (proper). A distribution, or data set, is
symmetric if it looks the same to the left and right of the center point. If mean, median, and mode of a
distribution coincide, then it is called a symmetric distribution, that i krwpescolaisis ex90. A distribution is
called approximate normal if skewness of kurtosis (excess of line data are between - 1 and +1.
Although this is a less reliable method in the small-to-moderate sample size (i.e.. n <300) because it can not
adjust the standard error (as the sample size increases, the standard error decreases). To overcome this
problem, a z-test is applied for normality test using skewness and kurtosis. A Z score could be obtained by
dividing the skewness values or excess kurtosis value by their standard errors. For small sample size (n
<50). z value = 1.96 are sufficient to establish

AIOU Studio ‫وب چینل کو سبسکرائب کریں۔‬T‫یوٹی‬


9 ‫ کی معلومات کے لیے‬T‫عالمہ اقبال اوپن یونیورسٹی‬
B.S Education
0334-5515779,0344,5515779,0345-7308411 normality of the data.However,
medium-sized samples (50< n <300). at absolute z-value $3.29. conclude the distribution of the sample is
normal. For sample size >300, normality of the data is depend on the histograms and the absolute values
of skewness and kurtosis. Either an absolute skewness value 2 or an absolute kurtosis (excess) <4 may be
used as reference values for determining considerable normality. A histogram is an estimate of the
probability distribution of a continuous variable. If the graph is approximately bell-shaped and
symmetric about the mean, we can assume normally distributed data.In statistics, a 0-0 plot is a
scatterplot created by plotting two sets of quantiles (observed and expected) against one another. For
normally distributed data, observed data are approximate to the expected data, that is, they are
statistically equal .A P-P plot (probability-probability plot or percent-percent plot) is a graphical technique
for assessing how closely two data sets (observed and expected) agree. It forms an approximate straight line when
data are normally distributed. Departures from this straight line indicate departures from normality.Box plot is
another way to assess the normality of the data. It shows the median as a horizontal line inside the
box and the IQR (range between the first and third quartile) as the length of the box. The whiskers (line
extending from the top and bottom of the box) represent the minimum and maximum values when they are
within 1.5 times the IQR from either end of the box (i.e., 01- 1.5* IQR and Q3 +1.5* IOR). Scores >1.5 times and
3 times the IQR are out of the box plot and are considered as outliers and extreme outliers, respectively. A box plot
that is symmetric with the median line at approximately the center of the box and with symmetric whiskers indicate
that the data may have come from a normal distribution. In case many outliers are present in our data set,
either outliers are need to remove or data should treat as nonnormally distributed. Another method of
normality of the data is relative value of the SD with respecto mean. If SD is less than half mean (i...
CV <50%), data are poosidered normal. This is the quick method to test the normality. However this method
slould only be used when our sample size is at least 50. For example in Tablo 1. data of MAP of the 15
patients are given. Normality of the above data was assessed. Resultsbqwed that data were normally
distributed as skewnes (0.398) and kurtosis (-0.825) individually were within 21. Critical ratio (Z value) of
the skewness (0.686) and kurtosis (-0.737) Werewohin +1.96. also evident to normally distibuted.
Similarly. Shapiro-Wilk test (P=0.454) Oncolmogorov-Smirnoy test(P 0.200) were statistically
insignificant, that is, data were considered normally distributed. As sample size is <50, we have to take Shapiro-
Wilk test result and Kolmogorov-Smirnov test result must be avoided. although both methodsdialed that data
were nomally distributed. As SD of the MAP was less than half mean vllus (ULDI 48.73), data were
considered normally distributed, although due to sample size <50, we should avoid this method
because it should use when our sample size is at least 50 [Tables [Tables 2 andand337 Table 3
Skewness. kurtosis, and normality tests for mean arterial pressure (mmHg) Variable Skewness
Kurtosis
P

Value SE
Z
Value
SE Z
K
-S test with Lilliefors Shapiro-Wilk correction
test

AIOU Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬


9 ‫عالمہ اقبال اوپن یونیورسٹی کی معلومات کے لیے‬
BS Education 0334-5515779,0344,5515779,0345-7308411
Variable
Skewness
Kurtosis

Value SE Z Value SE Z K-S test with Lilliefors Shapiro-Wilk


correction
test MAP 0.398 0.580
0.686 -0.825 LI2 -0.737 0.200
0.454 score K-S: Kolmogorov-
Smirnov, SD: Standard deviation. SE: Standard error Descriptive statistics are a statistical method to
summarizing data in a valid and meaningful way. A good and appropriate measure is important not only
for data but also for statistical methods used for hypothesis testing. For continuous data, testing of
normality is very important because based on the normality status, measures of central tendency. dispersion,
and selection of parametric/nonparametric test are decided. Although there are various methods for normality
testing but for small sample size (n <50). Shapiro-Wilk test should be used as it has more power to detect the
nonnormality and this is the most popular and widely used method. When our sample size (n) is at least 50. any
other methods (Kolmogorov-Smirnov test, skewness. kurtosis, z value of the skewness and kurtosis,
histogram, box plot, P-P Plot, Q-Q Plot, and SD with respect to mean) can be used to test of the
normality of continuous data.

Q.5 How is mean different from median? Explain the role of level of measurement in
measure of central tendency.

Dverimlicon of Mean ais Median


w
ANS: The difference between mean and median is explained in detail here. In statistics, mean is
the average of a set of data and the median is the middle value of the arranged stof data. Both values have thoit
own importance and play a distinct role in data collection and organisation. Let us see what we other
differences between them with the help of definitions table and example. Definition of Mean and Median
Mean: Mathematically. The mean can simply be defined as the list of sombers that are used to describe a central
tendency. Thouethod to calculate mcan is siinbenough, the variables need to
be added and divided by
the Munich of items in the overalbample. Assume a data value, say xl. x2, ..., xn So,
the formula for calculating the mean is given as: Mean = (x1.x2. ... men Here "n" is the total number of
sample items Median: Median, on the other hand, can simply be defined as the pumber that is found in the
middle of the set. Median is un
k a guantity that u sed for separating the available sample into two: the
higher half sample, as well as the lower half sample, can be procured in this method. To proceed with
the process of finding the median of the given data, first, arrange the given set of numbers on the
ascending order, and then determine the middle value from the centre of the distribution. This condition can be
suitable if we have an odd number of observations. But, in
BS tucation
Al0U Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬
9 ‫ کی معلومات کے لیے‬6‫عالمہ اقبال اوپن یونیورسٹی‬
B.S Education
0334-5515779,0344,5515779,0345-7308411 case of even number of
observation, there is no single median value. So, in this case, add the two numbers in the middle and then
divide it by 2. The obtained value is taken as a median value. What is the Difference between Mean and
Median? Here, the major difference between mean and median is listed below. Go through the
following differences. Difference between Mean and Median

Mean
Median

The average arithmetic of a given set of


numbers is called Mean.
The method of separating the higher sample with the lower value, usually from a
probability distribution is termed as the median

The application for the mean is for normal


distributions
The primary application for the median is skewed distributions.

There are a lot of external factors that


limit the use of Mean.
It is much more robust and reliable for measuring the data for uneven data.

Mean can found by calculated by adding all the values


and dividing the total by the number of values.
Median can be found by listing all the numbers available in
the set in arranging the order and then finding the number in
the centre of the distribution.

Mean is considered as an arithmetic average.


Median is considered as a positional average.

It is highly sensitive to outlier data


It is not much sensitive to the outlier data.

It defines the central value of It defines the centre of gravity of the the data
set.
midpoint of the data set. Thus, these are the major dittaendee
fetueen MeadMédtan. It is essential to know the major differences between the two.

Partb: Measures of central tendency tell us what is common or typical about our variable.
Three measures of central tendency are the mode, the median and the mean. The mode is
used

AIOU Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬


9 ‫عالمہ اقبال اوپن یونیورسٹی کی معلومات کے لیے‬
B.S Education
0334-5515779,0344,5515779,0345-7308411 almost exclusively with
nominal-level data, as it is the only measure of central tendency available for such variables. In statistics,
a central tendency (or measure of central tendency) is a central or typical value for a probability
distribution. It may also be called a center or location of the distribution Colloquially, measures of central
tendency are often called averages. The term central tendency dates from the late 1920s. The most
common measures of central tendency are the arithmetic mean, the median, and the mode. A middle
tendency can be calculated for either a finite set of values or for a theoretical distribution, such as the normal
distribution. Occasionally authors use central tendency to denote the tendency of quantitative data to
cluster around some central value. ** The central tendency of a distribution is typically contrasted with its
dispersion or variability: dispersion and central tendency are the often characterized properties of distributions.
Analysis may judge whether data has a strong or a weak central tendency based on its dispersion. Measures The
following may be applied to one-dimensional data. Depending on the circumstances, it may be
appropriate to transform the data before calculating a central tendency. Examples are squaring the values
or taking logarithms. Whether a transformation is appropriate and what it should be, depend heavily on the data
being analyzed. Arithmetic mean or simply. mean the sum of all measurements divided by the number of
observations in the data set. Median
the middle value that separates the higher half from the lower half of the data set. The median and the
mode art, the only measures of central tendency that can be used for ordinal data, in which values are tanked
relative to each other but are not measured absolutely Mode the most frequent raue in the data set. This is
the only central tendency modowe that can be used with nominal data which have purely qualitative
category assignments Geometric mean V > the nth root of the product of the data values, where
there are n of these. This measure is valid only for data that are measured folutely on a strictly positive Aple.
Harmonic mean the reciprocal of the arithmetic mean of the reciprocals of the data values. This measure too is
valid only for data that releasured absolutely on a strictly positive scale. Weighted arithmetic meany an
arithmetic mean that incorporates weighting to certain data elements.
Truncated mean or trimmed many the arithmetic mean of data values after tertain numberor
proportion of the highest and lowest data values have been discarded. Interquartile mean a truncated
mean based on data within the interquartile range. Midrange the arithmetic mean of the
maximum and minimum values of a data set.

AIOU Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬


9 ‫عالمہ اقبال اوپن یونیورسٹی کی معلومات کے لیے‬
B.S Education 0334-5515779,0344,5515779,0345-
7308411
Midhinge the arithmetic mean of the first and third quartiles. Trimean the weighted arithmetic mean of
the median and two quartiles. Winsorized mean an arithmetic mean in which extreme values are replaced
by values closer to the median. Any of the above may be applied to each dimension of multi-dimensional
data, but the results may not be invariant to rotations of the multi-dimensional space. In addition, there
are the Geometric median which minimizes the sum of distances to the data points. This is the same as the
median when applied to one-dimensional data, but it is not the same as taking the median of each dimension
independently. It is not invariant to different rescaling of the different dimensions. Quadratic mean (often
known as the root mean square) useful in engineering, but not often used in statistics. This is because it is not a
good indicator of the center of the distribution when the distribution includes negative values. Simplicial depth the
probability that a randomly chosen simplex with vertices from the given distribution will contain the given
center Tukey median a point with the property that every halfspace containing it also contains many
sample points Solutions to variational problems Several measures of central tendency can be characterized as
solving a variational problem, in the sense of the talculus of variations, namely minimizing variation from the
center. That is, given a measure af statistical dispersion, one asks for a measure of central tendancy that
minimizes varianto: such that variation from the center is minimal among all thoices of center. In a quip,
"dispersion precedes location". These measures are initially defined in one dimension, but can be
generalized to multiple dimensions. This center may or may not be unique. In the sense of LR spaces,
the correspondence is:
dispersion central tendency variation ratio mode[a] average absolute median (geometric
deviation median)[b] standard
mean (centroid)[c] deviation maximum deviation
midrange[d]
The associated functions are called p-norms: respectively 0-"norm". I-norm, 2-norm, and o
norm. The function corresponding to the LO space is not a norm, and is thus often referred to in quotes:
0-"norm".

AIOU Studio ‫وب چینل کو سبسکرائب کریں۔‬T‫یوٹی‬


9 ‫کے لیے‬
‫عالمہ اقبال اوپن یونیورسٹی کی معلومات‬
B.S Education
0334-5515779,0344,5515779,0345-7308411 In equations, for a given (finite)
data set X, thought of as a vector x = (x1,...,xn), the dispersion about a point c is the distance from x to
the constant vector c=(c. ..c) in the p-norm (normalized by the number of points n){\displaystyle
f_p,I=\left|\mathbf (x) -\mathbf {c) \right\_p}={\bigg ({\frac {1}{n}}\sum _{i=1}^{n}\left|x_{i}-c\right|p(\bigg )
^(1/p}} Forp= 0 and p= co these functions are defined by taking limits, respectively as p- and p 20. For p
=0 the limiting values are 00 = 0 and a0 = 0 or a = 0. so the difference becomes simply equality, so the 0-norm
counts the number of unequal points. For p=o the largest number dominates, and thus the o-norm is the maximum
difference. Uniqueness The mean (L2 center) and midrange (Le center) are unique (when they exist),
while the median (Ll center) and mode (L0 center) are not in general unique. This can be understood in
terms of convexity of the associated functions (coercive functions).
The 2-norm and 2-norm are strictly convex, and thus (by convex optimization) the 28inimize is
unique (if it exists), and exists for bounded distributions. Thus standard deviation about the mean is lower
than standard deviation about any other point, and the maximum deviation about the midrange is lower
than the maximum deviation about any other point. The I-norm is not strictly convex, whereas strict
convexity is needed to ensure uniqueness of the 28inimize. Correspondingly, the median in this sense
of minimizing) is not in general unique, and in fact any point between the two central points of a discrete
distribution minimizes average absolute deviation. The O-"norm" is not convex (hence not a norm).
Correspondingly, the mode is not unique - for example, in a uniform distribution any point is the mode.
Clustering O Instead of a single central point, one can ask for multiple points such that the variation from these
points is numinized. This leads to cluster analysis, where each point in the data set is clustered with the
nearest "center" Most commonly, using the 2-norm genatalizes the mean to k-means clustering while
using the l-norm generalizes the (geometrigh Indian to k-medians clustering. Using the then simply
generalizes the mode (most coming value) to using the k most common values as enters. Unlike the single-
center statistics, this multi-center clusterug. Anot in general be computed in a closed-form expression, and instead
must be computed papproximated by an iterative method: one general approach is expectation-maximization
algorithms. Information geometry The notion of a "centerm inimizing variation can be generalized in
information geometry as a distribution that minimizes divergence (a generalized distance) from a data
set. The most common case is maximum likelihode omation. Wiere dhe paintium likelihood estimate
(MLE) maximizes likelihood minimize expected surpisal), which can be interpreted geometrically by using
entropy to measure variation: the MLE minimizes cross entropy (equivalently, relative entropy, Kullback-
Leibler divergence). A simple example of this is for the center of nominal data: instead of using the mode
(the only single-valued "center"), one often uses the empirical measure (the frequency distribution
divided by the sample size) as a "center". For example, given binary data, say heads
ter statistics at must be comimization algor
AIOU Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬
9 ‫عالمہ اقبال او بین یو نیورسٹی کی معلومات کے لیے‬
B.S Education
0334-5515779,0344,5515779,0345-7308411 or tails, if a data set consists
of 2 heads and I tails, then the mode is "heads", but the empirical measure is 2/3 heads. 1/3 tails, which
minimizes the cross-entropy (total surprisal) from the data set. This perspective is also used in
regression analysis, where least squares finds the solution that minimizes the distances from it, and
analogously in logistic regression, a maximum likelihood estimate minimizes the surprisal (information
distance). Relationships between the mean, median and mode For unimodal distributions the following
bounds are known and are sharp: {\displaystyle \frac
Ntheta - nu \sigma }\leq \sqrt (33). where u is the mean, v is the median, 8 is the mode, and
o is the standard deviation. For every distribution. {\displaystyle {\frac { nu - muf\sigma }}\leq 1.)
0300-537109
344-5515779
1884 034 BS Education
0345-7308411

AIOU Studio ‫یوٹیوب چینل کو سبسکرائب کریں۔‬


9 ‫رسٹی کی معلومات کے لیے‬T‫عالمہ اقبال اوپن یونیو‬

You might also like