Stat 1

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Population

A population is any entire collection of people, animals, plants or things from which we may collect data. It is the entire group we are interested
in, which we wish to describe or draw conclusions about.

In order to make any generalisations about a population, a sample, that is meant to be representative of the population, is often studied. For
each population there are many possible samples. A sample statistic gives information about a corresponding population parameter. For
example, the sample mean for a set of data would give information about the overall population mean.

It is important that the investigator carefully and completely defines the population before collecting the sample, including a description of the
members to be included.

Example 
The population for a study of infant health might be all children born in the UK in the 1980's. The sample might be all babies born on 7th May in
any of the years.

Sample

A sample is a group of units selected from a larger group (the population). By studying the sample it is hoped to draw valid conclusions about
the larger group.

A sample is generally selected for study because the population is too large to study in its entirety. The sample should be representative of the
general population. This is often best achieved by random sampling. Also, before collecting the sample, it is important that the researcher
carefully and completely defines the population, including a description of the members to be included.

Example 
The population for a study of infant health might be all children born in the UK in the 1980's. The sample might be all babies born on 7th May in
any of the years.

Parameter

A parameter is a value, usually unknown (and which therefore has to be estimated), used to represent a certain population characteristic. For
example, the population mean is a parameter that is often used to indicate the average value of a quantity.

Within a population, a parameter is a fixed value which does not vary. Each sample drawn from the population has its own value of any statistic
that is used to estimate this parameter. For example, the mean of the data in a sample is used to give information about the overall mean in
the population from which that sample was drawn.

Parameters are often assigned Greek letters (e.g.  ), whereas statistics are assigned Roman letters (e.g. s).

Statistic

A statistic is a quantity that is calculated from a sample of data. It is used to give information about unknown values in the corresponding
population. For example, the average of the data in a sample is used to give information about the overall average in the population from which
that sample was drawn.

It is possible to draw more than one sample from the same population and the value of a statistic will in general vary from sample to sample.
For example, the average value in a sample is a statistic. The average values in more than one sample, drawn from the same population, will not
necessarily be equal.

Statistics are often assigned Roman letters (e.g. m and s), whereas the equivalent unknown values in the population (parameters ) are assigned
Greek letters (e.g. µ and  ).

Discrete Data

A set of data is said to be discrete if the values / observations belonging to it are distinct and separate, i.e. they can be counted (1,2,3,....).
Examples might include the number of kittens in a litter; the number of patients in a doctors surgery; the number of flaws in one metre of cloth;
gender (male, female); blood group (O, A, B, AB).

kmvc BSN3-1
Continuous Data

A set of data is said to be continuous if the values / observations belonging to it may take on any value within a finite or infinite interval. You
can count, order and measure continuous data. For example height, weight, temperature, the amount of sugar in an orange, the time required
to run a mile.

Quantitative Variable.
A variable that takes numerical values for which arithmetic makes sense, for example, counts, temperatures, weights, amounts of
money, etc. For some variables that take numerical values, arithmetic with those values does not make sense; such variables are not
quantitative. For example, adding and subtracting social security numbers does not make sense. Quantitative variables typically have
units of measurement, such as inches, people, or pounds.

Qualitative Variable

A qualitative variable is one whose values are adjectives, such as colors, genders, nationalities,  etc. C.f.  QUANTITATIVE
VARIABLE and CATEGORICAL VARIABLE .

Independent Variable

In REGRESSION, the independent variable is the one that is supposed to explain the other; the term is a synonym for "explanatory
variable." Usually, one regresses the "dependent variable" on the "independent variable." There is not always a clear choice of the
independent variable. The independent variable is usually plotted on the horizontal axis. Independent in this context does not mean
the same thing as STATISTICALLY INDEPENDENT .

Variable

A numerical value or a characteristic that can differ from individual to individual. See also  CATEGORICAL VARIABLE, QUALITATIVE
VARIABLE, QUANTITATIVE VARIABLE , DISCRETE VARIABLE , CONTINUOUS VARIABLE , andRANDOM VARIABLE .

Primary Data
Primary data means original data that have been collected specially for the purpose in mind. It means when an authorized organization or an
investigator or an enumerator collects the data for the first time himself or with the help of an institution or an expert then the data thus
collected are called primary data.
Research where one gathers this kind of data is referred to as  field research.
For example: a questionnaire.
Secondary Data
Secondary data are data that have been collected for another purpose and where we will use Statistical Method with the Primary Data. It
means that after performing statistical operations on Primary Data the results become known as Secondary Data.
Research where one gathers this kind of data is referred to as  desk research.
For example: data from a book.

*scales of measure

Nominal Data

A set of data is said to be nominal if the values / observations belonging to it can be assigned a code in the form of a number where the
numbers are simply labels. You can count but not order or measure nominal data. For example, in a data set males could be coded as 0, females
as 1; marital status of an individual could be coded as Y if married, N if single.

Ordinal Data

A set of data is said to be ordinal if the values / observations belonging to it can be ranked (put in order) or have a rating scale attached. You
can count and order, but not measure, ordinal data.

The categories for an ordinal set of data have a natural order, for example, suppose a group of people were asked to taste varieties of biscuit
and classify each biscuit on a rating scale of 1 to 5, representing strongly dislike, dislike, neutral, like, strongly like. A rating of 5 indicates more
enjoyment than a rating of 4, for example, so such data are ordinal.

kmvc BSN3-1
However, the distinction between neighbouring points on the scale is not necessarily always the same. For instance, the difference in
enjoyment expressed by giving a rating of 2 rather than 1 might be much less than the difference in enjoyment expressed by giving a rating of 4
rather than 3.

Interval Scale

An interval scale is a scale of measurement where the distance between any two adjacents units of measurement (or 'intervals') is the same but
the zero point is arbitrary. Scores on an interval scale can be added and subtracted but can not be meaningfully multiplied or divided. For
example, the time interval between the starts of years 1981 and 1982 is the same as that between 1983 and 1984, namely 365 days. The zero
point, year 1 AD, is arbitrary; time did not begin then. Other examples of interval scales include the heights of tides, and the measurement of
longitude.

LOGICAL STEPS IN STATISTICAL INVESTIGATION

Saturday, February 14, 2009


1. Defining the problem
2.gathering relevant information
3. presenting/organizing data
4.analyzing data
5. interpreting results

Sampling Techniques

Random Sampling

Random sampling is a sampling technique where we select a group of subjects (a sample) for study from a larger group (a population). Each
individual is chosen entirely by chance and each member of the population has a known, but possibly non-equal, chance of being included in
the sample. By using random sampling, the likelihood of bias is reduced.

Simple Random Sampling

Simple random sampling is the basic sampling technique where we select a group of subjects (a sample) for study from a larger group (a
population). Each individual is chosen entirely by chance and each member of the population has an equal chance of being included in the
sample. Every possible sample of a given size has the same chance of selection; i.e. each member of the population is equally likely to be
chosen at any stage in the sampling process.

Convenience, Haphazard or Accidental sampling 

members of the population are chosen based on their relative ease of access. To sample friends, co-workers, or shoppers at a single mall, are all
examples of convenience sampling.

Judgmental sampling or Purposive sampling 

The researcher chooses the sample based on who they think would be appropriate for the study. This is used primarily when there is a limited
number of people that have expertise in the area being researched.

Multistage sampling 
complex form of cluster sampling. Using all the sample elements in all the selected clusters may be prohibitively expensive or not necessary.
Under these circumstances, multistage cluster sampling becomes useful. Instead of using all the elements contained in the selected clusters,
the researcher randomly selects elements from each cluster. Constructing the clusters is the first stage. Deciding what elements within the
cluster to use is the second stage. The technique is used frequently when a complete list of all members of the population does not exist and is
inappropriate.
In some cases, several levels of cluster selection may be applied before the final sample elements are reached. For example, household surveys
conducted by the Australian Bureau of Statistics begin by dividing metropolitan regions into 'collection districts', and selecting some of these
collection districts (first stage). The selected collection districts are then divided into blocks, and blocks are chosen from within each selected
collection district (second stage). Next, dwellings are listed within each selected block, and some of these dwellings are selected (third stage).
This method means that it is not necessary to create a list of every dwelling in the region, only for selected blocks. In remote areas, an
additional stage of clustering is used, in order to reduce travel requirements.[1]

kmvc BSN3-1
Although cluster sampling and stratified sampling bear some superficial similarities, they are substantially different. In stratified sampling, a
random sample is drawn from all the strata, where in cluster sampling only the selected clusters are studied, either in single stage or multi
stage.

Stratified Sampling

There may often be factors which divide up the population into sub-populations (groups / strata) and we may expect the measurement of
interest to vary among the different sub-populations. This has to be accounted for when we select a sample from the population in order that
we obtain a sample that is representative of the population. This is achieved by stratified sampling.

A stratified sample is obtained by taking samples from each stratum or sub-group of a population.

When we sample a population with several strata, we generally require that the proportion of each stratum in the sample should be the same
as in the population.

Stratified sampling techniques are generally used when the population is heterogeneous, or dissimilar, where certain homogeneous, or similar,
sub-populations can be isolated (strata). Simple random sampling is most appropriate when the entire population from which the sample is
taken is homogeneous. Some reasons for using stratified sampling over simple random sampling are:

a. the cost per observation in the survey may be reduced;


b. estimates of the population parameters may be wanted for each sub-population;
c. increased accuracy at given cost.

Example 
Suppose a farmer wishes to work out the average milk yield of each cow type in his herd which consists of Ayrshire, Friesian, Galloway and
Jersey cows. He could divide up his herd into the four sub-groups and take samples from these.

 Cluster Sampling

Cluster sampling is a sampling technique where the entire population is divided into groups, or clusters, and a random sample of these clusters
are selected. All observations in the selected clusters are included in the sample.

Cluster sampling is typically used when the researcher cannot get a complete list of the members of a population they wish to study but can get
a complete list of groups or 'clusters' of the population. It is also used when a random sample would produce a list of subjects so widely
scattered that surveying them would prove to be far too expensive, for example, people who live in different postal districts in the UK.

This sampling technique may well be more practical and/or economical than simple random sampling or stratified sampling.

Example 
Suppose that the Department of Agriculture wishes to investigate the use of pesticides by farmers in England. A cluster sample could be taken
by identifying the different counties in England as clusters. A sample of these counties (clusters) would then be chosen at random, so all farmers
in those counties selected would be included in the sample. It can be seen here then that it is easier to visit several farmers in the same county
than it is to travel to each farm in a random sample to observe the use of pesticides.

Quota Sampling

Quota sampling is a method of sampling widely used in opinion polling and market research. Interviewers are each given a quota of subjects of
specified type to attempt to recruit for example, an interviewer might be told to go out and select 20 adult men and 20 adult women, 10
teenage girls and 10 teenage boys so that they could interview them about their television viewing.

It suffers from a number of methodological flaws, the most basic of which is that the sample is not a random sample and therefore the
sampling distributions of any statistics are unknown.

*methods in data collection

Quantitative and Qualitative Data collection methods

The Quantitative data collection methods, rely on random sampling and structured data collection instruments that fit diverse experiences
into predetermined response categories. They produce results that are easy to summarize, compare, and generalize. 

Quantitative research is concerned with testing hypotheses derived from theory and/or being able to estimate the size of a phenomenon of
interest. Depending on the research question, participants may be randomly assigned to different treatments.   If this is not feasible, the
researcher may collect data on participant and situational characteristics in order to statistically control for their influence on the dependent, or

kmvc BSN3-1
outcome, variable. If the intent is to generalize from the research participants to a larger population, the researcher will employ probability
sampling to select participants. 

Typical quantitative data gathering strategies include:

 Experiments/clinical trials.
 Observing and recording well-defined events (e.g., counting the number of patients waiting in emergency at specified times of the
day).
 Obtaining relevant data from management information systems.
 Administering surveys with closed-ended questions (e.g., face-to face and telephone interviews, questionnaires etc).
(http://www.achrn.org/quantitative_methods.htm)

Interviews

In a structured interview,the researcher asks a standard set of questions and nothing more.(Leedy and Ormrod, 2001)

Face -to -face interviews have a distinct advantage of enabling the researcher to establish rapport with potential partiocipants and therefor
gain their cooperation.These interviews yield highest response rates in survey research.They also allow the researcher to clarify ambiguous
answers and when appropriate, seek follow-up information. Disadvantages include impractical when large samples are involved time
consuming and expensive.(Leedy and Ormrod, 2001)

Telephone interviews are less time consuming and less expensive and the researcher has ready access to anyone on the planet who hasa
telephone.Disadvantages are that the response rate is not as high as the face-to- face interview but cosiderably higher than the mailed
questionnaire.The sample may be biased to the extent that people without phones are part of the population about whom the researcher
wants to draw inferences.

Computer Assisted Personal Interviewing (CAPI): is a form of personal interviewing, but instead of completing a questionnaire, the interviewer
brings along a laptop or hand-held computer to enter the information directly into the database. This method saves time involved in processing
the data, as well as saving the interviewer from carrying around hundreds of questionnaires. However, this type of data collection method can
be expensive to set up and requires that interviewers have computer and typing skills.

Questionnaires

Paper-pencil-questionnaires can be sent to a large number of people and saves the researcher time and money.People are more truthful while
responding to the questionnaires regarding controversial issues in particular due to the fact that their responses are anonymous. But they also
have drawbacks.Majority of the people who receive questionnaires don't return them and those who do might not be representative of the
originally selected sample.(Leedy and Ormrod, 2001)

Web based questionnaires : A new and inevitably growing methodology is the use of Internet based research. This would mean receiving an e-
mail on which you would click on an address that would take you to a secure web-site to fill in a questionnaire. This type of research is often
quicker and less detailed.Some disadvantages of this method include the exclusion of people who do not have a computer or are unable to
access a computer.Also the validity of such surveys are in question as people might be in a hurry to complete it and so might not give accurate
responses. (http://www.statcan.ca/english/edu/power/ch2/methods/methods.htm)

Questionnaires often make use of Checklist and rating scales.These devices help simplify and quantify people's behaviors and
attitudes.A checklist is a list of behaviors,characteristics,or other entities that te researcher is looking for.Either the researcher or survey
participant simply checks whether each item on the list is observed, present or true or vice versa.A  rating scale is more useful when a behavior
needs to be evaluated on a continuum.They are also known as Likert scales. (Leedy and Ormrod, 2001)

Qualitative data collection methods play an important role in impact evaluation by providing information useful to understand the processes
behind observed results and assess changes in people’s perceptions of their well-being.Furthermore qualitative methods can beused to
improve the quality of survey-based quantitative evaluations by helping generate evaluation hypothesis; strengthening the design of survey
questionnaires and expanding or clarifying quantitative evaluation findings. These methods are characterized by the following attributes:

 they tend to be open-ended and have less structured protocols (i.e., researchers may change the data collection strategy by adding,
refining, or dropping techniques or informants)
 they rely more heavily on iteractive interviews; respondents may be interviewed several times to follow up on a particular issue,
clarify concepts or check the reliability of data
 they use triangulation to increase the credibility of their findings (i.e., researchers rely on multiple data collection methods to check
the authenticity of their results)
 generally their findings are not generalizable to any specific population, rather each case study produces a single piece of evidence
that can be used to seek general patterns among different studies of the same issue

kmvc BSN3-1
Regardless of the kinds of data involved,data collection in a qualitative study takes a great deal of time.The researcher needs to record any
potentially useful data thououghly,accurately, and systematically,using field notes,sketches,audiotapes,photographs and other suitable
means.The data collection methods must observe the ethical principles of research.

The qualitative methods most commonly used in evaluation can be classified in three broad categories: 

 indepth interview
 observation methods
 document review

Experiments

It's true that scientists utilize two methods to gather information about the world: correlations (a.k.a. observational research) and experiments.
Experiments, no matter the scientific field, all have two distinct variables. Firstly, an independent variable (IV) is manipulated by an
experimenter to exist in at least two levels (usually "none" and "some"). Then the experimenter measures the second variable, the dependent
variable (DV).
A simple example---
Suppose the experimental hypothesis that concerns the scientist is that reading a Wiki will enhance knowledge. Notice that the hypothesis is
really an attempt to state a causal relationship like, "if you read a Wiki, then you will have enhanced knowledge." The antecedent condition
(reading a Wiki) causes the consequent condition (enhanced knowledge). Antecedent conditions are always IVs and consequent conditions are
always DVs in experiments. So the experimenter would produce two levels of Wiki reading (none and some, for example) and record
knowledge. If the subjects who got no Wiki exposure had less knowledge than those who were exposed to Wikis, it follows that the difference is
caused by the IV.
So, the reason scientists utilize experiments is that it is the only way to determine causal relationships between variables. Experiments tend to
be artificial because they try to make both groups identical with the single exception of the levels of the independent variable.

kmvc BSN3-1

You might also like