Introduction To STATISTICS

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 44

Introduction to

STATISTICS

Statistics is the science of conducting studies to collect, organize, summarize, analyze, present, interpret and draw conclusions from data.

What is data?
It is the collection of facts, concepts or instructions in a formalized manner suitable for communication or processing by human. Collection of data is known as a data set and a single observation a data point.

Statistics- Introduction
Most people become familiar with probability and statistics through radio, television, newspapers, and magazines. For example, the following statements were found in newspapers.
Based on the 2000 census, 40.5 million households have two vehicles. The average age of top 50 powerful persons in India is decreased from 58 years in 2003 to 54 years in 2006. The average cost of a wedding is nearly Rs 10,00,000. Women who eat fish once a week are 29% less likely to develop heart disease.

Basic Concepts
Data
An information coming from observations, counts, measurements, or responses.
The basic idea behind all statistical methods of data analysis is to make inferences about a population by studying small sample chosen from it

Parameter Population
A number that describes a The complete collection of population characteristics measurements outcomes, object or individual under study

Statistic Sample
A number a population, a sample A subset ofthat describes containing characteristics the objects or outcomes that are actually observed

Samples and Populations

Descriptive Statistics
Consists of the collection, organization, classification, summarization, and presentation of data obtain from the sample. Used to describe the characteristics of the sample Used to determine whether the sample represent the target population by comparing sample statistic and population parameter

Inferential Statistics
Consists of generalizing from samples to populations, performing estimations hypothesis testing, determining relationships among variables, and making predictions. Used when we want to draw a conclusion for the data obtain from the sample Used to describe, infer, estimate, approximate the characteristics of the

Inferences
Consider:
Average length of females and males: 90cm and 100cm respectively. o Descriptive statistics: the values. o Inference: males are (in general) larger than females.
o

An overview of descriptive statistics and statistical inference

Descriptive Statistics
Yes

No

Statistical Inference

Data Collection
Collect data
o e.g. o e.g.

Survey Tables and graphs

Present data
Characterize data
o e.g.

Sample mean =

Mean weight is 120 pounds

Types of data
Qualitative/ Categorical and Quantitative/Numerical
o o o

Nominal, Ordinal, Interval and Ratio Discrete --Nominal and ordinal Continuous -- Interval and ratio

Cross-sectional , Temporal and Spatial

Data Types
Data

Qualitative

Data Types

Quantitative

Nominal

Ordinal

Levels of Measurement

Interval

Ratio

Discrete

Discrete or continuous

Qualitative/ Categorical variables


Here, data are classified on the basis of some attribute or quality such as gender, literacy, religion, employment etc. These attributes under study cannot be measured. One can only find out whether it is present or absent in the units of population under study.

Example
Attribute under study blindness Here, we can determine how many persons are blind in a given population. It is not possible to measure the degree of blindness in each case. Attributes can be: Gender (Males and females) Literacy (literates and illiterates) Employment (employed and unemployed)

Two types of categorical variables


Nominal Ordinal

Nominal data
Nominal data are the labels or assigned numbers. Car number Roll number STD code Color of bike House number Such data are used for identifying individuals and places .

Ordinal data
Ordinal data can be arranged in order such as worst to best or best to worst Same as nominal but there is an order within the groups into which the data is classified. Unable to say by how much they differ from each other. -- Rating of hotels, restaurants and movies.

Quantitative/Numerical variables
Here, the data are classified on the basis of some characteristics capable of quantitative measurements such as: Marks scored by students in class Height of individuals Income of individuals Age of individuals Expenditure of individuals

Two types of Quantitative variables


Interval data Ratio data Quantitative variables can be discrete or continuous.

Interval data
Interval data can be on a numerical scale . zero point does not mean absence of property. Temperature

Ratio data
It possess all the properties of interval data with meaningful ratio of two values Ratio data differ from interval data in that there is a definite zero point(nothing exists for the variable at zero point) Height Weight Price Length Sales revenue

Discrete variables
The variables is said to be discrete if it assumes only some specific values. Discrete variables arises in a situation where counting is involved.
number of credit cards held by an individual o number of defective items in boxes of 100 items o number of students in the class
o

Continuous variables
Continuous variables arises in situations when some sort of measurement is involved in range.
life of an electric bulb waiting time for customers at a banks counter. o rainfall o temperature
o o

Case Let
The ABC Marketing Corporation has asked you for information about the car you drive. For each question, identify each of the types of data requested as either Qualitative data or Quantitative data. When numeric data is requested, identify the variable as discrete or continuous. 1. 2. 3. 4. 5. 6. 7. What is the weight of your car? In which city was your car made? How many people can be seated in your car? Whats the distance traveled from your home to your school? Whats the color of your car? How many cars are in your household? Whats the length of your car?

Levels of Measurement
Level Nominal Put in Arrange categories in order Yes No Subtract values No Divide values No

Ordinal
Interval Ratio

Yes
Yes Yes

Yes
Yes Yes

No
Yes Yes

No
No Yes

Cross-sectional Data
Cross-sectional data comprises of a variable recorded over at the same point or period of time for many individuals , organization, places etc. Ages of all students at the time of joining IMS , in the year 2008. o Number of students enrolled in IIM, in the year 2008. o Stock prices of Infosys Technologies, TCS, and Wipro on31st March 2008. o Population of Delhi, Mumbai , Chennai and Kolkata as per 2001 census.
o

Temporal Data
Temporal data also referred as time-series data , is the data about an individual organization , places etc over a period of time. Marks obtained by student from standard I to XII. Total business of ICICI bank as at the end of last five years. Population of India from the year 1931 to 2001

Spatial Data
Spatial data is the data based on geographical location basis. Income tax collection from various states Sales of Times Of India in Delhi. Production of wheat in different states of the country

Data Collection Techniques


Method of Data Collection

Data collected directly from the field of enquiry(primary data)

Data collected and recorded by others(secondary study)

Primary Data
Data originally collected in the process of investigations are known as primary data. Primary data consists of figures collected at first hand in order to satisfy the purpose of a particular statistical enquiry. Merits :
Original in nature More reliable and accurate Can be used with greater confidence bz the enquirer knows its origin. o Exactly matches the needs of the project.
o o o

Demerits :
o o o

Expensive Time-consuming Collection of data involves creating new definitions and measuring instruments such as questionnaires or interview forms and training people to use these specifically designed instruments.

Data Collection Techniques


Collection of Primary Data

Direct Personal Investigation Interview Observation

Mailed Questionnaire Method Schedule Sent Through Investigator

Indirect Oral Observation

Collection of primary data


Direct personal investigation
Personal interview ( the investigator personally approaches each informant and gathers the required information) o Personal observation ( here, rather than asking anybody, the investigator personally observes and records the information related to a particular field)
o

Indirect oral observation (here, instead of directly approaching the actual field or person, data are collected from third party informant) Questionnaire method ( here, a wellprepared questionnaire is given to a list of persons with the request to return them duly

Designing a Questionnaire
The no. of questions should be as few as possible Questions should be of objective type. Yes or no type or simple tick marking answers are preferred. Questions should be properly arranged to have a systematic and easy flow of answer. Questions affecting the sentiment and pride of the respondent should be avoided. Necessary instructions and guidelines

Types of Questionnaires
Structured or Non structured questionnaire. Disguised and Non disguised questionnaire.

Structured or Non structured questionnaire


Structured questionnaire : consists of a set of questions arranged in a predetermined order . Each question requires the respondent to make a choice among a few given predetermined responses. Example : How frequently do you go to watch a movie? Choices (Very frequently, often, sometimes, never) Such questions are called closed questions.

Non Structured questionnaire : consists of what are called open-ended questions. Example: How do you spend your free time? How do you describe the ambience of the new store? Such questions give the respondent freedom to answer according to their views and opinions.

Disguised and Non disguised questionnaire


Non disguised questionnaire: here, the purpose or objectives of the study are made known to the respondent. Disguised questionnaire: here, respondents are not taken into confidence regarding purpose or objectives of the study. Disguised questionnaire is not very popular as respondents may not be forthcoming in their answers when they do not know the objectives or relevance of the questions or the study.

Secondary data
Secondary data consists of figures which were collected originally to satisfy a particular enquiry but now are being used for different enquiry. Sources of secondary data:
o o o

Journals Reports Government and non-Government publications.

Data Collection Techniques


Collection of Secondary Data

Publication by Government / International Organization

Internet

Books

Universities and Research Organizations

Journals, Newspapers

Merits :
o o o

Readily available Less expensive compared to primary data Less time consuming compared to primary data

Demerits :
These may not be relevant in the present context. o These may not have the needed accuracy or reliability. o These may not be adequate.
o

Types of secondary data


Internal or external Internal
o o

Company Reports , Intranet Newspaper, magazines, websites, RBI publications

External

Summary
The two major areas of statistics are descriptive and inferential. When the populations to be studied are large, statisticians use subgroups called samples. Data can be classified as qualitative or quantitative. The four basic types of measurement are nominal, ordinal, interval, and ratio.

You might also like