Research Methodology

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 51

CHAPTER –I

1. INTRODUCTON TO SPSS

1.1. OVERVIEW OF SPSS:

SPSS Statistics is a software package used for interactive or batched, statistical


analysis. Long produced by SPSS Inc., it was acquired by IBMin 2009. The current
versions (2015) are named IBM SPSS Statistics. The software name originally stood
for Statistical Package for the Social Sciences (SPSS), reflecting the original market,
although the software is now popular in other fields as well, including the health
sciences and marketing.

SPSS is a widely used program for statistical analysis in social science. It is also
used by market researchers, health researchers, survey companies, government,
education researchers, marketing organizations, data miners, and others. The original
SPSS manual (Nie, Bent & Hull, 1970) has been described as one of "sociology's most
influential books" for allowing ordinary researchers to do their own statistical analysis. In
addition to statistical analysis, data management (case selection, file reshaping,
creating derived data) and data documentation are features of the base software.

SPSS, the Statistical Package for the Social Sciences) has been developed by
three students at the University of Stanford (Norman H. Nie, C. Hadlai (Tex) Hull and
Dale H. Bent), after graduation N. Nie moved to the University of Chicago, joined by Hull
(National Opinion Research Center). Initially not meant for distribution outside their
home university, the publication of the first Manuel made SPSS widely known and used.
Initially developed for IBM mainframe computers, versions for most other important
mainframe brands (Univac, CDC, Honeywell...,) and later for so-called minicomputers
were available. SPSS Inc. was the founded in 1975. In 2009 IBM acquired SPSS; it is
now fully integrated into the IBM Corporation Business Analytics Software portfolio.
The software was released in its first version in 1968 as the Statistical
Package for the Social Sciences (SPSS) after being developed by Norman H.
Nie, Dale H. Bent, and C. Hadlai Hull. Those principals incorporated as SPSS
Inc. in 1975. Early versions of SPSS Statistics were written in Fortranand
designed for batch processing on mainframes, including for
example IBM and ICL versions, originally using punched cards for data and
program input. A processing run read a command file of SPSS commands and
either a raw input file of fixed format data with a single record type, or a 'get file'
of data saved by a previous run.

To save precious computer time an 'edit' run could be done to check


command syntax without analysing the data. From version 10 (SPSS-X) in 1983,
data files could contain multiple record types. Prior to SPSS 16.0, different
versions of SPSS were available for Windows, Mac OS X and UNIX.SPSS
Statistics version 13.0 for Mac OS X was not compatible with Intel-based
Macintosh computers, due to the Rosetta emulation software causing errors in
calculations. SPSS Statistics 15.0 for Windows needed a downloadable hot fix to
be installed in order to be compatible with Windows Vista.

From version 16.0 the same version runs under Windows, Mac, and Linux.
The graphical user interface is written in Java. The Mac OS version is provided
as a Universal binary, making it fully compatible with both PowerPC and Intel-
based Mac hardware.

1.2. FUNCTIONS OF SPSS:

SPSS offers four programs that assist researchers with their complex data
analysis needs.
 Statistics Program: It furnishes a plethora of basic statistical functions like frequencies
and cross tabulation.

 Modeler Program: It enables researchers to build and validate predictive models using
advanced statistical procedures.

 Text Analytics for Surveys Program: It helps survey administrators uncover powerful
insights.

 Visualization Designer: It allows researchers to use their data to create a wide variety
of visuals like density charts and radial box plots very easily.

1.3. BENEFITS OF SPSS:

 SPSS is an extremely powerful tool for manipulating and deciphering survey


data.

 It makes the process of pulling, manipulating and analyzing data clean and easy.

 It provides countless opportunities for statistical analysis.

1.4. LIMITATIONS OF SPSS:

The major limitation of SPSS is that a very large data set cannot be analyzed. A
researcher often gets a large data ser in some fields, like insurance where the
researcher generally uses SAS or R instead of SPSS to analyze the data.
2. OPENING OF SPSS

STEPS TO OPEN SPSS:


1. START PROGRAM SPSS.
2. A dialogue box is open in front of SPSS grid listing several options to choose from.
3. The following options will appear in the dialogue box:
a. Run the tutorial.
b. Type in data.
c. Run in Existing Query.
d. Create new query using database wizard.
e. Opening an existing data source.
f. Open another type of file.

Fig.1.1: OPENING OF SPSS

3. DETAILS OF MENU
STEPS TO OPEN FILE MENU:

1) OPEN SPSS FILE NEW


2) A dialogue box in front of file grid will appear.
3) The following options will appear in dialogue box under NEW head.
a) DATA
b) SYNTAX
c) OUTPUT SCRIPT
4) The following options will appear in dialogue box under OPEN head.
a) Data
b) Syntax
c) Output
d) script

Fig.1.2: FILE MENU- NEW OPTION


Fig.1.3: FILE MENU- OPEN OPTION
4. DETAILS OF VIEW

STEPS TO OPEN DETAILS IN VIEW COLUMN

1) START DISPLAY BAR VIEW


2) A dialogue box will appear and following options will appear.
a) STATUS BAR
b) TOOLBARS
c) MENU EDITORS
d) FONTS
e) GRID LINES
f) CUSTOMIZE VARIABLE VIEW
g) VARIABLES

Fig.1.4: VIEW MENU


5. DETAILS OF EDIT

STEPS TO EDIT A SPSS FILE

1) START DISPLAY BAR EDIT


2) A dialogue box will appear and following options will appear.
a) COPY
b) INSERT VARIABLE
c) FIND
d) INSERT VARIABLE
e) INSERT CASES
f) OPTIONS

Fig.1.5: EDIT MENU


5. PREPARATIONS OF QUESTIONNAIRE
A questionnaire is a research instrument consisting of a series
of questions (or other types of prompts) for the purpose of
gathering information from respondents. The questionnaire was invented
by the Statistical Society of London in 1838.

Although questionnaires are often designed for statistical analysis of


the responses, this is not always the case.

Questionnaires have advantages over some other types of surveys in


that they are cheap, do not require as much effort from the questioner as
verbal or telephone surveys, and often have standardized answers that
make it simple to compile data. However, such standardized answers may
frustrate users. Questionnaires are also sharply limited by the fact that
respondents must be able to read the questions and respond to them.
Thus, for some demographic groups conducting a survey by questionnaire
may not be concrete.

CHARACTERISTICS OF GOOD QUESTIONNAIRE:

 Questionnaire should deal with important or significant topic to create interest


among respondents.
 It should seek only that data which cannot be obtained from other sources.
 It should be as short as possible but should be comprehensive.
 It should be attractive.
 Directions should be clear and complete.
 It should be represented in good Psychological order proceeding from general to
more specific responses.
 Double negatives in questions should be avoided.
 Putting two questions in one question also should be avoided. Every question
should seek to obtain only one specific information
 It should avoid annoying or embarrassing questions.
 It should be designed to collect information which can be used subsequently as
data for analysis.
 It should consist of a written list of questions.

Fig.1.6: QUESTIONNAIRE
7. DATA COLLECTION
Data collection is the process of gathering and measuring information
on targeted variables in an established system, which then enables one to
answer relevant questions and evaluate outcomes. Data collection is a
component of research in all fields of study including physical and social
sciences, humanities, and business. While methods vary by discipline, the
emphasis on ensuring accurate and honest collection remains the same.
The goal for all data collection is to capture quality evidence that allows
analysis to lead to the formulation of convincing and credible answers to
the questions that have been posed.

7.1. TYPES OF DATA

1) Primary Data – refers to the data that the investigator collects for the
very first time. This type of data has not been collected either by this or any
other investigator before. A primary data will provide the investigator with
the most reliable first-hand information about the respondents. The
investigator would have a clear idea about the terminologies uses, the
statistical units employed, the research methodology and the size of the
sample. Primary data may either be internal or external to the organization.

2) Secondary Data – refers to the data that the investigator collects from
another source. Past investigators or agents collect data required for their
study. The investigator is the first researcher or statistician to collect this
data. Moreover, the investigator does not have a clear idea about the
intricacies of the data. There may be ambiguity in terms of the sample size
and sample technique. There may also be unreliability with respect to the
accuracy of the data.
METHODS OF PRIMARY DATA COLLECTION:

a) Direct Personal Investigation

Consists of the collection of data by the investigator in a direct manner.


The investigator (or researcher) is responsible for personally approaching a
respondent and investigating the research and gather appropriate information. In
other words, the researcher himself enters the field and solicits data that he
requires to take the research forward. Thus, this method of data collection
ensures first-hand information. This data is all the more reliable for an intensive
research. But in an extensive research, this data is inadequate and proves to be
unreliable. This method of collection of data is time-consuming. Hence, it tends to
get handicapped when there is lack of time resource. However, the greatest
demerit is that this method is very subjective in nature and is not suitable for
objective based extensive researches.

b) Indirect Oral Interview


Consists of the collection of data by the investigator in an indirect manner.
The investigator (or enumerator) approaches (either by telephonic interviews) an
indirect respondent who possesses the appropriate information for the research.
Thus, this method of data collection ensures first-hand information because the
interviewers can cross-question for the right and appropriate information.

c) Mailed Questionnaire
Consists of mailing a set or series of questions related to the research. The
respondent answers the questionnaire and forwards it back to the investigator
after marking his/her responses. This method of collection of data has proven to
be time-saving. It is also a very cost-efficient manner of collecting the required
data. An investigator who has
SECONDARY DATA – SOURCES OF DATA

a) Published Sources
There are many national organizations, international agencies and official publications
that collect various statistical data. They collect data related to business, commerce,
trade, prices, economy, productions, services, industries, currency and foreign affairs.
They also collect information related to various (internal and external) socio-economic
phenomena and publishes them. Central Government Official Publication, Publications
of Research Institutions, Committee Reports and International Publications are some
published sources of secondary data.

b) Unpublished Sources
Some statistical data are not always a part of publications. Such data are stored by
institutions private firms. Researchers often make use of these unpublished data in
order to make their researches all the more original.

FIG: 1.7: DATA VIEW


CHAPTER –II

ANALYSIS OF DATA
Analysis of data plays an important role in the fulfillment of research
objectives. Data is summarized and observed to find patterns or
relationships. Data is analyzed using various statistical techniques requiring
substantive theoretical as well as practical knowledge a researcher should
first acquire theoretical as well as practical knowledge and then proceed for
data analysis on real data collected. The techniques would vary depending
on the nature of the research (qualitative/ quantitative study). This step of
the research process also includes the interpretation of findings and writing
down the results and conclusions.

TYPES OF ANALYSIS USED:

1. Frequency Distribution.

2. Cross Tabulation & Chi-Square.

3. Arithmetic Mean.

4. Median.

5. Mode.

6. T-test.
2.1. FREQUENCY DISTRIBUTION

Frequency Distribution is a method of displaying the frequency


(number of times a particular value of a variable repeats in the data) of
different values of a categorical/ nominal variable in a dataset. It
represents the counts of all outcomes of a variable in a sample. The
frequency distribution of variable can be represented in tabular as well as
graphical forms (bar charts, pie charts, etc.)

Frequency distribution is very common and important method for


analyzing the nominal (categorical) and ordinal (ranking) variables in a
dataset.

The required procedure is as follows:

Step 1: Click ‘Analyze’ ‘Descriptive Statistics’ ‘Frequencies’

Fig.2.1: OPENING OF FREQUENCY


Step 2: Next, transfer the variable ‘Education’ to the Variable (s)’
window and click ‘charts’ as shown in figure 5.3

Fig.2.2: INSERTING VARIABLES

Step 3: Next, select the type of chart (eg.bar charts) as shown in


figure 5.4
Step 4: Finally, click ‘continue’ and then ‘OK’. The final SPSS output
in the tabular form is shown in table 5.4

Gender Frequency Percent


MALE 14 56.0
FEMALE 11 44.0
Total 25 100.0

Total Respondents
60

50
Percentage of male and female

40

30

20

10

0
MALE FEMALE
Gender
Age Frequency Percent
24-30 4 16.0
31-36 5 20.0
37-45 6 24.0
46-54 7 28.0
55-60 3 12.0
Total 25 100.0

30

25

20
Frequncy (in percentage)

15

10

0
24-30 31-36 37-45 46-54 55-60
Age Group
Authority Frequency Percent
YES 9 36.0
NO 3 12.0
SOMETIMES 13 52.0
Total 25 100.0

Authority
60

50

40
Percentage

30

20

10

0
1 2 3
Respose from the respondents
Order From
More Than One
Boss Frequency Percent
YES 8 32.0
NO 12 48.0
SOMETIMES 5 20.0
Total 25 100.0

Order From More Than One Boss

0 10 20 30 40 50 60

Respondents (In percentage)


Top priority
to interests
of
company Frequency Percent
YES 19 76.0
NO 2 8.0
SOMETIM
4 16.0
ES
Total 25 100.0

Top Priority To Interests Of Company


80

70

60

50
Percentage

40

30

20

10

0
YES NO SOMETIMES
Remuneration Received Frequency Percent
MONTHLY 25 100.0

Remuneration Recieved
120
Percentage of respondents

100

80

60

40

20

0
Daily Decisions
Made Frequency Percent
YES 11 44.0
NO 6 24.0
I DONT
8 32.0
KNOW
Total 25 100.0

Daily Decisions Made

I DONT KNOW
32% YES
44%

NO
24%
Equal
Remuneration
To All Frequency Percent
YES 10 40.0
NO 4 16.0
I DONT KNOW 11 44.0
Total 25 100.0

Equal Remuneration To All


50
45
40
35
30 Percent
Percentage

25
20
15
10
5
0
YES NO I DONT KNOW
Importance to
Suggestions Frequency Percent
12
YES 48.0

NO 3
12.0

SOMETIMES 10 40.0

25 100.0
Total

Importance To Suggestions
YES NO SOMETIMES

40%

48%

12%
Feedback From
peers and seniors Frequency Percent
YES 9 36.0
NO 7 28.0
SOMETIMES 9 36.0
Total 25 100.0

Feedback From Peers And Seniors

SOMETIMES

NO

YES

0 5 10 15 20 25 30 35 40

Percentage
Regular Review of
The Progress Frequency Percent
YES 16 64.0
NO 2 8.0
SOMETIMES 7 28.0
Total 25 100.0

Regular Review Of The Progress Made


70

60

50
Percentage

40

30

20

10

0
1 2 3
Meeting
Conducted Frequency Percent
TWICE IN A MONTH 9 36.0
ONCE IN A MONTH 16 64.0
Total 25 100.0

Meeting Conducted

2 64

1 36

0 10 20 30 40 50 60 70

Percentage
Equitable
Distribution
Of work Frequency Percent
YES 5 20.0
NO 5 20.0
SOMETIMES 15 60.0
Total 25 100.0

Equitable Distribution Of Work


SOMETIMES

NO

YES

5 15 25 35 45 55 65
YES NO SOMETIMES
Percent 20 20 60

Percentage
Satisfaction
towards
working
environment Frequency Percent
YES 11 44.0
NO 1 4.0
NO
13 52.0
RESPONSE
Total 25 100.0

Satisfaction With Working Environment

50
Percentage

30
10
YES NO NO RESPONSE
Percent 44 4 52
Satisfaction
towards rewards
and
compensation Frequency Percent
YES 8 32.0
NO 3 12.0
SOMETIMES 14 56.0
Total 25 100.0

Satisfaction Towards Rewards And Compensation

YES
32%

SOMETIMES
56%

NO
12%
Satisfied
with job Frequency Percent
1-4 LESS SATISFIED 3 12.0
5-7 MODERATELY
12 48.0
SATISFIED
8-10 HIGHLY
10 40.0
SATISFIED
Total 25 100.0

Satisfaction Towards Job


60
50
40
30
20
Percentage

10
0
IED IED IED
IT SF IT SF IT SF
SA SA YS
A
ESS ELY L
L T GH
1-
4
ERA HI
D 10
M
O 8-
7
5-
2.2. CROSS TABULATION & CHI-SQUARE

It is one of the most popular methods of representing the joint frequency


distribution of the cases of two or more nominal variables in the dataset. For example.
In the dataset given in the previous section, the cross tabulation of the variables
“gender” and “religion” can be analyzed as given below:

CHI-SQUARE TEST – THE TEST OF ASSOCIATION

It is one of the most popular non-parametric tests. It is used in two cases, which are as
follows:

 To test the association between two nominal variables in research.


 To test the difference between expected and observed frequencies of an event.

The process of chi-square test compares the actual observed frequencies with the
calculated expected frequencies of different combinations of nominal variables. The
difference between observed and expected frequencies gives logic of possible
association between categorical variables. The chi-squared statistic compares the
observed count in which table cell to the count that would be expected between the row
and column classifications under the assumptions of no association.
Question:-

CODES PROVIDED TO SUB- CATEGORIES


CODES FOR THE VARIABLE ‘LEVEL CODES FOR THE VARIABLE
OF FAMILIARITY WITH THE ‘EDUCATION BACKGROUND’
INTERNET’

1= LOW FAMILIARITY 1= HUMANITIES


2= MEDIUM FAMILIARITY 2=MANAGEMENT

3=HIGH FAMILIARITY 3= TECHNOLOGY


4= IT

Step 1: Click ‘analyze’ ‘descriptive statistics’ ‘Crosstabs…’

Fig.2.4: OPENING OF CROSSTAB


Step 2: Transfer ‘Educational background’ to the ‘Row(s)’ window and
‘Familiarity with the internet’ to the ‘Column(s)’ window. Click ‘Statistics’.

Fig.2.5: INSERTING VARIABLES


Step 3: Click ‘Continue’.

Fig.2.6: COMMAND FOR CROSSTAB


Step 4: Click on cell and select ‘observed’ and ‘Expected’. Click ‘Continue’.

Fig.2.7: COMMAND FOR CROSSTAB

Step 5: Finally, select ‘OK’. The chi- square test results will appear.

EDUCATIONAL BACKGROUND * FAMILIARITY CROSSTABULATION

Familiarity
S.No Education Background Low Medium High
Familiarity Familiarity Familiarity Total
1. Humanities Count 1 5 7 13
Expected Count 4.4 4.7 3.9 13.0
2. Management Count 6 4 4 14
Expected Count 4.8 5.0 4.2 14.0
3. Technology Count 5 8 3 16
Expected Count 5.4 5.8 4.8 16.0
4. IT Count 5 1 1 7
Expected Count 2.4 2.5 2.1 7.0
Total Count 17 18 15 50
Expected Count 17.0 18.0 15.0 50.0
Chi-Square Tests
Asymp. Sig. (2-
Value df sided)
Pearson Chi-Square 11.638a 6 .071
Likelihood Ratio 12.101 6 .060
Linear-by-Linear Association 7.034 1 .008
N of Valid Cases 50
a. 9 cells (75.0%) have expected count less than 5. The minimum expected count is
2.10.

2.3. ARTHMETIC MEAN

Arithmetic Mean: Arithmetic mean is commonly known as average. The average


of a given set of numbers is called the arithmetic mean, or simply, the mean of the given
numbers.

Thus, the arithmetic mean of a group of observations is defined as …

= (Sum of observations)/ (Number of observations)

x is the symbol of the arithmetic mean.

2.4. MEDIAN

The middle number; found by ordering all data points and picking out the one in the
middle (or if there are two middle numbers, taking the mean of those two numbers).

Example: The median of 444, 111, and 777 is 444 because when the numbers are put
in order (1(1left parenthesis, 1, 444, 7)7)7, right parenthesis, the number 444 is in the
middle.
2.5. MODE

The most frequent number—that is, the number that occurs the highest number of
times.Example: The mode of \{4{4left brace, 4, 222, 444, 333, 222, 2\}2}2, right
brace is 222 because it occurs three times, which is more than any other number.

STEPS:

Step 1: Click ‘analyze’ ‘descriptive statistics’ ‘Frequencies’

Fig.2.6: COMMAND FOR MEAN, MEDIAN & MODE


STEP 2: next, transfer the variable to the ‘variables’ window and click ‘statistics’.

Fig.2.7: COMMAND FOR MEAN, MEDIAN & MODE

STEP 3: Select the options: ‘Mean’,’Median’,’Mode’ and ‘Quartiles’. Next click


CONTINUE and then OK

Fig.2.8: COMMAND FOR MEAN, MEDIAN & MODE

RESULT OF MEAN, MEDIAN & MODE


Statistics
education background
N Valid 50
Missing 1
Mean 2.3400
Median 2.0000
Mode 3.00
Percentiles 25 1.0000
50 2.0000
75 3.0000

2.6. One - Sample T-Test

In many situations, we come across claims made by marketers about their


products. For example, a car manufacturer may claim that the average
mileage of a car is, for say, 19.9 kmpl or a business school may claim that
the average package offered to its students is Rs. 12 lakhs per annum .A
researcher may be interested in analyzing the truthfulness of these claims.
For this analysis, the researcher needs to randomly pick a small from the
population and compare its mean with the claimed population mean. The
sample mean and the population mean may be different from each other. In
order to test whether this difference is statistically significant, we should
apply one-sample t-test.

The null hypothesis of one – sample t-test is:

“Ho: there is no significant difference between sample mean and population


mean.”

Step 1: Click ‘Analyze’ ‘compare means ‘ ‘One –sample T-test’


Fig.2.9: COMMAND FOR T-TEST

Step 2: Next, transfer the test variable ‘weight lost’ to the ‘Test variable(s)’
window and click ‘OK’ as shown in figure 7.2:

Fig.2.10: COMMAND FOR T-TEST


One-Sample Statistics

Std. Std. Error


N Mean Deviation Mean
Education 100 2.5200 1.10536 .11054

One-Sample Test

Test Value = 5

95% Confidence Interval


Sig. (2- Mean of the Difference
t df tailed) Difference Lower Upper
Education -22.436 99 .000 -2.48000 -2.6993 -2.2607
One-Sample Test

Test Value = 5

95% Confidence Interval of the Difference

t df Sig. (2-tailed) Mean Difference Lower Upper

Education -22.436 99 .000 -2.48000 -2.6993 -2.2607

One-Sample Test

Test Value = 5

95% Confidence Interval of the Difference

t df Sig. (2-tailed) Mean Difference Lower Upper

Education -22.436 99 .000 -2.48000 -2.6993 -2.2607

You might also like