Research Methodology by Vinod Chandra

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 292

Research Methodology

S. S. Vinod Chandra
Director
Computer Centre
University of Kerala
Kerala

S. Anand Hareendran
Associate Professor
Department of Computer Science and Engineering
Muthoot Institute of Technology and Science, Kochi
Kerala
Contents
Preface

Acknowledgements

About the Authors

Testimonials

1 Introduction

1.1 Objectives of Research

1.2 Definition and Motivation

1.2.1 Variables

1.3 Types of Research

1.4 Research Approaches

1.4.1 Quantitative Research Approach

1.4.2 Qualitative Research Approach

1.5 Steps in Research Process

1.5.1 Problem Definition

1.5.2 Setting Out a Plan

1.5.3 Literature Review

1.5.4 Analysis and Hypothesis Formulation

1.5.5 Presentation and Interpretation

1.5.6 Decision Making

1.5.7 Case Study: Sample Research Problem

1.6 Criteria of Good Research

1.6.1 Problems Faced by Researchers

1.7 Ethics in Research

1.7.1 Morals in Ethics

Exercises

2 Research Formulation and Literature Review

2.1 Problem Definition and Formulation


2.1.1 Problem Selection

2.1.2 Necessity of Problem Definition

2.2 Literature Review

2.2.1 Critical Analysis

2.2.2 Critical Thinking

2.2.3 Critical Evaluation

2.2.4 Objectives of Literature Review

2.2.5 Importance of Literature Review

2.3 Characteristics of a Good Research Question

2.4 Literature Review Process

2.4.1 Primary and Secondary Sources of Literature

2.4.2 Identifying Literature Review Gaps

2.4.3 Literature Review Pitfalls

Exercises

3 Data Collection

3.1 Primary and Secondary Data

3.1.1 Primary Data

3.1.2 Secondary Data

3.2 Primary and Secondary Data Sources

3.3 Data Collection Methods

3.3.1 Questionnaire

3.3.2 Questionnaire Design

3.3.3 Types of Questionnaires

3.3.4 Interviews

3.3.5 Types of Interview

3.3.6 Observation

3.3.7 Record Reviews

3.3.8 Schedules

3.4 Data Processing


3.4.1 Types of Data Processing

3.4.2 Data Processing Stages

3.5 Classification of Data

3.5.1 Quantitative and Qualitative Data

3.5.2 Discrete and Continuous Data

3.5.3 Univariate and Bivariate Data

Exercises

4 Basic Statistical Measures

4.1 Types of Scales

4.1.1 Nominal Scale

4.1.2 Ordinal Scale

4.1.3 Interval Scale

4.1.4 Ratio Scale

4.2 Measures of Central Tendency

4.2.1 Mean, Median and Mode

4.2.2 Geometric and Harmonic Mean

4.3 Skewness

4.3.1 Measuring Skewness

4.3.2 Relationship of Mean and Median

4.3.3 Kurtosis

4.4 Measure of Variation

4.4.1 Range

4.4.2 Absolute Deviation

4.4.3 Standard Deviation

4.4.4 Average Deviation

4.4.5 Quartile Deviation

4.4.6 Coefficient Deviation

4.5 Probability Distribution

4.5.1 Binomial Distribution


4.5.2 Poisson Distribution

4.5.3 Uniform Distribution

4.5.4 Exponential Distribution

4.5.5 Normal Distribution

Exercises

5 Data Analysis

5.1 Statistical Analysis

5.2 Multivariate Analysis

5.3 Correlation Analysis

5.4 Regression Analysis

5.5 Principle Component Analysis

5.6 Sampling

5.6.1 Important Terms

5.6.2 Characteristics of Good Sample Design

5.6.3 Types of Sampling

5.6.4 Steps in Sampling Process

5.7 SPSS: A Statistical Analysis Tool

5.7.1 Opening SPSS

5.7.2 File Types

5.7.3 Analysis of Functions

Exercises

6 Research Design

6.1 Need for Research Design

6.2 Features of a Good Design

6.3 Types of Research Designs

6.3.1 Exploratory Research Design

6.3.2 Conclusive Research Design

6.3.3 Experimental Research Design

6.4 Induction and Deduction


6.4.1 Deduction

6.4.2 Induction

Exercises

7 Hypothesis Formulation and Testing

7.1 Hypothesis

7.1.1 Null Hypothesis H0

7.1.2 Alternate Hypothesis H1

7.2 Important Terms

7.3 Types of Research Hypothesis

7.3.1 Characteristics of Qualitative Methods

7.3.2 Characteristics of Quantitative Methods

7.3.3 Generation of Research Hypothesis

7.4 Hypothesis Testing

7.4.1 Hypothesis Testing

7.4.2 Test the Level of Significance

7.5 Z-Test

7.6 t-Test

7.7 f-Test

7.8 Making a Decision: Types of Errors

7.8.1 Confusion Matrix

7.8.2 Quantification of Classification

7.9 ROC Graphics

Exercises

8 Test Procedures

8.1 Parametric and Non-parametric Tests

8.2 ANOVA

8.2.1 Tricks and Technique – ANOVA

8.2.2 One-way and Two-way ANOVA

8.3 Mann-Whitney Test


8.3.1 How to Perform Mann-Whitney Test in SPSS?

8.4 Kruskal–Wallis Test

8.4.1 Step-by-Step Kruskal-Wallis Test

8.4.2 Steps for Kruskal-Wallis Test in SPSS

8.5 Chi-Square Test

8.5.1 Test Procedure

8.5.2 Chi-Square Test Procedure in SPSS Statistics

8.5.3 Example – Chi-Square Test

8.6 Multi-Variate Analysis

8.6.1 Test Procedure in SPSS-MANOVA

Exercises

9 Models for Science and Business

9.1 Algorithmic Research

9.1.1 Analysis of Algorithm

9.1.2 Design of Algorithms

9.2 Methods of Scientific Research

9.3 Modelling

9.3.1 Steps in Modelling

9.3.2 Research Models

9.4 Simulations

9.4.1 Types of Simulation Models

9.4.2 Tools for Simulations

9.5 Industrial Research

9.5.1 Operational Research

Exercises

10 Social Research

10.1 Theory of Social Research

10.1.1 Social Research Characteristics

10.1.2 Scope of Social Research


10.1.3 Objectives of Social Research

10.2 Perspectives of Social Research

10.2.1 Complementary Perspectives

10.3 Methods of Social Research

10.4 Social Science Approaches

10.4.1 Alternative Approaches

10.4.2 Interdisciplinary Approaches

10.4.3 Role of Statistics

10.5 Social Research Design

10.6 Quantitative and Qualitative Social Research

10.6.1 Quantitative Research Methods

10.6.2 Qualitative Research Methods

10.6.3 Comparison of Quantitative and Qualitative Research

10.6.4 Social Surveys

10.7 Ethics and Politics in Social Research

10.7.1 Principles in Research Ethics

10.7.2 Politics in Research

Exercises

11 Presentation of the Research Work

11.1 Business Report

11.1.1 Planning a Business Report

11.1.2 Structuring of Business Report

11.2 Technical Report

11.2.1 Components of Technical Report

11.3 Research Report

11.3.1 Preliminary Section

11.3.2 Body of the Report

11.3.3 Supplementary Materials

11.4 General Tips for Writing Report


11.4.1 Technical Writing

11.4.2 Goal of Technical Writing

11.4.3 Foundations of Effective Technical Writing

11.4.4 Qualities of Good Technical Writing

11.5 Presentation of Data

11.6 Oral Presentation

11.6.1 General Tips

11.6.2 Advantages of Oral Communication

11.6.3 Disadvantages of Oral Communication

11.6.4 Effective Oral Presentation

11.7 Bibliography and References

11.7.1 Harvard Style

11.7.2 Numeric (Vancouver) Style

11.7.3 References Organization

11.8 Intellectual Property Rights

11.8.1 Copyright©

11.8.2 Patents

11.8.3 Layout Design

11.8.4 TrademarkTM

11.8.5 Geographical Indications (GI)

11.9 Open-Access Initiatives

11.9.1 Open-access Publishing

11.10 Plagiarism

Exercises

12 LaTeX-Document Generation Tool

12.1 Document Generation Tools

12.2 Getting Started

12.3 How LaTeX Works

12.4 Document Creation


12.4.1 Page Setup, Page Numbering and Headings

12.4.2 Creating a Title Page

12.4.3 Sections

12.4.4 Font Size and Formatting

12.4.5 List – Enumerate

12.4.6 List – Itemize

12.4.7 Comments

12.4.8 Special Characters

12.5 Tables

12.6 Figures

12.7 Math Mode

12.8 Algorithm Mode

12.9 Bibliographic References

12.9.1 BibTeX File

12.10 Preparation of Presentation

12.11 Templates

12.11.1 Articles

12.11.2 Thesis and Conference Proceedings

12.11.3 Books

12.12 Nuts and Bolts

Exercises

Laboratory Questions

Latex Mathematical Symbols

Symbols Used in this Book

Index
Preface
What is research? What are the ways to formulate a research problem? Is my work progressing? How will I
present my results? These are the major questions, which any researcher face during his early phases of work.
This is where research methodology comes into play. A research method is a systematic plan to conduct a
research. In this book, all major research scope and dimensions are considered, which will help the researcher to
have a self-evaluation on the kind of work he/she is being carried out. Starting from the basic concepts and
methods, this book focusses on each possible way of research steps – beginning from problem identification to
report generation. This book has 12 chapters and is organized into three parts – first section deals with
introduction to research followed by methods and tools used and the final section details on how a report can be
generated and presented.

The first chapter deals with the basic definition on research, how it can be carried out and most commonly used
research definitions. The historic aspects, various applications and areas where it can be applied are also
discussed. How can a good research topic evolve by reading literature? The advantages of literature survey,
primary and secondary references, how a new problem is created from reading related topics are included in
Chapter 2. Even if we have a research problem, the most important issue faced by a researcher is lack of data.
The methods by which data can be collected, how these data are processed, after processing how these are
represented – all these information are included in Chapter 3. Starting from the basic concepts, almost all
possible data collection methods and tools are included in Chapter 3. Data representation, numerical data
distributions and other scaling methods are detailed in Chapter 4.

The second part of the book starts by explaining various data analysis methods. The importance of interpreting
the data collected is a vital part of research. Only a well-analyzed data can produce valid results. Data analysis
and the various analytic tools are mentioned in Chapter 5. The lack of proper research design can be a sole
reason for undesired research solutions. Various methodologies that need to be considered during design stage,
the system architecture and methods to handle agile data are detailed in Chapter 6. Hypothesis formulation is the
major step while working on a research program. The various hypothesis testing methods and validation
methods are explained with examples in Chapter 7. Chapter 8 deals with various test procedures, which include
both parametric and non-parametric methods. Relevance of social research and models of science and business
are emphasized more on Chapters 9 and 10.

The final portion on document generation and presentation has been explained in Chapters 11 and 12. Chapter 11
mentions the various methods by which the results are presented. Presentation modes, styles and how to interact
with the right audience are all mentioned in Chapter 11. Chapter 12 gives a brief description on the document
generation tool – LaTeX. This is a professional documentation tool, which helped to make documents in a well-
formatted manner by less human intervention. Almost all features of the tool is explained in Chapter 12.

Various example-oriented problems and case studies are discussed and it will be useful for master’s degree level
students as well as researchers. Our primary aim is to cover the out-to-out process that happens in a typical
research. From an engineering perspective, there is no single book that will elaborately specify all
methodologies and procedures behind this. We have also included some possible research problem hints, which
could be used wisely by researchers.

The broad coverage of the book supports various undergraduate and specialized masters’ courses. More than 100
university syllabi were reviewed for selecting the chapter titles and text contents. The book covers case studies
for most of the scenarios and is well compiled that no specific prerequisite is necessary for learning this course
content.

This book is helpful for engineering researchers as well as students from other fields like business and social
science.

In this text, we have made commitments to the readers regarding the various methods and procedures used in
research. Many of the students and researchers expecting an engineering perspective of research will never feel a
setback after going through this book. We are confident that we will be able to fulfill the requirements of the
readers who select this book and hope the contents will make you more closer to the practical research.

ABOUT THE BOOK

Current books in the Indian Market dealing with the subject of research are primarily intended for those involved
in social research. While the basic aspects remain the same, research in the field of science and engineering is
much different from the science subjects. This book offers a standardized approach for research aspirants
working in the various areas. At the same time, all the major topics in social research have also been detailed
thoroughly which makes this book a very good frame of study for researchers in diverse fields. This book charts
new and evolving terrain of social research by covering qualitative, quantitative and mixed approach. The
chapters have extensive number of case studies that help researchers to understand practical implications of the
research and includes plenty of diagrammatic representations for easy understanding of various theories and
procedures. The book describes historical and current directions in research by debating crucial subjects. Each
phase of research is explained in detail so that even beginners can also effectively utilize this book. It is written
in a highly interactive manner, which makes for an interesting read. We have tried to incorporate as many new
ideas and concepts as possible in this book so that new possibilities in research can be explored. The different
methods of testing are depicted meticulously, with exclusive detailing of problem solving using each of these
methods. Templates of technical report, business report and research reports are also included in the book. The
document generation tool LaTeX and its various options are explained fully with sample codes and outputs. This
provides the reader with a hands-on experience. Numerous exercises, case studies and worked out examples
make this book unique when compared to other books of its genre.

SPECIAL FEATURES OF THE BOOK

The following points are the key features of this book:

1. This text is organized from an introduction to research, followed by literature review to data collection and
selecting the appropriate research design. Testing procedures, representations of results are also mentioned
in the same flow in which an original research is carried out. In most of the scenarios, we have chosen
real-time common problems for better understanding.
2. Throughout the book, we have presented solution-oriented research problems to emphasize the importance
on practical knowledge that is gained through systematic problem solving.
3. This book offers a standardized approach for research aspirants working in the various engineering fields.
4. Most of the sections are supports with highlighted examples and illustrations. Most of the case studies
used in this book are real-time problems, which could be expanded to a research problem in future.

We have tried to incorporate as many new ideas and concepts as possible in this book so that new possibilities in
research can be explored.

S. S. Vinod Chandra
S. Anand Hareendran
Acknowledgements
We express our sincere gratitude to our Guru, Dr Achuthsankar S. Nair – who gave many sparks to this textbook
in terms of case studies. He has spent a lot of his precious time for this work in long discussions, technology
dissertation and improving the writing skills. We had visited various Universities and have indulged in
discussions regarding their syllabi and questions. At this time, we thank all these Universities, Faculties and
other Academicians who have spent time with us in preparing the same. Our research articles are our confidence.
In this occasion, we are thankful to our research journal publishers and the anonymous reviewers who have
taken pain in furnishing our articles. Machine Intelligence Research (MIR) group has been our primary critics;
we thank each and every member of MIR group who have supported us in each phase of the compilation of the
book. We extend our gratitude to our family and friends for the kind cooperation during the manuscript
preparation time. Finally, I would like to thank the entire Pearson team for their contribution to this project, and
carefully handling the manuscript in editorial and production stages.

S. S. Vinod Chandra
S. Anand Hareendran
About the Authors

Dr S. S. Vinod Chandra is working as the Director, Computer Centre, University of Kerala. Since 1999, he has
taught in various Engineering Colleges in Kerala. Dr S. S. Vinod Chandra holds a Ph.D. from University of
Kerala and M.Tech. from CUSAT with first rank. He has discovered four microRNAs in the human cell. He has
five IPRs in algorithms. He authored five books and a modest number of research publications. He is a reviewer
of many international journals and conferences. His research areas include machine intelligence algorithms,
nature inspired algorithms and computational biology. He is heading Machine Intelligence Research (MIR)
group, a pinpoint research group in machine intelligence and nature inspired techniques. He is leading many e-
Governance projects associated with Universities and Government. He holds many consultancy activities for
Government organizations.

Dr S. Anand Hareendran is currently working as an Associate Professor, Department of Computer Science and
Engineering, Muthoot Institute of Technology and Science, Kochi. He obtained his Ph.D. in Computer Science
from University of Kerala. His current areas of research include machine learning algorithms, association rule
mining and bioinspired methodologies. He has a modest number of research journal publications and has two
IPRs in algorithm formulation. He has designed and implemented various rule-mining algorithms, which find
applications in medical field, route mapping and frequent item search. He is also an active member of Machine
Intelligence Research Group.
Testimonials
The presentation of the book is succinct. The pedagogy used is novel. For the last 8 years I am taking Research
Methodology for MBA students, so I am familiar with different authors RM books, in this books I haven’t seen
any citation from other books – it is an original manuscript. It’s the major strength. The explanations, contents
and examples are entirely new ones. Congratulations to the authors.

— Dr Ajesh Kumar, Associate Professor, Department of MCA,


SRM University

This book will be helpful heuristic for advancement of knowledge in the research area. Students and
Practitioners will welcome book especially for the clarity and the way in which the contents are placed. The
chapters in book will prompt reader a step forward to research journey. I am sure that this book will assist the
readers to successfully navigate the research methodology process and challenges which would otherwise deter
them from pursuing their research.

— Sri. Vinod Sivasankaran, Sr. System Architect, Infosys Campus,


Thiruvananthapuram

The coverage of the topics provided in the text is fairly elaborate, simple, lucid and easy to understand. The
material can be used not only for teaching, but also for self-study. The questions at the end of the chapters will
help readers check their understanding of the topics and feel confident to move ahead. The detailed explanation
given in the text for SPSS with screenshots is helpful for those who might be intimidated by the software.

— Dr Jubilant J. Kizhakkethottam, HOD, Department of CSE, St. Gits College of


Engineering, Kottayam

The text is simpler compared to similar titles I have been using to teach at PG level. The students who use very
popular, similar titles often complain that the statistical part of the text is intimidating and needs more
explanations in other areas also. I feel the chapters in this text address a lot of these issues effectively. The
coverage is comprehensive and detailed without assuming any prior knowledge of the topics. This will be
helpful to those students who are joining the management stream from other subject areas.

— Dr C. G. Raji, Professor CSE and IT HOD, MEA Engineering College,


Perinthalmanna

Simple, lucid and easy to follow, the book Research Methodology by Vinod Chandra S.S. & Anand Hareendran
S. starts from the basics and does not expect you to have prior knowledge of the subject before delving into it.
Simplicity of the text and detailed explanations is a definite strength of the book.

— Sri. C. K. Suvish, Senior Country Service Architect - Enterprise


Service Solutions and Transformations, Bengaluru, India

The content value is good and apt for Post Graduate and Under Graduate prospectus. The language used is
simple and comprehensible. In single word, my over all view on the features, pedagogy and presentation is that
it is just ‘creative’

— Dr K. Vijayakumaran Nair, Formerly Associate Professor,


Mar Ivanios College, Thiruvananthapuram

The content of the chapters are healthy. The language is easy to understand and the flow is also maintained. The
coverage of topic is good and it touches up the various aspects of the topics. The chapters give simple
explanation to the topic with case study. The pedagogy and presentation of chapters are easy to understand.
— Dr Devu Manikantan Shila, Sr. Research Scientist, Taylor Harbor E
Unit 1, Racine, WI 53403, USA

This book provides excellent understanding of research methodology with even computer software too. It covers
every aspects of Research Methodology in simple and smooth language. This book will help out students to
connect with research methodology concepts.

— Sri. Brijesh Gopinath, Sr. Account Director & Client Partner at L&T Infotech,
Greater New York City, USA

The strength of book is that, it is prepared with case study/example. The coverage of various and latest topic
makes it unique. The simplicity in the presentation of topic and language will increase the acceptability of book

— Sri. Jijith Somasundaram, Manager ECommerce Solutions


at Etihad Airways, United Arab Emirates

The quality of content is good; especially, the sequences of topics covered are completely aligned with the
research process. Authors have strictly followed the Research process in general way which is one of the key
factors to make it usable to readers. Authors have covered all possible topics required for research syllabus in
standard B- School. Some of topics especially review of literature is making this book differentiated from others.

— Dr Smitha Sunil Kumaran Nair, Professor, Middle East


College University, Oman

Concepts are very concisely explained. Many similar titles are already available in the market. But, the topics–
Research Formulation and Literature Review, Social Research, Presentation of the Research Work and LaTeX-
Document Generation Tool will differentiate this book from others and may support the acceptance by the
intended audience.

— Dr K. Bindu Kumar, Professor and Head, Department of Mechanical Engineering,


Government Engineering College, Idukki, Kerala

This is a relevant textbook with a presentation of the material and explanations that would fit an introductory
course on Research Methodology. The coverage of the book follows the research process and also have included
other supporting aspects such as Literature Review. Authors have appropriately used applications of SPSS. The
content of the book is supplemented by diagrams/pictures to support the communication of conceptual meaning.
This book is sure to blend the requirement of students.

— Mr A. Joseph, Research Scholar, IISc Bangalore


chapter 1
INTRODUCTION
Objectives:

After completing this chapter, you can understand the following:

The objectives of research


The definition and motivation of research
Different types of research
The research approaches and its limitations
Various steps in the research process
The criteria of good research
The ethics and morals in research

From a novice’s perspective, research can be defined as the search of knowledge. Oxford dictionary defines
research as the systematic investigation and study of materials and sources in order to establish facts and reach
new conclusions. Research is pursued within most professions. More than a set of skills, it is a critical way of
observing, examining, thinking, questioning and formulating principles that hold true at least for the given space.
Almost all professions affirm the need of research either for the advancement of business or for the
enlightenment of knowledge. Whatever profession we are in, we ask ourselves a lot of questions for finding new
knowledge and ideas. For example, let us consider that you are running a hotel; there are a lot of questions that
may help you in increasing your business:

How many customers do I treat daily?


Which are the most served dishes?
Which combo meal is more popular?
What time does a particular meal hits its maximum order?
How the customers rate our service?
What is the average price a customer spends on a dish?

Just by finding answers of these, one can always say that, a very valid investigation has been done for the
domain and the results. This will truly help him in making a sure positive progress. This is a very raw example
of research that we practice in everyday life. Consider the graph (Fig. 1.1), we have time in x axis and
knowledge in y axis. In our hotel example, there is a single point in x attribute, which can trigger the
management with a new concept for the advancement of the business. That is, the concept generation point,
which is represented as a steep peak.
Fig. 1.1 Concept generation graph

Redman and Mory defined research as, “a systematized effort to gain new knowledge”. Some
people/professionals consider research as a movement, a movement from the known to the unknown. It is
actually a voyage of discovery with pleasure and satisfaction. While considering research as an academic
activity, it involves a lot of steps such as problem definition, to solve the problem, literature review, data
collections, analysis, drawing inference, making hypothesis and arriving at a solution. This book deals each and
every activity that a researcher has to perform physically while engaged in research. Research is not just
gathering of information from books and other sources. The transportation of knowledge from one form to
another will neither constitute a good research. In short and simple, we can define research as, “the systematic
process of collecting and analyzing information (data) in order to increase our understanding of the phenomenon
about which we are concerned or interested”.

Research is not confined to science, engineering or technology. The research area covers many disciplines such
as history, sociology, linguistics and so on. Whatever may be the subject, it discovers, interprets or revises the
facts, events, behaviours and theories. The outcome of the research enhances the quality of human life.

Research process is a combination of study, experiment, observation, analysis, design and reasoning. How do we
know that cigarette smoking is injurious to health? How do we know that plasmodium vivax causes malaria?
How do we know that X-Rays can take internal images? All these are outcomes of research. Research provides
precise prediction of events, explanation to facts and theories, and various other information to humanity.

Social, scientific and engineering researches focus on mathematical models and theories. The output of an
engineering research can be a product, process or even a methodology. Thus, engineering research can be
explained in two worlds: explore and develop. Figure 1.2 shows the engineering research hierarchy. Social
research is conducted by social scientists through a systematic plan. A quantitative/qualitative dimension is
given by these researchers. Social research area includes statistics, political sciences, sociology, media studies
and market research. Sampling is one of the key areas in social research process.
Fig. 1.2 Research hierarchy

Scientific research includes those investigations utilized in acquiring new knowledge or correcting and
integrating previous knowledge. Basically scientific research forms questions, prepares hypothesis, makes
predictions, tests the observations and finally analyses the investigated result. The formal science, physical
science, life science, social science, applied science and interdisciplinary sciences are some of the types of
scientific research.

1.1 OBJECTIVES OF RESEARCH

It is quite natural to set goals before you start a task. Similarly, setting the goals in your research specifies the
objectives related to your work. You can find the answers to all your questions through a series of scientific,
recurrent set of queries. Thus, the hidden truth is exposed. So, let us look at how to set the goals of research.
Goal setting can be broadly classified into four steps, which is shown in Fig. 1.3.

Fig. 1.3 Setting goals

Research objective
Out of knowledge
You can do it
Usefulness

Research objective is the primary aim while carrying out research, where you can have general questions such as
what are the aims, and how do they differ from objectives? A statement of what you intend to achieve by
undertaking your research is the aim, whereas a statement of what is intended to be achieved at the end of the
research is the objective. So, goal setting always starts with the objective, that is, what you intend to find out.
Then our focus should be on undertaking some out-of-the-box thinking, which will provide an insight in to how
to achieve the set objectives. The usefulness of your research is also a major step while setting the goal of equal
or, if possible, more importance should be given to those steps that need be taken to prevent the unhealthy use of
your research products. Einstein’s famous mass energy equation, E = MC2, was the basis of creating the atom
bomb, which is still marked as one of the darkest days in the history of world and is often quoted as an example
of this. The impact of such research outcome can destroy the world. Figure 1.4 shows the remains of Hiroshima
city, which was destructed completely by the atom bomb. So usefulness and wise use of research outcomes hold
a prime role in setting research goal.
Fig. 1.4 Remains of Hiroshima episode

Source: old.seattletimes.com

Invention of electricity by Michael Faraday is an example of output of diligent scientific research. But at times
unexpected circumstances have led to major discoveries that have changed the course of the world. The
discovery of X-rays by William Roentgen is such an example. While conducting a study of electrical rays, he
accidentally discovered that a fluorescent covered screen was illuminated by the rays. He also found that the
image thus produced could be captured, leading to the world’s first X-ray.

Before determining the objective of a research, you should identify the scope of your work. After this, you can
determine on what you want to achieve and what decision you want to make. This will help to save time and
effort in the later stages of your research. We have to focus on the primary objectives of the research.

New fact discovery


Test or verify important facts
Cause and effect relationship analysis
New scientific tools, equipment and software, which can solve theories/concepts of scientific and non-
scientific problems
Identification of solutions to scientific and non-scientific problems
Solve problems in day-to-day life

Even though research objectives are scattered, they can be useful. It helps in identifying and sorting of objectives
that help the researcher to specify his/her aim.

Objective of the study should always be realistic, brief but descriptive and be feasible. So, by generalizing these
facts, we can summarize the objectives as in the following:

1. To get familiarized with a new theory or an idea by extensive investigation/reading to master each and
every concept related to it
2. To generalize the facts that you have studied for a specific class and implement them in a wide population
3. To find the relationships between the various events and factors that can influence the study under focus
4. To draw conclusion by conducting rigorous tests and thereby formulating hypothesis with solid
mathematical proof

1.2 DEFINITION AND MOTIVATION

The purpose of the research is to foresee future problems through pursuit of truth as a “global center of
excellence for intellectual creativity”. The scope of the research is basically defined by the researcher himself.
This helps to prioritize the tasks and even can avoid some issues that are likely to consume much of our time.
Try asking yourself the set of following questions before you start up a research.

What is the purpose of your research?


What information is being sought?
How will the information be used?

The answer for these questions is likely to guide you in a right path of research. Research is not just acquiring
knowledge; it is the creation of knowledge well around the researcher from which he/she can mine any sort of
information. In short, it is not the breadth of knowledge but the depth that is important. The basic purposes for
research are to learn something and to gather evidence for any sort of observation or theory. First is to learn
something, for your own benefit. Learning is not just restricted to studying theories and formulation of
hypothesis, it can be the observation of some techniques that your favourite batsman did while playing a straight
drive in cricket. Thus, we may generalize research as an organized learning. Thus, reading an encyclopaedia for
the latest innovations and reading a sports’ section for last night’s game results are both information gathering
and, in other words, research.

What you have learned is the source of the background information which you use to communicate with others.
In any conversation, you talk about the things you know, and also about the things you have learned. When you
write or speak formally, you share what you have learned with others backed up with evidence to show that what
you learned is correct. If, however, you have not learned more than your audience already knows, there is
nothing for you to share. Thus, with recursive learning, you do your research.

So, it is high time to give a complete definition to the term “research”. It refers to the systematic method of
solving the problem, formulating hypothesis, collecting the facts, analyzing them and reaching certain
conclusions either in the form of solution(s) towards a specific problem or in certain generalizations for some
theoretical formulation.

Strive for attaining new knowledge needs self-motivation and creativity. The attitude of the researcher and the
commitment that he/she puts into it for the completion of his/her work are really important for having a perfectly
designed and well-organized work. There can be a lot of answer for, what makes people to undergo research?
The most commonly found answers will be as follows:

Need a degree with all luxury that it can provide,


To have that intellectual joy of accomplishing such seemingly exhaustive problems,
To attain respect in public,
To serve the public through the new discoveries and improvements of processes,
To bring about something new in your field that will pave the way for future generations to explore deeper

Not only motivations but there can also be some compelling factors such as some government order,
employment conditions and so on. Another important factor that is required for a rigorous research is a
guide/mentor who can lead you safe in the sea of research. The guidance and motivation that the mentor
provides always lit the fire of motivation in every researcher.

The only way to do great work is to love what you do.

1.2.1 Variables

A variable is any factor that varies in place-time-object context. Variables are inter-connected. A change in a
single variable causes simultaneous change in related variables but maintains equilibrium. For example, if we
would like to buy a new mobile phone, there are various variables that we look for, such as brand, colour, size,
features, operating system and so on. Each attribute is viewed as a variable. Identification of research problem
should follow the fixing up of major variable sets as objectives. Figure 1.5 shows the major variables while
considering the construction of a new dam. All the variables are separate objectives for the research. The
industry, water supply, electricity, agriculture, health issues likewise, all the important factors need to be
carefully studied before starting such a project like dam. As mentioned above, all the variables are inter-
connected, so the effect of one variable on another and the various factors influencing these variables need to be
studied.

Fig. 1.5 Variables on a particular research

There are various types of variables such as independent, dependent, micro, macro, continuous, discrete,
qualitative and quantitative. There are certain quantities that are largely dependent on each other. For example,
when the salary increases, expense also increases, but the cause of salary increment is an independent variable.
Depending on the size, variables are divided in to micro and macro. The annual national income is a macro
variable, whereas the annual family income is treated as a micro variable. Depending on the order, variables are
divided in to continuous (such as age, income) and discrete (such as religion, occupation). According to
measurability, variables are classified as qualitative and quantitative. Attributes such as colour (black, white,
etc.) cannot be measured. But variables of unit measures, which can be measured, come under quantitative.
Figure 1.6 shows a variety of variables used in research.
Fig. 1.6 Types of variables

1.3 TYPES OF RESEARCH

Research comprises “creative work undertaken on a systematic basis in order to increase the stock of knowledge,
including knowledge of man, culture and society, and the use of this stock of knowledge to devise new
applications”. Research can be classified in many different ways on the basis of the methodology of research.
Some of the common research types are as follows:

Pure research
Applied research
Quantitative research
Descriptive research
Experimental research
Qualitative research
Action research
Historic research
Comparative research
Exploratory research
Conceptual and empirical research

Pure research

Pure research is also called as basic or fundamental research. It is an exploratory research done within a pre-
described boundary. Mindset and willpower of the researcher is the only way to advance such type of research.
Entire work is be driven by the motivation and commitment of the researcher. The work progresses by finding
the various variables and their relationship between each other. Pure research always serves as the foundation for
applied research. The primary concern of such type of research will be the designing of internal logic and
architecture, because of the limited knowledge space. Research concerning some natural phenomenon or relating
to pure mathematics or work that tries to generalize some facts are examples of fundamental research.
Applied research

A more glorified form of research is the applied research. Fundamentals that we have come across in pure
research are applied to produce some end products. In short, it is an application of theory into practical solutions
of problems. Applied researches are basically a group task, trying to create or solve a particular problem. The
research becomes truly acceptable when third parties or sponsors use our product with full satisfaction. For
example, studying the radio communication channel is a pure research, but creating a product that uses radio
communication for any particular task comes under applied research. In the western countries, about 80% of the
researches that they carry out are the applied research.

Quantitative research

Quantitative researches are those which can be represented or described according to some numerical system.
They are normally associated with large-scale analysis works. These are basically chosen to compare and fine
tune the products by considering the amount of output that research process delivers. Statistical tests are being
done for measuring the validity and reliability. Experimental and descriptive researches are the major
classification of quantitative research.

Descriptive research provides an accurate profile of the group. It describes the process, mechanism or
relationship. It gives information and stimulates new explorations. It does not involve any in-depth study, but it
reveals just the concepts of theories. For example, while studying the economic activities of a village, one will
look for day-to-day activities of the villagers, analysis it and generates a report. There are no deep evaluations,
just what he/she perceives reflects in his/her report.

Experimental research measures variations in varied conditions. For example, if we are trying to find the effect
of a particular fertilizer, we take a plot and divide it into two halves. In the first half, we keep the fertilizer and
the other half will be kept free. During the time of harvest, we can directly see the effect of the fertilizer. Thus,
we have a control group and an experimental group. Before and after the true impact study, the research work
will be supervised very closely.

Qualitative research

Qualitative research takes an inductive approach. This type of research is found common in social sciences
where researchers intend to study social and cultural phenomena. Qualitative research does not have a well-
placed definition, but it can be viewed as a work in a “world-view” angle, which has all assumptions but
meaning changes from situation to situation. Unlike quantitative no numerical measures are incorporated in
qualitative research, instead it uses an in-depth analysis approach by taking case studies or events to study the
situations. It is not involved in investigating and developing hypothesis. A common perception of qualitative
research is the emphasis on discovery rather than proof. Action research and historic research are the major
examples of qualitative research.

Action research makes study while an intervention program is going on. It involves simultaneous intervention
and observation or measurement of impact. Historical research explores into historical facts by adopting
historical methods. It relies more on the historical documents and other evidences. Studies on the stone
encryptions, palm leaf readings, etc., come under historical research. Data collection and data synthesis are the
major steps in historic research. The renowned Indian anthropologist R.K. Mukherjee defines qualitative
methods as theory generation. Figure 1.7 shows R.K. Mukherjee’s view on research methods.
Fig. 1.7 R.K. Mukherjee’s view of research

Comparative research

It deals with the comparison of similarities or differences under same or varying conditions. For example, study
of economic condition of two cities or health condition of people in nearby villages comes under comparative
research.

Exploratory research

It is the type of research carried out when the problems are not clearly stated. This research helps in finding the
best design and data collection methods. Even though the results are not well defined for making a decision, it
can be used for providing insight in to a particular situation. These types of researches cannot be generalized and
are not applicable to large populations. Researches on finding the average age of people in a village come under
this type of research.

Conceptual and empirical research

Conceptual research is related to some abstract idea(s) or theory. It is generally used by philosophers and
thinkers to develop new concepts or to reinterpret existing ones. On the other hand, empirical research relies on
experience or observation alone, often without due regard for system and theory. It is data-based research,
coming up with conclusions that are capable of being verified by observation or experiment.

1.4 RESEARCH APPROACHES

The various types of research discussed in the previous section enlighten the readers with the two basic research
approaches, namely qualitative and quantitative. There can also be more real-time research approaches, but
every other approach will be a mixture of either quantitative or qualitative. In a generalized view, we can add
logical and participatory approaches along with the aforementioned ones (Fig. 1.8).
Fig. 1.8 Research approaches

There have always been hot conversations and debates on selecting the apt approach for each research problem.
But in all researches, a mixed approach can be used to grab the results and to generate the hypothesis. Strauss
and Corbin divided the qualitative approaches into three main categories: non-interpretive, interpretive and
theory building. Non-interpretive studies are focussed on describing the observations that the researcher has just
investigated. Observations need not be analyzed, but can be left for the readers to interpret. Interpretive studies
focus on the importance of the analysis to be performed by the researcher. The researchers make out from the
empirical observations and create descriptive as well as analytical results. Thus, a “real world” experience is
provided to the readers.

1.4.1 Quantitative Research Approach

Quantitative research is generally associated with the collection and conversion of data into numerical form to
make statistical calculations from which generalized/specified conclusions are drawn to formulate hypothesis. In
a qualitative research, the objective is given prime importance. There may be many hypotheses, but the real
question is to select the one that gives the most relevant outcome (result). For this, the researchers make use of
the various instruments and materials such as computer, observation check-lists, statistical analysis tools and so
on.

There is a reconstruct procedure to accomplish the analysis task; by this, various relationships between variables
are found out. Researchers also take good care to control the results and all external factors that can bias the
result. This helps to obtain an undisputed final output. The main emphasis of quantitative research is on
deductive reasoning that tends to move from the general to the specific and is also known as top-down approach.
The validity of the hypothesis is proved by combining one or more valid observations or rules. Like the famous
deductive example,

Generalized Statement: All men are mortal.

Example: Socrates is a man.

Specific Conclusion: Socrates is mortal.

Researchers will not have access to the entire population; hence, it is a prime importance to study the sample
data (population). Generalizing the results should not be limited to the groups of people but also to the
situations. In short, qualitative research approach can be used to develop the understanding required for
evaluating if a variable is relevant or not to a given problem situation.
Limitations

It fails to take account of people’s unique ability to interpret their experiences, construct their own
meanings and act on these.
It leads to assumption that facts are true and same for all people any time.
Quantitative research often produces binary and trivial findings of little consequence due to the restriction
on and the controlling of variables.
It is not totally objective because the researcher is subjectively involved in the very choice of a problem as
worthy of investigation and in the interpretation of the results.

1.4.2 Qualitative Research Approach

Quantitative research is about recording, analyzing and understanding the deeper meaning and significance of
the revealable variables. The research approach adopted is an inductive method, wherein the researchers develop
a theory or look for a pattern on the basis of the data that he/she has collected. It involves a strategic and
conventional move from specific to general and is sometimes called the bottom-up approach. There is no pre-
determined hypothesis; they will be guided by a set of rules/theories, which provide them the framework to
investigate more on the required axis. The data is collected through observations, interviews and through focus
groups.

In short, according to the perspective on quantitative research as counting, qualitative research can be seen as
proposing which variables to count.

Limitations

The problem of adequate validity or reliability is a major criticism. Because of the subjective nature of
qualitative data and its origin in single contexts, it is difficult to apply conventional standards of reliability
and validity.
Contexts, situations, events, conditions and interactions cannot be replicated to any extent nor any kind of
generalizations can be made to a wider context than the one studied with any confidence.
The time required for data collection, analysis and interpretation is lengthy.
Researcher’s presence has a profound effect on the subjects of study.

1.5 STEPS IN RESEARCH PROCESS

Before explaining the various corners of research, let us have a general idea about how the research process is
carried out. There are a lot of feed forwards/back loops within the system in order to attain a complete and fresh
research. Figure 1.9 shows the process flow involved in a research process.
Fig. 1.9 Research process flow

We can basically divide research as a six-step process. Figure 1.10 shows the various steps involved in the
research process.

Fig. 1.10 Steps in research process

1.5.1 Problem Definition


Defining the problem is the initial step of the research. The researcher must find his/her area of interest and
should do a lot of readings in order to find the topic which he/she would like to work with. One major issue
during this phase will be the perspective in which the researcher converts the generic area to a specific topic.
Researcher needs to be aware of the feasibility of the study and should also look out for the facilities that he/she
may get. So when a topic is being focussed, all sort of feasibility study needs to be done. The best way to
understand a particular problem is to have discussion with friends or colleagues working in the same area. This
will help you to get a wider angle on the area of your interest. We can generally have two types of research: one
is association research, where the association of several variables is being studied or those relate to the states of
nature. Whichever be the research area, the research should be able to explain the work and area in simple
language that shows the ability of researcher to grab deep insight in to the work. Once the researcher feels that
he/she is ready with a research area, then he/she should write a sample draft mentioning the aim of his/her work.
The synopsis thus written should be reworked and corrected to have a final polished synopsis, which acts as a
guide for his/her future work. The synopsis should guide the way through which he/she should work in order to
attain the goal. In mean time of research, the goal or objectives can change, so it is always preferred to have a
dynamic draft, where in you can make sufficient changes throughout the work. The changes should be made in
such a way that the final focus never changes.

1.5.2 Setting Out a Plan

Once the problem is defined, researcher needs to set a plan. A research plan is a thoughtful, compelling and
well-written document that outlines your exciting, unique research idea. A typical research plan has four main
sections:

Specific aims
Significance
Preliminary studies and progress report
Research design and methods

The specific aim is a formal statement of the objectives and milestones of a research project. The next section
states the research problem including the proposed rationale, current state of knowledge and potential
contributions and significance of the research to the field. The preliminary results section describes prior works
relevant to the proposed project. They are important to establish the experience and competence of the
researcher to pursue the proposed research activity. Purpose of the research design and methodology section is to
describe how to carry out the research. This section is critical for demonstrating that the researcher has
developed a clear, organized and thoughtful study design.

It should provide an overview of the proposed design and conceptual framework.


Study goals should relate to proposed study hypotheses.
Include details related to specific methodology; explain why the proposed methods are the best to
accomplish study goals.
Describe any novel concepts, approaches, tools or techniques.

1.5.3 Literature Review

Once we have set a plan, it is now time to act according to it. The researcher must learn more about the topic
under investigation. To do this, the researcher must review the literature related to the research problem. This
step provides foundational knowledge about the problem area. The review of literature also educates the
researcher about what studies have been conducted in the past, how these studies were conducted, and the
conclusions in the problem area. A good library or resource websites will be a great help to accomplish this task.

1.5.4 Analysis and Hypothesis Formulation


Once the researcher has finished literature survey, he/she should be very clear and should be able to explain the
hypothesis behind his/her work. Hypothesis should be very specific and limited to the piece of research in hand
because it has to be tested. A simpler mathematical or working model proof should also be able to present to
show the consistency of the work. The researcher now analyses the data according to the plan. The data
collection methods and related topics of sampling are detailed in Chapter 3. The results of this analysis are then
reviewed and summarized in a manner directly related to the research questions.

1.5.5 Presentation and Interpretation

The hypothesis developed and tested now needs to be interpreted to build a theory. The real task of research lies
in this area, building a generalized theory from the various facts that has been discovered. This step also deals
with, how the proposed system is presented before the audience. It needs to be self-explanatory. The theory and
the hypothesis should always provide the audience a feel of freshness with simplicity. The more acceptable work
captures more attention. The presentation has an equal importance like interpretation.

1.5.6 Decision Making

Decision making is the final step in research, where the researcher has a very small role. The group to which
he/she presents the new theory needs to approve it. The genuine and validity will be questioned by the experts
and the researcher need to provide the answers which quench the doubt of experts. A valid and informative piece
of research will always be welcomed and finally researcher is rewarded for his/her tedious and hectic work.

1.5.7 Case Study: Sample Research Problem

Now let us look at how we can develop a research problem from an environment around us through the
aforementioned steps. Consider a social research on the study of primary child education. In this case, we need
to define our problem very clearly. By primary education, are you just looking a population who knows just to
read or write, or a population who has completed till a particular class, say standard four, and so on, and also by
child which category you are focussing, children below 8, 10 or 14 years. Thus, our problem definition should
be apt and clear so that a third person could easily help us or be able to guide us just by hearing our area. After
defining the objective, we need to create a work plan, such as from which area are you focussing, what are the
sample population, what kind of tools you are using for analysis, how the final result is interpreted and
presented, what advantage does this study provides and so on.

The next step is the analysis of the results that are produced from the specified tool. For example, we are using
IBM SPSS for the data analysis, we could easily conduct Anova test and get the result. Similarly, if we want to
present the data as graph or some charts, the tool has all options. So selecting the tool for analysis also has much
important effect. Results after the research should not be just an output that satisfies your work. It should have
some impact on the society that helps to create a positive effect on our effort. In this case, after our research
analysis and result approval, we should create a huge impact on our result among the society, which will help in
uplifting the children, thereby creating more opportunities for their primary education.

1.6 CRITERIA OF GOOD RESEARCH

We have discussed the various types of research, whatever be the type, the research must follow certain
guidelines. The purpose of the research should be well defined and the methodologies that we indent to follow
should be clearly stated. It helps other researchers who try to repeat the experiments in future for further
advancement. Thus, a systematic research, which is well structured and sequentially arranged, will provide more
ease to research works.

Another important criterion of good research is the logical flow of ideas. There needs to be a logical reasoning
or process of induction/deduction involved while representing the work. The ideas need to be fully supported by
well-formed hypothesis or theories. It also gives options for creative thinking for the researcher to arrive at valid
conclusions. A good research should always report the type of errors and the flaws in the design. Proper analysis
methods and reliability of the results should also be published. The conclusions obtained from the research
should be substantiated by valid proofs and the results should be reproducible. Thus, a systematic, logical,
replicable research work can be considered as a quality work.

1.6.1 Problems Faced by Researchers

There are various problems faced by researchers who are indulged in dedicated and sincere work. The problems
can be either external or internal. Those issues which the researcher have to face from external factors such as
co-workers, institute or similar situations can be categorized as external problems. The stress, work pressure and
lack of interest due to continuous failure accounts for internal problems. This section details the most common
issues found among the researchers.

Lack of support

Researchers may not get proper support from the institution or department where they work. The ambience that
the institution offers has a major role in setting the researchers to follow his/her dreams. The support from co-
workers, guide and the family is necessary to complete a successful work. The interaction thus becomes a key
point.

Lack of funds

Almost all research activities require big financial support. There will always be a funding agency or external
agency. It always becomes difficult when the funds get blocked or the agencies withdraw in between. So
searching a new sponsor in between is a tedious task. Likewise, inadequate supply of materials that are needed
for research can also be a major issue.

Lack of scientific training

The researchers might have no proper knowledge about the research methodologies. A sudden jump into
research without proper knowledge may lead to a bad outcome. The ethics, design principles and data collection
scheme are well defined in the methodologies, and therefore without a proper knowledge on these may have an
ill effect on research.

Lack of study materials

While having a higher level of study, proper books need to be supplied. Libraries may lack sufficient book
collections. Also, for online access to top class journals and even the archived copies of works need to be
available in the libraries. When there are not adequate materials, literature survey becomes a big question in
front of researchers.

Issues in publications

Once we have a valid result or output, we need to publish them in renowned journals. But almost all journals
take huge processing time. This can make the researchers restless or even can demotivate them. The review
comments from publishers need to be taken by an open heart. It can contain major suggestions, which will be
useful to improve your work.

1.7 ETHICS IN RESEARCH


We have read in many books regarding the invention of telescope by Galileo Galilee. But, it is said that a
Dutchman, Hans Lippershey, created the first ever telescope, who was denied a patent. Galileo caught wind of
this idea and made his own. The controversy regarding this still exists. Here, the idea of telescope was put
forward and implemented by Hans, but it was taken from him by Galileo. From a pure theoretical research, it is
an ethical issue.

When people hear about ethics, they consider it as a set of rules to differentiate right and wrong. Ethics can also
be defined as the norms for conduct that distinguish between acceptable and unacceptable behaviour. Such
learning starts at home, school, church or even from the society itself. Consider the society around us; they have
legal rules that govern behaviour, but ethical norms take a broader scope and are more informal than laws.
Ethical norms are also applicable in research area, to give personal importance to people who conduct scientific
or creative activities. Research ethics is the specialized branch, which deals with study of such norms.

Maintaining ethical norms is one of the most important issues to be kept in mind while undertaking research.
First and foremost, this helps in defining the principle aims of research. That is, knowledge, truth and avoidance
of error bringing down fabrication or falsification of data help in decreasing the possibility of error.

Second, while carrying out research, we need to have the co-operation and co-ordination of several people. They
may be from different disciples or even from different institutes. In such situations, we need to have some ethical
measures which will help to generate trust, mutual respect and accountability. The various ethical norms such as
copyright policies, patents, maintaining confidentiality while reviewing and handling sensitive data are designed
to protect the intellectual rights while having collaborative research. Nobody wants to have their idea
implemented by others. Thus, ethical norms prevent such acts.

Third, it makes the researcher accountable to the public. For example, while doing research if any misconduct or
breaking of some policies or protection is observed, the public can question him/her. There are many laws that
prevent unnecessary experiments on animals and birds. As he/she is being funded by the public, he/she is
supposed to be answerable to them.

Fourth, ethical norms in research also help to build public support. People are more likely to fund a research
project if they can trust the quality and integrity of research. And to gain that trust, ethical norms need to be
strictly followed. They also need to promote social values and responsibilities. Research also needs to adhere
strictly to the health and safety of the subjects under observation.

For example, a researcher who fabricates data in a clinical trial may harm or even kill patients. Similarly, if a
radiologist conducting studies related to UV or IR rays without following the regulations and guidelines may
jeopardize his/her health and safety or the health and safety of staff and students.

1.7.1 Morals in Ethics

This section briefs the various characteristics that a good researcher should hold.

Honesty

Researcher should strive for honesty in all scientific communications.


Should present only honest report data, results, methods and procedures.
Fabrication, falsification or misrepresentation of data can lead to many severe issues.
Should not try to deceive colleagues, funding agencies, public and not even their heart.

Frankness

Researcher should avoid bias in experimental design, data analysis and data interpretation;
No personal relationships should come into play while doing peer review, giving grants and even writing
testimonies;
Should make sure that financial or personal interests will never affect research;
Should have mind to share data, results, ideas, tools and resources;
Should be open to criticism and new ideas.

Integrity

Should always keep his/her promises and agreements;


Should never give false promises and come up with silly excuses when all others are trying hard for
his/her success;
Should be sincere and consistent of thought and action.

Carefulness

Should always avoid careless errors and negligence;


Should carefully and critically examine all the work by themselves before giving to peer reviews;
Should keep a research diary for almost all activities such as data collection, research design and
correspondence with guides, friends, agencies or journals.

Respect for intellectual property

Should honour patents, copyrights and other forms of intellectual property;


Should not use unpublished data, methods or results without permission;
Should give credit where it is due;
Should give proper acknowledgement or credit for all contributors to their research; never plagiarize.

Confidentiality

Sharing data is well acceptable. But when there are situations to handle sensitive data, protect those confidential
communications with much care. Papers or grants submitted for publication, personnel records and patient
records should not be disclosed.

Responsible

Researchers should always have a commitment to society, should strive to promote social good and prevent or
mitigate social harms through research and public education. They should try to improve their self-competence
along with the upliftment of the society

Non-discrimination

Discrimination against colleagues and students on the basis of sex, race and ethnicity needs to be avoided. Any
factors that are not related to their scientific competence and integrity need not be questioned or challenged.

Subjects protection

When conducting research on human subjects, we need to minimize the harms that may cause. The risk factors
need to be well studied and the work needs to be progressed in such a way that it maximizes the benefits. While
dealing with vulnerable populations, maximum care needs to be taken such that no actions from their part should
make them sensitive. Proper care and respect need to be given to animals while using them in research. They
should avoid unnecessary or poorly designed animal experiments

There are various other matters that comes under this category, some of which are mentioned in the following:
Publishing the same paper in two different journals without informing the editors,
Not informing a collaborator of their work to file a patent, and thus making sure that he/she is the sole
inventor,
Including a colleague as an author on a paper even though he/she did not make a serious contribution,
Discussing with their colleague’s confidential data from a paper that they are reviewing for a journal,
Manipulating datasets without providing significant reasons,
Using an inappropriate statistical technique in order to enhance the significance of research,
Bypassing the peer review process and announcing the results through a press conference without giving
peers adequate information to review the work,
Conducting a review of the literature that fails to acknowledge the contributions of other people in the
field or relevant prior work,
Stretching the truth on a job application or curriculum vitae,
Giving the same research project to two graduate students in order to see who can do it faster,
Overworking, neglecting or exploiting graduate or post-doctoral students,
Not able to maintain research records periodically,
Making arrogant comments and personal attacks while reviewing an author’s submission,
Personal grudge while working.

Now we have paved the basic ideas regarding research, the various objectives, motivation, types, approaches,
etc. We have also detailed the steps in research process. Now it is high time to have an elaborate study on each
of these steps.

EXERCISES

1. Explain the various steps involved in research process.


2. What is scientific research? What is its significance in modern age?
3. Differentiate research methodology from research methods.
4. Explain the various problems that the researchers face.
5. “Research is a dedicated sequential process”. Discuss this statement.
6. Differentiate the following:
Pure and applied research
Conceptual and empirical research
7. Explain the following:
Criteria of good research
Motivation of good research
8. Motivation and confidence are the keys to success. How can you substantiate this statement?
9. With an example, explain the various steps in research process.
10. Write notes on research ethics.
11. Explain the different criteria in selecting a research objective.
12. Explain the different types of research.
13. What is the relationship of research to science? What do you mean by technology research?
14. What are the different items to be included in research proposal? Explain each in detail.
15. Define and distinguish between theory, law and hypothesis.
16. Explain the ethical considerations related to empirical research.
17. Explain the differences between social, scientific and engineering research.
18. List out the social impacts of research inventions.
19. What are variables? List out the different variables used in a social research.
20. What are qualitative and quantitative methods? How are they useful in research?
21. Explain how scientific collaborations are made while performing research?
22. What are the limitations of quantitative research approach?
23. What is the bottom-up approach in qualitative research?
24. What are the steps involved in the research process for the social/economic surveys?
25. What are the ethical norms in research? Explain the reasons.
26. Explain the ethical principles in research.
27. How animal care is given in research studies? Is it important?
28. How data confidentiality is kept in research process?
chapter 2
RESEARCH FORMULATION AND LITERATURE REVIEW
Objectives:

After completing this chapter, you can understand the following:

The definition of problem and its formulation


The review of literature
The characteristics of a good research question
The process of literature review

Primary step in the process of research is selection of a research problem. The researcher needs to have a vast
knowledge in the domain that he/she wish to work. Only then, he can analyze the gaps in the present situation
and put forward a new proposal. For getting immense knowledge in their field, he/she should carry out vast
literature survey. This chapter details how to select a research problem and the various aspects related to
literature survey.

2.1 PROBLEM DEFINITION AND FORMULATION

The best explanation for research problem is provided by Northrop during 1966. He stated that “Inquiry starts
only when something is unsatisfactory, when traditional beliefs are inadequate or in question, when the facts
necessary to resolve one’s uncertainties are not known, when the likely relevant hypotheses are not even
imagined. What one has at the beginning of inquiry is merely the problem”.

In simple words, we can say that research problem defines the destination before starting the journey. It
specifies, what to do, how to do, where to do and what the outcomes are. Thus without a properly defined
problem, the research cannot progress. Formulation of research problem is not just finding a topic within our
interest. It is the remodelling, reshaping or even reconstruction of facts, theories or hypothesis. The problem thus
formulated should also be compact for data collection and analysis. Figure 2.1 shows the directions of problem
formulation from a situation.

We can visualize this process like a magician’s magic box. Whenever we open the box, it contains another one.
Similarly, the process continues and he takes something very precious from the innermost boxes. Likewise, in
research we start with a big domain and we narrow it down to smaller pieces and finally gets to the core and
obtains the real research problem.

Fig. 2.1 Research problem formulation


The sole aim of problem definition is creation of research question and creation of hypothesis. So, what exactly
is a research question? They are kind of questions that could be answered through the analysis of data. The data
can be either qualitative or quantitative. It can also be the answers of some survey questions. Where hypotheses,
on the other hand, are the guesses or conclusions generated from the theory that were created during data
analysis.

In short, we can summarize the goal of formulating research problem as the method of generating measurable,
well-defined, directed and in-scope research questions for creating desired hypothesis.

Case Study

Domain: Alcohol consumption among school children.

Theory formulation: Increase in price may reduce the demand

Research question: Will the price hike cause lower consumption of alcohol among school children?

Hypothesis: Increase in tax, reduces the consumption to a great extent

Here, a social survey is being conducted to study the use of alcohol among school children. One of the major
causes of this is the easy availability of the product. The students have ample money with them to consume this.
But what if, the prices are increased? Not all but still a particular percentage of students will not be able to
constantly get the product. Thus, the consumption can be reduced, and a hypothesis is generated. In our social
situation, the real question is, what will they do to get more money? A serious question to think about….

There are certain guidelines that need to be followed while formulating a research problem. The single statement
that needs to be in every researchers mind is, there is no short cut to the research – it is only the hard work and
determination that gives you the perfect result. Many scholars hastily skip the primary step of problem
formulation, which makes them to face difficulties in later stage. The problem formulation cannot be borrowed;
the guide can only throw light to the vast domain; it is the responsibility of each researcher to find the problem
on his/her own interest. Having spotted the domain, right questions should be asked and answered. The non-
biased and unattached approach will surely make way to have a core research problem. Formulation should start
with a quick and dry study of the topic, and also need to have some alternative problems which can help in
formulating our main scope. The problem thus formulated should be novel, significant and useful to
practitioners.

Always keep the five W’s in mind.

1. What is my research?
2. Why do I want to do this research?
3. Who are my research participants?
4. Where am I going to do the research?
5. When am I going to do the research?

Research problem identification is like treasure hunting. The hunting may have any shape (circle, square,
rectangle, etc.) fields. These fields are research topic areas selected by a researcher. A shapeless field is
considered as “no idea” in the research problem that has been selected by a researcher. He/She starts from
outside of the field, slowly enters into the area and search the core (actual research problem). If he/she reached
the core, then they will try to understand the core and try to move out to the surface. This journey is the problem
solving. So there are two journeys – identification of the research problem by a wide search and how to solve the
identified problem. During this travel, he/she may conduct a literature survey, reviews and becomes a scholar in
that field. In this second travel, he/she has the research problem(s) and tries to identify how it can be solved.
When they reach the surface, they win with appreciations (publications, patents, etc.)
Figure 2.2 shows how a researcher identifies a research problem and solves this problem by methods. A basic
confusion always arises is, where to start? Is any result will be obtained? Am I in the right track? etc.

Fig. 2.2 Identification of research problem

2.1.1 Problem Selection

The research problem undertaken for study must be carefully selected. Major sources of research problem can
either come from experience, observation, interest or needs. While choosing a problem, you should make sure
that the area chosen is of real interest. Also an area with some practical value will help in a wide acceptance. A
general feasibility study should also be done before a selection. Figure 2.3 shows a sample feasibility study
check-list. For almost all researches, pilot study adhere will be a sample check-list to evaluate the feasibility and
work is carried out only when the feasibility score card is passed.
Fig. 2.3 Feasibility check-list

One major criterion while selecting the problem is to avoid those areas that have already been overdone. For
example, nowadays most sort of research activities in the area of computer science is dealing with cloud
computing and its security. Even though selecting a problem from such an area helps you with a lot of literature,
the probability of getting a very valid research problem is rare. Also subjects of sensitive scope or controversial
topics should not become the choice unless you are ready to face the consequences. Those vague and broad areas
that narrow our working space should also be avoided from selecting. Figure 2.4 gives how research problem is
evolved.

Another major factor that needs to be considered is the availability of sources that perform a particular research.
For example, if we do not have any particular arrangements and sources to study the presence of water in the
planet Pluto, well then the research will just end in nowhere. So availability of resources should be checked
before selecting a problem. Before selecting, some experts and universities that are working in your area should
be contacted. This can help you a lot in future.
Fig. 2.4 Research problem evolution

The cost and time involved in the research should also be a key point while selecting the problem. Even
conducting a pilot study of the problem, a researcher must know the present situations, drawbacks and the
current state holds. Reading latest articles can also help him/her to achieve these goals. The genuineness of the
research problem also needs to be checked. Anticipate the results, critically question, discuss with the colleagues
and finally focus on the goal. All these help you to get the diamond (research problem) out of the shell (domain).

If the research problem is selected appropriately, by conforming to the aforementioned points, then the research
will not be a boring drudgery. Rather, it is exciting and educating. The selected subject/problem must involve the
researcher and be the prime priority in his/her mind, so that he/she may give his/her best shot required for the
study.

2.1.2 Necessity of Problem Definition

“A problem well-defined is a problem half-solved” holds strong even today. Proper definition of a research
problem is a prerequisite for any research study. Formulation of a problem is more significance than its solution.
The entire direction in which research moves is decided by the way in which the problem is defined. The
definition will help in differentiating the relevant and irrelevant data. The process model, design of solution,
steps that are involved in solving the research problem can easily be spotted if the problem is well defined.

What data needs to be collected?


What characteristics of that data are relevant and need to be studied?
What relations have to be explored?
What techniques have to be used for the purpose?

All these questions can easily be answered and if the problem is well defined. Overall quality of the research and
efficiency will increase only when the system is well defined. Problems which are ill defined can cause hurdles
in the path of research. The techniques that need to be incorporated in our study can be initialized and selected if
the problem statement is clear. In short, we can say that a research problem should be FINER (Feasible,
Interesting, Novel, Ethical, Relevant). A well-defined research problem keeps the researcher in track and helps
to focus his/her work. Figure 2.5 gives steps involved in research problem making by a researcher.
Fig. 2.5 Research problem loop

Problem formulation

This section illustrates an example on how to formulate a research problem. Consider a general problem,

Why the productivity in USA greater than in India?

While working in this area, there are many questions that need to be resolved. Starting from, what kind of
productivity is the domain talking about? With what industry does the comparison takes place? What period of
time does this comparison valid? While answering such questions the general domain becomes much specific.
Now we can modify our question to, why productivity of America greater than India, with regard to IT industry
from the time period 2005–2008. We can further rethink and reorganize our questions like, what are the major
factors? What amount of increase does happen and likewise?

2.2 LITERATURE REVIEW

Literature means writings, and a body of literature refers to all the published writings in a particular style on a
particular subject. In research, a body of literature is a collection of published information and data relevant to a
research question. Literature survey is not a compilation of every work written about a particular topic. It is a
survey or overview of the literatures found to be significant to the area under study. Review of literature always
help you to increase your knowledge of your topic, identify important authors and works in your area of
research, identify new researches, theories and/or methodologies in your area of research. While doing literature
review, it should not stand as a stand-alone document. It should include an introduction defining your topic and
the purpose of your review of the literature. It needs to be organized by common themes or categories. It should
also contain the summary and analysis of each work including its importance to the overall topic as well as its
relationship to the other referred works.

2.2.1 Critical Analysis

A critique is a person who evaluates somebody’s work (e.g., book, essay, movie, painting, article, etc.) in order
to increase its understanding. Critical analysis is a view point expression of a writer who evaluates the work. So
it is a subjective text to break down and study the parts. Critical reading and critical writing are the two parts of
critical analysis. Critical writing depends on critical reading. An involvement is reflected in the texts of a
researcher’s written text who is already interested. The interpretations are like a judgement and formulate
findings based on your approach. In critical analysis, the following type of questions may be considered by a
researcher.

Theoretical questions

Questions are prepared based on a theory.

For example,

How authors are concluding their views in this situation?


What is the relevance of sigma value?

Definitional questions

Questions are prepared based on definitions.

For example,

Does the author consider two different aspects?


Is any relevance in comparing communism and democracy in China?

Evidence questions

Questions are prepared based on selected evidence.

For example,

Do authors give evident support for their arguments?


Do regulatory mechanisms are applicable to this gene?

2.2.2 Critical Thinking

Critical thinking, according to National Council for Excellence in Critical Thinking, is the intellectually
disciplined process of actively and skilfully conceptualizing, applying, analyzing and/or evaluating information
gathered from, or generated by, observation, experience, reflection, reasoning or communication, as a guide to
belief and action. One of the key questions of research is how do you infer into a correct decision from you
problem? Thinking is one of the basic solutions of the problem formulating skill. Is all thinking formulated into
correct problems? If your approach as a critic to your problem may give much idea into your research problems,
there come certain skills in your thinking. Figure 2.6 explains the required skill sets of a researcher.
Fig. 2.6 Critical thinking

Interpretation is the ability to understand the information you are being presented with and being able to
communicate the meaning of that information to others. During the analyzing step, a critic separate or break a
combined idea into parts to notice their nature, functional and relationships. Reasoning is ability to understand
and recognize what elements you will need in order to determine an accurate conclusion or hypothesis from the
information you have at your disposal. At the evaluation stage, the identified statements are judged or meet an
opinion to measure the validity of the presented information. Problem solving is a research skill that arrived at a
set of conclusions from a question.

2.2.3 Critical Evaluation

A critical evaluation of a work must consider provenance, objectivity, persuasiveness and value. Provenance
shows author’s credentials or author’s arguments supported by evidence.

For example, a researcher reveals his ideas of historical materials with some case studies or researcher needs to
publish his recent scientific findings. Objectivity is the author’s perspective of even-handed or prejudicial mind
in the secondary data analysis. For example, a researcher discusses “cause and effects of atomic energy”, but
significantly ignored the first atom bomb blast. Persuasiveness is author’s convincing points to a narration of
subject. For example, a researcher likes to explain growth of a new political party in middle-class people. The
author remarked some convincing points such as inflation or scam produced by other parties, which are major
reasons of new party growth. Value is the author’s convincing argument and conclusions. For example,
researcher gives ultimate contribution of his/her understanding of the subject that he/she had discussed.

2.2.4 Objectives of Literature Review

Authors should try to accomplish the following four important objectives in preparing a literature review:

1. The review should provide a thorough overview of previous research on the topic. This should be a
helpful review for readers who are already familiar with the topic and an essential background for readers
who are new to the topic. The review should provide a clear sense about how the author’s current research
fits into the broader, sociological understanding of the topic. When the reader completes reading of the
literature review, he/she should be able to say, I now know what previous research has learned about this
topic.
2. The review should contain references to important previous studies related to the research question that
are found in high-quality sources such as scholarly books and journals. A good literature review conveys
the readers that the author has been conscientious in examining previous research. Authors in the research
build on what is already known. In this process, highly interested readers are also provided with a set of
references that they may wish to read themselves.
3. The review should be succinct and well organized. Most scholarly journals stipulate a maximum length for
papers submitted for publication, and often this is only about 20 pages. After all the work that has gone
into a paper, authors typically feel that they could write at least twice that much. Thus, every page is
precious, and authors must learn how to write succinctly. A typical literature review is only about 3, 4 or 5
of the 20 pages, and it must contain a lot of information. Therefore, it is necessary and helpful to learn
how to do it succinctly.

Many authors like to begin with a short “Introduction” section that identifies the general topic and its
importance. This is followed by the “Literature Review” section that provides the overview of the
previous research and explains what has and what has not already been learned. Much of the focus of
literature review is on previous research related to the dependent variable. This includes use of
sociological theories to explain the dependent variable. It may also be appropriate to focus on research
related to the independent variables.

4. The review should follow generally established stylistic guidelines. This conveys to readers that the author
is familiar with scholarly publication style and that can add legitimacy to the author’s work. In addition,
when the typical style is used, it is easier for readers to immediately follow the paper’s organization.

2.2.5 Importance of Literature Review

“Literature review is a scholarship”

How will you improve your knowledge before conducting an experimental or theoretical work? The available
literature should be reviewed related to your study and requirements. Literature review is a survey of scholarly
articles, books and other sources relevant to a particular area of research, issue and theory. Output of the
literature review is a description, summary and critical evaluation of selected works. For example, a researcher
wishes to develop an expert system for power distribution substations. He/She should review most of the works
related to power plant and power distribution expert systems before his/her actual work design. Without proper
literature review, the researcher will not be aware about similar works done in that area. It will end up in creating
work plagarism.

Researcher becomes a scholar in his/her area by a proper review in the literature. A literature review offers
simple summary of the key sources with organizational patterns and results. The following are the purposes of
the literature reviews:

Study the contribution of others’ work and its contribution of the research work in the problem domain.
Identify the relationship of each work to others and note them before the actual work.
Identify the gaps in the previous research by new interpretations and drop light on any identified gap.
Analyze the conflicts and resolve it, if any in the previous studies.
Duplication efforts can be prevented by prior scholarship in the current field.
Additional research area can fulfil after a successful literature review.
Literature review reveals common findings among studies and discloses inconsistencies between the
studies.
Literature review identifies factors not previously considered and provides suggestions for further
research.

Literature review is important because

Output of the literature review describes how proposed research work is discriminated from the previous
research statistics,
Originality of the content and relevance of the proposed research problem,
Proposed research methodology is justified,
Literature survey pre-demonstrate the research proposal.

Case Study

A researcher wishes to develop a language translator from a scrambled English text. He/She would develop a
system that identifies Malayalam words from a SMS or e-mail message. The message is typed in English letters.
How he/she can identify the Malayalam words typed in English letters? Before entering to the work design,
he/she should review all the literatures that give related study. Available literature may give ideas in English to
Arabic translator or English to Hindi translator. That is, the words typed in English language letters for Arabic or
Hindi words. These literature reviews accelerate his/her research work and produce better results. He/She can
write a survey article from the above studies.

2.3 CHARACTERISTICS OF A GOOD RESEARCH QUESTION

Certain characteristics are required in the creation of good questions in research.

1. A good research question can be answered by collecting and analyzing data.

Your literature search will be limited to scholarly research, which nearly always attempts to answer
questions by gathering and analyzing the data. If you ask a question that cannot be addressed by data, you
will not have a research question. An example of a non-research question would be one like this:

Will text messaging be the end of spelling as we know it?

That is an interesting question, but inappropriate as a research question because data could not answer a
question about behaviour in future.

Another non-research question is:

Do parents have a moral obligation to be involved in their children’s schools?

Although, worthwhile for individual consideration, questions of morals, values and religious faith cannot
be answered by data collection and analysis. However, you could survey a group of people to determine if
they believe that parents have a moral obligation to be involved in their children’s education, but you
would then be asking a different question.

2. A good research question assumes the possibility of different outcomes or opinions.

What is the Individuals with Disabilities Education Act? There is no room for opinion in this question.
Any credible source will provide identical information (though some sources will provide more and some
less). There is no way to draw a conclusion from it. This question might be the start of a report, but not of
a literature review or research design, which requires presentation of various points of view in order to add
new insights.

How did No Child Left Behind affect the main streaming of students receiving special education services?

This is a better research question because it will like generate more than one viewpoint, that allows the
writer to reach a conclusion (positive impact, negative impact, no impact, various impacts, etc.). The
question can also be addressed by the collection and analysis of data, and it is limited in scope main
streaming, special education.

3. A good research question is narrow.

What makes a good teacher? Think about what kind of research could answer this question. First of all,
what does “good” mean? How could “good” be measured or defined? Second, who should answer the
question: Teachers, Students, Parents or Administrators? Second, would the answers be the same for each
grade level? Third, what kind of teacher is to be included: science teacher, Tutor, Band director,
Economics professor or Parent? You have got the picture. The question is too broad.

4. A good research question is clear.

What is the best way to teach sex education? Such a question like this would certainly involve a diversity
of viewpoints; however, it is also too broad and vague to be meaningful. There is the problem of deciding
what “best” means and how it would be measured. There is a question about who would be doing the
teaching (public school, parents, religious institution, etc.) There is the problem of context: age and gender
of children, type of school and so on.

One better way to ask a sex education question is:

Can public middle school sex education classes significantly reduce student pregnancy rates?

This is a better question because it is limited (public middle school, “good” = significant reductions in
student pregnancy rates). Although this is a yes/no question, you are likely to uncover a pattern of
successful sex education; in other words, what kind of instruction resulted in the greatest reductions. In
that case, you will have a rich field to harvest for your discussion and conclusion sections of the literature
review.

5. A good research question is a single question.

Sometimes research questions have corollaries (closely related sub-questions), but for now just focus on a
single guiding question for which you attempt to find an answer in the existing literature. The single
guiding question should not have a multi-part. For example, do not ask, how larger class sizes affected
student test scores, student behaviour and teacher job satisfaction? Obviously, these are three questions,
and not one (in addition to being too broad and vague). You will not be able to manage a paper length of a
literature review, if multiple questions are embedded in your research question.

How do teachers and students benefit from teacher in-service training? That is two questions requiring two
literature reviews.

6. A good research question is built on sound assumptions.

To save money, I am going to buy a hybrid car. Where can I find the cheapest hybrid? The underlying
assumption is that buying a hybrid car will save you money. If so, finding the cheapest hybrid would be
worth the search. But will buying a hybrid save you money? Independent consumer research shows
probably not, depending upon what you are driving now, how much you drive, and the price of fuel. If
saving money is the goal, then why research hybrid prices instead of researching cheaper cars that would
save you more money?

How can schools get parents involved in their children’s education?

The question contains a strong assumption that children would benefit from their parents’ getting
involved. Once you have clarified the meaning of “involved in their children’s education”, you must make
sure that the benefits of such involvement have been supported by research and that the drawbacks of such
involvement do not overshadow the benefits. If your assumption is faulty, you have lost your readers at the
first paragraph.

Check your assumptions before you decide on the question. The validity of the assumption need not be
established in the paper, but most readers will know if it is faulty or weak.

2.4 LITERATURE REVIEW PROCESS


Most of the primary research study begins with a review of literature. Literature review offers a big picture of
topic of research from the previous studies to a current researcher in that field. The literature review serves a
path for additional research in the present problems by explaining the topic of research. Literature review results
in a critical thinking in the summarized articles of present topic. Major steps involved in the literature review are
given in Figure 2.7

1. Research topic identification

Students may choose research topics that are conceptualized and sometimes not reachable. They start
reading the available materials and investigate their scope in the research problem. For a narrow defined
problem, it may not be possible to identify any prior research that addressed to precise topic. A variety of
sources can be selected with respect to researcher’s interest, knowledge of social conditions, observations,
challenges and specific topic. This may lead the researcher to primary research topic. Researcher can
conclude a comprehensive review of the literature as his/her own for later use.

2. Review secondary sources to get an overview

To gather overview of the research, identify the secondary sources of the topic which is important. Many
literature reviews are available as a secondary source. Someone may already write a good literature review
in your research topic and is published. This may be useful for further reading. Journals, conferences
proceedings, research communities, magazines, etc. are such resources for a researcher.

Fig. 2.7 Process of literature review

3. Develop a search strategy


Using the appropriate primary resources such as journals, conference proceedings, community networks,
etc., researcher can develop a search strategy. Four important search strategies are identify preliminary
sources, identify primary research journals, access personal networks and involve community members.
Databases are preliminary sources that contain compiled indexes of bibliographic information, abstracts
and sometimes full-text articles. These resources can preserve in a wide range of topics and are accessible
in print form, in electronic media (CDROM/DVD) or in Internet. To identify primary research articles,
first examine the reference list at the end of the relevant journal articles or books. Then directly go to the
journals that is related to your topic of interest. For example, “Nucleic Acids Research” is a journal for
researchers in bioinformatics or molecular chemistry, while “nature magazine” gives a wide range of
research readers.

4. Perform the search process in the selected articles to review

Combination of self-knowledge and community knowledge and skills result in an effective research and
evaluation methods. For example, research guides can organize their peer group internal meetings among
the similar topics of interest. In the meeting, the researcher will be able to communicate with each other
and discuss their topics. In such internal meeting, the researcher may get feedback from his peer group
members, before communicating a work into a conference/journal.

5. Collect the full-text references

Many journals, articles, conference proceedings and books are available in online texts. The materials that
are not available online can be accessed through the library. Librarians can provide articles by the help of
complete bibliographic information. Most libraries are providing digital facilities to their members.

6. Prepare bibliographic notes from the articles and notes on each article

Read the collected documents and check whether it is required in your research. Relevant materials related
to your topics need to be recorded as bibliographic information. These materials can be sorted and kept for
further references. Several options are available for recoding the collected materials. One of the citation
options is as follows:

Journal:

Author’s Last Name, Initials. (year). “Title of journal article”, Title of Journal, volume number(issue
number), page numbers.

For example:

Chandra V. (2010). “MTar: a computational microRNA target prediction architecture for human
transcriptome”, BMC Bioinformatics, 11(S2), 1–9.

Book:

Author’s Last Name, Initials. (year). “Title of book”, edition, Publisher, Place of publication.

For example:

Chandra, V; Hareendran, A. (2014). “Artificial Intelligence and Machine Learning”, 1 edn, Prentice Hall,
India.

7. Evaluate research reports and make a note

Evaluation of reports is one of the special tasks of a researcher by making sufficient notes and backlog
references. These reports are useful for further reading.
8. Identify and record your findings

Recording and storing the material is bookkeeping in the research work. Develop a flexible framework
that synthesis stage by stage. It is flexible because you can add, delete and modify your organized
materials. This is done by continuous review process. For example, in the second edition of a book, the
author may have different organization style used and is not even mentioned in the first edition of the
same book. After publishing the first edition, the author records these findings.

9. Use the literature review

Researcher must formulate a conceptual framework and plan research questions, hypotheses or both. A
researcher formulates a conceptual framework that influences a high-level planning and conduct the
research review. The researchers keep an open mind for self-arguments and question making. He/She can
find hypothesis for his/her study and test whether they are correct.

A literature review is structured by an overview of the subject, issue or theory and objectives of the study. The
division of literature review is organized into categories or subsections. Literature review focusses to an
explanation of the similarity and difference with others’ work. The written article needs conclusions that
contribute your understanding and development of the area of research. The reviewed literatures are useful for
later reference.

2.4.1 Primary and Secondary Sources of Literature

Primary sources are the first hand evidence left behind by participants or observers at the time of events. They
are from the time period involved and have not been filtered through interpretation or evaluation. They are
original materials on which other research is deployed. They are usually the first formal appearance of results in
physical, print or electronic format. They present original thinking, report a discovery or share new information.
Artefacts (e.g., coins, plant specimens, fossils, furniture, tools, clothing, all from the time under study), Audio
recordings (e.g., radio programs), Diaries, Internet communications on email, Interviews (e.g., oral histories,
telephone, e-mail), Journal articles published in peer-reviewed publications, Newspaper articles written at the
time, Original documents, etc. come under primary sources of literature.

Secondary sources are less easily defined than primary sources. Generally, they are accounts written after the
fact with the benefit of hindsight. They are interpretations and evaluations of primary sources. Secondary
sources are not evidence, but rather commentary on and discussion of evidence. Bibliographies (also considered
tertiary), Biographical works, Commentaries, Criticisms, Dictionaries, Encyclopaedias, Histories, etc. are the
examples of secondary literature. Figure 2.8 shows primary and secondary literatures.
Fig. 2.8 Primary and secondary literatures

2.4.2 Identifying Literature Review Gaps

A gap in the literature is a research question relevant to a given domain that has not been answered at all or
adequately in existing peer-reviewed scholarship. A gap in the literature may emerge if,

The question has not been addressed in a given domain, although it may have been answered in a similar
or related area.
The question has never been asked before, but it now merits exploration due to changes in accepted
theory, data collection technology or culture.
The question has been asked and tested in peer-reviewed research, but the methods were either of
questionable validity or had necessitated limited applicability of results. Alternatively, a replication study
could be run to verify a published study’s results if appropriate.

2.4.3 Literature Review Pitfalls

Is Internet a good literature resource? We can say that the Internet provides huge amount of data and
information. Its service is countless for a researcher. But we cannot fully trust the Internet because the ownership
of the data and information shared in Internet are not trustworthy. The Internet provides you pseudo-science and
poor research. The documentations provided in the Internet are not correct and may be wrong path for a
researcher. Therefore, identification of correct materials is one of the challenges of the researcher.

Another pitfall in the literature review is trade magazines and journals. Without much reviews or evaluation,
some articles published or recommended for publication. These articles may not give any results or they may
give some false results for the followers. These may misled a researcher, if he/she selects that article as a basic
material.
In certain cases, students may not have right of entry to certain information. They have spent unnecessary time
and resources on searching for the review. If the supervisor is not providing a right direction and feedback, the
student may enter in to a confusing stage or their valuable time may be lost.

The good and poor literature reviews are obtained by researchers’ views and hard work. Table 2.1 shows a
comparison of good and poor literature review output.

Table 2.1 Comparison of good and poor literature review

EXERCISES

1. Explain scientific importance of primary and secondary sources of literatures.


2. Can we trust Internet as a primary source of literature?
3. What is a literature review? How it differs from literature survey?
4. How do you arrive that the literature review is poor?
5. Explain the steps involved in formulating a research problem.
6. What are the major criteria for selecting a research problem?
7. Explain feasibility of a research problem.
8. Why is it so important to have a clear/specific research problem definition?
9. Explain critical thinking?
10. What are the major objectives of literature review?
11. What are the basic purposes of having literature survey?
12. Explain in detail, the various characteristics of a good research question?
13. Explain the literature review process.
14. Define and differentiate good and poor literature survey.
15. What do you mean by the statement, a good research question is a single question?
chapter 3
DATA COLLECTION
Objectives:

After completing this chapter, you can understand the following:

The definition of primary and secondary data


The sources primary and secondary data
Various methods of data collection
The definition of data processing
Different classifications of data

Oxford dictionary defines data as “facts, figures and statistics collected together for analysis”. For most of the
real-time projects, these facts and figures are unorganized and unprocessed. Consider a sentence, “the price of
crude oil is $180 per barrel”. Can any specific information be gained from this sentence? Is it current price of the
crude oil in India? Such questions cannot be answered from these data fragments. So, data can be a number, a
word, an object, a picture, a graph or even a recorded sound, and it can also be characters, integers, real
numbers, strings, etc. Data by itself has no meaning. These fragments fill the gap of some information, which is
in an organized manner. These organized structures are called databases.

3.1 PRIMARY AND SECONDARY DATA

From a researcher’s point of view, data can be categorized into primary and secondary datasets. Researchers
need to consider two choices of data sources: primary and secondary. Suppose a researcher prepares a
questionnaire in his topic of research interest. The data thus obtained will be used in direct analysis; it is termed
as primary dataset. If the researcher obtained the same data from some official documents, then it is called
secondary dataset.

3.1.1 Primary Data

Primary data are generated by data gathering techniques. For example, an economics researcher preparing an
analysis report from annual reports of companies or official statistics of the statistical office is considered as
secondary data. Researchers use different methods in the collection of data sources and their usage. Figure 3.1
shows difference in primary and secondary data sources.

Fig. 3.1 Primary and secondary data


There exists a high-level link between primary and secondary data sources. Researchers can interchange and
apply these links based on the way how data need to be used. If the researcher is seeking data directly, he/she
makes use of primary data sources. The primary data collection methods are as follows:

Questionnaire
Interviews
Observations
Surveys
Action research
Longitudinal studies
Life histories

Advantages

Primary data is original and is apt to the research topic, as the degree of accuracy is high.
A variety of methods are used to collect primary data, from a large population and geographical coverage
is promised. For an instance, a web-based questionnaire gives a wide range of geographical coverage and
border population. For example, a common question like, “Is Aadhar required for subsidies in domestic
LPG cylinders?”
Primary data provides a realistic view because the data is current in the selected topic.
Primary data offers high reliability and robustness.

Disadvantages

Interview is a primary data collection technique, which is limited and expensive when it needs a wider
coverage or a massive number of requirements.
More time is required for data collection. Even analysis and report generation from a primary source are
time consuming.
Many design problems are incorporated with primary data collection techniques. For example, to conduct
a survey, high-level design is required because questions must be simple to understand and easy to answer.
A timely response is inadequate in primary data collection. Respondent may give fake or sweet answers
and try to cover up the facts or realities.
Time, efforts and costs involved in data collection method are more and much man power is required.
Concept of logic may be lost in some primary data collection techniques. This can cause some negative
impact.
Trained persons are required in the data collection steps. The availability of such personal is very less and
also costly.

3.1.2 Secondary Data

Secondary data are generated by alternate methods. They are collected by the researcher for supporting
purposes. For example, a researcher collects 2011 census data of India for his/her research purpose to study the
impact of education in carrier selection and earning. Here, the census data is considered as the secondary dataset
for his/her purpose. Thus, he/she got the primary data from the secondary sources.

A variety of secondary data are available in different sources in written, typed or in electronic forms.
Researchers can gather these data from industries, organizations, databases, etc. This is used to gain initial
insight into their research problems. The collected data may be in different forms, from different sources in the
form of raw data, which needs to be organized for the research purpose.

Secondary data can be either internal or external. Internal data is acquired within an organization for a
researcher. For example, a researcher is conducting his/her work in demography department; the department
needs census data for analysis. Thus, this becomes an internal data. External data are those obtained from
outside the department such as national population register, records from government of India and so on.

Secondary data are collected from internal and external sources (Fig. 3.2) such as accounts, periodicals,
government records, internet sources and so on. Secondary documents can be generated from supporting
documents, historical documents, etc. (Fig. 3.3).

Fig. 3.2 Types of secondary data sources

In essence, primary and secondary sources do not have any tight separation, but a researcher always keeps a
narrow gap between the two while involved in deep research. For example, data collected from the internet is
considered sometimes primary or secondary. So, shall we consider internet as a primary data source or
secondary data source?

Advantages

Data population is already available for the investigator, so that data collection time can be saved.
It is less expensive and also there is a faster access in the data sources.

Fig. 3.3 Document cycle

Responsibility of the quality of data does not falls onto the investigator.
Secondary data give a frame to the researcher’s mind, which has a direction for specific research.
Secondary data are collective and voluminous.
Disadvantages

Investigators may confuse what is to be collected due to its bulkiness.


Quality of the data may be doubtful.
Additional data that are related to the collected data may not be available.
Due to variable environment factors, data collection locations may not be suitable.
Special care should be required for secondary data modification, analysis and change, because it may be
copyright protected.

Case Studies

1. A researcher needs to conduct a social survey related to “requirement of bank account in Below Poverty
Line (BPL) people”. Initially, how he/she can conclude that “these are the people who are under BPL
category?” First, the researcher collects the relevant data as he/she wishes. In the next step, he/she collects
the list of people who are under BPL category (address, location, house number or any other useful links).
This source may be from the population register, voters register or from the panchayat register (any
government record source). This dataset is the secondary data for his/her survey. He/She can prepare a set
of questions or conduct personal meetings with the selected people (or family) for collecting the data
regarding his requirements. The resultant dataset is primary data and is useful for analysis.
2. Suppose an environment researcher wants to study “the corals in Lakshadweep”. Initially, he/she collects
data with respect to coral types, geographical locations, availability, etc., from the internet or any
government agencies. This is considered as secondary data source. In the next step, he/she may visit the
selected locations of the Lakshadweep to collect his/her required samples. Features of the samples are
primary data obtained from the primary data sources.

3.2 PRIMARY AND SECONDARY DATA SOURCES

A primary source is an original object or document. This is raw information or material or the first-hand account
of an event. The primary source material may be created by participant or researchers at the time of their study.
Some of the primary sources are interviews, current newspapers, manuscripts, government documents, etc.

Secondary sources are any published or unpublished work that may be removed from the original source.
Usually these are obtained by summarization, analysis, evaluation or even derived from primary source
materials. The secondary source may be obtained from a criticism or interpretation of primary course. Some
examples of secondary sources are text books, review articles, biographies, music clips, article about events etc.

Secondary source may also be primary source due to its types of direction. For example, a researcher collects
some scientific data from past researchers and he/she will come to know that it is useful for his/her study. So
he/she considers this dataset as primary data source. The distinction of primary and secondary sources depend on
how a researcher is using the source and the nature of the research. The various sources of data are shown in Fig.
3.4, and the various sources of secondary data are given in Fig. 3.5.

Case Studies

1. Suppose a researcher wants to do his/her work in speech and signal processing. He/She would like to do a
“mimic system” from a set of voice data. Initially, he/she makes the design model, and then trains it and
finally tests with some input voice. The “text-to-speech” will be carried out by a trained man’s voice. For
this study, he/she can collect voice clips from the internet or radio sources recorded data. The source may
be secondary source but for this study this is considered as a primary source of data.
Fig. 3.4 Sources of data

Fig. 3.5 Sources of secondary data

1. 2. A researcher wishes to do his/her work in a public interested subject, say “effects of war”. This work
will be basically a public survey, as we know that articles in the newspapers and magazines are considered
as secondary sources. If the researcher is conducting study on “Afghanistan war”, an eye witness accounts
for a primary source for his/her work, where he/she can collect data from the witness and the newspaper
reports about the war become supporting evidences or additional materials. Through interviews or similar
investigation methods, the relevant data can be collected, sorted and stored for later use. These relevant
sources may be a secondary source.

3.3 DATA COLLECTION METHODS


Data collection is an important aspect of any type of research study. Inaccurate data collection can affect the
results of a study and ultimately lead to invalid results. An empirical method is used for data collection by
researchers entrusted in quantitative work. The various methods used for data collection are discussed in the
following sections.

3.3.1 Questionnaire

Questionnaire is a set of questions that have been prepared and often sent to a large number people. Most of the
time questions are either in printed or in electronic form, to be answered by individuals. The questions should be
very clear and easy to understand. After completing the questionnaire, a report needs to be prepared. The
following are the important points to be remembered while designing a good questionnaire:

1. Group items into logically coherent sections,


2. Begin with non-threatening and interesting questions,
3. Do not put important items at the very end of the questionnaire,
4. Do not crowd a page with too many items (if paper is the medium of communication),
5. Do not crowd too many questions or choices in an electronic page,
6. Avoid abbreviations and biased items or terms,
7. Number the questions to avoid confusions,
8. Provide anonymity to respondents,
9. Test all questions in the questionnaire to avoid confusing ones.

For example, a common questionnaire looks like,

1. How often do you backup your computer file (in your hard disk)?
1. Frequently
2. Some times
3. Hardly at all
4. Never

Advantages

1. Questionnaires are very useful when it is used for specific purpose and not in general information
gathering.
2. This data collection method typically includes closed-ended questions.
3. Questionnaire must have clarity and needs to be logical in sequence as any doubts can be cleared.
4. Questionnaires are preferred to be in objective format, because it helps even the less educated community
to respond to it.
5. Data can be gathered from a large group. So, data collection is good enough based on the number of
respondents with relatively low cost.
6. A variety of communication channel may be used such as telephone, e-mail, postal, web enabled and so
on.
7. Easy to reach people in a large geographical area if they are spread over in remote locations.
8. Face-to-face questionnaires are possible and are appropriate in consulting disabled people.

Disadvantages

1. Response rate may be low due to some refusal from the people.
2. Postal questionnaires are very slow.
3. Technical surveys require high-level design skills in order to simplify the questionnaire.
4. Trained professionals are required in face-to-face questionnaires.
5. Face-to-face questionnaire is time consuming and costly due to intensive labour.
6. Evaluation of questionnaire is difficult if the answers are in written form.
7. Common mistakes lead to information fabrication.

Case Study

An economic survey is conducted on the average income of families over a population of 10,000 people living in
a housing board colony. The questions are designed in the closed-end format.

1. Gender:

M: Male F: Female

2. Age group:
1. 20–30 years
2. 30–40 years
3. 40–50 years
4. Above 50 years
3. Yearly house hold income:
1. Below 2 lakhs
2. 2–5 lakhs
3. 5–10 lakhs
4. Above 10 lakhs
4. Your occupation:
1. Professional
2. Government
3. Business
4. Self-employed
5. Average annual income tax paid:
1. Below 1 lakh
2. 1–3 lakhs
3. 3–5 lakhs
4. Above 5 lakhs
6. Number of members in your family:
1. 2
2. 2–4
3. 4–6
4. Above 6
7. Average monthly saving expenses for a family in saving schemes:
1. Below Rs. 50,000
2. Rs. 50,000 1 lakhs
3. Rs. 1 3 lakhs
4. Above 3 lakhs
8. Do you have child insurance plan:
1. Yes
2. No
9. Average travel expenses for a month:
1. Below Rs. 1000
2. Rs. 1000 to 3000
3. Rs. 3000 to 5000
4. Above Rs. 5000
10. Average educational expenses in a month:
1. Below Rs. 1000
2. Rs. 1000 to 5000
3. Rs. 5000 to 10,000
4. Above Rs. 10,000
The questions may or may not be answered by every respondent. Some of the respondents may answer
completely while some of them may not. The analysis is very difficult in the scenario if most of the questions are
not answered.

3.3.2 Questionnaire Design

Questionnaire is a tool for collecting and recording information about an issue of interest. A list of questions are
included in simple or understandable format for the respondents. The information gathering is essential, so that
careful consideration is to be given while designing the questionnaire. The number of stages included in the
questionnaire design is given in Fig. 3.6.

Step 1. Initial Considerations

First, a researcher must decide and be clear about the type and nature of information that needs to be collected
and what exactly is targeted. For example, if we need to collect economic survey data of low poverty line family,
decide which are the target geographical locations to be selected, how many districts are included?, etc.

Fig. 3.6 Steps involved in the design of questionnaire

Step 2. Question Content

It is an important step as the questionnaire design is highly related to the type/kind of questions that need to be
included. The prepared questions must be clear and easy to understand. It should not cause any confusion to the
readers. Question preparation is a creative process and it needs some skill set to prepare it. The questions thus
prepared should always adhere to some standard formats.

Step 3. Question Sequence and Layout

The questions should be numbered and ordered with a touch of logical flow. This is because the respondent
should feel a light mind while answering the questions. Funnelling is a technique that begins with general
questions before entering to specific questions. Some questions may need routing. For example, if yes go to
question number 5, else go to question number 10, etc. Care should be taken while creating routing as too many
routing create complex architecture.
Equalities: Only a part of population may be related while we include questions on gender, age, ethnic group,
etc. It is a bad practice to ask these questions and simply store such data rather than using them for specific
purpose. While making the questions, always remember such accounts for equalities are met. Equalities are
sometimes included in transactions into minority ethnic language for people who are not native English
speakers. This explains the focus of the survey and allows them to request a translated copy of the questionnaire.

Confidentiality: Sometimes the respondent needs an assurance of confidentiality of information they provide on
the questionnaire. Personal details and identities cannot be disclosed to others except for research purposes and
data analysis. For example, in election surveys, responses are very important but revealing people’s identity to
others may do serious hazards.

Step 4. Piloting the Questionnaire

A pre-test is required for the questionnaire within a small population before publishing. This pilot should check
people’s understanding and ability to answer the questions. Various other concerns such as response time,
confusion in the respondents such as routing errors, question clarity, equalities, etc. can also be studied by
conducting the pilot study.

Design of research questionnaire

For social research, questionnaire is an important data collection tool. Basically, it serves four purposes:

1. Collection of appropriate data


2. Prepares the data in comparable and agreeable form for analysis
3. Formulate minimal biased questions
4. Prepare questions in varied forms

The following are the important points to be noted while designing a questionnaire for research purpose:

Keep the objective of the research in your mind and the structure of questions.
Provide a clear direction of how to answer and compile questionnaire with examples (if it is necessary).
Format of the questionnaire should be neat, clear and contain instructions on how to complete these
questions.
Use of easy and simple English throughout the questionnaire.
An element of motivation can be given to increase its response rate.
Pre-preparation is essential in order to ensure that the questionnaire should be completed within the time
limit.
Avoid confusing words and highly technical words in the questionnaire.
As far as possible questionnaire should be structured and standardized.
Questions should have logical order and sequential flow.
Questions should be numbered in chronological order.
Biased contents, themes and concepts should be avoided.
Questionnaire should not embarrass or hurt the respondent.
Questionnaire must be focussed to the objectives and should not lead to specific answer.
Questions should be accurate as far as possible from the feedback of the respondent.
Anticipated questions delivered to the respondents at the instruction phase will be helpful to the
respondents.
Enough time need to be allotted to each questions and entire questionnaire.
A pre-test of the same or similar questionnaire should be conducted in order to identify the problems and
threats to improve the quality of the questionnaire.
A logical structuring and design of the pages are essential in the questionnaire, as it gives a good “look
and feel”.
If we follow the aforementioned guidelines while preparing a research questionnaire, accuracy and clarity of the
data collection will increase. Research problems that need data collection using questionnaire should decide on
the type of information that needs to be searched for. The questionnaire should be problem specific. After the
construction of questionnaire, it must be scaled and set in a logical framework followed by a pre-test. Based on
resulting recommended corrections, adoptions or deletions, questions can be modified to increase the quality of
the final questionnaire.

3.3.3 Types of Questionnaires

Based on the format or type of question preparation, there are two types of questionnaire: open end and closed
end.

Open format questions

If our audience needs a free-flowing opinion and they wish to express their answers, then we can design open
questions.

For example,

1. Provide your opinion regarding the quality of XYZ Company’s products and services.

Ans: …

The open format questions give true insight and unexpected suggestions for improvement.

Closed format questions

Multiple choices and true or false type questions restrict the respondent to choose any one answer. This format is
called closed-end questions. The analysis and calculation of statistics become easy, if we are using closed-end
questions in our questionnaire.

For example,

1. Rate the quality of XYZ product and service.


1. Poor
2. Satisfactory
3. Good
4. Excellent

There are seven ways in which one can create closed-end questions. These responses are accurate in
statistical analysis. The closed format questions are seen in different types based on the choice of answers
given.

1. Leading questions: The question that forwards the audience for a particular answer. The answers are
equally likely.

For example,

1. How would you rate our service?


1. Fair
2. Good
3. Excellent
4. Super
2. Importance questions: A scale of 1 to 5 is given to the questions, the respondents are free to select.
For example,

1. Cost-benefits of our service are …


1. Extremely important
2. Very important
3. Sometimes important
4. Not very important
5. Not at all important
3. Likert questions: These types of questions help in a strong agree if the respondent in particular statement
with a certain product/service needs.

For example,

1. Do you agree XYZ company’s product have to improve on quality?


1. Strong agree
2. Agree
3. Neither agree or disagree
4. Disagree
5. Strongly disagree
4. Dichotomous questions: It is like a “true or false” or “yes or no” type questions. The analysis scope is very
poor; this is one of the major drawbacks of these types of questions.

For example,

1. Do you like the products of XYZ company?


1. Yes
2. No
5. Bipolar question: Suppose a question set contains two extreme answers written in opposite end of the
scale.

For example,

1. How would you describe the service of XYZ company?


Fast – X – – Slow
Reliable – X – – Unreliable
Efficient – – – X Inefficient
Excellent X – – – Poor
6. Rating scale questions: Most of the companies in service sectors are using these types of questions. The
respondents are asked to rate a particular issue on a scale that ranges between poor and good.

For example,

1. How do you rate the service of XYZ company?


1. Good
2. Fair
3. Poor
4. Very poor
7. Buying propensity questions: In order to identify futuristic intentions of a customer and determine
respondent’s buying intention. A particular product review or requirements can be addressed in these types
of questions.

For example,

1. If TV channels are provided in your cell phones, would you prefer to buy it?
1. Definitely
2. Probably
3. Probably not
4. Not sure
5. Definitely not

Another classification of questionnaire based on organization or involvement of questions is discussed here.

Structured questionnaire

If the questions are listed in a prearranged order, it is called structured questionnaire. These listings are in
chronological order and maintain a flow.

Non-structured questionnaire

The question in the questionnaire does not maintain any order or flow but maintain some structure. Researchers
are free to ask any sequence as per their need/wish.

Disguised questionnaire

The respondents who are involved in the questionnaire are aware of the purpose of data collection and need of
information gathering. Such a type questionnaire is considered as disguised questionnaire.

Non-disguised questionnaire

The respondents are unaware of conducting the questionnaire or need of conducting survey.

Case Study

A botany researcher wishes to collect data regarding “medicinal plants used by Tribes. Initially, he/she locates
the places where he/she wishes to visit; then identify the tribal group who can contribute to his/her research
work; what are the plants to be included in his/her study, etc. He/She can prepare a questionnaire that may be
open-ended or some questions are in closed format, based on the requirements. The questions need not maintain
any flow because he/she wishes to collect data from different tribal people/groups; most of them are unaware of
some of the medicinal plants and its usage. The targeted group may contain different types of scholars. Also they
may be not familiar with standard decisions. So questions may be disguised also.

3.3.4 Interviews

It is a powerful data collection method that handles personal meetings with necessary questions. Usually,
interviews are quantitative research aids for data gathering. There are two people committed to come across:
interviewer and interviewee. The interviewer efficiently collects the data by cross examination and must be
sharp to get an accurate data.

Advantages

Information can be gathered from literate as well as illiterate people.


Non-responding population will be very less.
Reliable data sources are obtained in a structured interview.
Provides an excellent way to probe and explore questions.
Disadvantages

Require skilled staff and ample time for conducting the interviews.
Sometimes special equipment are required to record and transcribe interviews.
Chances of bias and threats are more.
Cost and manpower are high.
Personal questions are non-judgemental.

Interview process

For successful data collection, interviews should be in correct format. While performing an interview, the basic
things to be considered as mentioned below.

Step 1. Identify what information is required. What exactly are you looking for?

Step 2. Determine the audience for the interview. Interviews can be face to face or telephonic or chat based.
Prepare an invitation that may contain purpose of the interview, participants’ role and expected outcome.

Step 3. Prepare the interview schedule, contents, format and its layout.

Step 4. A pilot interview schedule needs to be conducted in order to rectify errors.

Step 5. Conduct interview.

Step 6. Transcribe interview.

Step 7. Analyze the outcomes.

3.3.5 Types of Interview

Basically two types of interviews are in research data gathering: structured and unstructured. In structured
interviews, researchers prepare a standard set of questions. There are open-end and closed-end questions in the
interviews. Open-end questions give more sense of involvement but take long time to answer and difficult to
summarize. The questions are focussed on specified answers such as “what you think about”, “how it can be
improved” and so on.

The closed-end questions offer set of specified responses like true or false, multiple choices, rating a response,
etc. In such cases, useful information may be overlooked and more topics can be covered in less time period.
Figure 3.7 shows basic types of interview questions with examples. While preparing questions for the interview,
any methodology can be adapted viz. top-down or bottom-up question preparation (Fig. 3.8). These two
methodologies have their own advantages.
Fig. 3.7 Types of questions

Fig. 3.8 Methods – top-down and bottom-up

Case Study

Consider an interview with a doctor for making an expert system software. The questions may be closed and
open ended. Finally, the expert system may ask questions and get conclusions.

Patient: I have a headache.

System: I suggest you take a pain killer. Do you have a sore throat?

Patient: Yes, I do.

System: Have you been coughing?

Patient: No, why do you ask?

System: If you had a headache and a sore throat and had been coughing, I would have to conclude that you
might have the flu. It does not sound as though you have the flu? Have you been feeling tired recently?

Patient: Yes, I have.


System: It sounds as though you may have glandular fever or tonsillitis.

3.3.6 Observation

Observation is a complex research method that requires more role and attention from a researcher at the time of
data collection. An observer gives careful watch because his/her skill set and experience are very much used for
the data collection as there is no close contact with the respondents. That is, the observer’s talent is dependable
for the data collection. Observation data collection method is useful when the subject cannot provide any
accurate information. For specific topics of study, observation is suitable and it fulfills the requirements for data
collection properly. Figure 3.9 shows the various types of observations.

Fig. 3.9 Types of observation

A major drawback of the observation technique is the need of skilled staff as there is no chance to ask questions
during observation. When large samples are the subject of study/concern, this method is inappropriate.

Case Study

Consider a researcher who is doing his/her research works in power plants (generating electricity). Basically,
he/she is an electrical engineer, so that the technical terms related to power plants and turbine are well known.
He/She needs observation of the actual power plant operations because most of the collected data are studied
theoretically (or from text books). To understand the components and clarify the data, he/she needs a clear
observation in the power plant. The observation helps him/her to place the data in his/her research slots.

3.3.7 Record Reviews

Reviewing the records/reports from the secondary source is another method of data collection. Through this
method, researchers may get a clear picture about his/her accurate area of information source. For example,
historical research may need to search old records or reports of such types. Another example is a medical
research to conduct a survey of “heart attack of middle age people”. The record reviews from hospitals are
essential for his/her study to get data. Here, the patient’s case history serves as the data source.

3.3.8 Schedules

This method is a time-lined process in which the appointed personal will give training. Consider a research
focussed to collect information from a large uneducated and non-responsive group. The questionnaire may not
be good enough for data collection. Here, questionnaires are prepared and sent to the enumerators who are
appointed by investigators. They can explain the objective, scope and purpose of the survey expecting their co-
operation. To complete the data collection, the questionnaire is filled by the enumerators. Such methods are
schedules and are used in extensive studies. Accuracy of the collected information is highly dependent on the
honesty of the enumerators. So they must be trained. This method is more time consuming and costly. Some
examples of scheduling are as follows:
Village or community schedule:
For example, census researchers collect general information on population etc.

Household schedule:
For example, demographic details of households, education, relations, etc.

Attitude schedule:
For example, views of population, a particular event, etc.

3.4 DATA PROCESSING

Data processing is an operation(s) performed by automatic means such as collection, recording, streamlined
storage and analysis of useful information. Data processing is an important activity which involves five steps
(Fig. 3.10).

1. Input step

In this step, data is collected and transformed into a computer understandable format because correct
output entirely depends on the input data. The collected data should be verified and corrections if any
required should be done. Next step is coding. Coding converts the collected data into machine readable
format for computerized processing. These data are stored on the secondary storage in the form of files or
database.

2. Processing step

Manipulations can be done on the collected data for data processing such as classification, sorting,
calculation, comparing, summarizing, etc. This process gives information from the processed data.
Classification is the process of grouping the data in subgroups that can be handled separately. Sorting is
the arrangement of data in any order for faster access. Arithmetic operations can be performed in this
numerical data for required results. For top managements, a summary of the selected data is required. For
example, percentage of pass or fail criteria of students in examinations.

Fig. 3.10 Data processing

3. Output step

In the output steps, processed data are visualized in different forms. Each view is user dependent based on
their actual requirements. There may be required hard/soft copy of the processed data as an output view.
For example, graphical representation of “comparison of degree results of different colleges” in a
University.

4. Storage step

An output is generated from the processed data. The generated output can be kept in the secondary storage
for future use. For example, first three ranks of BSc. Computer Science students can be obtained by data
processing step and stored in the secondary storage for any time search. This data processing includes
consolidation of all marks obtained in the past examinations. A comparison of consolidated marks
obtained in the past examinations can be obtained from this store. Also comparison of consolidated marks
among the whole registered candidates can be obtained for rank calculation. Sorting process should be
applied for displaying first three ranks.

5. Communication step

The output is required to be stored in the media that can be retrieved any time. Some conversions are also
required for storing data in different forms. For example, graphical representations of examination result.
Finally, communicating these processed data to different users. For example, weather forecast report is
obtained by a series of data processing. These reports are sent to government agencies or newspapers
when it is required.

3.4.1 Types of Data Processing

There are manual data processing, electronic data processing, real-time processing and batch processing. In
manual processing, many errors may occur, such as data capturing, operator mistake, etc.. This type of data
processing is expensive due to heavy labour cost. In electronic data processing (EDP), computers play an
important role in data processing. An information system (IS) is an example of EDP system that evolved from a
data processing unit. For example, an Automatic Teller Machine (ATM) works as an EDP system that gives
reports as output.

In real-time processing, for a continual input, a processed output data is obtained. Only a small time period is
involved in between the processes. For example, in the case of online banking, the processing time is very small
between transactions and balance is updated as soon as possible for secure transactions.

In batch processing, a group of transactions are collected over a period of time. That is, data is collected, entered
and processed in batches and the results are produced. Separate programs are required for input and output data
processing. When high volume of data is involved in the research process, this technique is highly acceptable.
For example, in the university examination system, online mark entry and decoding process are considered
separately (parallel or batch wise) for efficient data processing.

3.4.2 Data Processing Stages

Data preparation steps involve editing, coding, verifying, analyzing and displaying the data. Data need to be
processed before it is used for analysis. Figure 3.11 shows stages in data processing.
Fig. 3.11 Stages of data processing

1. Editing

Editing is a process of checking and adjusting the obtained data. The data editing is required to review the
collected data, for maximizing its accuracy and clarity. Early editing gives much advantage of permitting the
accurate analysis. Editing process normally ensures consistency and uniformity in the data treatment. Editing
tools are available for researchers in various forms depending on its use. A good editing process should
essentially possess the following:

Legibility of entries: The collected data should be legible; otherwise data should not be used.
Completeness of the entries: Ambiguous data may create many problems in the research. So its
completeness must be ensured in the records in future.
Consistency of entries: Questions may evolve due to inconsistent entries. If the responses are incorrect,
use editing tools to correct the data.
Accuracy of entries: Editing tool needs a look for any indication of inaccurate data obtained from the
interview.

Types of editing

Major types of editing are as follows:

Field editing: Field supervisor can edit the data on the same day of data collection. These errors may be the
legibility of handwriting and poor responses that are logically inconsistent.

In-house editing: An in-house team in the central office can perform editing process of the collected data. For
example, rearrangement of data after the questionnaire or interview process.

Major purpose of editing is for ensuring the consistency between responses. “No response” may be corrected in
order to reduce errors. Such legitimate errors are corrected before coding process.

2. Coding

It is a process of identification and classification of the answers with numerical representation. A numerical
score or symbol is a code that serves rule for interpreting, classifying and recording the data. Coding process
translates answers into numerical numbers for later evaluation. It can be visualized with the help of coding book,
code sheets or computer software. A data matrix that is arranged in row and column wise (organized into fields,
records and files) is one of the translated form for the coding stage. Here, fields are collection of characters that
represent a single type of data. Records are collection of related fields within a same respondent. A file is related
to collection of records. For example, student’s data are stored in a file named “student”. Each student’s details
are stored in the records. Each record contains student’s data, such as “name”, “date of birth” and so on, which
are considered as fields.

Coding process may have some issues such as fixed alternative questions – before the data collection, a fixed
coding technique is provided to the questions. Another issue in coding process is the hardship in maintaining a
code book. Different coding books are required for the process if it is a lengthy process. How many books are
required? What books are required? Such questions are difficult to address.

Case Study

A research survey is conducted in three districts of Kerala state to identify “role of youth contribution in political
parties”. Figure 3.12 shows the same age group of 300 people who involved in the survey from three districts:
Kannur, Thrissur and Kollam.

The corresponding graphical representation is given in Fig. 3.13.

In the questionnaire, gender is a field in order to get the “role of woman youth”. Figure 3.14 shows data
distribution over the selected three districts and Fig. 3.15 shows the marital status of the respondents.

In the questionnaire, people may not give data such as marital status. Suppose this field is not answered by any
of the respondent, then in editing process it must be filled. In this questionnaire, the marital status may be look
like

Fig. 3.12 Data table


Fig. 3.13 Graphical representation

Fig. 3.14 Data table 1

Fig. 3.15 Data table 2

Married … Unmarried … Widower

This must be translated to counts (Fig. 3.14), which is the actual coding process. These tabular forms are good
enough for the data analysis. Similarly questionnaire contains many questions for the respondents. Before the
data analysis, editing and cleaning processes are required. Data cleaning is an intermediate step before the data
analysis step. In this step, errors are cleaned before the analysis stage. For example, in a questionnaire, the age of
a person is given as 120. Is it 20 or 12? This must be cleared. Another example, 1 = male and 0 = female is given
in a questionnaire. But some missing value may happen like the person forget to answer the query. These
mistakes must be cleared.
3.5 CLASSIFICATION OF DATA

There are different classifications of data with respect to organization, storage and usage. This classification
helps the researcher to physically collect, logically organize and technically arrange the datasets.

3.5.1 Quantitative and Qualitative Data

Quantitative approach includes historical research, which collects narrative data to attain some insights.
Qualitative analysis transforms data into findings. Though guidelines exist, there is no formula for its translation.

A quantitative method of data analysis draws meaningful results from a large body of data. For example,
university examination database is very complex and quantitative in nature because it may contain data, images,
etc. The results or consolidated mark list is an output from the quantitative data. This consolidation is a
quantitative analysis.

Quantitative analysis allows report summary in numerical terms with certain degree of confidence. For example,
a statement like “65% of the households use an unprotected water sources for drinking” may be with 95%
confidence level. It is possible to conclude that with 95% confidence that more than 50% households has no
access to a protected water source.

We have to summarize the quantitative and qualitative data in many ways. The information about quantitative
data is measured in numerical values. For example, “Age of your parent”, “Number of pens you have”.

The information from the input data may not be accurately meaningful if it is considered as a quantitative data.
For example, “colour of your skin”, “softness of your cake”. Figures 3.16 and 3.17 show difference between
quantitative and qualitative data that performs a frequency distribution from 20 observations.

Fig. 3.16 Data over an interval of 20 population 1


Fig. 3.17 Data over an interval of 20 population 2

3.5.2 Discrete and Continuous Data

Consider the data obtained from flipping a coin. We can observe that Head (H) or Tail (T) and rarely Edge (E)
are the possible outcomes. Can we predict which is the outcome for the next toss?

Discrete data are a finite number of possible values in the above example, flip a coin and its recorded results
have two possible outcomes: head (H) or tail (T). The observation is treated as discrete. For example, number of
students in a class.

If there is no clear separation between possible values but occupied over a continuous range, then it is
considered as continuous data.

For example, “Height of a person”, “Length of a leaf”, etc. Distance measured is an example of continuous data.

Km: 0.1…0.5…0.8…1.3…1.5…2.0 etc.

Size of a shoe is an example of discrete data because size may be 8 in, 9 in, 10 in, etc., but never 8.2 in.

3.5.3 Univariate and Bivariate Data

We are working with a group of people to measure their height. These heights can be measured in centimetres
(cm). A list of height in cm:

160…152…163…175…181…164

This is univariate data; we only observe one aspect of a person at a time. In the case of bivariate, we are
considering more than one aspect. We can plot the data into a table with rows and columns (Table 3.1). For
example, height and weight of people can be tabulated.

Table 3.1 Height–weight table


The example is a bivariate observation regarding two aspects (height and weight).

Univariate analysis is a common statistical method available for calculating continuous variables: mean, mode,
median, range of values, etc. Another common statistical operation is Standard Deviation (SD).

EXERCISES

1. What are primary data? What is the difference between primary and secondary data?
2. What are sources of secondary data? Can a secondary source considered as a primary source?
3. What are schedules and its advantages in research methodology?
4. How schedules differ from questionnaire?
5. Explain different types of questionnaire methodologies.
6. Explain the similarities between schedule and questionnaire.
7. What are the advantages of questionnaire over other data collection methods?
8. Explain the various sources of records review.
9. How will you select open-ended questions while conducting an interview?
10. What are closed questions? How closed questions are processed?
11. How observation is useful in research design?
12. Classify each set of data as discrete or continuous.

Number of boxes in a moving train

Height of Everest

Time taken for a car battery to die

Number of car manufactured by BMW in 2013

Production of rice by weight

Number of tails while flipping a coin

13. Distinguish between discrete and continuous data.


14. What are quantitative data? How it differ from quantitative data?
15. Company employee details are given in Table 3.2.

How many of each category of staff should be included in a straight random sample of size?

Table 3.2 Home sales


16. Classify each set of data as discrete or continuous.
1. Number of children in a household
2. Height of children
3. Weight of cars
4. Speed of trains
5. Number of people in a train
6. Weight of aeroplane
17. Prepare a questionnaire limited to 10 questions to conduct a survey “Is Aadhar essential for people”?
18. What are the major drawbacks of open-ended questionnaire?
19. Prepare a questionnaire that shows the performance measurements of a class teacher who is handling the
UGC post graduate classes.
chapter 4
BASIC STATISTICAL MEASURES
Objectives:

After completing this chapter, you can understand the following:

Different types of scales


Measures of central tendency
The definition of skewness and its measure
Measure of variation
The definition of probability distribution

Science fiction author H. G. Wells in 1903 stated, “Statistical thinking will one day be as necessary for efficient
citizenship as the ability to read and write”. Statistics is a set of perception, procedures and rules that help to
organize numerical information in any form such as graphs, tables or charts. It also helps to understand statistical
techniques that primarily affect our life in decision making or make informed decisions. Simple statistical tools
are available for analytical work like inspection, comparison, quality and precision of the data.

There are descriptive and inferential statistics. To organize or summarize a particular set of measurements, we
can use descriptive statistics. That is, a descriptive statistic is described as a set of measurements. For example, a
cricket player’s batting average is an example of a descriptive statistic. It describes the cricket player’s past
ability to hit a ball at any point in time. This example has a common property that they organize, summarize and
describe a set of measurements. To make inferences about the larger population from which the sample was
drawn, we can use inferential statistics. For example, we could take the information gained from our service
satisfaction from a set of service organizations. Another use of inferential statistics is opinion polls and
television rating systems. For example, a limited number of people are polled during an election and then this
information is used to describe voters as a whole.

4.1 TYPES OF SCALES

Numbers and set of numbers formulate statistical information specified qualities in the research. The type of
statistics used for analysis of data in research is converted into certain types. These types have a great deal of
confusion in education and social research that gives in measuring of behaviour. Scale of measure is a
classification that describes the nature of information within the numbers assigned to variables. The qualities are
magnitude, absolute zero and equal interval. Best statistical procedures can be selected to determine the scale of
measurement. The ability to know if one score is greater than, equal to or less than another score is referred as
magnitude. The absolute zero referred as a point where none of the scale exists. Here, a score of zero can be
assigned. Equal interval is the possible scores that have equal distance from each other. We can determine the
four scales of measurement by combining the three-scale qualities. These are nominal, ordinal, interval and ratio
scales. Basically, nominal scale is a preparation of a list in an alphabetical order. Alphabetically sorted list of
students in a class, list of organizational chart based on hierarchical order and list of favourite actors are the
representation of nominal scales. In ordinal data, the magnitude of the data is considered and is looked as any set
of data placed in order from the greatest to the lowest without absolute zero and no equal intervals. Likert scale
and Thurstone technique are examples of nominal scale types. The interval scale is processed with magnitude
and equal intervals without absolute zero. Temperature is an example of interval scale. Since temperature has no
absolute zero because it does not exist in the temperature measurement. Ratio scale is the highest scale of
measurement. The ratio scale contains three qualities and offers preference to statisticians because ease of
analysis. For example age, height and score on a 100-point test are ratio scales. Table 4.1 gives a comparison of
each scale.
4.1.1 Nominal Scale

From a statistical point of view, nominal scale is the lowest measurement. It simply places the data into
categories without any order or structure. A yes/no is a nominal scale in research activities. For example, in a
survey of research answer from the participants can be managed through yes/no scale in order to ease the
evaluation. In statistics, the nominal scales are in non-parametric group. Modes and cross tabulation with chi-
square are statistical measurement that uses nominal scale.

4.1.2 Ordinal Scale

An ordinal scale is important in terms of power of measurement. For example, consider ranking of beer based on
the quality and demand. When a market study for ranking five types of beer from most flavourful to least
flavourful is conducted, an ordinal scale of preference can be created. There is no objective distance between
any two points on the subjective scale. To one respondent, the brand of the beer may be far superior to the
second prefered beer, then another beer is the selection from a respondent. But to another respondant with the
same top and second beer, the distance may be subjectively small. An ordinal scale interprets gross order and not
the relative positional distances. In statistics, ordinal data would use non-parametric statistics. Median and mode
rank order correlation and non-parametric analysis of variance are the statistical techniques.

4.1.3 Interval Scale

One of the standard survey ratings is an interval scale. Suppose the rate of satisfaction of a car service is 10-
point scale from dissatisfied to most satisfied by an interval of 1. It is called an interval scale because it is
assumed as equidistant points between each of the scale elements; that is, interpretation of differences in the
distance along the scale, because we can take the differences in order, not differences in the degree of order.
Interval scales are defined by metrics such as logarithms. In logarithms, the distance is not equal but they are
strictly definable based on the metric used. Usually in statistics, interval scale data would use parametric
statistical techniques such as correlation, regression analysis of variance and factor analysis.

4.1.4 Ratio Scale

It is a top level of measurement and is not frequently on hand in social research. A true zero point is the factor
which clearly defines a ratio scale. Measurement of length is the simplest example of the ratio scale. The
temperature measurement is the best way to contrast interval and ratio scales. The Centigrade scale has a zero
point but it is an arbitrary one. The Fahrenheit scale has its equivalent point at −32°. This way the ratio scale
interprets the data.

The fundamental difference between the four scales of measurement is given in Table 4.1.

Table 4.1 Sample database for calculation


4.2 MEASURES OF CENTRAL TENDENCY

A central tendency or measure of central tendency is a typical value for a probability distribution. It is also
called centre or location of the distribution. Typically, variability or dispersion is the excellence of the central
tendency. Normal distribution set is the key of calculation of the central tendency. Most common measure of
central tendency is arithmetic mean and median.

In one-dimensional data, the following techniques may be applied. Before calculating a central tendency to
transform the data, the circumstances should be identified.

Mean (arithmetic mean): It is the sum of all measurements divided by the number of observations in the
dataset.

Median: It is the middle value that separates the higher half from the lower half of the dataset. The ordinal
dataset is used for identification of the median and the mode because values are ranked relative to each other but
are not measured absolutely.

Mode: It is the most frequent value in the dataset. In nominal data, this central tendency measure is used
because of purely qualitative category assignments.

Geometric mean: For n items in the dataset, it is the nth root of the product of the data values. This measure is
valid only for data that are measured absolutely on a strictly positive scale.

Harmonic mean: It is the reciprocal of arithmetic mean of the reciprocals of data values. This measure is valid
only for data that are used absolutely on a strictly positive scale.

Weighted mean: It is the arithmetic mean that incorporates weighting to certain data elements.

Truncated mean (or trimmed mean): It is the arithmetic mean of data values after a certain number or
proportion of the highest and lowest data values that have been discarded.

Interquartile mean: It is the truncated mean based on data within the interquartile range.

Midrange: It is the arithmetic mean of the maximum and minimum values of a dataset.

Midhinge: It is the arithmetic mean of two quartiles.

Trimean: It is the weighted arithmetic mean of the median and two quartiles.

Winsorized mean: It is the arithmetic mean in which extreme values are replaced by values closer to the
median. These techniques are applied to each dimension of multi-dimensional data, but the results may not be
invariant to rotations of the multi-dimensional space.

Another set of measurement used in multi-dimensional data are as follows:

Geometric median: It minimizes the sum of distances to the data points. Identification of geometric mean is
same as median when applied to one-dimensional data but calculates the median of each dimension
independently.

Quadratic mean: It is also known as root mean square, used in engineering, not commonly used in statistics.
Major reason of the usage is not a good indicator of the centre of the distribution when the distribution includes
negative values.

Parameter: It is a measure concerning a population (e.g., population mean).

4.2.1 Mean, Median and Mode


Simply mean is the average of data, median is the middle value of the ordered data and mode is the value that
occurs most often in the data. The measure of central tendency from a population is illustrated in Fig. 4.1.

Due to cost and time factors in most research experimental situations, examination of all members of a
population is not typically conducted but examines a random sample, that is, considering representative subset of
the population. Parameters are descriptive measures of population. For example, a sample mean is a statistic
measure and a population mean is a parameter. The sample mean is usually denoted by x̄ :

Fig. 4.1 Measures of central tendency

where N is the sample size and xi are the measurements.

For example,

Consider the dataset, −1, 1, 2, 3, 13

Here, mean = 4, median = 2, mode = 1

How we arrived at these results?

Steps for finding the median for a set of data:

1. Arrange the data in increasing order


2. Find the location of median in the ordered data by (n + 1)/2
3. The value that represents the location found in Step 2 is the median

For example, consider the aptitude test scores of 10 students,

91, 76, 69, 95, 82, 76, 78, 80, 88, 86

Mean = (91 + 76 + 69 + 95 + 82 + 76 + 78 + 80 + 88 + 86)/10 = 82.1

If the entry 91 is mistakenly recorded as 9, the mean would be 73.9, which is very different from 82.1.

On the other hand, let us see the effect of the mistake on the median value:

The original dataset in increasing order is follows:


69, 76, 76, 78, 80, 82, 86, 88, 91, 95

With n = 10, the median position is found by (10 + 1)/2 = 5.5. Thus, the median is the average of the fifth (80)
and sixth (82) ordered value and the median is 81.

The dataset (with 91 coded as 9) in increasing order is as follows:

9, 69, 76, 76, 78, 80, 82, 86, 88, 95

where the median is 79.

Note that the medians of the two sets are different. Therefore, the median is not affected by the extreme value 9.

4.2.2 Geometric and Harmonic Mean

Geometric mean is a particular type of average where we multiply the numbers together and then take the square
root (for two numbers), cube root (for three numbers), etc. For example, what is the geometric mean of 2 and
18?

Multiply them: 2 × 18 = 36, the square root: √36 = 6

Geometric mean indicates the central tendency or typical value of a set of numbers by using product of their
values. That is, for a set of numbers x1, x2, …, xn, the geometric mean is given as follows:

For example, the geometric mean of the three numbers 4, 1 and 1/32 is the cube root of their product, which is
equal to 1/2.

Harmonic mean is a type of average that represents the central tendency of a set of numbers. The harmonic mean
of x1, x2, x3, …, xn is

For example, for two numbers 4 and 9 the harmonic mean is 5.54.

4.3 SKEWNESS

It is a measure of degree of asymmetry of distribution. That is, skewness is a measure of symmetry, or more
precisely, the lack of symmetry. A distribution of dataset is symmetric if it looks the same to the left and right of
the centre point. There are symmetric, left and right skewness. The qualitative interpretation of the skew is
complicated. The skewness does not determine the relationship of mean and median. Skewness in a data series
may be observed by simple inspection of the values instead of graphical figures. Consider the numeric sequence
(99, 100, 101), whose values are evenly distributed around a central value (100). A negative skewed distribution
can be obtained by transforming this sequence by adding a value far below the mean. That is, (90, 99, 100, 101).
In positive skew, the sequence is by adding a value far above the mean. That is, (99, 100, 101, 110).

1. Symmetric: Distribution of mean, median and mode is mound shaped, and no skewness is apparent; such
a distribution is described as symmetric (Fig. 4.2).
Fig. 4.2 Symmetric distribution

2. Skewed left: In this situation, mean is to the left of the median, long tail on the left (Fig. 4.3). In unimodal
distribution, negative skew indicates that the tail on the left side of the probability density function is
longer or fatter than the right side. In negative skew, the left tail is longer. That is, the mass of the
distribution is concentrated on the right. The distribution is said to be left-skewed, left-tailed or skewed to
the left.
3. Skewed right: In this situation, mean is to the right of the median, long tail on the right (Fig. 4.4). The
positive skew indicates that the tail on the right side is longer or fatter than the left side. In the positive
skew, the right tail is longer. That is, mass of the distribution is concentrated on the left. The distribution is
said to be right-skewed, right-tailed or skewed to the right.

Fig. 4.3 Left-skewed distribution

Fig. 4.4 Right-skewed distribution

4.3.1 Measuring Skewness

For univariate data X1, X2, …, XN, the formula for skewness is:

where x̄ is the mean, S is the standard deviation, and N is the number of data points. This is referred to as the
Fisher–Pearson coefficient of skewness.
In normal distribution for any symmetric data, skewness is zero or near to zero. The negative values indicate that
data is left-skewed and the positive values indicate that data is right-skewed.

Another formula of skewness is defined by Galton (also known as Bowley’s skewness) is

where Q1 is the lower quartile, Q3 is the upper quartile, and Q2 is the median.

The Pearson 2 skewness coefficient is defined as

4.3.2 Relationship of Mean and Median

The skewness has no strict connection with relationship between mean and median. Negative skew have a mean
greater than the median and positive skew in likewise. The mean is equal to the median; if the distribution is
symmetric, then the skewness is zero. If the distribution is unimodal, then mean = median = mode. For example,
a coin is tossed many times; the series converse is not true in general. That is, zero skewness does not imply that
the mean is equal to the median. What happens to the mean and median if we add or multiply each inspection in
a dataset by a constant? For example, if a teacher prepares a dataset for an examination by adding five points to
each student’s score. What effect does this have on the mean and the median? Altering the mean and median by
the constant, the result of adding a constant to each value resulted in an intended effect. Consider that the teacher
conducted 10 aptitude scores and the original mean is 82.1 with the median of 81. If 5 is added to each score, the
mean of this new dataset would be 87.1 and the new median would be 86. Similarity can be seen by
multiplication by a constant; the new mean and median would change by a factor of this constant.

4.3.3 Kurtosis

It is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution. The datasets
with high kurtosis tend to have heavy tails and the datasets with low kurtosis tend to have light tails. A uniform
distribution would be the intense case. Histogram is an effective graphical technique that shows effectiveness of
skewness and kurtosis of dataset.

For univariate data X1, X2, …, XN, the formula for kurtosis is given as follows:

where x̄ is the mean, S is the standard deviation, and N is the number of data points.

Selected normal assumptions are made for the classical statistical techniques. Significant skewness and kurtosis
clearly indicate that data are not normal.

4.4 MEASURE OF VARIATION

John asked his classmates how many glasses of water they drink on a typical day? From this question, an
average measurement or consumption of water for his six friends are marked (Fig. 4.5).
From the analysis of data, one-fourth of the data lie below the Lower Quartile (LQ) and one-fourth of the data lie
above the Upper Quartile (UQ). Figure 4.6 gives the placement of data. Measures of variation are used to
describe the distribution of the data. The range is the difference between the greatest and the least data values.
Quartiles are values that divide the dataset into four equal parts. From these data, questions have been
formulated.

1. What is the median of the dataset?


2. Organize the data into two groups: the top half and the bottom half. How many data values are in each
group?
3. What is the median of each group?

Fig. 4.5 Observation of consuming water

Fig. 4.6 Data distribution from observation of consuming water

For dataset, measures of average such as mean, mode and median are typical. Within the dataset, the actual
values usually differ from one another and from the average value itself. The dataset with high dispersion
contains values considerably higher and lower than the mean value. Measures of variation have certain
techniques, which describes the distribution of data.

4.4.1 Range

One of the simplest measures of variation is range and calculated by the highest value minus the lowest value.

RANGE = MAXIMUM − MINIMUM

It is not resistant to change because range only uses the largest and the smallest values and affected by extreme
values. In descriptive statistics, the range is the size of the smallest interval which contains all the data that
provides a signal of statistical scattering. Representation is most useful in small dataset because it depends on
two of the observations.

For example, consider the set A = 4, 6, 9, 3, 7; the lowest value is 3, and the highest value is 9. So the range is 9
− 3 = 6 (Fig. 4.7).
Fig. 4.7 Range

The range can be misleading due to the elements in the set. For example, A = 7, 11, 5, 9, 12, 8, 3600.

Here, the lowest value is 5 and the highest is 3616. Then the range is 3600 − 5 = 3595.

The single value of 3595 makes the range large, but most values are around 10. This can be solved by applying
interquartile range or standard deviation.

For example, find the measures of variation for the data (Table 4.2) Range = 70 − 1 or 69 mph Quartile is order
the numbers from least to greatest.

Table 4.2 Speed of each animal

Interquartile range is 50 − 8 or 42 UQ − LQ

The range is 69, the median is 27.5, the lower quartile is 8, the upper quartile is 50, and the interquartile range is
42. Median preparation is shown in Fig. 4.8.

Fig. 4.8 Calculation of median

4.4.2 Absolute Deviation

Normal way of calculation of deviation from the mean is to take each score and minus the mean score. For
example, the mean score for the group of 100 students we used earlier was 48.75 out of 100. If we take a student
who scored 50 out of 100, the deviation of a score from the mean is 50 − 48.75 = 1.25. We can perform same
calculation in the dataset to find the total variability of 100 student’s records. This problem may have positive
and negative scores, summing up, total deviation becomes zero. How we can analyze the data? By taking the
absolute value by avoiding the sign gives absolute deviation. Summing up the total number of scores divided by
the total number gives mean absolute deviation. In our example,

where µ is the mean, X is the score, ∑ is the sum of, N is the number of scores, takes the absolute value.

4.4.3 Standard Deviation

Standard deviation is a useful measure of variation. If we observe a small dispersion of a set of data or values are
closely bunched about the mean, then standard deviation performs. If a set of numbers x1, x2, x3, …, xn
constitute a sample with the mean x̄, then the differences are called the deviation from the mean.

x1 − x̄ , x2 − x̄ , …, and xn − x̄

are called the deviation from the mean.

Since all x̄ are not equal, some mean will be negative and some mean will be positive. But Σ(x−(x̄))= 0. the sum
of deviations from the mean is Here, the magnitude of mean deviations is considered and simply leave out the
signs and define a measure of variation in terms of the absolute values of the deviations from the mean. For n
elements in a dataset, we have a statistical measure called mean deviation.

If we work with squares of deviations from mean, the signs are eliminated because square of a real number
cannot be negative. So the values after squaring become positive and take the average from the mean and then
take the square root of the result. That is,

This is the traditional calculation of the standard deviation. Literally speaking, this is mathematically derived
formula, and it is also called root-mean-square deviation. The formula may have some deviations from the mean
by n − 1 instead of n. That is, from a sample population, we have sample standard deviation(s)

The population standard deviation is,

where µ is the population mean and N is the number of elements in the population.

What is the purpose of calculating mean, standard deviation and variance? Literally, these statistics estimates
corresponding to population parameter give relations between data and dataset. If we draw from many samples
from a population that has the mean µ, calculated mean x̄ estimated average of the mean µ, then we can see that
the average is close to µ. Even we can calculate the variance of each sample by the formula

Then take the average of estimates and if the average is less then σ2, theoretically we can compensate for this by
dividing by n − 1 instead of n in the formula s2.

Example: A biologist found 7, 13, 9, 11, 10, 11, 8 and 7 microorganisms of a certain kind in eight cultures.
Calculate S.

Solution: Calculating the mean, we get x̄ = (7 + 13 + 9 + 11 + 10 + 11 + 8 + 7)/8. Now find Σ(x−x̄)2, and it may
be arranged as in Table 4.3

Table 4.3 Calculation

By dividing 32.00 by (8 − 1) = 7 and taking the square root, we will get S = 2.14.

In this example, it is very easy to calculate S, because the data are whole numbers and the mean is exact to one
decimal. Otherwise, the calculations required by the formula defining S can be quite tedious. We can get S
directly with a statistical calculator or a computer by the use of the formula,

4.4.4 Average Deviation

Average deviation or mean absolute deviation of a dataset is the average of the absolute from the central point.
First, compute the mean and identify the distance between each score and whether the score is above or below
the mean. The average deviation is defined as the mean of these absolute values. The mean absolute deviation of
a set x1, x2, …, xn is
The measure of central tendency, m(X), has a marked effect on the value of the mean deviation. For example, for
the dataset {2, 2, 3, 4, 14}, see Table 4.4:

Table 4.4 Central tendency and absolute deviation

Mean absolute deviation is calculated by the formula,

4.4.5 Quartile Deviation

Quartile deviation is based on the lower quartile Q1 and the upper quartile Q3. The quartile range is difference
Q3 − Q1, which is called the inter quartile range. The difference Q3 − Q1 divided by 2 is called semi-
interquartile range or the quartile deviation. That is, QD = (Q3 − Q1)/2.

Comparing range and absolute dispersion, the quartile deviation is a slightly better measure but it ignores the
observation on the tails. Values are quite likely to be sufficiently different in the obtained samples from a
population by calculation.

Coefficient of quartile deviation

It is a relative measure of dispersion based on the quartile deviation and is given as

It is pure number free of any units of measurement that can be used for comparing the dispersion in two or more
than two sets of data.

Example: Calculate the quartile deviation and coefficient of quartile deviation from the data given in Table 4.5.

Table 4.5 Sample database for calculation


Solution: The necessary calculations are given in Table 4.6:

Q1 = Value of (n/4)th item = Value of (60/4)th item = 15th item

Q1 lies in the class 10.25 − 10.75, therefore Q1 = l + h/f(n/4 − c)

where l = 10.25, h = 0.5, f = 12, n/4 = 15 and c = 7, therefore

Table 4.6 Calculation

Q1 = 10.25 + 0.5/12(15 − 7) = 10.25 + 0.33


= 10.58
Q3 = Value of (3n/4)th item = Value of (3 × 60/4)th item
= 45th item

Q3 lies in the class 11.25 − 11.75, therefore Q3 = l + h/f(3n/4 − c)

where l = 11.25, h = 0.5, f = 14, 3n/4 = 45 and c = 36, therefore

Q1 = 11.25 + 0.514(45 − 36) = 11.25 + 0.32


= 11.57
Quartile deviation = Q3 − Q12 = 11.57 − 10.582 = 0.992
= 0.495
Coefficient of quartile deviation = Q3 − Q1Q3 + Q1 = 11.57 − 10.5811.57 + 10.58
Coefficient of quartile deviation = 0.9922.15 = 0.045
4.4.6 Coefficient Deviation

Coefficient of Variation (CV) or Relative Standard Deviation (RSD) is a standard measure of dispersion of a
probability distribution or frequency distribution. Let the standard deviation be σ to the mean µ, then the
coefficient of variation is defined as

It shows the extent of variability in relation to the mean of the population.

For example, calculate the coefficient of standard deviation (Fig. 4.9) and the coefficient of variation for the
following sample data: 2, 4, 8, 6, 10, and 12.

Fig. 4.9 Calculation

4.5 PROBABILITY DISTRIBUTION

The outcome of a statistical experiment with its probability of occurrence can be distributed in a table. This is
probability distribution that deals variables, random variables and notations. Random variables are subjected to
variations by random chances. It is obtained by random experiments such as tossing a coin, rolling a die,
selection of card from a pack and picking a number from a given interval. These variables perform unevenness
in a function. Probability function is a function that assigns probabilities to the values of a random variable
between 0 and 1 inclusive and the sum of the probabilities of the outcomes must be 1. If these conditions are not
satisfied by a function, then function is not a probability function. A probability distribution is a function that
describes how likely to obtain the different possible values of the random variable.

A discrete variable is obtained from a discrete set of values. Consider rolling of a six- sided die; then the values
lie in 1, 2, 3, 4, 5 and 6. For a discrete random variable X and any number x, we can form a probability
distribution function P(x). That is, P(x) is the probability that the random variable X equals the given number x
which is given by

P(x) = Pr(X = x).

The random variable must take on some value in the set of possible values with probability 1, so we have P(x)
with sum to 1. In equations, the requirements are P(x) ≥ 0 for all x∑x P(x) = 1, where the sum is implicitly over
all possible values of X.

For the example of rolling a six-sided die, the probability mass function is

P(x) = 1/6; if x in 1, 2, 3, 4, 5, 6
0; otherwise

A continuous random variable is a random variable such as a real number or an interval. We cannot make a
probability distribution function for a continuous random variable, X by directly assigning a probability. So, a
probability distribution function called Probability Density Function (PDF) assigns the probability that X is near
each value.

Given the probability density function p(x) for X, we determine the probability that X is in any set A (i.e., X in A)
by integrating p(x) over the set A

Pr(X in A) = ∫ p(x)dx.

For a random variable, probability density function with a subscript can be written as

Pr(X in A) = ∫ ApX(x)dx.

4.5.1 Binomial Distribution

A binomial testing is an experiment that satisfies a fixed number of trials with two outcomes. Each of these trails
is independent and probability of each outcome remains constant. Simply binomial tests are experiments with a
fixed number of independent trials, each of which can only have two possible outcomes. For example, tossing a
coin 20 times to see how many tails occur, asking 200 people if they watch ABC news and rolling a die to see if
a 5 appears. The binomial distribution describes the behaviour of a count variable X with the following
conditions:

1. The number of observations n is fixed.


2. Each observation is independent.
3. Each observation represents one of two outcomes (“success” or “failure”).
4. The probability of success (p) is the same for each outcome.

For example, suppose a coin is tossed twice and its outcome is in Table 4.7. These four outcomes are probability
of 1/4. Note that the tosses are independent. Hence, the probability of a head on Flip 1 and a head on Flip 2 is the
product of P(H) and P(H), which is 1/2 × 1/2 = 1/4. The same calculation applies to the probability of a head on
Flip 1 and a tail on Flip 2. Each is 1/2 × 1/2 = 1/4.
Table 4.7 Possible outcomes

Based on the number of occurrence of heads, the four possible outcomes can be classified. The number could be
2 (Outcome 1), 1 (Outcomes 2 and 3) or 0 (Outcome 4). Table 4.7 furnishes the probabilities of these
possibilities. Graphically we can represent the probabilities in Fig. 4.10. Since two of the outcomes represent the
case in which just one head appears in the two tosses, the probability of this event is equal to 1/4 + 1/4 = 1/2.
The situation is summarized in Table 4.8.

Table 4.8 Probabilities of getting 0, 1 or 2 heads

Figure 4.10 shows the probability for each of the values on the x-axis. The head represented as a “success” gives
the probability of 0, 1 and 2 successes for two trials. An event has a probability 0.5 of being a success on each
trial. This makes Fig. 4.9 an example of a binomial distribution.

Fig. 4.10 Probabilities of 0, 1 and 2 heads

For N trails, binomial distribution consists of the probabilities for independent events. Each one has a probability
of occurring. For the coin tossing example, N = 2 and π = 0.5. Hence, the formula of binomial distribution is

where N is the number of trials, P(x) is the probability of x successes out of N trials, and π is the probability of
success on a given trial.
Applying this to the coin tossing example,

Notations

The following notations are helpful while dealing with binomial distribution:

x: The number of successes that result from the binomial experiment.

n: The number of trials in the binomial experiment.

P: The probability of success on an individual trial.

Q: The probability of failure on an individual trial. (This is equal to 1 − P.)

n!: The factorial of n.

b(x; n, P): Binomial probability − the probability that an n-trial binomial experiment results for x successes,
when the probability of success on an individual trial is P.

nCr: The number of combinations of n things, taken r at a time.

Example: Suppose a die is tossed 5 times. What is the probability of getting exactly 2 fours?

Solution: In this binomial experiment, the number of trials is equal to 5, the number of successes is equal to 2,
and the probability of success on a single trial is 1/6 or 0.167. Therefore, the binomial probability is

b(2; 5, 0.167) = 5C2 * (0.167)2 * (0.833)3


b(2; 5, 0.167) = 0.161

4.5.2 Poisson Distribution

Poisson distribution is a discrete probability distribution to model the number of events occurring within a given
time interval. The average number of events in the given time interval is represented by λ, which is the shape
parameter. Let X is the number of events in a given interval and e is a mathematical constant with e ≈ 2.718282.
Poisson probability mass function is

Example: Average rates of 1.8 births per hour in a hospital occur randomly. What is the probability of observing
4 births in a given hour at the hospital?

Solution: X = No. of births in a given hour


Randomly occurred event with a mean rate, λ = 1.8

Using the Poisson distribution formula, calculate the probability of observing exactly 4 births in a given hour.

What about the probability of observing more than or equal to 2 births in a given hour at the hospital?

P(X ≥ 2) = P(X = 2) + P(X = 3) + …


= 1 − P(X < 2)

Figure 4.11 gives the Poisson probability density function for four values of λ.

Fig. 4.11 Poisson probability distribution function for (λ = 5, λ = 15, λ = 25 and λ = 35)

Sum of Two Poisson Variables

Consider the previous example, birth rate in a hospital.

Example: Suppose there are two hospitals A and B. In hospital A, births occur randomly at an average rate of 2.3
births per hour and in hospital B births occur randomly at an average rate of 3.1 births per hour. What is the
probability that observed 7 births from both the hospitals in a 1-hour period?
Solution: The following rules have been formed:

If X ∼ Po(λ1) on 1 unit interval and Y ∼ Po(λ2) on 1 unit interval, then X + Y ∼ Po(λ1 + λ2) on 1 unit interval.
Let X = No. of births in a given hour at hospital A and Y = No. of births in a given hour at hospital B

Then, X ∼ Po(2.3), Y ∼ Po(3.1) and X + Y ∼ Po(5.4)

4.5.3 Uniform Distribution

A uniform distribution or a rectangular distribution is a distribution with constant probability. The uniform or
rectangular distribution has random variable X restricted to a finite interval [a, b] and has f(x) has constant
density over the interval. An illustration is given in Fig. 4.12. There are discrete and continuous uniform
distributions. The outcome of throwing a fair dice is a simple example of the discrete uniform distribution. The
probability of each outcome is 1/6 from the possible values 1, 2, 3, 4, 5 and 6. While adding each probability
resulting distribution is no longer uniform since not all sums have equal probability.

Fig. 4.12 Rectangular distribution

The function F(x) is defined by

The expectation and variance are given by the formulae

Example: The electric current (in mA) measured in a piece of copper wire is known as uniform distribution over
the interval [0, 25]. Write down the formula for the probability density function P(x) of the random variable X
representing the current. Calculate the mean and variance of the distribution and find the cumulative distribution
function F(x).

Solution: Over the interval [0, 25], the probability density function f(x) is given by the formula
Using the formulae developed for the mean and variance gives

The cumulative distribution function is obtained by integrating the probability density function by

Hence, choosing the three distinct regions, x < 0, 0 ≤ x ≤ 25 and x ≥ 25, gives

4.5.4 Exponential Distribution

The exponential distribution is the probability distribution that describes the time between events in a Poisson
process. Figure 4.13 illustrates the different exponential distribution. It gives plot of the exponential probability
density function and exponential cumulative distribution function. That is a process in which events occur
continuously and separately at a constant average rate. The equation for the standard exponential distribution is
f(x) = e x for x ≥ 0.

Fig. 4.13 Exponential distribution

The formula for the cumulative distribution function of the exponential distribution is
Example: The number of kilometres that a particular car can run before its battery wears out is exponentially
distributed with an average of 15,000 km. The owner of the car needs to take a 7500-km trip. What is the
probability that he will be able to complete the trip without having to replace the car battery?

Solution: Let X denotes the number of kilometres that the car can run before its battery wears out. Consider that
the following is true:

P(X ≥ x + y| X ≥ x) = P(X ≥ y)

It is noticed that the probability that the car battery wears out in more than y = 7,500 km does not subject if the
car battery was already running for x = 0 km or x = 1,500 km or x = 22,500 km. Given that X is exponentially
distributed because of the above true statement for the exponential distribution.

That is, if X is exponentially distributed with mean θ, then

P(X ≥ k) = e− k/θ.

Therefore, the probability in question is simply

P(X ≥ 5,000) = e−7,500/15,000


= e−1/2 = 0.604

From this, we can decide that the probability is large enough to give him comfort that he won’t be stranded
somewhere along a remote desert highway.

4.5.5 Normal Distribution

The normal distribution or Gaussian distribution is a very common continuous probability distribution. This is
highly applicable in social and natural science.

A normal distribution formula is

where µ is the mean and σ2 is the variance.

Empirical rule

For bell-shaped distributions, about 68% of the data will be within one standard deviation of the mean, about
95% will be within two standard deviations of the mean, and about 99.7% will be within three standard
deviations of the mean (Fig. 4.14)

Example: An IT company that has employees hired during the last 5 years is normally distributed. Within this
curve, 95.4% of the ages, centred about the mean, are between 24.6 and 37.4 years. Find the mean age and the
standard deviation of the data.

Solution: The mean age is symmetrically located between −2 standard deviations (24.6) and + 2 standard
deviations (37.4).

The mean age is 31 years of age.

From 31 to 37.4, (a distance of 6.4 years) is 2 standard deviations. Therefore, 1 standard deviation is (6.4)/2 =
3.2 years.

Fig. 4.14 Empirical rule

EXERCISES

1. What is the best measure of central tendency?


2. In a strongly skewed distribution, what is the best indicator of central tendency?
3. Does all data have a median, mode and mean?
4. When is the mean the best measure of central tendency?
5. When is the mode the best measure of central tendency?
6. When is the median the best measure of central tendency?
7. What is the most appropriate measure of central tendency when the data have outliers?
8. In a normally distributed dataset, which is greatest: mode, median or mean?
9. For any dataset, which measures of central tendency have only one value?
10. Table 4.9 shows a set of scores on science test in two different classrooms. Compare and contrast their
measures of variation.

Table 4.9 Database


11. The normal monthly rainfalls in inches for a city are given in Table 4.10. What are outlier values of the
following table?

Table 4.10

12. Suppose a die is tossed. What is the probability that the die will land on 5?
13. Data on the number of minutes that a particular train service was late have been summarised in Table 4.11

(Times are given to the nearest minute.)

Table 4.11 Database

How many journeys have been included?


What is the modal group?
Estimate the mean number of minutes the train is late for these journeys.
Which of the two averages, mode and mean, would the train company like to use in advertising its
service? Why does this give a false impression of the likelihood of being late?
Estimate the probability of a train being more than 20 minutes late on this service.
14. What is the Geometric Mean of 10, 51.2 and 8?
15. The amount of time that John plays video games in any given week is normally distributed. If John plays
video games an average of 15 hours per week, with a standard deviation of 3 hours, what is the probability
of John playing video games between 15 and 18 hours a week?
chapter 5
DATA ANALYSIS
Objectives:

After completing this chapter, you can understand the following:

The definition of statistical analysis and its types


The description of multivariate analysis
The definition of correlation analysis and its limitations
The definition of regression analysis and its case study
The detailed explanation of principle component analysis
The definition of sampling and its various types
The description of SPSS (a statistical analysis tool) and its file types and analysis functions

What is the human development index (HDI) value of India in 2013? Who is analyzing HDI? What is the scale
rating of HDI? Is there any role for stock market in the development of HDI? What is the health rate score in
India?

These questions have common elements. The answers may have some data part that helps in determining facts.
Data analysis is a method or procedure that helps to describe facts, detect patterns, provide explanations and test
hypothesis. It is highly used in scientific research, business and policy administration.

The output of data analysis may be numeric results, graphs, etc. For numerical results, values such as average
income of a group or temperature difference from year to year, etc. may be taken into consideration. For
example, between 2001 and 2011, the population of India has increased by 181 million people, out of which 91
million are males and 90 million are females. This is a fact and can interpret that there is a growth of 18 million
people per year. From these, can we conclude that, after 30 years a growth of 543 million people from the
present population status will occur? This interpretation may be wrong. The Indian population cannot be
predicted in such an approximate way.

Different data analysis techniques are given in Fig. 5.1. The techniques include statistical analysis, multivariate
analysis (MVA), regression analysis, etc. Using these analyses, a researcher finds facts, relations and outputs
from the collected data.
Fig. 5.1 Data analysis techniques

5.1 STATISTICAL ANALYSIS

How will we collect, explore and present large amount of data? To discover patterns and trends in a large
amount of data, we can use the statistical data analysis. In our day-to-day life, statistics plays a significant role in
decision making. The use of statistics helped to research, industry, business and government to make important
decisions.

Statistical analysis is a component of data analytics that involves collecting and scrutinizing every data sample
in a set of items from which samples can be drawn. Trends identification is one of the goals of statistical
analysis, which identify patterns in unstructured and semi-structured data for business analysis. In statistical
analysis, two important terms are required to notice – population and sampling. Population is a total inclusion of
all people or items with similar characteristics or wishes. Sampling is central to the discipline of statistics. From
the population, samples are made in order to obtain selected observations. This is a feasible approach that saves
cost and time because sampling gives a snapshot of a particular moment. When the population is uniform, a
perfect representation is possible in the sampling. Sampling errors occur when a difference in the measured
value for an attribute in a sample from the “true value” in the population. A detailed explanation of population
and sampling is given in Section 5.6.

Figure 5.2 gives steps involved in statistical analysis. Hypothesis testing plays major role in the statistical
analysis. Chapter 6 gives detailed account of statistical test procedures.

Qualitative and quantitative data are two types of data in statistical analysis. Qualitative data is a categorical
measurement expressed not in terms of numbers, but described by means of a natural language. For example,
“height = tall”, “colour = white”. Quantitative data is a numerical measurement expressed in terms of number,
not in natural language. For example, “length = 450 m”, “height = 1.8 m”.

Inferential statistics and descriptive statistics are two types of statistical analysis. Inferential statistics
(confirmatory data analysis) used information from a sample to draw conclusions about the population from
which it was drawn. Descriptive statistics (exploratory data analysis) investigates the measurements of the
variables in a dataset. In descriptive statistics, the attributes of a set of measurements are characterized. To
explore patterns of variation, and describe changes over time, summarized data are used. The inferential
statistics are designed to allow inference from a statistic measure on sample of cases to a population parameter.
A hypothesis test to the population is used in data analysis. Figure 5.3 shows different types of statistical
methods.
Fig. 5.2 Statistical analysis steps

Fig. 5.3 Types of statistical methods

Different types of descriptive and inferential statistical methods are given in Figs. 5.4 and 5.5, respectively.

Fig. 5.4 Types of descriptive methods

Fig. 5.5 Types of inferential methods


The details of these methods are given in Chapter 7. The examples of some commonly used statistical tests are
given in Fig. 5.6. These tests are based on level of measurements. Most of the tests are covered in this book.

Fig. 5.6 Examples of some commonly used statistical tests

5.2 MULTIVARIATE ANALYSIS

How can we analyze two sets of data in a simultaneous statistical process? When more than two variables
involved, how statistical analysis is being done? A MVA technique is a statistical process that allows more than
two variables to be analyzed at once. Otherwise, we can define as multivariate data analysis is a statistical
technique used to analyze data that arises from more than one variable. There are basically two general types of
MVA technique – analysis of dependence and analysis of interdependence. With respect to analysis of
dependence, one or more variables are dependent to be explained or predicted by others. Multiple regression,
Partial Least Squares (PLS) regression and Multiple Discriminant Analysis (MDA) are various methods of
dependence MVA. In analysis of interdependence, we will not choose any variables that are thought of as
“dependent” and look at the relationships among variables, objects or cases. Cluster analysis and factor analysis
are certain interdependence MVA techniques. The MVA technique is useful where each situation, product or
decision involves more than a single variable.

The MVA is used in the following areas:

Market and consumer research


Across a range of industries, quality control and quality assurance. The industries such as food and
beverage, pharmaceuticals, telecommunications, paint, chemicals, energy and so on are used MVA
technique for information formulation.
Process control and optimization
Research and development

The MVA technique depends upon the question: Are some variables dependent upon others? We can use
dependence methods, if the answer is “yes”. Otherwise, we can use interdependence methods. To select a
classification technology, two more questions are required to understand the nature of multivariate techniques.
First, if the variables are dependent, how many variables are dependent? Another question is, whether the data
are metric or non-metric? This is about collected quantitative data on an interval or ratio scale. We can say
whether the data are qualitative, collected on nominal or ordinal scale. Figure 5.7 gives the flow chart that
explains how to arrive multivariate techniques from a variable. Various MVA techniques are given in the
following.
Factor analysis

Consider a research design with many variables, which reduces the variables to a smaller set of factors. There is
no dependent variable used in this technique. A researcher looked into the underlying structure of the data
matrix, for a sample size. Normally, the independent variables are normal and continuous. Common factor
analysis and principle component analysis are two major factor analysis methods. The first one is used to look
into underlying factors, while the second one is used to find the fewest number of variables that explain the
variance.

Cluster analysis

To reduce a large dataset to meaningful subgroups of individuals or objects, we can use cluster analysis. Based
on similarity of the objects across a set of specified characteristics, the division is made. The correlation of the
data in the population gives clusters. Hierarchical and non-hierarchical clustering are two clustering techniques
used in data analysis. For smaller datasets, hierarchical technique is used. The non-hierarchical technique is used
in larger dataset with priorities.

Multiple regression analysis

Relationship between a single metric dependent variable and two or more metric independent variables is
examined by this technique. This technique determines linear relationship with the lowest sum of squared
variances. The assumptions are carefully observed. Weights are the minor impacts of each variable and the size
of the weight can be interpreted directly.

Logistic regression analysis

It is a choice model that allows in prediction of an event. The objective is to arrive at a probabilistic assessment
of a binary choice. The variable chosen is either discrete or continuous. An event match is seen in the
classification of observations for observed and predicted events. These matches are plotted into a table and used
in data analytics.
Fig. 5.7 Types of multivariate techniques from a variable

Discriminant analysis

For correct classification of observations or people into homogeneous groups, we can use this technique. The
independent variables have a high degree of normality and are seen as metric. To classify the observations, the
discriminant analysis builds a linear discriminant function. A partial value is calculated to determine variables
that have the most impact on the discriminant function. The higher the partial value, the more impact the
variable has on the discriminant function.

Multivariate Analysis of Variance (MANOVA)

To examine the relationship between two or more metric dependent variables and several categorical
independent variables, we can use MANOVA. Across a set of groups, this technique examines the dependence
relationship between a set of dependent measures. The MANOVA analysis is useful in experimental design.
MANOVA uses the hypothesis tests in problem domains which can be practically solved.

Multi-dimensional Scaling (MDS)

It is useful in visualizing the level of similarity of individual cases of a dataset. An MDS algorithm identifies
between-object-distances that are preserved as well as possible. Each object is then assigned coordinates in each
of the N dimensions.

Correspondence analysis

This technique provides a dimensional reduction of objects by examining the independent variables and
dependent variables at the same time. When we have many companies and attributes, this technique is useful.

Conjoint analysis

It is a trade-off analysis method used in market research to determine how people value different attributes. It is
a particular application of regression analysis. The objective of conjoint analysis is to determine what
combination of a limited number of attributes is most influential on respondent choice. A controlled set of
potential products or services is shown to respondents, and by analyzing how they make preferences between
these products, the implicit valuation of the individual elements making up the product is determined.

Figure 5.8 shows the various attributes and levels of attributes in a typical conjoint analysis technique. The
figure explains the various variables that a customer typically uses while buying a television.

Fig. 5.8 Conjoint analysis

The various steps in conjoint analysis are as follows:

Choose product attribute


Choose values for each attribute
Define products as a combination of attribute options
A value of relative utility is assigned to each level of an attribute called part-worth utility
The combination with the highest utilities should be the one that is most preferred.

There are basically two methods by which conjoint analysis can be done: one is Paired comparison and the other
one is Full profile comparison. In paired, just two attributes/features will be selected while multiple attributes are
selected in full profiling. Paired makes the respondents easy to comment on, but it can be sometimes unrealistic
inputs.

Limitation to conjoint analysis


Assumes that important attributes can be identified
Assumes that consumers evaluate choice alternatives based on these attributes
Assumes that consumers make trade-offs (compensatory model)
Trade-off model may not represent choice process (non-compensatory models)
Data collection can be difficult and complex

Canonical correlation

Suppose there are two vectors X = (X1, …, Xn) and Y = (Y1, …, Ym) of random variables, and there exists a
correlation among these variables. Canonical-correlation analysis is used to find linear combinations of Xi and Yj
which have maximum correlation with each other. A typical use for canonical correlation in the experimental
context is to take two sets of variables and see what is common amongst the two sets.

Structural Equation Modelling (SEM)

To identify multiple relationships between sets of variables simultaneously, we can use SEM. For example,
human intelligence cannot be measured directly by a measure of height or weight. A psychologist develops
theories to measure intelligence by executing the use of variables. The SEM technique is used to test with data
gathered from people who took their intelligence test.

5.3 CORRELATION ANALYSIS

Correlation and regression analysis are related measures because both of them deal with relationship of
variables. Correlation is a measure of linear association between two variables. The correlation value is always
between −1 and +1. If two variables are perfectly related in a positive linear sense, then the correlation co-
efficient is +1. Similarly when correlation co-efficient is −1, it indicates two variables which are perfectly related
in a negative linear sense. If there is no relation between two variables, then the correlation co-efficient is 0.

Suppose a random variable X is distributed with a series of n measurement and Y = X2. Here, Y is perfectly
depended on X. The two variables X and Y are written as Xi and Yi where i = 1, 2, …, n.

Sample correlation co-efficient can be estimated as Pearson correlation R between X and Y as shown in Eqs. 5.1
and 5.2:

where x̄ and ȳ are sample means of X and Y, respectively. Sx and Sy are sample standard deviations of X and Y.
Equations 5.3 and 5.4 summarize this process
The correlation co-efficient is between −1 and +1. The degree of correlation is shown in Fig. 5.9. Strong
positives and strong negatives tend to +1 and −1, respectively. When weak positives or weak negatives occur,
the points plotted are scattered across. A pure scatter diagram is obtained for a zero correlation.

Fig. 5.9 Degree of correlation

In positive correlation, high values of X are associated with high values of Y and in negative correlation high
values are associated with low values of Y. For a null correlation, the values of X cannot predict the values of Y.
That is, they are independent of each other. Figure 5.10 shows positive, negative and null correlation.

When all points fall directly on a downward line, then r = −1. Similarly, when the scatter plot falls directly on an
upward line slope, then r = ±1. These are called perfectly negative and perfectly positive correlation (Fig. 5.11).

Strong correlations are associated closely to the imaginary tied line. When value of r is closer to +1, stronger
positive correlation occurs and when it is closer to −1 stronger negative correlation happens. Figure 5.12 shows
the stronger and weaker correlation subjective to certain points plotted. However, visually correlation strength
can never be qualified as it is a subjective measurement.
Fig. 5.10 Positive, negative and null correlation

Fig. 5.11 Perfect correlation

Based on the Pearson correlation co-efficient, we can identify correlation strength by the value of r.

0.0 ≤ |r| ≤ 0.3 Weak correlation

0.3 ≤ |r| ≤ 0.7 Medium correlation

0.7 ≤ |r| ≤ 1.0 Strong correlation

For example, +0.881 is a strong positive correlation.


Fig. 5.12 Strong–weak correlation

The one discussed here is the Pearson correlation co-efficient. Do you know any other correlation co-efficient?
Another such correlation co-efficient is Spearman’s rank correlation co-efficient. This correlation co-efficient is
used in a population parameter and as rs for sample statistic. This is used when one or both variables are skewed,
when extreme values are there. Consider X and Y as variables and the formula for Spearman’s correlation co-
efficient is shown in Eq. 5.5. rs shows the difference in ranks for X and Y.

Difference between correlation and regression

We can understand that correlation and regression are two measures of dependency of variables. How can we
discriminate one from the other? In order to understand the difference, take an example of two variables, crop
yield and rain. These variables can be measured at different places and scale. One is used in farmer’s scale and
the other is at weather forecasting station. Correlation analysis shows a high degree of association between these
two variables. The regression analysis shows the dependence of crop yield on rain. But careless data collection
demonstrates that rain is dependent on crop yield. This type of analysis concludes that in a heavy rainy season
there is guaranteed big crop yield. This may be an error conclusion.

Limitations of correlation

One major disadvantage of correlation co-efficient is that it is computationally intensive. The correlation co-
efficient is sensitive to imaging systems. It measures linear relationship between X and Y; therefore, change in X
will proportionally change Y. If the relationship is non-linear, the result is inaccurate. It is meaningless for
categorical data such as colour or gender.

5.4 REGRESSION ANALYSIS

Regression is one of the powerful statistical analysis techniques used to predict continuous dependent variable
from a number of independent variables. Or in other words, regression is a technique to determine linear
relationship between two or more variables. Primary use of regression technique is prediction analysis of
naturally occurring variables to extremely manipulated variables. A natural terminology is used for predicting Y
from the known variable, X. The variables used in regression analysis can be either continuous or discrete. The
simplest form of regression shows a relationship between the above said variables X and Y formulated as

Yi = W0 + W1X1 + U1,

where i = 1, 2, … n, Ui is the random error associated with an ith observation, W0 is the intercept parameter and
W1 is the slope parameter. The slope parameter gives magnitude and direction of the relation. Linear regression
computes W1 from a dataset to minimize the error to fit the data (Fig. 5.13).

Fig. 5.13 Linear regression

For example, let “income” and “educational level” be two dataset whose relationship needs to be analyzed. This
will create a normal distribution of independent volumes. Consider another example, in which a researcher has a
set of collected data on home sales prices and the actual sales prices. Table 5.1 furnishes the data from a
collected resource useful in regression analysis.

Table 5.1 Home sales

The data are collected for a month from a particular geographical area. We want to know the relationship
between X and Y. Figure 5.14 shows the relationship plotted in a graph by using the formula
Y = W0 + W1X + U,

where Y is the observed random variable (response variable), X is the observed non-random variable (prediction
value), W0 is the population parameter – intercept, W1 is the population parameter – slope co-efficient, U is the
unobserved random variable.

Fig. 5.14 Graph plotted

Here no straight line can be exactly plotted through the points. In each observation, the line that fits “best” is one
of the vertical distance measures. This is called residual error. How will you calculate W0 and W1 here? The
answer is by Ordinary Least Squares (OLS) regression procedure. Using OLS, one can calculate the values of
W0 and W1 (intercept and slope) that shows best fit observation. In order to calculate OLS, certain assumptions
are made in a linear regression technique as mentioned in the following:

1. Selected linear model is correct.


2. Non-liner effects are omitted (e.g., area of circle = πr2, r is the radius).
3. The mean of unobservable variable not depends on observed variable.
4. No independent variable predicts another one exactly.
5. In repeated sampling, independent variables are fixed or random.

By unvarying these assumptions, we need to identify a best fit associated with n points. In the two-dimensional
space points (x1, y1), (x2, y2), …, (xn, ym) has the form

y = mx + b, where

In our example (home sales collected data, Table 5.2), the values of slope and intercept are as follows:

Table 5.2 Home sales


Now y = mx + b, becomes

y = 249.8571 − 249.8571x

and it is our least square line.

Conclusion

For every one unit increase in the price of house hold items, 0.7929 fewer house hold items are sold. This is our
co-efficient. If the regression were performed repeatedly with the same variable on a different dataset, there may
be a standard error in the estimated co-efficient. In this regression analysis, we have to consider only W1 and U
due to its linearity. Figure 5.15 shows how the curve is exactly fitted with respect to the points by its slope and
interception.

Case Study

The human Body Mass Index (BMI) is calculated by ratio of weight (kg) and square of height (m2).

A data collection contains 10 people’s height and weight in the tabular form. Can you plot these values on a
graph with weight on x-axis and height on y-axis? How will you calculate the linear relationship between the
height and the weight? How will you fit a straight line linearly in the graph? Figure 5.16 shows the points plotted
on the graph. You should cluster points and find a slope upwards on the graph. Then find out taller people with
more weight. The regression analysis may have an equation of a line that fits in the identified cluster of points
with minimum deviation. This deviation is called error. In your regression equation, if a person’s weight is
known, then the height can be predicted.
Fig. 5.15 Curve that fitted to the points

Fig. 5.16 BMI plot

Let us take the case of a researcher who likes to study the elephant’s details from its foot prints. Samples of leg
lengths and skull size from a population of elephants are collected. The two variables selected in the study are
“leg length” and “skull size” that are associated in some way. The elephants with short legs may have big heads.
So, an association may be found in these variables. Regression is an appropriate bag to describe a relationship
between head size and leg length. Is it correct? If the skull size increases, the length of leg increases? Does size
is a cause for small skull? The answers to these questions can be concluded by the regression analysis of two
variables.

Failures of linear regression

Linear regression may fail when false assumptions are made. For example, when you consider more independent
variables than the observations, false assumptions are made. Another failure of linear equation is it only looks a
linear relationships between dependent and independent variables. That is, when a straight line relationship is
between two variables (e.g., income and age), there may be a curve that can be fitted and is incorrect because of
the income.
5.5 PRINCIPLE COMPONENT ANALYSIS

Principle Component Analysis (PCA) is a widely used mathematical procedure in exploratory data analysis,
signal processing, etc. It was introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set
of multivariate data in terms of uncorrelated variable sets. It is widely used as a mathematical tool for high
dimensional data analysis. However, it is often considered as a black box operation whose results and procedures
are difficult to understand. PCA has been used for face recognition, motion analysis, clustering, dimension
reduction, etc. It provides a guideline for how to reduce a complex dataset to a lower dimensional one to reveal
the underlying hidden and simplified structures. There are a lot of mathematical proofs and theories involved in
PCA, but here we are not discussing them, we are just outlining how the analysis is carried out.

The following are the various steps involved in PCA:

1. Get the data

The aim of our study is to analyze the data; so the primary requirement is to get the dataset. We have
already mentioned the various methods of data collection schemes. Any of those approaches may be used
for collecting the data. In case of result comparisons, already processed datasets will be available in the
logs and data bank. Anyhow, acquiring the dataset is the primary step in PCA.

2. Data adjustment

For the analysis to carry out smoothly, the data need to be polished. For this process, you have to subtract
the mean from each data dimension. That is, all x values will have x̄ subtracted and all y value will have ȳ
subtracted from it respectively (if the selected data is a 2D dataset). This produce a dataset with zero
mean.

3. Covariance matrix is calculated

Covariance is usually measured between two dimensions. But whenever we have more dimensions, more
covariant values need to be calculated. In general, for n-dimensional dataset, we need to calculate play

style . When we are having more dimensions, what we do is, calculate the covariance between
two different dimensions. For example, in an “n” dimension matrix, the entry on row 2, column 3 is the
covariance value calculated between the second and third dimensions.

4. Calculate the eigen vectors and eigen values of the covariance matrix

To do this, we find the values of λ which satisfy the characteristic equation of the particular matrix. If A is
the matrix, we find A − λI, then find the determinant of (A − λI). Now solve the equation we obtained and
the result gives you the eigen values. Once the eigen values of a matrix (A) have been found, just find the
eigen vectors by Gaussian elimination.

5. Project and derive the new dataset

Once we have chosen the components (eigen vectors) that need to be incorporated in our data, we find the
new feature by taking the transpose of the vector and multiply it on the left of the original dataset. That is,

Final Data = Row Feature Vector × Row data Adjust.

Principle component analysis accounts for the total variance of the observed variables. If you want to see
the arrangement of points across many correlated variables, you can use PCA to show the most prominent
directions of the high-dimensional data. Using PCA, the dimensionality of a set of data can be reduced.
Principle component representation is important in visualizing multivariate data by reducing it to two
dimensions. Principle component is a way to picture the structure of the data as completely as possible by
using as few variables as possible.

5.6 SAMPLING

Suppose we need to collect the names of all developed countries in the world. How we are going to do this?
What are the criteria for this selection?

How will you collect a group of zoology scholars from a large scholar’s group? These are the generation of
subset from a larger set. So, sampling is a technique of subset creation from a population. A population can be
defined as total inclusion of all people or items with similar characteristics or wishes. Researchers who are
interested in the study of a selected population group are called target population. Researchers make conclusions
from the target population. For example, in a social research conducted for the survey regarding heart attack of
men aged between 35 and 40 years. The purpose of this study is to compare the effectiveness of certain drugs in
order to delay or prevent the future heart attacks. All men who meet some general criteria are selected in the
target group and is included in the actual study.

Samples are subset of population (Fig. 5.17a) obtained by subgroups (Fig. 5.17b). Sample selection is important
when it comes to reality. Suppose our population is unmanageably large (geographically scattered), studies on
this group may be considerably expensive and consume more effort. So, sample selection is important too when
the population size increases. Sampling steps include population identification, sample size identification and
sample selection.

Fig. 5.17 Population and sample

5.6.1 Important Terms

Universe: The entire items or units in any study of enquiry.

Population: Information-oriented items that possess some information or common characteristics. This may be
real or hypothetical.

Elementary units: A selected unit that possess relevant characteristics of a population. These constitute the
attributes in the object of study.

Sampling frame: Frame is constructed by a researcher or using some existing common population used in the
study. This frame must satisfy completeness, accurateness, adequateness and should be up to date.
Sample design: The target population selection needs a plan from a sampling frame. There are many techniques
that provide good sample design.

Parameters: Certain characteristics of a population.

Statistics: Characteristics of a sampling frame with specific objectives achieved by analysis.

Sampling errors: Errors that occur at the time of sampling. The errors can be measured as

Total error = Non-sampling error + sampling error + measurement error


Sampling error = Response error + frame error + chance error

Precision and accuracy: Precision and accuracy are measurements of a system in the degree of closeness.

Confidence level: Expected percentage of times that actual result falls in the fixed precision range.

Significance level: Predicted results fall outside the range.

5.6.2 Characteristics of Good Sample Design

In order to select good samples from a population, there are certain characteristics.

A representative sample: Suppose a researcher selects a small number of closely matched samples from a
population, it will help to generalize the results for a large universe being studied.

Sampling error should be minimized: Sampling error is caused in small samples obtained from a population.
It is a discrepancy in the original and obtained values. Efficient design and estimation strategies will reduce such
errors.

Economically viable: If the sample collection is expensive, it will not fit to the research budget. Ensure that the
sampling must be within the budget and reduce its expenses.

Marginal systematic bias: A systematic approach gives less error in the sampling procedure that will not
reduce the sample size.

Generalized samples: Population may be larger and geographically distributed. When sample creation for the
research is carried out, it must be generalized; then the errors will come down and the sample covers the whole
universe.

Applicable to population: When we select a sample, it should be noticed that it covers the entire population of
the study. The sample selection should not be limited to a part so that the entire population is satisfied.

5.6.3 Types of Sampling

Basically, there are two sampling techniques, probability sampling and non-probability sampling (Fig. 5.18).
Fig. 5.18 Types of sampling

Probability sampling

In this type of sampling, the samples are based on probability theory. A non-zero chance of selection is made in
every sample unit.

1. Simple random sampling

This is one of the most widely known types of sampling characterized by same probability selection for
every sample in the population. Simple random sampling selects n items from a population of size N such
that every possible sample size has an equal chance. For example, if we want to conduct a survey
regarding the next municipal election, we have to select 1,000 voters in a small town with population of
1,00,000 eligible voters. The survey is conducted with the help of paper slips. There are 1,000 slip
samples selected by random locations in the town. We have a random selection procedure for each people
in the location of the town.

Case Study

Suppose a social researcher wants to conduct a survey in a medium town of population 1.5 million people
regarding the waste management and disposal system of the municipal corporation. For this, the researcher
should collect the telephone numbers from the directory that contains 3 lakh entries. Then the researcher may
select 1,000 numbers in various locations using random sampling technique. He/She may start between 1 and
N/n and take every 300th name.

1. 2. Stratified random sampling

As we have seen that in sampling the entire population is divided into two or more mutually exclusive
segments based on research interest. In the population, data may be scattered so that homogeneous subsets
are required before sampling. Each subunit in the subgroup is called “strata”. The researcher
systematically divides the groups into subgroups based on his/her interest. For example, an institute in
South India needs to take sample students who are from south region of the country as well as foreign
origin. The population contains 10,000 students and its strata contain 6,000 – Tamil Nadu, 2,000 – Kerala,
1,000 – Karnataka, 500 – other regions and 500 – foreign students. Suppose the researcher is interested in
the sample from Kerala students, then ensure that for further study percentage of students in each group
must be selected by random sampling method. This is nearly same as stratified random sampling.
Stratification sampling is a popular technique and can be chosen for wide applications, because

Sampling is done independently in each status so that each subgroup has precision.
Management by stratification from a large population is conveniently high. Example: Branch
Officers conduct surveys from the main population.
Sampling is inherent in certain subpopulation. Example: Students in a college living in hostels.
Characteristics of the entire population may improve by stratification.
Division of homogeneous and heterogeneous subpopulation is possible.
Statistical advantage in the method with small variance with parameters of interest.

1. 3. Systematic random sampling

This sampling method is same as simple random sampling but subgroups are not selected randomly. For
example, you have a population with 10,000 records and you want a sample of 1,000. We can create
systematic random samples by

Number of cases in the population is divided by the number of required size. In the above example,
it is 10,000/1,000 gives a value 10.
Value has been selected between 1 and the attained value in the previous step of the above example:
the value should be between 1 and 10 (e.g., 5)
A step factor should be added to the selected value for the successive records. In the above example,
we are selecting the record numbers 6, 15, 25, …

There are many advantages of systematic sampling over simple random sampling. One of them is lesser
mistakes while sample designing. We can expect precise samples within one stratum. Bias sources can be
eliminated by this sampling technique.

1. 4. Cluster sampling

Cluster is a group of item with similar nature. In certain instance, the sampling units need to be grouped
into smaller units. For example, general election opinion in South India, North India, Southwest India, etc.
is collected for total election results. The states can be grouped together or clustered at the time of
sampling. When cluster sampling is chosen, certain points should be noted.

Clustering is done in large scale surveys.


The other type of random sampling can be combined with random sampling (clustering with strata).
Generally, for a given sample size and cluster, clustering sampling is less accurate than the other
type of sampling.

In cluster sampling, all items are included in the study; each of them may be focussed in any of the groups
or clusters. Figure 5.19 shows cluster sampling.
Fig. 5.19 Cluster sampling

Case Study

Suppose the agriculture department, Government of India, wishes to investigate the use of pesticides by the
farmers in India. Different states in India can form clusters. For example, South Indian States (Kerala, Tamil
Nadu, Karnataka and Andhra Pradesh) produce more rice than other clusters like Central India (Cluster contains
Madhya Pradesh, Chhattisgarh, Maharashtra). A sample of other clusters (states) would be chosen at random so
that farmers are included in the sample. It is easy to visit several farmers (even they are different in crop
plantation) to understand the use of pesticides. The group who visited may have good idea about what the
farmers produced with their interest.

Non-probability Sampling

Large-scale surveys are conducted in social researches. For example, say we want to study problems related to
homeless people. Here, the information gathering is a tedious task. In such cases, non-probability methods are
adapted for the sampling purpose. Primary difference between probabilistic and non-probability methods of
sampling is the element of population selected for the study. Four major non-probability sampling methods are
discussed here.

1. Availability sampling

Availability sampling is otherwise stated as convenience sampling. The researcher selects units that are
accidental or close at hand. Primary advantage of such a sampling is the usage of handout surveys. For
example, before designing, all students have to complete their survey regarding the introductory course.
This is particularly audience specific and gives more attention like interviews. We are using this sampling
method without the knowledge of availability of the population in the study represent. In convenience
sampling, we will try to find some people that are easy to find. For example, conducting a survey of
popularity of a Minister. We will try to have SMS or web-based poll. Is it the correct way? Many people
may vote more. Some important people may not. Such a survey is based on our convenience.

2. Quota sampling
Quota sampling determines what the population looks like in terms of certain qualities. This sampling
technique overcomes the disadvantages of availability sampling. In availability sampling, we have to
select certain representatives. Similarly in quota sampling, the participation of selected represents with
pre-defined population should be ensured. Quota samples are the representatives of a particular
characteristic, which has been previously set. This is one of the major disadvantages. For example, we
need to conduct a survey amongst youth aged 20–35 years. How will you divide its subrange like age
group of 20–25, 25–30, 30–35, etc. How many people are required in each quota? How many females are
required in the study? Such practical issues are difficult to address.

3. Purpose sampling

Based on a purpose, we have to select population in our study. It may be limited in certain groups. For
example, if we want to conduct education growth of Nagaland people, purposive sampling does not
produce a larger representative sample of the population but limited to certain group.

4. Theoretical sampling

How will you select a sample with keen interest group? Certain studies need theoretical background like
researches in algorithms. There we have to apply selected sampling methods theoretically. Figure 5.20
shows steps involved in the theoretical sample selection.

Fig. 5.20 Theoretical sample selection

5.6.4 Steps in Sampling Process

The major steps involved in sampling process are as follows:

1. Define the population


2. Sample frame identification
3. Choose a sampling design or procedure
4. Identify sample size
5. Draw the sample

The major steps involved in the sampling process are discussed in Fig. 5.21.
Fig. 5.21 Sampling process

A quick comparison of random and non-random sampling is given in Table 5.3.

Table 5.3 Comparison of random and non-random sampling

Case Study

A survey is conducted to study “the communication behaviour of users in different educational levels”.
Parameters selected in this study are as follows (Tables 5.4 and 5.5):

1. People with UG, PG and PhD degree


2. Males and females
3. Sample size

Distribution matrix shows the particulars of population in percentage.

Table 5.4 Qualification table 1

Table 5.5 Qualification table 2


We have to design quota sample in numbers as per the parameters which are

Qualification ratio ÷ Population

That is,

UG:PG:Phd = 4:7:9

Now Gender ratio ÷ Population

That is,

M:F = 3:2
So UG males = 420 × 60 = 12

5.7 SPSS: A STATISTICAL ANALYSIS TOOL

SPSS is a software package released by IBM for statistical analysis of data. It is most commonly used in social
science researches. Survey companies, marketing organizations and even government institutions make use of
this package for efficient and fast data analysis. If we look the history of SPSS, it can be traced to 1968, where
the SPSS Version 1 was released. Originally, SPSS stood for Statistical Package for the Social Sciences, but later
it was modified to Statistical Product and Service Solutions. The first developers of SPSS were Norman H. Nie,
Dale H. Bent and C. Hadlai Hull. It nearly took 15 years to have the next updated version, SPSSx-2. IBM
acquired the SPSS package during the year 2009, and from there on six updated releases were made in the span
of four years.

In general, we can express SPSS as a Windows-based program which is used to perform data entry and analysis.
It handles large volume of data and can perform all sort of analysis in a very narrow time gap. SPSS looks like a
spreadsheet similar to that of Microsoft Excel. All commands and options of SPSS can be accessed by the pull-
down menus at the top of the SPSS editor window. This means, once you have learned the basic steps, it is very
easy to figure out and extend your knowledge in SPSS through the help files. So, now let us look in detail how
SPSS does the data analysis.

5.7.1 Opening SPSS

After installing the software, SPSS will be included in the IBM SPSS Statistics folder. There will normally be a
shortcut on the desktop or we can access the software from the Start menu. Figure 5.22 shows the entire layout
of SPSS.

Start Menu → All programs → IBM SPSS Statistics → IBM SPSS


Fig. 5.22 Layout of SPSS

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014

Editor window of SPSS basically has two views, which can be selected from the lower left-hand side of the
screen: the Data view and the Variable view. Data view is where you see the data that you are using. Variable
view is where you can specify the format of your data when you are creating a file or while you load a pre-
existing file. The default saving extension is .sav. But we can also import data through the Microsoft Excel.
SPSS also has a SPSS Viewer window, which displays the output from any analyses that have been run. It also
displays the error messages. Information from the Output Viewer is saved in a file with the extension .spo.

When prompted to open the software, you will see a pop-up window with a bunch of options. If you are new to
SPSS, you can select the “Run the Tutorial” radio button on the right half of the dialogue box. If you want to
analyze a dataset, you can choose either “type in data” or “open an existing data source” to select data from your
computer. If your data file is shown in the list “More Files”, click the corresponding item and get it loaded. You
can also add data files from the file menu, without disturbing the current data session. However, SPSS can only
have one data file open at a time, so it is best to save the already opened data file before you try to open another
one. Figure 5.23 details the screen shot while SPSS is being opened.

Now let us look how data is being loaded into the SPSS (Fig. 5.24). From the beginners’ point of view, we can
load the data through Excel. Go to Files → Open → Data → Browse for the file. Once the data is loaded, the
Editor window shows the loaded data and the output window will show the log details of the recently loaded
data (Fig. 5.25).
Fig. 5.23 Opening SPSS

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014

Fig. 5.24 Loading data to SPSS

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014
Fig. 5.25 SPSS after loading the data

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014

After loading the data, if we want to see the details of the variables (Fig. 5.26) and its properties, we can go to
the variable view in the Editor window. In our example, the variable view will be as shown in Fig. 5.27.

We can alter or add more variables from this variable view (Fig. 5.26). We need to double click the particular
variable and a new dialogue box will be opened showing the various actions that can be performed at that
particular step. The screenshot (Fig. 5.27) shows the variable properties of Name. The variable view of data
always allows the user to understand the current statistics of the data snap-short.

5.7.2 File Types

Data files: A file with an extension of .sav is assumed to be a data file in SPSS for Windows format. A file with
an extension of .por is a portable SPSS data file. The contents of a data file are displayed in the Data Editor
window.

Viewer (Output) file: A file with an extension of .spo is assumed to be a Viewer file containing statistical
results and graphs.

Syntax (Command) files: A file with an extension of .sps is assumed to be a Syntax file containing SPSS
syntax and commands.
Fig. 5.26 Variable view

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014

Fig. 5.27 Variable property

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014

5.7.3 Analysis of Functions


There are more than 200 analysis functions available in the IBM SPSS-V21. All the analysis functions are very
much user friendly and each output can be easily stored and exported. This makes SPSS globally acceptable. In
this section, you are guided through two analysis processes: one sample T test (Fig. 5.28) and graph plotting.
While you are working with the software, please try each data analysis option and see how quickly the process is
taking place.

Fig. 5.28 Sample T test

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014

One-sample t test

The one-sample t-test is used to determine whether a sample comes from a population with a specific mean. This
population mean is not always known, but is sometimes hypothesized. So whenever this test is being done, the
sample mean, variance, etc. could be easily calculated and the output is viewed in the output window. The steps
for doing t test are explained in the figures. After loading the data to SPSS, you can select the t test from the
Analysis tab. Figure 5.29 shows the output of one-sample t test.

Once the process is being run, it takes up to five seconds (depending on the bulkiness of data) to print the results
in the output screen. The output can be saved in .spv format.

Graph representation

Now we can have another dataset for drawing/plotting the graph. There are various options to have pie chart,
line graph, bar graph, scatter plotting, line plotting, etc. In this section, we are including how to get a scatter plot
graph. The primary steps are the same as that of T testing. We need to open the data and load it. Now we need to
take the option graphs from the menu bar and need to select the required type of graph, such as scatter, bar or
histogram. Now select the x- and y-axis and then click on the start button. The out graph in .spv format will be
available on the output screen. Figure 5.30 shows the output of the graph which was plotted. You can save or
directly use the graph from the output screen.
Fig. 5.29 Output

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014

Fig. 5.30 Plotted graph

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014

The aforementioned are two sample analyses that could be done with SPSS. For knowing the tool better, take a
hands-on experience with IBM-SPSS Statistics.

EXERCISES
1. What are the differences between Pearson correlation co-efficient and Spearman’s correlation co-efficient?
At what condition a researcher select any one correlation co-efficient for the data analysis?
2. Suggest an appropriate sampling design and sample size.
1. Number of officials wants to estimate number of babies who are infected polio.
2. Number of offices required support “Kissan Centre” by Government of India.
3. The economic developer analyst wants to know educational bank loan and failure of loan recovery.
4. Number of women scientists whose area of interest in marine science and its data collection.
3. What do you mean by data analysis? Explain the various techniques used in data analysis.
4. Define and differentiate regression analysis to correlation analysis.
5. What do you mean by degree of correlation?
6. How is PCA significant in data analysis?
7. Explain in detail, strong–weak correlation.
8. Explain the concept of population and sample in terms of sampling.
9. What are sampling errors?
10. What are the characteristics of a good sample design?
11. In detail, explain the various types of sampling.
12. Differentiate stratified and systematic random sampling.
13. What are the various steps involved in sampling process?
14. In detail, explain the various data analysis tools available.
15. What are the various file types available in SPSS?
16. Define one-sample t test.
chapter 6
RESEARCH DESIGN
Objectives:

After completing this chapter, you can understand the following:

The detailed description regarding the need for research design


The description of features of a good design and its case study
The various types of research designs and its case studies
The description of methods of reasoning – induction and deduction
The case study of “Maternal mortality rate in below poverty line (BPL) families”

The principal task of all science is to find a systematic explanation for the phenomena of the material universe.
We derive such explanations and understandings from the data around us and by its analysis. Major task in every
research activity is to find such interesting relationships between attributes that make up the raw data. Research
design emphasizes techniques for an organized research process to achieve data collection, its development and
techniques for data analysis. This chapter describes the research design and specific methodologies adopted.

Research design is often confused with the choice of research method – the decision to use either qualitative or
quantitative methods. These decisions are part of the research design process but they are not the whole of it. It
is easiest to think of research design as having two levels. The first level describes the logic of research, its
framework or structure. It is at this level we come to know about the nature of the research whether it is
exploratory, descriptive or explanatory. We also make decision on what type of study need to be taken forward
like, whether to use a cross-sectional, a longitudinal, an experimental design or a case study oriented work plan.
At the second level, it is about the “mechanics” of the research – what type of data is used; is it primary or
secondary, qualitative or quantitative or a combination, what methods of data collections are employed, what
sampling strategy is made used of, and so on. The first level is about designing the overall structure of the
research and the second level concerns decisions about how to collect that evidence. Figure 6.1 shows the two
levels of research design.

Level one defines the research problem. It also brief on how to use the data that are collected. It also deals with
the sort of evidence that the researcher wants to prove. Deciding the logic and structure of the research is also
included in the level one of research design.

Fig. 6.1 Research design levels


Level two mentions the various methods involved in data collection along with the type of data that need to be
selected. Data collection instruments that are needed and the sampling strategy designing also come in level two.

6.1 NEED FOR RESEARCH DESIGN

Research design is highly important because we rely on it to deliver the necessary evidence to answer the
research problem as accurately and clearly as possible. A sound research design is a framework on which good
quality research is built. It also facilitates the smooth sailing of the various research operations making the
process efficient but with minimal expenditure. We can relate the research design to “constructing a house”. We
never start building a house from scratch. We need to be thoroughly aware of the entire plan, the elevation, the
electric, water lines, etc. to begin with. If we have a good and intact blueprint of the house, it is easy to build the
house in a very cost effective manner. Applying research design in this scenario, we can say that it is the initial
planning phase to create the blueprint. It is in accordance to this plan that the entire research activity runs.

Preparation of the research design should be done with great care as any error here may upset the entire project.
Even though the importance of design is known, many a times the researchers find it very difficult to plan their
work, so they just let it go with the flow. But they will face problems in between and may come up with
unfavourable conclusions. Flaws in designing may even result in rendering the whole research exercise futile.
Thus, a very appropriate and efficient research activity design needs to be developed before starting the work.
You can consult your teachers, researchers from your own area, etc. to share your thoughts with, before making
the design ready. When the ideas are arranged in an order, the flaws and inadequacies can be easily spotted. Also
your coresearchers can offer you valuable comments and evaluation reports for your design activities. If this is
not done properly, it may be difficult to provide a comprehensive review of the study.

6.2 FEATURES OF A GOOD DESIGN

A good research design is often characterized by various features such as flexibility, efficiency and cost
effectiveness. The design should describe each level of work, thereby providing a comfort zone for researchers.
The design which gives the smallest experimental error is supposed to be the best design in many investigations.
A design that includes maximum possible outcomes, and also incorporates almost all scenarios/perspectives
which can occur during the research can be seen as a well-formed design. For example, an organization is
carrying out a survey for finding the effect of road widening on people. It should consider all aspects, not just the
development it can cause to the area but also the scheme to relocate and rehabilitate those who lose land. It
should also mention all sorts of barriers that can come across in their work. The protest from people, how to face
a natural calamity, the extra costs that can occur and so on. A good design should incorporate all these key
features (Fig. 6.2). The nature of the problem and the objective of the research hold the key while defining the
features of a research. That is, the design suitable for one problem may not suit to another. Thus, it can be said
like, research designs are unique.

Fig. 6.2 Features of good design

Before finalizing a design, the following factors need to be well studied.


Objective of the work
Nature of research
Skill set of researcher and his team
Methods for obtaining information
Time and money involved in the work

If a researcher is keen in formulating new idea, then the research must be so designed that it should have
windows for considering all perspectives. The design should be very flexible; it should be able to incorporate
and implement any new idea or remove a faulty one even in later stage of design. There are also types of
research where studies of some already established facts and figures are being carried out. In such types of
research, skill of the research team is very important. They need to get all information on the subject. Here the
skills of the team are always under review. There are certain types of research which may produce a high impact
on its finding. But the researcher should always design his/her work depending on the time and amount of
money they can put into it for its successful completion. Else the project gets halt in between.

Validity is another key concept in assessing the quality of research. In other words, it refers to how well a
research design measures and what it claims to measure. How well it gives us clear and unequivocal evidence
with which to answer the research problem.

Internal validity in the context of research design refers to ability of the research to deliver credible evidence to
address the research problem. In other words, the job of the research design is to ensure that the research has
internal validity. In causal or explanatory research, for example, it is about the ability of the research design to
allow us to make links or associations between variables, to rule out alternative explanations or rival hypotheses
and to make inferences about causality. Internal validity is important when designing questions and
questionnaires. In this context, internal validity refers to the ability of the questions to measure what it is we
think they are measuring.

When a piece of research has external validity, it means that we can generalize from the research conducted
among the samples (or in the specific setting) to the wider population (or setting).

Research design is important because we rely on it to deliver the evidence necessary to answer the research
problem as accurately, clearly and unequivocally as possible. A sound research design is the framework on
which good quality research is built. This summarizes the main points,

Appropriateness to the research question


Lack of bias
Plan and strategy
Precision, power and budget

Case Study

A good research should always provide an outcome which is well desirable to the society. One of the best
features of a research design is to prevent the illicit usage of the outcome of a good research. So always a study
on the by-products of the result is necessary. For example, the drug generated for painkiller, Morphine, has a
great drawback. One of the derivatives of Morphine is Heroin, which is the most commonly found drug abuse.
Same is the case with amphetamines, which was a drug used for treating depression and nasal congestion. But its
derivatives are also used as a drug.

6.3 TYPES OF RESEARCH DESIGNS

There are several research designs and the researcher must decide in advance of collection and analysis of data
as to which design would prove to be more appropriate for his research project. The various types of research
designs include the following:
1. Exploratory research design
2. Conclusive research design
3. Experimental research design

There are also other research designs such as descriptive design, casual design, cross-sectional design,
longitudinal design, action design, case study design and historic design. All these come under conclusive
research design. The major category of research designs is explained in Fig. 6.3.

Fig. 6.3 Types of research designs

6.3.1 Exploratory Research Design

Exploratory research design is also called as formulative design. In this type of design, a working hypothesis is
developed by keen investigation of the team from an operation point of view. From such investigations, new
ideas and aspects are developed. As the name suggests, each idea thus evolved is studied in deep by exploring all
the possibilities and a final conclusion is made (Fig. 6.4). There needs to have a great flexibility in the process as
the research may start from a point and as time passes, it can get new objectives and scope which can lead to
more desirable solutions.

Fig. 6.4 Exploratory research design

In exploratory research design, the major ways by which the researcher explores a particular subject are either
by

Survey of related literature


Survey-experienced researchers
Insight ideas

Survey of literature is the most easy and simple method for formulating hypothesis. Literature survey helps the
researcher to understand those theories which are already stated. He/She can also get an understanding on the
evaluations done in theories on his/her area. Reading and reviewing other works help him/her to know, whether
the work he/she is planning to do is new or not. Survey of literature should not go very vast and outside the
scope of the subject.

Conducting a survey on the experienced researchers in the same field, the new researchers can get many ideas
and can get to know the hardships that the earlier had to face in the past. The major issue here is to spot the
people in their area. Once they have the people, he/she can schedule an official meeting and can get to know
them in detail and regarding the work they have done. It will be always good to prepare a list of questions that
the researcher needs to ask. If the researcher can share these questions to the interviewee, then it will be great as
he/she can be well prepared and it will help the researcher in obtaining more knowledge. Such an interview
experience may enable the researcher to define the problem more concisely and help in the formulation of the
research hypothesis.

There are situations where the researcher needs to make some decisions based on the studies and other literatures
available. He/She needs to choose a particular path for his/her further work. But sometimes there may not be a
fully proven theory for him/her to select the path. In such cases, he/she needs to go along with his/her insight
views and stick to it and move along. Mostly, the researcher never blindly believes his/her heart, it will always
be a logical decision. A hard heart is needed to jump a broad river.

Case Study

Consider a researcher who wants to study on migratory birds. He/She needs to first observe a whole year to find
out which time of the year does migration happens. This gives him/her a primary knowledge on, what are the
conditions required for migration. He/She has to then find out the general species or family of the birds who
migrates in group. From that he/she focusses on those birds, which migrate during a particular season in a year.
Again he/she can explore on what are the special characteristics of those birds. That is, from a general point,
he/she explores the situation and gets to a specific conclusion.

6.3.2 Conclusive Research Design

Conclusive research is more likely to use statistical tests, advanced analytical techniques and larger sample sizes,
compared with exploratory studies. It provides information to manager for making a correct decision. It consists
of formal research procedures which clearly defines the goals and needs. Unlike other research methods, a
questionnaire is designed in conjunction with a sampling plan. Various research designs such as descriptive,
casual, cross-sectional, longitudinal come under conclusive research. Figure 6.5 shows the types of conclusive
research design.
Fig. 6.5 Conclusive research design

Malhotra and Birks (2000) divided conclusive research design into further two categories: descriptive research
which is used to describe some functions or characteristics and causal research which is used to research cause
and effect relationships.

Descriptive research

Descriptive research studies are those studies which are concerned with describing the characteristics of a
particular individual, or of a group. Studies concerned with specific predictions, with narration of facts and
characteristics concerning individual, group or situation are all examples of descriptive research studies. Most of
the social research comes under this category.

There are certain requirements to carry out descriptive research. The researcher should be able to clearly define
his population and must define what amount of data is he expecting and the adequate methods for measuring.
Since it is a descriptive study, the whole aim is to obtain a well-defined step-by-step procedure; hence, more
careful planning is required in this type of research design. There should be certain points where more focus
need to be given.

Primary need is to keep track on the various objectives. The objectives need to be studied precisely to make sure
that the data collected are sufficient and relevant. Selecting the method by which data can be obtained is the
secondary concern. Several methods such as observation, questionnaires, interviewing and examination of
records can be done to collect the data. One major issue that needs to be answered is the identification and
avoidance of biased data, only then reliable data can be obtained.

In most of the studies, researchers take a sample population which could be easily tackled with minimum effort
that yields maximum information. The researcher may not be able to do a field visit and collect data most of the
time. So it will be a good idea to have a field officer who takes care of the data collection, which in turn helps to
provide us with valid and error-free data source.

The data collected must be processed and analyzed. This includes steps such as coding the interview replies,
observations, tabulating the data and performing several statistical computations. To the extent possible, the
processing and analyzing procedure should be planned in detail before actual work is started. This will prove
economical in the sense that the researcher may avoid unnecessary labour such as preparing tables for which
he/she later finds and he/she has no use or on the other hand. He/She can redo some tables because he/she failed
to include relevant data. Coding should be done carefully to avoid error in coding and for this purpose the
reliability of coders needs to be checked. Probability and sampling analysis may be used as well. The
appropriate statistical operations, along with the use of appropriate tests of significance, should be carried out to
safeguard the drawing of conclusions concerning the study.

Finally, there comes the question of reporting the findings. This is the task of communicating the findings to
others and the researcher must do it in an efficient manner. The layout of the report needs to be well planned so
that all things relating to the research study may be well presented in simple and effective style.

Casual research

The aim of causal research is to provide explanations. It is also known as explanatory research for that reason.
For example, it might be used to find out why people buy brand A and not brand B or why some people are in
favour of capital punishment and others are not. It can be used to rule out rival explanations and come to a
conclusion.

It may be thought of as understanding a phenomenon in terms of conditional statements in the form, “If X, then
Y”. This type of research is used to measure what impact a specific change will have on existing norms and
assumptions. Most social scientists seek causal explanations that reflect tests of hypotheses. Causal effect occurs
when variation in one phenomenon, an independent variable, leads to or results, on average, in variation in
another phenomenon, the dependent variable.

Causal research designs assist researchers in understanding the link between variables and the process of
eliminating possibilities that are found to be ambiguous. Not all relationships are casual. The possibility always
exists that, by sheer coincidence, two unrelated events appear to be related, so we can never always trust the
outcome of a casual research. This is also one of the major reasons why this research is not practised all over for
sensitive subjects. Conclusions about causal relationships are difficult to determine due to a variety of
extraneous and confounding variables that exist in a social environment.

Cross-sectional design

Cross-sectional research design allows you to collect data from a cross-section of a population at one point in
time. A single cross-sectional design involves only one wave or round of data collection; data are collected from
a sample on one occasion only. A repeated cross-sectional design involves conducting more than one wave of
(more or less) the same research with an independent or fresh sample each time. The use of an independent
sample at each round of data collection is what distinguishes repeated cross-sectional design from longitudinal
research. In longitudinal research, data are collected from the same sample on more than one occasion.

For example, a cross-sectional design can be used to provide data for an exploratory or descriptive research
enquiry – to understand the health information needs of older people. It can also be used to look for and examine
relationships between variables; to test out ideas and hypotheses; to help decide which explanation or theory best
fits with the data; and to help establish causal direction but not to prove cause. For example, it might be used to
determine what factors are involved in the decision to take out critical illness benefit insurance, and the
relationship between the factors.

Cross-sectional studies provide a clear “snapshot” of the outcome and characteristics associated with it, at a
specific point in time. Unlike an experimental design, where there is an active intervention by the researcher to
produce and measure change or to create differences, cross-sectional designs focus on studying and drawing
inferences from existing differences between people, subjects or phenomena. Cross-sectional studies are capable
of using data from a large number of subjects. One of the major disadvantages is that, as it provides a snapshot
of analysis there is always the possibility that a study could have differing results if another time-frame had been
chosen (Fig. 6.6).

Fig. 6.6 Cross-sectional design

Longitudinal design

Longitudinal research involves collecting data from same sample (example individuals or organizations) on
more than one occasion. The number and frequency of the snapshots or data collection points depend largely on
the research objectives. For example, if the purpose of the research is to look at the immediate, short-term
impact of an advertising campaign. Then a relatively small number of data collection points, fairly closely
spaced in time, may be required. To examine the longer term impact of advertising on a brand may require a
relatively large number of data collection points over many years.

The main application of longitudinal design is to monitor changes in the marketing or social environment,
changes that occur in the normal course of things and events that are planned. For example, changes as a result
of an advertising campaign, a new product launch or an election. Longitudinal design can be used to provide
data for descriptive research enquiry. Although it cannot be used to prove cause, it can be used to

Explore and examine relationships between variables


Establish the time order of events or changes, and age or historical effects
Help decide which explanation or theory best fits with the data
Help establish causal direction (rather than prove cause)

Table 6.1 shows the differences between longitudinal and cross-sectional study.

Table 6.1 Difference between longitudinal and cross-sectional study

Action research design

Action research is a type of qualitative research that seeks action to improve practice and study the effects of the
action that was taken. It can be seen as a cyclic process. The essentials of action research design follow a
characteristic cycle whereby initially an exploratory stance is adopted, where an understanding of a problem is
developed and plans are made for some form of interventional strategy. Then these interventions are carried out
and the results are observed. If we obtain a positive curve for our studies, then same approach is carried out until
the sufficient results are obtained. It is almost an iterative cycle. For a computer science student, they can easily
relate this to an iterative waterfall model. It helps in deeper understanding of a given situation, starting with
conceptualizing and particularizing the problem and moving through several interventions and evaluations.

In this case, design is focussed on a solution-based approach than any theoretical science. There is no hidden
control of the researcher. It is just the action of the researcher that gives a positive output and is taken as a step.
There are various disadvantages for this type of research design. It is fully dependent on trial and error method.
If the actions of the researcher are not in the correct direction, he/she may never get a valid conclusion. As it is
not upholding any theoretical background, a roll back will not be always possible. Action research is much
harder to write up because it is less likely that you can use a standard format to report your findings effectively.
Also the personal involvement of the researcher can bias the results.

Participatory action research (PAR) is a special kind of community-based action research in which there is
collaboration between the study participants and the researcher in all steps of the study: determining the
problem, the research methods to use, the analysis of data, and how the study results will be used. The
participants and the researcher are co-researchers throughout the entire research study.

Case Study Design


A case study is an in-depth investigation of a “case” for exploratory, descriptive or explanatory research
purposes, or a combination. A “case” might be, for example, a household, an organization, a situation, an event
or an individual’s experience. Case study research may involve examining all aspects of a case – the case as a
whole and its constituent parts. For example, a case study of a particular household may involve data collection
from individual members; in an organization, the elements of the case might be departments and individuals
within departments. A case study design might be made up of several case studies, not just one. A variety of
methods of data collection can be used in a case study, including analysis of documents, observation and
qualitative and quantitative interviewing.

Data may be collected in case studies through various means such as questionnaires, interviews, observations or
written accounts by the subjects. Content analysis is used in evaluating the data from case studies. Content
analysis involves the examination of communication messages.

One of the major disadvantages of case study is that they are time consuming and may be quite expensive.
Additionally, subject drop-out may occur during this type of study. Whenever a study is carried out over an
extended period, loss of subjects must be considered. A person may move from the locality or simply decide to
discontinue participation in the study. If the criteria for selecting the case is just because it represents an unusual
or unique phenomenon, then the interpretation of that study will only be applicable to that particular case.

Case Study

There are various treatments methods available to cure a disease. All of these methodologies have a common
goal which is to prevent or treat the disease. Each one has different origin and different way of treatment. Here
let us take the major branches Homoeopathy and Allopathy to have a comparative study.

Allopathy is drug-oriented methodology. It mainly depends on three things: hypothesis, experimentation and the
outcome of the experiment. This methodology basically depends on experimentation. In this, doctors treat a
disease based on symptoms not on the causes. But treatment in the case of Homoeopathy is in accordance with a
case-based reasoning approach. The doctor treats for the cause and not for the disease. It is highly dependent on
individuality. Two persons having the same disease are treated differently. The doctor makes a set of test cases,
which are the primary questions to the patient. From the primary test cases, he/she builds a hypothesis by having
an inference on the answers obtained. It helps the doctor to make a secondary set of questions; this continues
until he gets the correct root cause of the disease.

Like we mentioned, case-based studies are time consuming but a very promising result is obtained. The validity
of the result fully depends on the cases made and the inferences drawn through those case studies.

Historic research design

Leininger (1985) wrote that “Without a past, there is no meaning to the present, nor can we develop a sense of
ourselves as individuals and as members of groups.” Historical studies concern the identification, location,
evaluation and synthesis of data from the past. Historical research seeks not only to discover the events of the
past but to relate these past happenings to the present and to the future. The process of historical research is
basically the same as in many other types of scientific research. The problem area or area of interest is clearly
identified and the literature is reviewed. Research questions are formulated. Finally, the data are collected and
analyzed. Historic researchers need to have a bit more of curiosity, perseverance, tenacity and scepticism of a
detective. The data for historical research are usually found in documents or in relics and artefacts. Documents
may include a wide range of printed material. Relics and artefacts are items of physical evidence. The material
may be found in libraries, archives or in personal collections

The sources of historical data are frequently referred to as primary and secondary sources. Primary sources are
those that provide first-hand information or direct evidence. Secondary sources are second-hand information.
Primary sources should be used in historical research when possible. There are many examples of primary
sources: oral histories, written records, diaries, eyewitnesses, pictorial sources and physical evidence. The data
for historical research should be subjected to two types of evaluation. These evaluations are called external
criticism and internal criticism. External criticism is concerned with the authenticity or genuineness of the data
and should be considered first. Internal criticism examines the accuracy of the data and is considered after the
data are considered to be genuine. While external criticism establishes the validity of the data, internal criticism
establishes the reliability of the data. Internal criticism of historical data is more difficult to conduct than
external criticism. In the case of a written document, internal criticism would evaluate the material contained in
the document. Motives and possible biases of the author must be considered in trying to determine if the material
is accurate.

Consider the example of a researcher who is working in the area of vaccination research. If the researcher is
keen in finding the history of a particular drug, then he/she have to go through a lot of literature to find the
information regarding it. The information such as the place where it was developed, the period in which it was
created and so on needs to be known. While having a site visit to such a laboratory, he/she may get some
personal diaries or records of the doctors who were involved in the process of drug development. It can be used
as primary source of literature. Secondary literature can involve some information/articles wrote by some
external agents regarding the drug development.

So far we have seen the various subcategories in conclusive research design.

Figure 6.7 details the major difference between exploratory and conclusive research design.

6.3.3 Experimental Research Design

Two identical samples or groups are recruited: one is known as the test group and the other is the control group.
The test and control groups are matched on key criteria – in other words, the two are the same on all key
characteristics. The independent variable – the one that is thought to cause or explain the change – is
manipulated to see the effect that this change has on the dependent variable. This is referred to as the treatment.
The treatment is applied to the test group but not to the control group. The purpose of the test group is to observe
the effect of the treatment, whereas the purpose of the control group is to act as a comparison. Since the
treatment is not applied to the control group, any changes that take place will not be due to the independent
variable but to some other factor(s). The design of the experiment should be such that the effect of other factors
is limited or controlled. Comparison of the test and control groups allows us to determine the extent of the
change that is due to the independent variable only. This type of experimental design is called as the “after with
a control group”. There are variations to this design: when the independent variable and the dependent variable
are measured in both groups before the “treatment” takes place, the design is called as “before and after”; if a
control group is used it is called, not surprisingly, as “before and after with a control”.
Fig. 6.7 Difference between exploratory and conclusive research design

The main application of experimental research designs is to determine whether a causal relationship exists and
the nature of the relationship, and to rule out the effects of other variables and to establish the time order or
sequence of events (which is the cause and which is the effect). It is used in marketing experiments, for example,
to make decisions about elements of the marketing mix, to evaluate effectiveness of advertisement A or B, or the
weight of advertising spend, or the combination of media to be used in a campaign.

An experimental design was used to examine the effects of monetary incentives on response rates to a mail
survey. More specifically, to examine the relative effectiveness of prepaid cash incentives, a cash prize and an
equivalent value non-cash prize was introduced for increasing mail survey response rates.

Scientific control group

In the social sciences, control groups are the most important part of the experiment, because it is practically
impossible to eliminate all of the confounding variables and bias. There are two main types of control, positive
and negative, both providing researchers with ways of increasing the statistical validity of their data. Figure 6.8
shows the confounding effect on dependent and independent variables.

Positive scientific control groups are where the control group is expected to have a positive result, and allows the
researcher to show that the set-up was capable of producing results. Positive scientific control groups reduce the
chances of false negatives.
Fig. 6.8 Confounding variable effect

Negative scientific control is the process of using the control group to make sure that no confounding variable
has affected the results, or to factor in any likely sources of bias. It uses a sample that is not expected to work. A
negative control can also be a way of setting a baseline.

Research design principles

Let us now focus on the three basic principles of experimental research design.

1. Principle of replication

According to the Principle of Replication, the experiment should be repeated more than once. By doing so,
the accuracy of the experiments gets increased. This is just similar to the primary school level chemistry
practicals, where we repeat the experiment many times to find the acid base balance. Conceptually
replication does not present any difficulty, but there are some computational difficulties. But as this
replication increases the accuracy of the study, this is widely accepted in almost all experimental research
designs.

2. Principle of randomization

It provides protection, when we conduct an experiment, against the effect of extraneous factors by
randomization. In other words, this principle indicates that we should design or plan the experiment in
such a way that the variations caused by extraneous factors can all be combined under the general heading
of “chance”.

3. Principle of local control

According to this principle, we should plan the experiment in such a manner that we can perform a two-
way analysis of variance, in which the total variability of the data is divided into three components:
experiment groups, extraneous factor and experimental error. Here the extraneous factor (the source of
variability) is deliberately divided amongst all groups, so that the variation caused can be measured and
can be eliminated from the experimental error.

6.4 INDUCTION AND DEDUCTION

In logic, the two methods of reasoning may be classified as deductive and inductive approaches. They are two
different methods that can be utilized to arrive at a solution.

6.4.1 Deduction

In this method, we try to start reasoning from a generalized approach and then narrow our thinking down to a
more specific one. This can also be referred to as top-down method. Here we start making a theory in our area
and then mould it into a more specific hypothesis. Then we add on observations to this hypothesis. Finally, we
test this hypothesis and arrive at a confirmation of our theory. Figure 6.9 shows the method flow of deduction.
Fig. 6.9 Deduction

In deduction, the conclusions drawn are necessary and true. Deduction can also be mentioned as a process of
arriving at a conclusion based on situations that you know to be true.

An example for deduction logic is, “To earn a master’s degree, a student must have 32 credits. Tim has 40
credits, so Tim will earn a master’s degree.” Here from a generalized theory we are narrowing the concept to a
specific point. The generalized fact is, to earn a master’s degree we need 32 credits. Then we observe a specific
specimen, Tim. He is having 40 credits, so according to our hypothesis, we can arrive at a confirmation that Tim
will earn master’s degree. This type of logical reasoning is termed as deduction.

6.4.2 Induction

This may be considered to be a method working in the opposite direction of deduction. That is, we go from a
specific observation or from our own experience towards a generalized theory. Induction is based on situations
that we assume to be true. So the conclusions drawn here are only probable and may not always be true. This is
also called the bottom-up method. Figure 6.10 represents the method flow in inductive reasoning.

Here we make a specific observation, study it to see if there is a definite pattern to an observation, if yes, we
form a tentative hypothesis based on our assumption and then end up making a generalized theory.

Consider an example, “This ball from the bag is red. That ball from the bag is red. A third ball from the bag is
red. Therefore all the balls in the bag are red.” This statement is an example of inductive generalization, which
uses evidence about a limited number of things to make an overall assumption of most things of that type. The
authentication of this type of a statement depends on the number of things used to make the assumption and the
total number of things.

Fig. 6.10 Induction


Deductive method is mainly concerned by forming and testing of hypothesis with only a limited point of view,
whereas inductive method is more permissive and indulgent.

Case Study: Maternal mortality rate in below poverty line (BPL) families

So far we have learned the various research design techniques. Now let us take a general case study and see how
the process works. We can divide the whole process into 10 steps.

Step 1: Determine the key research questions or hypotheses. What do you need to know? What relationships are
you interested in investigating?

For this, our answer is, the topic under study is, “Maternal mortality rate in below poverty line families”.

Step 2: Determine very clearly what your dependent variables are. It is always easier and less costly to
investigate a one-to-one relationship. However, it is often the case that we want to know either how multiple
causes lead to a single effect, or to multiple effects.

In this study, for example, we are looking at one dependent variable and one independent variable. Dependent
variable is to reduce the maternal mortality rate. The independent variables are Women in BPL families.

Step 3: Identify crucial intervening or confounding variables. These are variables that may intrude themselves
between your purported dependent and independent variables.

Here, below poverty vs above poverty, women marital status, age during pregnancy, poverty can be the
confounding variables.

Step 4: Define and identify specific and measurable (can be qualitative or quantitative) indicators for the
dependent variables.

Study on maternal mortality rate in below poverty line (BPL) families shows that the various reasons for this
may be listed as, lack of proper nutrition to mother during the time of pregnancy, lack of proper antenatal
checkups or hospital visits, more of home deliveries as compared to institutional deliveries, unforeseen
pregnancy related complications that cannot be managed in a home set-up, lack of proper hygiene in the delivery
area leading to infections and so on.

Step 5: Data sources to the research need to be determined.

In our project, participating women, earning member of the family, secondary members, the government record
of past studies done in this area, non-participating members for comparative study, the village doctor and his/her
care staff can be included as data source.

Step 6: Determine the methods that you need in order to gather the information and data required. It should also
meet levels of rigour that will satisfy the intended audience(s) of the research.

Principle: Adopt methods that are as complex as is needed, but simple in implementation. In our case,

Quantitative survey that allows us to make statistically valid comparisons between participants and non-
participants on the dependent variable (Maternal Mortality Rate)
Secondary data review (results of similar studies in similar contexts at different places)
Key informant interviews (on socio-cultural and gender context along with family background)
Semi-structured interviews with women and men (qualitative and participatory numbers)
Focus group discussions
Study on government aids and other socio welfare groups.
Step 7: Determine the overall research design strategy, longitudinal (data will be collected at least twice over
some period of time) and cross-sectional (a single point in time). To a very large extent, this decision is
determined by your actual research questions from step one. It can also be influenced by the resources you have
available. It can also be influenced by a longer term evaluation strategy. In our case, we can have a cross-
sectional research design.

Step 8: Determine the appropriate sampling population. Who or what is the largest population that you wish to
be able to describe and/or account for in relation to your hypothesis? The key is to be very clear with ourselves
and our stakeholders about who or what we are leaving out, and why and what population our research actually
represents in terms of its findings.

Here the sample group can be taken to be those women who are expected to deliver in the next 6 months. The
rate and number can differ according to the various data sources that we have taken in our previous steps.

Step 9: Select a sampling strategy for every level identified in step nine. There are basically two broad types of
sample: probability and non-probability samples. Probability samples, also known as random samples, allow
every analytical unit to have an equal chance of being selected. They allow you to generalize to a larger
population. They are also best for avoiding researcher bias. Non-probability samples, also known as purposive
samples, cannot, on their own, allow you to generalize to a wider group. They are more subject to researcher
bias although this can be minimized through establishing strict, objective criteria for choosing data sources.

Step 10: Select a comparison group. Identify at least one comparison group – sometimes called a control group,
and discuss the importance of the result and get to a conclusion and finally make the hypothesis.

EXERCISES

1. What is a research design and what are the types of basic research designs?
2. How can the basic research designs be compared and contrasted?
3. What are the major sources of errors in a research design?
4. How does the researcher co-ordinate the budgeting and scheduling aspects of a research project?
5. What elements make up the marketing research proposal?
6. What factors should the researcher consider while formulating a research design in international marketing
research?
7. How can technology facilitate the research design process?
8. What ethical issues arise when selecting a research design?
9. Why is research design important?
10. What do the following terms mean: internal validity and external validity?
11. Describe what is meant by exploratory and descriptive research. Give examples of each type of enquiry.
12. What is the aim of causal research?
13. To make sound causal inferences, what sort of evidence must a research design provide?
14.
1. What is involved in a cross-sectional research design; give examples?
2. What is involved in a longitudinal research design; give examples?
15. Describe the main stages in an experimental research design. Give an example of the application of an
experimental design.
16. What is case study research? What methods of data collection are suited to a case study approach?
17. What type(s) of research design are suitable for a descriptive research enquiry?
chapter 7
HYPOTHESIS FORMULATION AND TESTING
Objectives:

After completing this chapter, you can understand the following:

The description of hypothesis: Null and Alternate


The definitions of various important terms
The various types of research hypothesis
The detailed explanation of hypothesis testing
The description of Z-test
The description of t-test
The description of f-test
The description of types of errors in making a decision
The definition of ROC graphics

Is more than 50% of the Indians are under Above Poverty Line (APL) category?
How can I purchase a new car with a mileage above 20 km per litre?
How do we confirm that numbers of Tigers are reduced recently in Indian forest?

Such questions are answered by conductive studies in a large population. The samples are selected from a
population for analysis and propose results. The above questions are formulated a type of queries or hypothesis.
We are using inferential statistics in the hypothesis formulation because it allows to measure behaviour in
sample learning in the populations that are too large or increasable. We select samples from the population
because we know how these parameters are related. The inferential statistical significance is a measure which
calculates the difference between samples that would likely observed in a population. It is identified as a ratio of
size of a relationship or effect and sampling error measured. Hence, the test can be defined as

7.1 HYPOTHESIS

A hypothesis is an assumption of a population parameter. A parameter selected from dataset is a population


mean or proportion. The parameter must be identified before a critical analysis. In order to test these parameters,
a procedure is formulated called hypothesis testing. Basically, two outputs are obtained as a result of testing a
hypothesis – “reject” or “fail to reject”.

Suppose a car manufacturer offers that their car gets a mileage of 24 km per litre for their new model. This
advertisement may be correct or wrong. Assume that the dealer may overrate the mileage. How we can identify
their rating is correct or not? We can conduct a test that results reject or accept the car manufacturer’s
advertisement by forming a hypothesis. We can formulate a statement about the value of population parameter
regarding the car manufacturing.

“Average mileage of the newly branded car is above 24 km per litre.”

An alternative statement can also be consider

“Average mileage of the newly branded car is below 24 km per litre.”


There are two formulations for a population parameter – null hypothesis and alternate hypothesis. This is formed
based on the question asked by the researcher.

7.1.1 Null Hypothesis H0

For testing a hypothesis, we have to formulate an assumption. A null hypothesis (H0) is such an assumption
statement with “no effect” or “no difference” or “no change”. The output of the test is “reject” or “fail to reject”.
We are not accepting the null hypothesis, H0. We can either “reject” or “fail to reject” our assumptions made
after the test. Due to insufficient evidence in data, the test may reject to justify H0. If we reject H0, the data is
statistically significant and a significant evidence in the data for a test to justify its rejection. For example, a null
hypothesis is,

“Average number of cars in Delhi homes is at least two (H0: µ ≥ 2)”,

where µ is the population mean.

7.1.2 Alternate Hypothesis H1

It is opposite of the null hypothesis. Strong evidence can be adopted against the null hypothesis, which is
considered as alternate hypothesis (H1). A designed statistical test is required to conduct against the null
hypothesis to assess its strength. For example,

“Average number of cars in Delhi homes is less than two (H1: µ < 2)”.

In a clinical test, new drug is determined to test again an existing drug. The null hypothesis might be “the new
drug is not better than the current drug”, on average. The null hypothesis can be written as H0: “there is no
difference between the two drugs on average”.

It must be noted that there required an alternative hypothesis to compare two drugs. The alternative hypothesis is
“there are some different effects in the two drugs compared to two drugs on average”. It is denoted as

H1: “the two drugs have different effects, on average”.

Normally, we are giving special consideration to null hypothesis because it is related to a statement being tested.
But, if null hypothesis is rejected, the alternative hypothesis related to the statement to be accepted. Once the test
has been carried out, we can either “reject H0 in favour of H1” or “do not reject H0”. We cannot conclude “reject
H1” or “accept H1”.

For a true null hypothesis, we must arrive “do not reject H0”. This means that there is not sufficient evidence
against H0 in favour of H1. If null hypothesis rejects, then the alternate hypothesis may be true.

7.2 IMPORTANT TERMS

Independent variable: The variables those are manipulated or controlled or changed.

Dependent variable: The variables that change on account of independent variable.

Probability: It is a theory of uncertainty.

Population: A population is a complete set of items, which share at least one property in common that is the
subject of a statistical analysis.
Sample distribution: It is a distribution of sample statistics.

Mean: In mathematics and statistics, the arithmetic mean is simply the sum of a collection of numbers divided
by the number of numbers in the collection.

Standard deviation: In statistics and probability theory, the standard deviation (SD) measures the amount of
variation or dispersion from the average.

Power: It is the probability of correctly rejecting a false null hypothesis.

Hypothesis testing: Hypothesis testing is a systematic testing that claims or ideas about a group or population.

Test statistics: If the observed data are agreed with the desired model, then we use test statistics.

Null hypothesis (H0): The hypothesis that claims true initial assumptions.

Alternate hypothesis (H1): This is a claim that a given assumption against the null hypothesis.

Upper tail test: If the population characteristic is greater than the hypothesized value, we select the upper tail
test.

Lower tail test: If the population characteristic is less than the hypothesized value, then we select the lower tail
test.

Two-tailed test: If the population characteristic is not equal to the hypothesized value, then we can select the
two-tailed test.

Level of significance: The probability of a type I error to the level of the hypothesis test (denoted by α).

P-value: It is the smallest level of significance at which null hypothesis can be rejected.

P-value ≤ α means to reject null hypothesis at level α.

P-value ≥ α means to accept the null hypothesis at level α.

Critical region: The region marked when the null hypothesis is rejected by the calculated value.

Critical value: The value that determines the boundary of the critical region.

Type I error: When we reject the null hypothesis but null hypothesis is true, then type I error occurs.

Type II error: When we are not rejecting the null hypothesis but the null hypothesis is false (alternate
hypothesis is true), then type II error occurs.

Type III error: Type III error occurs during the one-tailed test, when a rejection region is located in the wrong
tail.

7.3 TYPES OF RESEARCH HYPOTHESIS

Formulation of hypothesis is an important kind of process in the research. It varies with the type of research
conducted. Figure 7.1 shows the division of research hypothesis. The inductive hypothesis testing starts from a
theory and confirms after a successful test. In deductive testing, the theory is formulated from an observation
after a couple of steps. In both cases, we have to prove that H0 is “reject” or “fail to reject”. The observation is
an important step and formulates testing procedure for proving our assumption. Another way of classification of
research approach is based on the size of the work a researcher involved.
Fig. 7.1 Deductive and inductive hypotheses

An example for deductive hypothesis statements: Every day, I leave for work in my car at 7 o’clock. The drive
takes one hour and I arrive to the office on time. So, if I leave for work at 7 o’clock today, I will be on time.
These statements are logical, but not rely on initial correct premise. We cannot prove the hypothesis completely
because there is always some initial premise that may be wrong or dependent. An example for deductive
hypothesis statements: Today I left to the office at 7 o’clock and I will be on time at the office. So, every day I
must leave the house at 7 o’clock, I will be on time at the office. These types of statements are commonly used
in science and engineering. In principle, it is not always accurate and assumes that generally correct. In the
above statements, if some heavy traffic today, I cannot reach the office on time. It is illogical to assume that all
conditions are correct on a specific dataset.

There are quantitative and qualitative hypothesis formulation based on the research approach.

7.3.1 Characteristics of Qualitative Methods

It is very frequent in the use of research questions against the objectives. The questions mainly start with what or
how and seek to explore or describe the experiences. These questions describe to compare groups or variables.
The questions under continual review are revised at the time of study. Usually questions are open ended which
will not refer any literature or theory.

The qualitative research design strategies include Naturalistic inquiry (e.g., Study real-world situations),
Emergent design flexibility (e.g., Avoid rigid designs) and Purposeful sampling (e.g., Case studies with
organizations or people). Selected characteristics of qualitative research methods are as follows:

According to Patton (1985)1, “Qualitative research is an effort to understand situations in their uniqueness
as part of a particular context and the interactions there”.
The researcher is the primary instrument for data collection and analysis.
Qualitative research involves fieldwork for data collection.
The uses of an inductive research strategy and the research build abstractions, concepts, hypothesis or
theories rather than tests existing theory.

7.3.2 Characteristics of Quantitative Methods

It is very frequent of getting research questions in the objectives of survey projects. The use of hypothesis is also
more frequent in quantitative experimental research in variables. The comparison and relationship between
variables are represented in experiments. Theories are useful to deduce testable propositions. Dependent and
independent variables are separated at the experiments and measured separately. Combinations of objectives and
hypothesis are not considered. Make an alternate forms of experiments crated to focus to the audience of
research. It is analyzed frequent comparison of relationships between variables.
Selected characteristics of quantitative research methods are

Quantitative methods emphasize on collecting and analyzing information in the form of numbers
This emphasis on collecting scores that calculate distinct attributes of people and organizations
Quantitative method emphasizes on the procedures of comparing groups or relating factors about people
or groups in experiments, correlation studies and surveys.

7.3.3 Generation of Research Hypothesis

Figure 7.2 shows how a research hypothesis formulated. For example, a hypothesis is formulated with two-
sample variable association of x and y.

1. If there is an association between x and y, then “x and y are associated”.


2. If x is dependent on y, then “y is related to x”.
3. Increase in the values of x appears to result fall in value of y, then “x increases, y decreases”.

From this scenario, we have different hypothesis formulations from the two variables. We can formulate three
different hypotheses:

1. A simple statement of association between two variables x and y. There is no indication in the association
of variables x and y that cause change in any other variable.
2. A simple statement of association between two variables x and y. It is the conditional of the values of y
and contingent upon the condition of the variable x.
3. Consider a relation between variables x and y with reference to its values. The values may be depending
on nature of association between the variables.

Fig. 7.2 Formulation of research hypothesis

7.4 HYPOTHESIS TESTING

Hypothesis testing is a procedure based on probability theory of sample evidence used to determine whether the
hypothesis is a realistic account and should not be rejected or is unreasonable and should be rejected. Otherwise
we can say that hypothesis testing or significance testing is really a systematic procedure to test claims or ideas
about a group or population.

7.4.1 Hypothesis Testing


The following procedure steps are used for hypothesis testing:

1. Hypothesis formulation (null hypothesis and alternate hypothesis)


2. Formulate decision criteria
3. Data collection and ordering
4. Evaluate null hypothesis

In your research work, a hypothesis formulation is important. Some important points that can use to formulate a
good hypothesis are as follows:

1. Dependent and independent variables identification


2. Identify the relationship between the variables
3. Identify simple hypothesis rather than complicated hypothesis
4. Use statistical procedures for analysis
5. Identify the population in your study
6. Develop testing strategies

7.4.2 Test the Level of Significance

Significance level refers to create a judgement that makes a decision regarding the stated value in a null
hypothesis. This creation is based on the probability of static measurement in a sample population when the null
hypothesis is true.

We have to consider the probability theory for the level of significance. The level of significance, α, is a
probability value typically between 0.01 and 0.05 (1% to 5%) used to reject null hypothesis when the test is true.
Typically, the level of significance is divided into three statistical tests – left-tailed test, right-tailed test and two-
tailed test. The level of significance is measured on the top of a curve (Fig. 7.3). In the research, we use sample
mean of a population that is explained in Fig. 7.3. Here, 1000 is the population mean. An empirical rule that
gives 95% of the samples is selected from the population that falls within two standard deviations of the mean.
So, there are less than 5% probabilities that will be beyond the standard deviations from the population mean.

Fig. 7.3 Significance test curve

Left-tailed test

In a one-tailed test, there are two possibilities within the population mean. It is either lower than or higher than
the hypothesized value. In hypothesis testing, we should be clear about how to reject a null hypothesis. In left-
tailed test, the rejection region is on the left side of the curve.

Let µ is the mean of a population, x̄ is the sample mean and k is the value obtained from the test. Figure 7.4a
shows left-tailed test with H0 = k and H1 < k.
Fig. 7.4 Significance curve left-tailed, right-tailed and two-tailed tests

Right-tailed test

Figure 7.4b shows right-tailed test with H0: µ = k and H1: µ ≥ k. The dark area is rejection rejoins and is marked
at the right-side of the curve.

Two-tailed test

In a two-tailed test, if the sample mean is significantly higher or lower than the pre-settled value of the mean
population, then it rejects the null hypothesis. This test is appropriate when null hypothesis is specified and
alternative hypothesis is not equal to the specified value of the null hypothesis. Figure 7.4c shows two-tailed test
with H0: µ = k and H1: µ ≠ k. Here, we consider α/2 at both regions. There are two rejection regions in two-
tailed tests. The rejection area is distributed to 2.5% each at both ends for a two-tailed test if the significance
level is 5%. The acceptance region will be 95%.

For example, a survey shows that children in the New Delhi city watch TV on an average of 3 hours per week.
The population is obtained as a group of 50 children and the time is recorded (in hours) among the entire
children in the New Delhi city locations. In this study, we believe that children watch more than (≥) or less than
(≤) 3 hours of TV per week. Figure 7.5 shows level of significance in one and two tails of a sampling
distribution. When the value for a population mean in a null hypothesis is true, sampling means in the tails are
unlikely to occur less than a 5% probability.

Fig. 7.5 Watching TV example in left-tailed, right-tailed and two-tailed tests

7.5 Z-TEST
Z-test is used to compare a known population sample mean (µ) to determine significant difference in the
population. Using the Z-test, a population must be drawn with known mean (µ) and standard deviation (σ). The
Z-test gives a Z-score that is associated with the population. The Z-score becomes large when a greater
difference in the sample mean and the population mean and is lesser if the sample is associated with the
population. The Z-test is useful when the population is normally distributed and the samples are larger. The Z-
score can be obtained by,

where x̄ is the mean of a sample, µ is the population mean and σ is the standard deviation. Figure 7.6 shows a
sample Z-test with two conditions.

Fig. 7.6 Conditions in the Z-test in left-tailed, right-tailed and two-tailed tests

Case Study

The plus two average results of schools in district A for aptitude test in mathematics examination is 75 with a
standard deviation 8.1. A random sample of 100 students is selected from a school with a mean score of 71. The
basic question exists is “Is it indicates these students are significantly less skilled than the average students in the
same district in mathematical aptitude?” (Use a level of significance 5%.).

Solution: The mean and the standard deviation are given for the population as µ = 75 and σ = 8.1 for district A.
We know that marks of 100 students are selected as sample in a certain school in district A. So we test the
sample mean against the population mean with a known standard deviation. We can use Z-test that is based on
the normal curve or normal distribution.

Step 1: Hypothesis formulation

The null hypothesis contains =, ≥ or ≤ and the alternate hypothesis contains “not”. Consider the question
“students are significantly less skilled than the average students in the same district in mathematical aptitude”.
The “less skilled” may be considered as less than 75 marks out of 100 marks. The null hypothesis and alternate
hypothesis can be
H0: µ ≥ 75
H1: µ < 75

Step 2: Significant level

Select a level of significance and is suggested that 5% or α = 0.05.

Step 3: Identify statistical test

Select the statistical test and here we select Z-test because of a large sample (n = 100).

Step 4: Formulate a decision rule

This is a one-tailed test to the left because the alternate hypothesis states µ < 75. For α = 0.05, we find Z in the
normal curve with a probability 0.05 to the left of Z. This means that the critical value or negative value of Z has
a table value of 0.5 − 0.05 = 0.45 or z = −1.645. That is, P(Z < −1.645) = 0.05.

Because 0.4500 is exactly half way between 0.4495 and 0.4505. This indicates a half way between 1.640 and
1.650 to get Z = 1.645. Since 71 is to the left of 75, we have Z = −1.645. That is P(Z < −1.645) = 0.05. Figure
7.7 shows the test of significance at 5% level.

Fig. 7.7 Left-tailed test with 5% level of significance

We can reject the null hypothesis with Z < −1.645 and can accept the alternate hypothesis that the students in the
school sampled are “less skilled” in mathematics aptitude than those in district A.

Step 5: Decision

From the sample of 100 students, it is found that the mean score is 71. Using the statistical test (Z-test)

We can reject the null hypothesis because the computed Z = −4.938 < −1.645 (critical Z value). That is, the
students in the school are not less skilled in mathematical ability.

7.6 t-TEST
If the standard deviation is unknown for a population, then we can use the t-test. In this case, population is
normally distributed and slightly skewed and a large sample is taken.

where x̄ is the sample mean, µ is the population mean, S is the standard deviation of the sample and n is the
degrees of freedom. For example, does an average box of seed oil contain more than 368 grams? A random
sample of 36 boxes showed x̄ = 372.5 and S = 15. Test at the X = 0.01 level. Here we do not know the standard
deviation. The hypothesis can be formulated by

H0: µ ≥ 368
H1: µ ≥ 368

Test statistics

H0: µ ≥ 368
H1: µ ≥ 368
α = 0.01
N = 36

Critical value: 2.4377

Figure 7.8 shows the test criteria.

Fig. 7.8 t-Test criteria

From the values, we cannot reject H0 at α = 0.01. So we can conclude that there is no evidence that true mean is
more than 368.

Case Study
A cake manufacturer concerned about the mean fat content of a certain grade of submits to an independent
laboratory for a random sample of 12 cakes from different lots for analysis. The percentage of fat in each of the
cake is as follows:

21, 18, 19, 16, 18, 24, 22, 19, 24, 14, 18, 15.

The manufacturer claims that the mean fat content of this grade of cake is less than 20%. Assume that the
percentage of fat content is normally distributed with a standard deviation of 3. Perform a hypothesis and test
whether his claim is correct or not.

The percentage of fat content in a cake is X and is represented to test. Null and alternate hypotheses can be

H0: μ ≥ 20%
H1: μ ≥ 20%

Select significant level, α = 0.05

The critical region Z < −1−645

Figure 7.9 shows the Z-test criteria.

Fig. 7.9 Z-test criteria with 5% significance level

By the test statistics,

The resulted value is not in the critical region; so there is no evidence to the manufacturer’s claim. Suppose the
standard deviation (σ) is unknown for the above case and the percentage of fat content is normally distributed.
So, we cannot use Z-test and we need to select t-test.
H0: μ ≥ 20%
H1: μ ≤ 20% (one tail test)

Selected significant level, α = 0.05

Degree of freedom v = n − 1 = 11

Critical region t ≤ 1.796

By the t-test statistics,

S2 = 1/11 (4448 − 2282/12)


= 10.545
S = 3.25

The value is not in the critical region; hence, the manufactures’ claim is acceptable.

Types of t-test

There are two types of t-tests based on its name: paired and unpaired, dependent and independent, one sample
and two samples. In all cases, the null hypothesis of a population is specifically set to 0. In unpaired t-test, two
quite unrelated samples are considered by comparison. During the test, certain assumptions are requested to
make. Samples are obtained from a normal population and the population has a common standard deviation. The
tested null hypotheses for the means of the population are same. The standard error (SE) can be computed as the
difference in means for two population sizes m and n,

where σ is the common standard deviation of two normal populations. In paired test, two samples are compared
and there is a link between the samples. A meaningful association between the samples, if two samples, m1−
m2/SE, where SE is the standard error computed as the standard deviation by the square root of the sample size.
m1− m2 is the mean of sample difference.

7.7 f-TEST
f-test is a statistical test, which has a f-distribution under the null hypothesis. The f-distribution is a continuous
probability distribution (also known as Fisher-Snedecor distribution). A model is suitable in f-test, the data using
least squares. f-test hypothesis is based on the standard deviation comparison. That is, it first tests the null
hypothesis,

H0: σ12 − σ22 , against the alternate hypothesis.

f-test is a comparison of variance of two sets of data that can lead to many prediction.

where x̄ and ȳ are mean of the distributions.

For example, we would like to compare alcohol concentration of two different Rums using the same instrument.
The test may be conducted in different ways. On the first day, a standard deviation S1 = 9 ppm was found. On
the next day, the standard deviation was 2 ppm. We have used six instruments for the entire dataset. If we
combine the dataset and there are some significant differences in the dataset, then anyone should be avoided.

We begin with null hypothesis, H0: σ12 − σ22 and the alternate hypothesis, H1: σ12 ≠ σ22 , which shows a two-
tailed test because of two types of cases, σ12 > σ22 and σ12 < σ22.

The f-value can be calculated, where S1 and S2 are standard deviations of the given sample.

fcal = 92/22 = 20.25 We have to check 10% confidence level by 5% on left and right tail end each. The tabulated
value, v = 5 at 90% confidence level is, 5.05.

That is, fcal > fval and we reject the null hypothesis. There is a 90% certainty in the differences of standard
deviation of two hypotheses.

Steps to follow while doing f-test

Step 1: If you are given standard deviations, then go to Step 2. If you are given variances to compare, then move
to Step 3.

Step 2: Square both standard deviations to get the variances. For example, if 1 = 9.6 and 2 = 10.9, then the
variances (S1 and S2) would be 9.62 = 92.16 and 10.92 = 118.81.

Step 3: Take the largest variance, and divide it by the smallest variance to get the f-value. For example, if your
two variances were S1 = 2.5 and S2 = 9.4, divide 9.4/2.5 = 3.76. Placing the largest variance on top will force the
f-test into a right-tailed test, which is much easier to calculate than a left-tailed test.
Step 4: Find the degrees of freedom. Degrees of freedom is your sample size minus 1. As you have two samples
(variance 1 and variance 2), you will have two degrees of freedom: one for the numerator and one for the
denominator.

Step 5: Check the f-value that you have calculated in Step 3 with f-table value. If the f-table value is smaller than
the calculated value, you can reject the null hypothesis.

Example – f test

Calculate f-test for given 10, 20, 30, 40, 50 and 5, 10, 15, 20, 25.

Calculate variance of first set

For 10, 20, 30, 40, 50:

Total inputs (N) = (10, 20, 30, 40, 50)


Total inputs (N) = 5

Calculate variance of second set

For 5, 10, 15, 20, 25:


Total inputs (N) = (5, 10, 15, 20, 25)
Total inputs (N) = 5

Variance = SD2
= 7.90572
= 62.5

To calculate f-test

f-test = (variance of 10, 20, 30, 40, 50)/(variance of 5, 10, 15, 20, 25)
= 250/62.5
=4

The f test value is 4.

7.8 MAKING A DECISION: TYPES OF ERRORS

Most of the time we believe that our decisions are correct while testing a hypothesis. Sometimes the decisions
may get into the wrong side. Thus, there are two possibilities of such decision errors. After making the null
hypothesis, we may conduct a test with the test data. There are basically four outcomes (Fig. 7.10).
Fig. 7.10 Four outcomes of test

Retaining the null hypothesis decision may fall into correct or incorrect regions. A null result or null finding
occurs when the decision is to retain a true null hypothesis. Here our assumptions as well as test are correct. The
incorrect decision retains the false null hypothesis, which is an example of Type II error (False Positive).

When we reject a null hypothesis, it can be correct or incorrect. The incorrect decision to reject a true null
hypothesis is an example of Type I error (True Positive). That is, Type I error is the probability if rejecting a true
null hypothesis and Type II error is the probability of accepting a false null hypothesis.

For example, a wine manufacturer claims that his brand contains 18% of alcohol in a bottle. There are 16
collected samples that show there is a standard deviation of 1% with a sample mean 15. For a significant level
0.05, test the statistics,

Null hypothesis H0: µ = 18

Alternate hypothesis H1: µ ≠ 18

From this complete test statistics, N = 16 and mean (x̄) = 15. Thus, z = (15 − 18)/1 = −3.0. The obtained Z-score
from the table is less than 1.96. Reject H0 and can be concluded that alcohol content of each bottle is not unique
(18%)

This is a Type I error, because naturally, the process of wine making is standard and most of the bottle contains
same volume of alcohol. This error may be due to selection of wrong samples.

7.8.1 Confusion Matrix

The Type I and Type II errors can be mapped to a confusion table. The outcome of the test must be any one –
True Positive, True Negative, False Positive and False Negative (Fig. 7.11).
Fig. 7.11 Confusion matrix

We can illustrate Type I and Type II errors with an example. Consider a mechanic inspects the brake pads of a
car for minimum specified thickness. The null and alternate hypotheses are

H0: “Car brakes meet the standard for the minimum specified thickness”.

H1: “Car brakes do not meet the standard for the minimum specified thicknesses”.

Here two types of errors can occur:

Type I (False Positive): The brakes are fine but the inspection report gives “a break pad replaces”.

Type II (False Negative): The brake pads are beyond the minimum specified thickness, but the mechanic cannot
identify anything wrong in the break pad. Actually, the brake pads are damaged. He reported that “no
replacement of brake pads”.

Case Study

For example, a class of 100 people is selected for a study. Suppose our null hypothesis is that the average age of
the whole class is 31. We now observe a value of 20 in a sample we have randomly chosen. That is very unlikely
to have happened by chance if our null hypothesis was true. It is much more likely that our null hypothesis is
false. So we decide to reject our null hypothesis in favour of an alternative, which is that the true average age of
the whole class is less than 31. So the procedure involves us deciding on a rule for when we should reject and
when we should not. We could say that in this example with a NULL of 31, if we find a value below 26, then we
will not believe our NULL of 31. But there is still a chance that the true value could be 31 even though we
observe a value below 26. So maybe we want to reduce the chances of that Type I error. To do that we decide on
a new rule; it says that we reject if we find a value below 23. Now if we find a sample value below 23 (that is 8
years younger than the hypothesized value of 31) we reject and this time we are “more sure” of our decision than
when we used 26 as our cut off value. So formally these regions “below 26” and “below 23” are our rejection
regions and it is possible to mathematically construct the intervals so that the Type I error they correspond to is
equal to some given value (e.g., 0.1 or 0.05 or 0.01).

7.8.2 Quantification of Classification

A standard method is available to quantify the classification for a learned model. A classification model is a
mapping from instances to predicted classes. A classifier results a set of real output values between a threshold
value. Consider two classes of prediction problem (binary classification). The outcome is labelled into positive
(p) or negative (n) class. For the classification model, the outcome is labelled as actual class and the predicted
class, Y, N. For the binary classifier, there are four possible outcomes. If the outcome and the actual value are
positive, then it is classified as true positive (TP), but the actual value is counted as negative then it is classified
as false positive (FP). Similarly, if the outcome and the actual values are negative, then it is counted as true
negative (TN), but the actual value is positive, then it is counted as false negative (FN). Figure 7.12 shows the
two-by-two contingency table representing the dispositions of the set of instances.

Fig. 7.12 Confusion matrix

The hit rate or True Positive Rate (TPR) of a classifier is estimated as

True positive rate is also called Sensitivity (Se).

The False Positive Rate (FPR) of the classifier is

False Positive rate is also called 1 − Specificity (Sp), i.e.,

The accuracy can be estimated as

7.9 ROC GRAPHICS

Receiver Operator Characteristic (ROC) graph is a technique for visualizing, organizing and selecting classifiers
based on their performance (Fig. 7.13).
Fig. 7.13 Graph created by threshold of a test set. Sensitivity (TPR) on x-axis and 1 − specificity (FPR) on y-
axis. There are 17 threshold scores assigned to the classifier

The ROC analysis is widely used to analyze the classifiers and generally useful as performance graphing
method. ROC is a graph with true positive rate (sensitivity) on the x-axis and false positive rate (1 − specificity)
on the y-axis. Figure 7.13 shows a basic ROC graph. The lower left point (0, 0) represents a no false positive
error but also gains no true positives. The point (1, 1) shows the opposite strategy of unconditionally issuing
positive classification. The point (0, 1) represents perfect classification.

Some classifiers, such as neural network or hidden Markov model classifiers, yield a score, a numeric value that
represents the degree for which an instance is a member of the class. Such a ranking or scoring classification can
be used with a threshold to produce a discrete (binary) classifier (If the classifier output is above the threshold,
the classifier produces a Y, else an N). In the ROC space, each threshold produces a different point. We can vary
the threshold and trace a curve through the ROC space. Any ROC curve generated from a finite set of instances
is acutely a step function, which approaches a true curve as the number of instance approaches infinity. In Fig.
7.13, a threshold of +∞ produces the point (0, 0). As we lower the threshold to 0.99, the first positive instance is
classified. As the threshold is further reduced, the curve climbs up to the right and end up at (1, 1) with a
threshold 0.01. It seems that the ROC point at (0.1, 0.8) produces its high accuracy (84%). At this point, the
classifier shows a better at identifying likely positives than at identifying likely negatives. So the best accuracy
is found to be at a threshold of 0.85.

Classifier appearing on the left-hand side of an ROC graph near the x-axis shows positive classifications only
with strong evidence, and so they make a few positive errors, but with low true positive rates. Classifiers on the
upper right-hand side of an ROC graph may produce positive classifications with weak evidence, and so they
classify nearly all positives correctly, but they often have high FPRs.

Case Study

Consider a person who is conducting an interview for selecting people from a college for a company. There are
200 students in the population. 100 students are correct selection of his choice and 100 students are not required
based on his requirements. At the time of interview, mix of these students is frequently arriving into his desk.
How can we determine his performance in terms of specificity and sensitivity?
Fig. 7.14 Diseases mapping and test results

Among the population, 100 students are from correct selection list (P = 100) and 100 students are from wrong
selection list (N = 100). This interviewer selects 96 students from P; then we can say that his sensitivity (Se) is
96/100 = 96%.

If he selects 5 students from N, then we can say that his specificity (Sp) is (100 − 5)/100 = 95%.

The accuracy of his selection can be (96 + 95)/200; that is, the accuracy is (Se + Sp)/(P + N).

The test result can be identified by its analysis, if it is a binary classification. For example, diagnostic test for a
disease may produce two outcomes, positive (diseased) and negative (not diseased). Figure 7.14 shows the test
result and disease state mapping.

There is a test for a candidate population. If the candidate is diseased, then (D = 1); otherwise (D = 0). The
classifier predicts that a candidate is diseased and he is a patient; then our system is highly sensitive. If the
candidate is not a patient and system says that she is not diseased then our system is highly specific. If the
system wrongly predicts, Type I or Type II error occurs. That is, the system predicts that a candidate is diseased
but he is not a patient, then Type I error occurs.

If the system wrongly classified as a candidate as he is not diseased but he is a patient (Diseased), then Type II
error occurs. We can test such test by setting up with threshold. The correct classification (true positive and true
negative) is shown in Fig. 7.15. The wrong classification is shown in Fig. 7.16.
Fig. 7.15 True negative and true positive

In order to have a correct classification, we need to fix a boundary called threshold. The final value obtained by
the classifier may be selected for our classification. That is, our expectation is +1 for a positive classification but
most of results are close to 0.85 for a positive result. Then we can say that the threshold is 0.85 for positive
results and it can be fixed for better sensitivity. The threshold can be fixed by successive experiments. Figure
7.17 shows this threshold fixation from positive output class to the negative output class. The threshold can be
fixed by frequently changing the threshold either from positive class of data to the negative class of data. An
optimum accuracy, sensitivity and specificity are obtained by a proper threshold fixing.

Fig. 7.16 False negative and false positive


Fig. 7.17 Threshold fixations in the test result to obtain a maximum accuracy

Case Study

Suppose a researcher wish to design a new edge detection algorithm. After analyzing the dataset, he may try this
technique to predict the scene containing chair or not (for example). How he succeeded in his technique?
Consider his claim is 95% accuracy, that means, he is correctly classified 95% of the data with correct pixels and
5% of pixels are missed. Figure 7.18 shows his classification.

Fig. 7.18 Prediction matrix and result matrix

In this problem,
In this scenario, suppose we have 90 examples with edges (Class 1) and 100 examples without edges (Class 0).
So, the total numbers of examples are 190; then,

Sensitivity = 60/90 = 66.67%


Specificity = 80/100 = 80.9%

The prediction and result matrices are shown in Fig. 7.18.

EXERCISES

1. State the steps involved in the hypothesis testing.


2. What are two decisions that a researcher makes in hypothesis testing?
3. What is a Type I error? Give an example.
4. What is a Type II error? Give an example.
5. What is the power in hypothesis testing?
6. What are the critical values for a non-independent sample non-directional (two-tailed) Z-test at a 0.05
level of significance?
7. Define null hypothesis and alternate hypothesis.
8. Differentiate upper tailed and lower tailed test.
9. What are the types of research hypothesis?
10. What are the major characteristics of quantitative method?
11. Explain the various steps involved in the formulation of research hypothesis.
12. Explain in detail the various hypothesis testing methods.
13. Explain the situation where Z-test is used. Also mention why t-test is used in such a situation?
14. What are the various types of t-test?
15. Explain confusion matrix.
16. What do you mean by ROC graphics?
chapter 8
TEST PROCEDURES
Objectives:

After completing this chapter, you can understand the following:

The definition of parametric and non-parametric tests


The detailed explanation of ANOVA
The definition of Mann-Whitney test and its performance in SPSS
The definition of Kruskal-Wallis test and its step by step procedure in SPSS
The definition of chi-square test and its test procedure in SPSS statistics
The definition of multivariate analysis and its test procedure in SPSS-MANOVA

After collecting the data, we need to have ways for analyzing them to retrieve results and facts from it.
Researchers are very much interested in analyzing such data. For example, a population of Tigers in India is
collected. It may be the total number from reserve forests, tropical forests or plantation forests. From these
numbers, how can we say that population of tigers is getting reduced day by day? or How can we say that
Bengal Tigers are not distributed throughout the reserve forest? Such questions are answered by the help of test
procedures. There are two types of test procedures depending upon the operational content of the data –
parametric and non-parametric tests. These two tests are basically based on population of data.

8.1 PARAMETRIC AND NON-PARAMETRIC TESTS

Non-parametric statistics covers those data not belong to any distribution. That is, you can use non-parametric
statistics if your measurement scale is nominal or ordinal. Non-parametric statistics are less powerful because
they use less information in their calculations. This statistics uses ordinal position of pairs of scores rather than
mean or standard deviation. The non-parametric methods are used in study of populations that takes on ranked
orders. For example, a survey needs to review a movie by rating from one to five stars. Here, non-parametric
methods are easy to be used by the researcher as the population is not much known. Mann-Whitney test, rank
sum test and Kruskal-Wallis test are the examples of non-parametric tests.

In parametric test, all information about the population is known completely. Also the researcher is aware of
which test is suitable for his/her application. The parametric tests use certain assumptions that produce more
accurate and precise estimates. Power of the statistics can be used in these types of test and may mislead if the
assumptions are not valid. t-test, z-test, f-test and ANOVA are examples of parametric tests. This chapter
discusses various test procedures used in research data analysis. Table 8.1 gives major differences between
parametric and non-parametric test procedures.

Table 8.1 Difference between parametric and non-parametric tests


8.2 ANOVA

ANOVA stands for Analysis of Variance. It is a statistical model developed by R.A. Fisher to analyze the
variations among and between the groups. In ANOVA, we use variance like a quantity to study the equality or
non-equality of means of particular populations. There are various test methods available to compare and study
the means of two groups, but ANOVA finds importance in those situations where we need to compare the means
of more than two groups. Thus, this method is very useful for researchers and scientists in various fields such as
biology, statistics, business and education. Using ANOVA, we can infer whether a particular group is drawn
from a population whose mean is under investigation. ANOVA is essentially a procedure for testing the
difference among different groups of data for homogeneity. If there is a wide variation from its mean, then
ignore the particular group from its parent population.

The logic used in ANOVA to compare means of multiple groups is similar to that used with the t-test to compare
means of two independent groups. The assumptions needed for ANOVA are as follows:

1. Random, independent sampling from the k populations


2. Normal population distributions
3. Equal variances within the k populations

The first assumption is the real critical one, whereas we can avoid the points 2 and 3, if the sample size is very
large.

8.2.1 Tricks and Technique – ANOVA

The stepwise technique for working out ANOVA is as follows:

1. Obtain the mean of each sample, i.e., obtain X1, X2, X3, …, Xk when there are k samples.
2. Take the deviations of the sample means from the mean of the sample means and calculate the square of
such deviations which may be multiplied by the number of items in the corresponding sample, and then
obtain their total. This is known as the sum of squares for variance between the samples.
3. Divide the result of step (2) by the degrees of freedom between the samples to obtain variance or Mean
Square (MS) between samples.
4. Obtain the deviations of the values of the sample items for all the samples from corresponding means of
the samples and calculate the squares of such deviations and then obtain their total. This total is known as
the sum of squares for variance within samples (or SS within).
5. Divide the result of step (4) by the degrees of freedom within samples to obtain the variance or MS within
samples.
6. Now find the total variance, SS for total variance = SS between + SS within
7. Finally, find the F-ratio,

8.2.2 One-way and Two-way ANOVA

One-way ANOVA is used to compare means from at least three groups from one variable (Fig. 8.1). The null
hypothesis is, “all the population group means are equal” and the alternative hypothesis is, “at least one of the
population means differs from the other”. This may seem confusing as we call it as Analysis of Variance even
though we are comparing means. Reason of the test statistic uses evidence about two types of variability. We
will only consider the reason behind it instead of the complex formula used to calculate it.

Fig. 8.1 One-way ANOVA

Step-by-Step ANOVA

The method used today for comparisons of three or more groups is called analysis of variance (ANOVA). This
method has the advantage of testing whether there are any differences between the groups with a single
probability associated with the test. The hypothesis tested is that all groups have the same mean. Before we
present an example, notice that there are several assumptions should be met before using the ANOVA.

Essentially, we must have independence between groups (unless a repeated measures design is used); the
sampling distributions of sample means must be normally distributed; and the groups should come from
populations with equal variances (called homogeneity of variance). The basic principle of ANOVA is to test for
differences among means of populations by examining the amount of variation within each of these samples,
relative to the amount of variation between the samples.

In short, we have to make two estimates of population variance, viz., one based on between samples variance
and the other based on within samples variance. Then the said two estimates of population variance are
compared with F-test,

Two-way ANOVA (Fig. 8.2) technique is used when the data are classified on the basis of two factors.
Fig. 8.2 Two-way ANOVA

ANOVA test example

Consider the situation where ANOVA is used for the statistical analysis. The details are given in Table 8.2.

Table 8.2 Group details

The calculated mean and standard deviations can be represented in Table 8.3.

Table 8.3 Mean and standard deviation


Table 8.4 Significant-probability table

According to the F significant/probability (Table 8.4) with df = (2, 21), F must be at least 3.4668 to reach p ≤
0.05, so F score is statistically significant. In other words, the hypothesis can be accepted or supported.

8.3 MANN-WHITNEY TEST

The Mann-Whitney U test is the counterpart of the independent sample t-test. It is a non-parametric test of the
null hypothesis. The Mann-Whitney U test is used to compare differences between two independent groups
when the dependent variable is either ordinal or continuous. For example, one might compare the speed at which
two different groups of people can run 100 metres, where one group has trained for six weeks and the other has
not. Unlike the independent sample t-test, the Mann-Whitney U test allows to draw different conclusions about
the data depending on assumptions of data’s distribution.

Requirements

Two random, independent samples


The data is continuous – in other words, it must, in principle, be possible to distinguish between values at
the nth decimal place
Scale of measurement should be ordinal, interval or ratio. For maximum accuracy, there should be no ties,
though this test – like others – has a way to handle ties

Null hypothesis: The null hypothesis asserts that the medians of the two samples are identical.

Case Study

To have more understanding, let us look a basic case study. A general market study of two products under same
category is being done. Consider Brand X tea and Brand Y tea. A voting scheme is carried out where each
participant can just rate one product and the results need to be compared. Before proceeding, how will we
confirm the test that needs to be used is Mann-Whitney?

Here we have two conditions, with each participant taking part in only one of the conditions. The data are ratings
(ordinal data), and hence a non-parametric test is appropriate, thus concluding the test to be done is the Mann-
Whitney U test. Table 8.5 shows the input obtained from an audience rating of two brands X and Y.

Table 8.5 Audience rating of the brands

Step 1:

Rank all scores together (Table 8.6), ignoring which group they belong to.

Step 2:

Add up the ranks for Brand X, to get T1 and add up the rank for Brand Y, to get T2. The largest rank is selected
to work out in the equation

Table 8.6 Ranking of the brands


T1 = 3 + 4 + 1.5 + 7.5 + 1.5 + 5.5 = 23
T2 = 11 + 9 + 5.5 + 12 + 7.5 + 10 = 55

T2 is selected with rank 55.

Step 3:

We have to initialize N and M. These are number of participants in each group, and the number of people in the
group that gave the larger rank total. Here, both values are equal to six.

Step 4:

Perform the calculation depending on the formulae,

Step 5:

Compare the resultant U, with the critical value of Mann–Whitney test. For our result to be significant, our
obtained U has to be equal to or less than this critical value. From the critical table,

The critical value for a two-tailed test at 0.05 significance level = 5

The critical value for a two-tailed test at 0.01 significance level = 2

So, our obtained U is less than the critical value of U for a 0.05 significance level. It is also equal to the critical
value of U for a 0.01 significance level.

This means that there is a highly significant difference (p ≤ 0.01) between the ratings given to each brand.

8.3.1 How to Perform Mann-Whitney Test in SPSS?

The Mann-Whitney test can be easily performed in the SPSS software. The following are the series of steps that
need to be performed for calculating the Mann-Whitney test.
Step 1:

Go to Analyze > Non-parametric Tests > Legacy Dialogues > 2 Independent Samples … on the top menu, as
shown in Fig. 8.3.

Fig. 8.3 Step 1 – SPSS

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014

Step 2:

Now a dialogue box of two independent samples tests appears, as shown in Fig. 8.4.

Fig. 8.4 Step 2 – Independent sample test

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014
Now select two datasets that need to be compared and also see whether the Mann-Whitney test is ticked and
click ok. This will generate the output for the Mann-Whitney U test.

8.4 KRUSKAL–WALLIS TEST

The Kruskal-Wallis one-way analysis of variance by ranks was named after William Kruskal and W. Allen
Wallis. It is a non-parametric method for testing whether samples originate from the same distribution. It is used
for comparing two or more samples that are independent, and that may have different sample sizes, and extends
the Mann-Whitney U test to more than two groups. The parametric equivalent of the Kruskal-Wallis test is the
one-way analysis of variance (ANOVA).

To conduct the Kruskal-Wallis test, using the K independent samples procedure, cases must have scores on an
independent or grouping variable and on a dependent variable. The independent or grouping variable divides
individuals into two or more groups, and the dependent variable assesses individuals on at least an ordinal scale.

Assumptions

Because the analysis for the Kruskal-Wallis test is conducted on ranked scores, the population distributions for
the test variable do not have to be of any particular form. However, these distributions should be continuous and
have identical form.

1. The continuous distributions for the test variable are exactly the same (except their medians) for the
different populations.
2. The cases represent random samples from the populations, and scores on the test variable are independent
of each other.
3. The chi-square statistic for the Kruskal-Wallis test is only approximate and becomes more accurate with
larger sample sizes.

8.4.1 Step-by-Step Kruskal-Wallis Test

This test is appropriate for use under the following circumstances:

1. When we have three or more conditions that we want to compare.


2. When each condition is performed by a different group of participants; i.e., we have an independent-
measures design with three or more conditions.
3. When the data do not meet the requirements for a parametric test.

Consider a situation, where four groups of students were randomly assigned to be taught with four different
techniques, and their achievement test scores were recorded (Table 8.7). Are the distributions of test scores the
same, or do they differ?

Table 8.7 Database of four students and their scores

Step 1:
Rank all of the scores, ignoring which group they belong to. The procedure for ranking is as follows: the lowest
score gets the lowest rank. If two or more scores are the same, then they are “tied”. “Tied” scores get the average
of the ranks that they would have obtained, if they had not been tied. Here is the scores again, now with their
ranks in brackets (Table 8.8)

Table 8.8 Database with rank

Find “Tc”, the total of the ranks for each group. Just add together all of the ranks for each group in turn.

Step 2:

Calculate the test statistics H, using the formula,

where N is the total number of participants, Ti is the rank total for each group. Thus in our problem,

Step 3:

The degrees of freedom is the number of groups minus one. In this problem, we have four groups, and so we
have three degrees of freedom. Assessing the significance of H depends on the number of participants and the
number of groups. Now we use the table of chi-square value to find significance of H. From the table, we have
the rejection region described as, a right-tailed chi-square test with a = 0.05 and df = 4 − 1 = 3, reject H0 if H ≥
7.81. Thus, we get the conclusion like, there is sufficient evidence to indicate that there is a difference in test
scores for the four teaching techniques.

8.4.2 Steps for Kruskal-Wallis Test in SPSS

1. Open the dataset in SPSS to be used for the Kruskal-Wallis test analysis.
2. Click Analyze, click (mouse over) Non-parametric Tests, Legacy Dialogues and then click K Independent-
Samples as shown in Fig. 8.5.
Fig. 8.5 Kruskal-Wallis in SPSS 1

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014

You should now be in the Test for Several Independent Samples dialogue box Click on the (Test Variable),
and click arrow to move it to the Test Variable List: box Click on the (Grouping Variable), and click arrow
to move it to the Grouping Variable: box Click Define Range and continue Click options under statistics
and select Descriptive, click continue (Fig. 8.6).

Fig. 8.6 Kruskal-Wallis in SPSS 2

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014

3. Make sure that Kruskal-Wallis H is checked in the Test Type area. Click Ok, and now it is ready to
analyze the output.

A large amount of resources is required to compute exact probabilities for the Kruskal-Wallis test. Existing
software only provides exact probabilities for sample sizes less than about 30 participants. These software
programs rely on asymptotic approximation for larger sample sizes.

8.5 CHI-SQUARE TEST

A chi-squared test is a statistical hypothesis test in which the sampling distribution of the test statistic is a chi-
squared distribution when the null hypothesis is true. The chi-square is used to test hypotheses about the
distribution of observations into categories. The null hypothesis (H0) is that the observed frequencies are the
same (except for chance variation) as the expected frequencies. If the frequencies observed are different from
expected frequencies, the value of chi-square goes up. If the observed and expected frequencies are exactly the
same, then chi-square equals zero ( χ2 = 0).

In chi-square test, we test whether a given χ is statistically significant by testing it against a table of chi-square
distributions, according to the number of degrees of freedom for our sample.

Conducting chi-square analysis

1. Make a hypothesis based on your basic question


2. Determine the expected frequencies
3. Create a table with observed frequencies, expected frequencies and chi-square values using the formula:

4. Find the degrees of freedom


5. Find the chi-square statistic in the chi-square distribution table
6. If chi-square statistic ≥ calculated chi-square value, do not reject the null hypothesis and vice versa

Assumptions

The chi-square test for independence, also called Pearson’s chi-square test or the chi-square test of association, is
used to discover if there is a relationship between two categorical variables. When we choose to analyze the data
using a chi-square test for independence, we need to make sure that the data we want to analyze “passes” two
assumptions.

1. The two variables should be measured at an ordinal or nominal level (i.e., categorical data).
2. The two variables should consist of two or more categorical, independent groups. Examples of
independent variables that meet this criterion include gender (2 groups: Males and Females), profession (5
groups: surgeon, doctor, nurse, dentist, therapist) and so on.

8.5.1 Test Procedure

The test procedure consists of four steps: (1) state the hypotheses, (2) formulate an analysis plan, (3) analysis of
sample data and (4) interpret results.

Stating the hypothesis

Suppose, variable A has r levels, and variable B has c levels. The null hypothesis states that knowing the level of
variable A does not help predict the level of variable B. That is, the variables are independent.

Analysis plan

The analysis plan describes how to use sample data to accept or reject the null hypothesis. The plan should
specify significance level and test method.
Analyze sample data

Using sample data, find the degrees of freedom, expected frequencies, test statistic and the P-value associated
with the test statistic.

Expected frequencies: The expected frequency counts are computed separately for each level of one categorical
variable at each level of the other categorical variable. Compute r * c expected frequencies, according to the

following formula, where Er,n is the expected frequency count for level r of variable A and
level c of variable B, nr is the total number of Z sample observations at level r of variable A, nc is the total
number of sample observations at level c of variable B, and n is the total sample size.

Test statistic: The test statistic is a chi-square random variable (χ2) defined by the following equation,

where Or is the observed frequency and Er is the expected frequency.

P-value: The P-value is the probability of observing a sample statistic as extreme as the test statistic. Since the
test statistic is a chi-square, use the Chi-Square Distribution Calculator to assess the probability associated with
the test statistic.

Interpret results

If the sample findings are unlikely, given the null hypothesis, we reject the null hypothesis. Typically, this
involves comparing the P-value to the significance level, and rejecting the null hypothesis when the P-value is
less than the significance level.

8.5.2 Chi-Square Test Procedure in SPSS Statistics

Step 1:

Click Analyze > Descriptive Statistics > Cross-tabs … on the top menu, as shown in Fig. 8.7.

Fig. 8.7 Chi-square in SPSS

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014

Now a cross-tab dialogue box appears, transfer one of the variables into the Row(s): box and the other variable
into the Column(s): box (Fig. 8.8).
Fig. 8.8 Chi-square in SPSS

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014

Step 2:

Now click on the SPSS Statistics button. It will be presented with the following Crosstabs: Statistics dialogue
box (Fig. 8.9):

Fig. 8.9 Chi-square in SPSS

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014

Select the chi-square and click continue.

Step 3:

Click the SPSS Cells button. It will be presented with the following Crosstabs Cell Display dialogue box. Select
Observed from the Counts area, and Row, Column and Total from the Percentages area, as shown in Fig. 8.10
and then click the SPSS Continue button. This leads to generate the output (Fig. 8.10).
Fig. 8.10 Chi-square in SPSS

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014

8.5.3 Example – Chi-Square Test

Consider the example of type of job (full time, part time, no job) that male and female does. Table 8.9 shows the
initial record.

Table 8.9 Chi-square example database

Step 1:

Add numbers across columns and rows and calculate total number in chart. Now calculate expected numbers for
each individual cell. For example, using the first cell in Table 8.9, we get the expected number as,

Step 2:

Now modify the table by entering the expected and observed values (Table 8.10).

Step 3:

Now calculate chi-square using the following formula,


Table 8.10 Chi-square modified database

So, for cell 1, we have,

Continue doing this for the rest of the cells, and add the final numbers for each cell together for the final chi-
square number (final number = 0.0952). Now, calculate the degree of freedom,

(Number of rows − 1) * (Number of columns − 1) = 2 df.

Step 4:

At 0.05 significance level, with 2 df, the number in the lookup chart of chi-square test should be 5.99. Therefore,
in order to reject the null hypothesis, the final answer to the chi-square must be greater or equal to 5.99. The chi-
square/final answer found was 0.0952. This number is less than 5.99, so you fail to reject the null hypothesis.

8.6 MULTI-VARIATE ANALYSIS

Multivariate Analysis (MVA) is based on the statistical principle of multivariate statistics, which involves
observation and analysis of more than one statistical outcome variable at a time. In design and analysis, the
technique is used to perform trade studies across multiple dimensions while taking into account the effects of all
variables on the responses of interest. Uses for multivariate analysis include the following:

Design for capability (also known as capability-based design)


Inverse design, where any variable can be treated as an independent variable
Analysis of Alternatives (AoA), the selection of concepts to fulfill a customer’s need
Analysis of concepts with respect to changing scenarios
Identification of critical design drivers and correlations across hierarchical levels

The one-way multivariate analysis of variance (one-way MANOVA) is used to determine whether there are any
differences between independent groups on more than one continuous dependent variable. In this regard, it
differs from a one-way ANOVA, which only measures one dependent variable.

Assumptions while working with multivariate ANOVA

When you choose to analyze your data using a one-way MANOVA, part of the process involves checking to
make sure that the data you want to analyze can actually be analyzed using a one-way MANOVA. You need to
do this because it is only appropriate to use a one-way MANOVA if your data “passes” six assumptions that are
required for a one-way MANOVA to give you a valid result.

1. Two or more dependent variables should be measured at the interval or ratio level (i.e., they are
continuous).
2. Independent variable should consist of two or more categorical, independent groups.
3. It should have an independence of observations, which means that there is no relationship between the
observations in each group or between the groups themselves.
4. It should have an adequate sample size. Although the larger the sample size, the better for MANOVA.
MANOVA needs to have more cases in each group than the number of dependent variables.
5. There should be no univariate or multivariate outliers.
6. It is required to have a linear relationship between each pair of dependent variables for each group of the
independent variable. If the variables are not linearly related, the power of the test is reduced.

8.6.1 Test Procedure in SPSS-MANOVA

Step 1:

Click Analyze > General Linear Model > Multivariate … on the top menu as shown in Fig. 8.11.

Fig. 8.11 MANOVA in SPSS

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014

It will be presented with the Multivariate dialogue box, transfer the independent variable into the Fixed Factor(s)
box and transfer the dependent variables into the Dependent Variables box. We can do this by drag-and-dropping
the variables into their respective boxes or by using the SPSS Right Arrow Button.
Fig. 8.12 MANOVA in SPSS

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014

Click on the SPSS Plots button. We can present the Multivariate: Profile Plots dialogue box. It helps in adding
factors to axis and plots. While clicking the continue button, it comes back to the Multivariate dialogue box.

Step 2:

Click the SPSS Post-Hoc button. It will be presented with the Multivariate: Post Hoc Multiple Comparisons for
Observed dialogue box, as shown in Figs. 8.13 and 8.14.

Fig. 8.13 MANOVA in SPSS

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014

Transfer the independent variables, in to the Post Hoc Tests for: box and select the Tukey checkbox in the Equal
Variances Assumed area. Click the SPSS Continue button and go to options. This will present you with the
Multivariate: Options dialogue box, as shown in Fig. 8.14.

Fig. 8.14 MANOVA in SPSS

Source: SPSS, Basev24, Statistical Package and Interface Package for SPSS, © 2014

Transfer the independent variables, from the Factor(s) and Factor Interactions box into the Display Means for
box. Select the Descriptive statistics, Estimates of effect size and Observed power check boxes in the Display
area. Click the continue button and select OK. This will lead you to generate output.

EXERCISES

1. The pupils at a high school come from three different primary schools. The head teacher wanted to know
whether there were academic differences between the pupils from the three different primary schools. As
such, she randomly selected 20 pupils from School A, 20 pupils from School B and 20 pupils from School
C, and measured their academic performance as assessed by the marks they received for their end-of-year
English and Maths exams. Therefore, the two dependent variables were “English score” and “Maths
score”, whilst the independent variable was “School”, which consisted of three categories: “School A”,
“School B” and “School C”. Which test procedure will be used and why? Also find the validity of the null
hypothesis.
2. Define and differentiate parametric and non-parametric test.
3. Explain, analysis of variance.
4. How one-way ANOVA differs from two-way ANOVA?
5. What are the basic requirements for conducting Mann-Whitney test?
6. In detail, explain the various assumptions that need to be taken care while conducting Kruskal-Wallis test.
7. Define the step by step procedure for carrying out Chi-Square test.
8. In detail, explain the various assumptions that need to be taken care while conducting Chi-Square test.
9. Explain multivariate analysis.
10. How is MANOVA different from ANOVA?
11. When should a researcher adopt parametric test method?
12. In detail, explain the various assumptions that need to be taken care while conducting ANOVA.
13. Explain residual error.
14. Use an appropriate non-parametric procedure to test the null hypothesis that the following sample of size n
= 5 has been drawn from a normal distribution having mean as 100 and standard deviation as 10. Use α =
0.05.

93 97 102 103 105

15. For analysis nominal data, which non-parametric statistics will you use? Why?
chapter 9
MODELS FOR SCIENCE AND BUSINESS
Objectives:

After completing this chapter, you can understand the following:

The definition of algorithmic research and analysis and design of algorithms


Various methods of scientific research
The description of modeling and steps involved in modelling
The definition of simulations and description of types and tools of simulation
The description of industrial research

Scientific research is a broad term which specifies data science and other related analytic research streams. It can
be defined as a systematic, controlled, empirical and critical investigation of hypothetical propositions about the
presumed relations among observed phenomena. Scientific research can be either pure or applied. Pure research
explains the world around us and tries to make us understand how the universe operates. It is about finding out
what is already there without any greater purpose of research than the explanation itself. Applied research might
look for answers to specific questions that help humanity, like, medical research or environmental studies. Such
research generally takes a specific question and tries to find a definitive and comprehensive answer. A scientific
model is one which aims to make a particular part or feature of the world easier to understand, define or simulate
by referencing it to existing or commonly accepted knowledge. For this process, we need to select the attributes,
identify the relevance in a real-life situation and try to model it using a strong mathematical or theoretical
support. The limitations of scientific modelling are emphasized by the fact that models generally are not
complete representations. In the attempt to fully understand an object or system, multiple models, each
representing a part of the object or system, are needed. Collectively the models are able to provide more
complete representation, or at least a more complete understanding, of the real object or system. Business
models are abstract representation of an organization. It can be either conceptual, textual and/or graphical. A
business model thus describes the rationale of how an organization creates, delivers and captures value, in
economic, social, cultural or other contexts. The process of business model construction is a part of business
strategy. This chapter discusses the basic methods/strategies involved in scientific research modelling and
business modelling.

9.1 ALGORITHMIC RESEARCH

Algorithm is a step-by-step procedure to solve a problem. A formal definition is given like as a set of rules that
precisely defines a sequence of operations. Starting from the initial empty space, the algorithm guides the users
to a solution by performing finite well-defined successive states. Representations of algorithms are classed into
three accepted levels: high-level description, implementation description and formal description.

Methodology of problem solving that takes the basic concept of algorithm is termed as algorithmic research. It is
a very straight forward method where well-defined sequences of steps are provided to solve the organizational
problems. This type of research is applicable to government, business and any corporate industry. A variety of
problems such as polynomial or combinatorial can be solved by algorithmic research. In polynomial category,
researchers develop a proper algorithm for optimal solution where a heuristic approach is chosen to solve the
problem in other cases.

Algorithmic research is carried out when problems are precisely stated and are often generic rather than
application specific. Various computational problems such as searching, sorting, shortest path, branch and bound
techniques are solved using algorithmic methodology. Figure 9.1 shows the different types of algorithmic
research problems.
Fig. 9.1 Types of algorithmic research problem

Consider the example of an algorithmic research; temperature, weight and time are usually well known and
defined, with only the exact scale used in definition. If a researcher measures abstract concepts, such as
intelligence, emotions and subjective responses, then a system of measuring numerically needs to be established,
allowing statistical analysis and replication. So there should be some accurate method by which these abstract
terms are changed or mapped to physical concepts. So, a well-defined and structured process must be explained
and conceptualized. During the study of similar problems, algorithmic research is the ones that are used more
common.

There are various advantages of algorithmic research. One of the big reasons of algorithmic research has become
so popular because of the advantages that it holds over manual event decisions. The advantages fall in areas
related to speed, accuracy and reduced costs. Since algorithms are written beforehand and are executed
automatically, the main advantage is speed. The speed at which these trades are made is measured in fractions of
a second, faster than humans can perceive.

Trading with algorithms has the advantage of scanning and executing on multiple indicators at a speed that no
human could do. Since trades can be analyzed and executed faster, more opportunities are available at better
prices. Another advantage of algorithmic trading is accuracy. If a computer is automatically executing a trade,
you get to avoid the pitfalls of accidentally putting in the wrong trade associated with human trades. With
manual entries, it is much more likely to buy the wrong currency pair, or for the wrong amount, compared to a
computer algorithm that has been double checked to make sure that the correct order is entered.

Another advantage of algorithmic research is the ability to backtrack. As each and every step is explained and
well defined, the areas where the error occurred are easily found out. The algorithms are reusable to design new
solutions to another set of problems. Stronger rules can be set after a critical analysis of those problems.

Another advantage of automated trading is the reduced transaction costs. With auto trading, traders do not have
to spend as much time for monitoring the markets, as trades can be executed without their continuous
supervision. The dramatic time reduction for trading lowers transaction costs because of the saved opportunity
cost of constantly monitoring the markets.

9.1.1 Analysis of Algorithm

Analysis of algorithms is the determination of the amount of resources (such as time and storage) necessary to
execute them. Most algorithms are designed to work with inputs of arbitrary length. Usually, the efficiency or
running time of an algorithm is stated as a function relating the input length to the number of steps (time
complexity) or storage locations (space complexity). But the real question is, How can these attributes be
calculated? In theoretical analysis of algorithms, complexities are determined in the asymptotic sense, i.e., to
estimate the complexity function for a large input. Big O notation, Big-omega notation and Big-theta notation
are used for this purpose.

Big O notation is a mathematical notation that describes the limiting behaviour of a function when the argument
tends towards a particular value or infinity.

Let f and g be two functions defined on some subset of numbers,

f(n) = O(g(n)) as n → ∞; if and only if there is a positive constant M such that for all sufficiently large values of
n, the absolute value of f(n) is at most M multiplied by the absolute value of g(n). That is, f(n) = O(g(n)), if and
only if there exists a positive real number M and a real number n0 such that

| f(n) | ≤ M | g(n) |, for all n ≥ x0

Figure 9.2 shows the Big O function graph.

What is the importance of analysis? If we want to compare two different algorithms which perform the same
task, there need to be some differentiators. The time and space complexities are the major differentiators that a
programmer selects to find the quality of the algorithm. For example, if we are looking for an algorithm for
sorting n numbers, there are lots of algorithms available – selection sort, bubble sort, quick sort, heap sort, etc.
But which algorithm will we use? Here the programmer checks the best, average and worst time complexities of
these algorithms and compares them. In which type of input does a particular sorting algorithm works well. And
depending on that he/she chooses the algorithm.

Fig. 9.2 Big O function

9.1.2 Design of Algorithms

Algorithm design is a specific method to create a mathematical process in solving problems. It is identified and
incorporated into many solution theories of operation research, such as dynamic programming and divide-and-
conquer. In this section, let us look how to design an algorithm.

Designing the right algorithm for a given application is a difficult job. It requires a major creative act, taking a
problem and pulling a solution out of it. The key to algorithm design is to proceed by asking yourself a sequence
of questions to guide your thought process. What if we do this? What if we do that? While working in a group or
as a team, make sure you do enough brain storming sessions to come up with the best solution to all your
questions. After this brain storming, there will be enough directions to guide how to move forward. The strategy
and the tactics used for algorithm design need to be dealt very carefully. Should we use bottom-up or top-down
strategy? Should we incorporate dynamic programming? All these questions need to find answer during this
brain storming.

Algorithm developed for all applications needs to have three major qualities. Prime need is that the algorithm
should deliver the accurate results. The next major feature that the designer needs to focus is on the amount of
time that the algorithm utilized to finish the process. Memory consumption is also a major focus; a best
algorithm will be the one which delivers accurate results in least time utilizing less memory.

The basic questions that a designer has to find answer before starting algorithm design are given as follows.

1. Have you understood the correct goal?


What are the inputs?
What are the desired outputs?
Is there any intermediate result?
Is it generic or specific?
What are the time constraints?
Where is it applied?
2. Is there a more simpler or heuristic solution available?
Is there any other recursive method to make my solution simpler?
Can I make my logic, modular?
3. Spot the special cases.
What are the deviations possible?
How good will the algorithm work in extreme inputs?
Maximum load possible?
4. Is there any standard methods available for the one I am working with?
If I split the problem, is there any methods already defined to solve subproblems?
What are the general rules in solving generic algorithms?

If you start analyzing such a set of questions before designing an algorithm, the process will be smooth and an
efficient algorithm will be obtained.

9.2 METHODS OF SCIENTIFIC RESEARCH

The scientific method is a way to ask and answer scientific questions by making observations and doing
experiments. The following are the steps of the scientific method:

Ask a question
Do background research
Construct a hypothesis
Test the hypothesis by doing an experiment
Analyze the data and draw a conclusion
Communicate the results

It is important for the experiment to be a fair test. A “fair test” occurs when you change only one factor
(variable) and keep all other conditions the same. The scientific method is a process for experimentation that is
used to explore observations and answer questions. Scientists use the scientific method to search for cause and
effect relationships in nature. In other words, they design an experiment so that changes to one item cause
something else to vary in a predictable way. Just as it does for a professional scientist, the scientific method will
help to focus on a project question, construct a hypothesis, and design, execute and evaluate the experiment.

Figure 9.3 shows the steps of scientific research.

Ask a question
The scientific method starts when a question is asked about observation: How, What, When, Who, Which, Why
or Where? In order the scientific method to answer a question, it must be about something that is measurable,
preferably with a number.

Do background research

Rather than starting from scratch in putting together a plan for answering a question, you need to be a savvy
scientist using library and Internet research that helps to find the best way to do things and ensure that you do
not repeat mistakes from the past.

Construct a hypothesis

A hypothesis is an educated guess about how things work:

Fig. 9.3 Steps of scientific research

“If …. [I do this] …., then …. [this] …. will happen.”

It must state the hypothesis in a way that it can easily measure, and of course, the hypothesis should be
constructed in a way to help the answer of the original question.

Test the hypothesis by doing an experiment

The experiment tests whether the hypothesis is supported or not. It is important for the experiment to be a fair
test. Conduct a fair test by making sure that you change only one factor at a time while keeping all other
conditions the same. You should also repeat the experiments several times to make sure that the first results were
not just an accident.
Analyze the data and draw a conclusion

Once the experiment is complete, collect the measurements and analyze them to see if they support the
hypothesis or not. Scientists often find that their hypothesis was not supported, and in such cases they will
construct a new hypothesis based on the information they learned during their experiment. This starts the entire
process of the scientific method over again. Even if they find that their hypothesis was supported, they may want
to test it again in a new way.

Communicate the results

To complete the research process, it is essential to communicate the results to others in a final report and/or a
display board. Professional scientists do almost exactly the same thing by publishing their final report in a
scientific journal or by presenting their results on a poster in a scientific meeting.

The whole process is collaborative and is conducted in a clearly documented manner to help other scientists who
are doing research in the same field. Throughout history, there are instances where scientists have stopped their
research before completing all the steps of the scientific method, only to have the enquiry taken up and solved
by another scientist interested in answering the same question.

The basic correlation of a real-life data collection to scientific method is shown in Fig. 9.4.

Fig. 9.4 Glass hour

9.3 MODELLING

Scientific modelling is a scientific activity, the aim of which is to make a particular part or feature of the world
easier to understand, define, quantify, visualize or simulate by referencing it to existing and usually commonly
accepted knowledge. It requires selecting and identifying relevant aspects of a situation in the real world and
then using different types of models for different aims, such as conceptual models to better understand,
operational models to operationalize, mathematical models to quantify and graphical models to visualize the
subject. Modelling is an essential and inseparable part of scientific activity, and many scientific disciplines have
their own ideas about specific types of modelling

Models are typically used when it is either impossible or impractical to create experimental conditions in which
scientists can directly measure outcomes. Direct measurement of outcomes under controlled conditions will
always be more reliable than modelled estimates of outcomes.

1. Simulation: A simulation is the implementation of a model. A steady-state simulation provides


information about the system at a specific instant in time. A dynamic simulation provides information
over time. A simulation brings a model to life and shows how a particular object or phenomenon will
behave. Such a simulation can be useful for testing, analyzing or training in those cases where real-world
systems or concepts can be represented by models.
2. Structure: Structure is a fundamental and sometimes intangible notion covering the recognition,
observation, nature, and stability of patterns and relationships of entities. From a child’s verbal description
of a snowflake, to the detailed scientific analysis of the properties of magnetic fields, the concept of
structure is an essential foundation of nearly every mode of enquiry and discovery in science, philosophy
and art.
3. Systems: A system is a set of interacting or interdependent entities, real or abstract, forming an integrated
representation. In general, a system is a construct or collection of different elements that together can
produce results not obtainable by the elements alone. The concept of an “integrated whole” can also be
stated in terms of system embodying a set of relationships which are differentiated from relationships of
the set to other elements and from the relationships between an element of the set and elements not a part
of the relational regime. There are two types of system models: (1) discrete in which the variables change
instantaneously at separate points in time and (2) continuous where the state variables change
continuously with respect to time.

9.3.1 Steps in Modelling

The basic steps of the model-building process are as follows:

1. Model selection
2. Model fitting
3. Model validation

These three basic steps are used iteratively until an appropriate model for the data has been developed. In the
model selection step, plots of the data, process knowledge and assumptions about the process are used to
determine the form of the model to be fit to the data. Then, using the selected model and possibly information
about the data, an appropriate model-fitting method is used to estimate the unknown parameters in the model.
When the parameter estimates have been made, the model is then carefully assessed to see if the underlying
assumptions of the analysis appear plausible. If the assumptions seem valid, the model can be used to answer the
scientific or engineering questions that prompted the modelling effort. If the model validation identifies
problems with the current model, however, the modelling process is repeated using information from the model
validation step to select and/or fit an improved model.

The three basic steps of process modelling described above assume that the data have already been collected and
that the same dataset can be used to fit all of the candidate models. Although this is often the case in model-
building situations, one variation on the basic model-building sequence comes up when additional data are
needed to fit a newly hypothesized model based on a model fit to the initial data. In this case, two additional
steps, experimental design and data collection, can be added to the basic sequence between model selection and
model fitting. The flow chart shown in Fig. 9.5 gives basic model-fitting sequence with the integration of the
related data collection steps into the model-building process.
Fig. 9.5 Model-fitting sequence

9.3.2 Research Models

Various business research companies use different types of tools, techniques and methods for analysis and
building models. These research methods and types vastly depend on the particular requirements of their
research project. However, the following are the basic types of industry analysis that are commonly used by
various study firms across the globe.

1. Quantitative analysis: This method deals with collecting all the objective and numerical data from
various resources. Various statistical models and formulae help the experts to collect market data about the
features. Questionnaire is the basic tool, provides adequate information about customer behaviour and
their approach towards a particular product or a company. Compiling complete statistical investigation is
the basic aim of quantitative analysis. Hence, the questions are also in objective sorts that draw yes and no
responses from the customers chosen for the tests.
2. Qualitative analysis: Qualitative analysis is exactly opposite to quantitative study. It is thoroughly
subjective and deals with the market data, which can be stored in the form of words and visual
presentations. In this study, experts observe customers’ record and analyze their responses in the form of
answers and queries. These responses include various answers to open-ended questions, their overall
behaviour and results of various tests performed on them. Qualitative examination mainly depends on case
studies, which help the experts to collect the required market data.
3. Observations: This is the basic weapon at the disposal of the experts undertaking the projects.
Observational research plays a vital role in collecting crucial market data, which all the other methods
cannot do. It is the method of collecting valuable data without any interference or inputs from the experts.
They plainly observe the customers and their behaviour and then carry out reports based on these
observations. They record the feedback and complaints of the customers based on their preferences and
suggest improvements.
4. Experiments: Business analysis based on experiments helps the researchers to change the set parameters
and observe the results according to these changes. These projects generally take place in dedicated
laboratories, but can also be performed at other places. This technique helps the experts understand
various aspects and conditions that affect the behaviour of target customers.
5. Basic research: Basic study concentrates on collecting all the basic things that are crucial yet unknown
for the business or product.
6. Applied research: This study helps in understanding of answers to all crucial issues and problems
troubling the business.
7. Developmental research: This study is very similar to applied analysis. However, it is mainly focused on
using known solutions for product improvements and new business ventures.

9.4 SIMULATIONS

Simulation is imitation of the operation of a real-world process or system over time. The act of simulating
something first requires a model development. This model represents the key characteristics or
behaviour/functions of the selected physical or abstract system or process. The model represents the system
itself, whereas the simulation represents the operation of the system over time.

Simulation is used in many contexts, such as simulation of technology for performance optimization, safety
engineering, testing, training, education and video games. Often, computer experiments are used to study
simulation models. Simulation is also used with scientific modelling of natural systems or human systems to
gain insight into their functioning. Simulation can be used to show the eventual real effects of alternative
conditions and courses of action. Simulation is also used when the real system cannot be engaged, because it
may not be accessible, or it may be dangerous or unacceptable to engage, or it is being designed but not yet
built, or it may simply not exist.

Key issues in simulation include acquisition of valid source information about the relevant selection of key
characteristics and behaviour, the use of simplifying approximations and assumptions within the simulation, and
fidelity and validity of the simulation outcomes. Procedures and protocols for model verification and validation
are an ongoing field of academic study, refinement, research and development in simulations technology or
practice, particularly in the field of computer simulation.

A computer simulation (or “sim”) is an attempt to model a real-life or hypothetical situation on a computer so
that it can be studied to see how the system works. By changing variables in the simulation, predictions may be
made about the behaviour of the system. It is a tool to virtually investigate the behaviour of the system under
study.

Traditionally, the formal modelling of systems has been via a mathematical model, which attempts to find
analytical solutions enabling the prediction of the behaviour of the system from a set of parameters and initial
conditions. Computer simulation is often used as an adjunct to, or substitution for, modelling systems in which
simple closed form analytic solutions are not possible. There are many different types of computer simulation;
the common feature they share is to generate a sample of representative scenarios for a model in which a
complete enumeration of all possible states would be prohibitive or impossible.

Medical simulators are increasingly being developed and deployed to teach therapeutic and diagnostic
procedures as well as medical concepts and decision making to personnel in the health professions. Simulators
have been developed for training procedures ranging from the basics such as blood draw, to laparoscopic surgery
and trauma care. They are also important to help on prototyping new devices for biomedical engineering
problems. Currently, simulators are applied to research and develop tools for new therapies, treatments and early
diagnosis in medicine.

9.4.1 Types of Simulation Models

Based on the application development research, there are many simulation models.
1. Active models: Active models that attempt to reproduce living anatomy or physiology are recent
developments. The famous “Harvey” mannequin was developed at the University of Miami and is able to
recreate many of the physical findings of the cardiology examination, including palpation, auscultation
and electrocardiography.
2. Interactive model: More recently, interactive models have been developed that respond to actions taken
by a student or physician. Until recently, these simulations were two-dimensional computer programs that
acted more like a textbook than a patient. Computer simulations have the advantage of allowing a student
to make judgements and also to make errors. The process of iterative learning through assessment,
evaluation, decision making and error correction creates a much stronger learning environment than
passive instruction.
3. Computer simulators: 3DiTeams learner is percussing the patient’s chest in virtual field hospital.
Simulators have been proposed as an ideal tool for assessment of students for clinical skills. For patients,
“cyber therapy” is used for sessions simulating traumatic experiences, from fear of heights to social
anxiety.

Programmed patients and simulated clinical situations, including mock disaster drills, have been used
extensively for education and evaluation. These “lifelike” simulations are expensive, and lack reproducibility. A
fully functional “3Di” simulator would be the most specific tool available for teaching and measurement of
clinical skills. Gaming platforms have been applied to create these virtual medical environments to create an
interactive method for learning and application of information in a clinical context.

Immersive disease state simulations allow a doctor or HCP to experience what a disease actually feels. Using
sensors and transducers, symptomatic effects can be delivered to a participant allowing them to experience the
patient’s disease state.

Such a simulator meets the goals of an objective and standardized examination for clinical competence. This
system is superior to examinations that use “standard patients” because it permits the quantitative measurement
of competence, as well as reproducing the same objective findings.

9.4.2 Tools for Simulations

Because simulation is such a powerful tool to assist in understanding complex systems and to support decision
making, a wide variety of approaches and tools exist. Many special-purpose simulators exist to simulate very
specific types of systems. For example, tools exist for simulating the movement of water (and contaminants) in
an estuary, the evolution of a galaxy, or the exchange rates for a set of currencies. The key attribute of these tools
is highly specialized to solve a particular type of problem. In many cases, these tools require great subject-matter
expertise to use. In other cases, the system being simulated may be so highly specified that using the tools is
quite simple. That is, the user is presented with a very limited number of options. Other tools are not specialized
to a particular type of problem. Rather, they are “tool kits” or general-purpose frameworks for simulating a wide
variety of systems. There are a variety of such tools, each tailored for a specific type of problem. What they all
have in common, is that they allow the user to model how a system might evolve or change over time. Such
frameworks can be thought of as high-level programming languages that allow the user to simulate many
different kinds of systems in a flexible way.

Perhaps the simplest and most broadly used general-purpose simulator is the spread sheet. Although spread
sheets are inherently limited by their structure in many ways because of their ubiquity, they are very widely used
for simple simulation projects (particularly in the business world) (e.g., representing complex dynamic processes
is difficult, they cannot display the model structure graphically, and they require special add-ins to represent
uncertainty).

Other general-purpose tools exist that are able to represent complex dynamics, as well as provide a graphical
mechanism for viewing the model structure (e.g., an influence diagram or flow chart of some type). Although
these tools are generally harder to learn to use than spread sheets (and are typically more expensive), these
advantages allow them to realistically simulate larger and more complex systems than can be done in a spread
sheet. The general-purpose tools can be broadly categorized as follows:

1. Discrete event simulators: These tools rely on a transaction-flow approach to modelling systems. Models
consist of entities (units of traffic), resources (elements that service entities) and control elements
(elements that determine the states of the entities and resources). Discrete simulators are generally
designed for simulating processes such as call centres, factory operations and shipping facilities in which
the material or information that is being simulated can be described as moving in discrete steps or packets.
They are not meant to model the movement of continuous material (e.g., water) or represent continuous
systems that are represented by differential equations.
2. Agent-based simulators: This is a special class of discrete event simulator in which the mobile entities
are known as agents. Whereas in a traditional discrete event model, the entities only have attributes. The
agents have both attributes and methods (e.g., rules for interacting with other agents). (Properties that may
control how they interact with various resources or control elements), an agent-based model could
simulate the behaviour of a population of animals that interact with each other.
3. Continuous simulators: This class of tools solves differential equations that describe the evolution of a
system using continuous equations. These types of simulators are most appropriate if the material or
information that is being simulated can be described as evolving or moving smoothly and continuously,
rather than in infrequent discrete steps or packets. For example, simulation of the movement of water
through a series of reservoirs and pipes can most appropriately be represented by a continuous simulator.
Continuous simulators can also be used to simulate systems consisting of discrete entities if the number of
entities is large so that the movement can be treated as a flow. A common class of continuous simulators
are system dynamics tools, based on the standard stock and flow approach developed by Professor Jay W.
Forrester at MIT in the early 1960s.
4. Hybrid simulators: These tools combine the features of continuous simulators and discrete simulators.
That is, they solve differential equations, but can superimpose discrete events on the continuously varying
system. Gold Sim is a hybrid simulator

9.5 INDUSTRIAL RESEARCH

Industrial research means the planned research or critical investigation aimed at the acquisition of new
knowledge and skills for developing new products, processes or services or for bringing about a significant
improvement in existing products, processes or services. It comprises the creation of components of complex
systems, which is necessary for the industrial research, notably for generic technology validation, to the
exclusion of prototypes. Industrial research led to a semantic innovation, the addition of “development” to
“research”, thereby coining the new term of R&D. Before the beginning of the present century, people spoken
about science, investigation and enquiry. Research as a term generalized after being used regularly by industries
where science as often a contested term then applied to industry, the industrial research actually made the
boundary between pure and applied research almost invisible. R&D is a component of innovation and is situated
at front end of the innovation lifecycle. Innovation builds on R&D and includes commercialization phases. The
activities that are classified as R&D differ from company to company, but there are two primary models, with an
R&D department being either staffed by engineers and tasked with directly developing new products, or staffed
with industrial scientists and tasked with applied research in scientific or technological fields which may
facilitate future product development. In either case, R&D differs from the vast majority of corporate activities
that is not often intended to yield immediate profit, and generally carries greater risk and an uncertain return on
investment. A system driven by marketing is one that puts the customer needs first, and only produces goods
that are known to sell. Market research is carried out, which establishes what is required. If the development is
technology driven, then R&D is directed towards developing products that market research indicates will meet
an unmet need. In general, R&D activities are conducted by specialized units or centres belonging to a company,
or can be outsourced to a contract research organization, universities or state agencies. In the context of
commerce, “research and development” normally refers to future-oriented, longer-term activities in science or
technology, using similar techniques to scientific research but directed towards desired outcomes and with broad
forecasts of commercial yield. Many a times, industrial research and operational research are used side by side.
It is because every new industrial product needs to have a strong mathematical support for the public to accept,
let it be a model or a product. So operational research also has a major impact on industrial research activities.

9.5.1 Operational Research

According to the Operations Research Society of America, “Operations research is concerned with scientifically
deciding how to best design and operate man-machine systems, usually under conditions requiring the allocation
of scarce resources.” No matter how operations research is defined, the construction and use of models are at its
core. Models are representations of real systems. They can be iconic (made to look like the real system), abstract
or somewhere in between. Iconic models can be full-scale, scaled-down or scaled-up in size. Sawmill heading
control simulators are full-scale models. A model of the solar system is a scaled-down model, and a teaching
model of a wood cell or a water molecule is a scaled-up model. Regardless of the type of model used, operations
research approach comprises the following seven sequential steps: (1) Orientation, (2) Problem definition, (3)
Data collection, (4) Model formulation, (5) Solution, (6) Model validation and Output analysis, and (7)
Implementation and monitoring. Figure 9.6 shows this schematically.

Fig. 9.6 Steps of operations research

1. Orientation: The first step in the operations research approach is referred to as problem orientation. The
primary objective of this step is to constitute the team that will address the problem at hand and ensure
that all its members have a clear picture of the relevant issues. Typically, the team will have a leader and
be constituted of members from various functional areas or departments that will be affected by or have an
effect upon the problem at hand. In the orientation phase, the team typically meets several times to discuss
all of the issues involved and to arrive at a focus on the critical ones. This phase also involves a study of
documents and literature relevant to the problem in order to determine if others have encountered the same
(or similar) problem in the past, and if so, to determine and evaluate what was done to address the
problem. The aim of the orientation phase is to obtain a clear understanding of the problem and its
relationship to different operational aspects of the system, and to arrive at a consensus on what should be
the primary focus of the project.
2. Problem definition: This is the second, and in a significant number of cases, the most difficult step of the
operations research process. The objective here is to further refine the deliberations from the orientation
phase to the point where there is a clear definition of the problem in terms of its scope and the results
desired. This phase should not be confused with the previous one since it is much more focussed and goal
oriented; however, a clear orientation aids immeasurably in obtaining this focus.
3. Data collection: In the third phase of the operations research process, data is collected with the objective
of translating the problem defined in the second phase into a model that can then be objectively analyzed.
Data typically comes from two sources – observation and standards. The first corresponds to the case
where data are actually collected by observing the system in operation and typically, this data tend to
derive from the technology of the system. Other data are obtained by using standards; a lot of cost-related
information tends to fall into this category. For instance, most companies have standard values for cost
items such as hourly wage rates, inventory holding charges, selling prices, etc. These standards must be
consolidated appropriately to compute costs of various activities.
4. Model formulation: This is the fourth phase of the operations research process. It is also a phase that
deserves a lot of attention since modelling is a defining characteristic of all operations research projects.
The term “model” is misunderstood by many, and is therefore explained in some detail here. A model is
defined formally as a selective abstraction of reality. This definition implies that modelling is the process
of capturing selected characteristics of a system or a process and then combining these into an abstract
representation of the original. The main idea is usually far easier to analyze a simplified model than it is to
analyze the original system, and as long as the model is a reasonably accurate representation, conclusions
drawn from such an analysis may be validly extrapolated back to the original system. Models may be
broadly classified into four categories: physical models, analogic models, computer simulation models and
mathematical models. Amongst these which is the best model – a simple model or complex? A simple
model is better than a complex one as long as it works as well. A model only needs to perform its intended
function to be valid. It should be easy to understand. It is important to use the most relevant operations
research tool when constructing a model. A modeller should not try to shape the problem to fit a particular
operations research method. For example, a linear programming (LP) expert may try to use LP on a
problem where there is no optimal solution. Instead, modellers should study the problem and choose the
most appropriate operations research tool. For complicated systems, users need to remember that models
are only simplified representations. If a user mistakenly considers a complicated model to be correct, he or
she may disregard further study of the real system. Modellers and users of models never should rely only
on a model’s output and ignore the real system being modelled. A good model should be easy to modify
and update. New information from the real system can be incorporated easily into a well-planned model.
A good model usually starts out simple and becomes more complex as the modeller attempts to expand it
enough to give meaningful answers.
5. Model solution: The fifth phase of the operations research process is the solution of the problem
represented by the model. This is the area on which a huge amount of research and development in
operations research has been focussed.
6. Validation and analysis: Once a solution has been obtained, two things need to be done before one even
considers developing a final policy or course of action for implementation. The first is to verify that the
solution itself makes sense and do a detailed analysis to check whether the model is in scope for all its
solutions.
7. Implementation and monitoring: The last step in the operations research process is to implement the
final recommendation and establish control over it. Implementation entails the constitution of a team
whose leadership will consist of some of the members on the original operations research team. This team
is typically responsible for the development of operating procedures or manuals and a time table for
putting the plan into effect. Once implementation is complete, responsibility for monitoring the system is
usually turned over to an operating team. From an operations research perspective, the primary
responsibility of the latter is to recognize that the implemented results are valid only as long as the
operating environment is unchanged and the assumptions made by the study remain valid.

EXERCISES
1. Explain the steps in business modelling.
2. How can we simulate a product which has a potential influence in society? Explain the various modes.
3. How is OR different from BR?
4. Explain any five advantages of simulation.
5. What are the various methods employed in scientific research?
6. Justify the statement, “research happens more in industries than in laboratories”.
7. How can the business head motivate his team?
8. What are the various simulation tools?
9. Explain the various research tools.
10. What are the various steps in designing an algorithm?
chapter 10
SOCIAL RESEARCH
Objectives:

After completing this chapter, you can understand the following:

The definition of theory of social research and characteristics, scope and objectives of social research
The description of perspectives of social research
Various methods of social research
The description of social science approaches and its types
The design of social research
The description of quantitative and qualitative social research and its methods and comparison
The detailed explanation of ethics and politics in social research

Research is the core aspect of all social sciences. Social sciences are a very important area of interest to human
beings, because the multitude impact of results is obtained from each social research. It paves a way to create a
more humane society in which human beings can interact in a more refined and tasteful manner. According to
Albert Einstein, “Politics is more difficult than physics and the world is more likely to die from bad politics than
from bad physics”.

Human culture is highly inter-dependent with human society. One thing in life can and will lead on to another; a
single mistake may cause many problems in social environment. Today, social research is very vast and
complex. It can be divided into subfields such as anthropology, sociology, psychology, economics, political
science, geography, history, education, demography and so on (Fig. 10.1). Research activities are needed in the
social science subjects for creating new knowledge in these fields.

Some of the basic definitions from sociologist are as follows:

According to C.A. Moser: “Social research is a systematized investigation to gain new knowledge about social
phenomenon and problems.”

According to P.V. Young: “Social research is a scientific undertaking which by means of logical methods, aim to
discover new facts or old facts and to analyse their sequences, interrelationships, casual explanations and natural
laws which govern them.”

Fig. 10.1 Various fields in social research

10.1 THEORY OF SOCIAL RESEARCH

Social science research is a systematic method that explores human life to extend the human knowledge and
behaviour. The research clarifies doubts by seeking explanations in unexplained area of interest. There are
scientific methods to understand social life in order to correct or verify the knowledge of the system. Human
behaviour bounds to certain laws and values. One of the major purposes of social research is to discover those
laws and gives proper guidelines to human.

10.1.1 Social Research Characteristics

The following are the characteristics of social research:

A direct solution to the problem is given with an ultimate goal that discovers cause-and-effect relationship
between social problems.
To predict future occurrences, a generalized principles or theories are emphasized.
The observable experience or empirical evidence is basic core of the research.
Social research stress precise observations and description.
Social researchers can choose non-qualitative description of observations that they had prepared.
New data can be gathered from primary source or use existence data for new purpose.
The activities in social research may be random and unsystematic. It is carefully designed procedure that
can be applied for analysis.
A variety of expertise is required in the social research. Researcher must be aware of the problem, what is
already known and how others investigated the same problems.
Researcher must strive to apply every possible test to validate the procedure employed, data collected and
conclusion reached.
Researcher must involve to answer unsolved problems.
Researcher must be patient and unhurried activity. They can expect dissatisfaction, disappointment and
discouragement while facing the difficult questions.
A careful reporting and recoding is very much essential. Definition of terms, variables, theories and
procedures are important and should be documented in detail that will be helpful in drawing conclusions.
Presenting and writing the report may be helpful while careful documentation has been done from the
beginning of the research work.
Most of the time, social researchers are seen interdisciplinary and need sometimes courage in selecting
topics.

10.1.2 Scope of Social Research

A question always arises in our mind “What is the scope of social research? The basic scope of social research is
clear that it is a scientific tool to study and analyze social problems with certain values. Scope of social research
includes correct understanding of the nature of social events, processes and thoughts.

1. Knowledge formation

Generally, a corpus of knowledge is the output of any research. The social researchers also generate new
knowledge. The new knowledge helps to bridge the gap between ignorance and knowledge. One major
way by which knowledge can be obtained is through observation and experience. For example, we know
that the traffic will be on peak during the office hours, morning and evening. This knowledge comes
merely from the experience and observation. Similarly, all sorts of research draw to a conclusion, which is
knowledge.

2. Study of social problem

Studying social problem generates knowledge. When knowledge is created, it eradicates the disbeliefs and
provides a screen of logical reasoning with facts. For example, there are various superstitious beliefs such
as the number 13 is unlucky and many apartments and hotels omit the 13th floor, and some planes do have
13th row. But the logic behind this was unknown to many. But it required many historical and religious
social researchers to find an answer to this. The most accepted belief is the one put forward by Christian
religious researchers that there were 13 guests at the Last Supper of Jesus Christ and thus they believed 13
will bring evil luck. Acceptance of this fact is truly up to the readers, but the facts put forward was as a
result of their research.

Thus, the study on social problems removes the curtain of ignorance. The key to the solution of social
problems is their accurate and unbiased analysis and thereby understanding the causal factors responsible
for them.

3. Theory and policy making

The knowledge generated from the study of social problems can be extended to make theoretical and
practical theories and policies. There will be many elements obtained during a particular study. The
researchers may find some interesting link between them and will generate a theory. All sorts of theories
thus created will not be acceptable; they need to be proved with some logic and a strong background. If
such background exists, then it can be used to create valid policies for future references.

As Karl Jespers said, “It is only when using methodologically classified sciences that we know what we
know & what we do not know”. This way, theory constitutes a crucially important guide to design fruitful
research.

10.1.3 Objectives of Social Research

Social research focusses a variety of fields and subfields. The major objective of social research is human
understanding. We can broadly specify the objectives as follows:

Knowledge gathering about social phenomena, events, issues and problems.


Functional relationship identification in the social environment.
Natural law identification in the social phenomena that represent social behaviour.
Standardization of social concepts, culture, struggle, generation gap, social distance, social ethics, etc.
Identify solutions to social problem.
Optimized concept of social tension, misconception, etc.
Revival plan making with respect to social problems.

10.2 PERSPECTIVES OF SOCIAL RESEARCH

Social phenomena are analyzed by the sociologist in different levels and with different view. To obtain a
generalization of society and social behaviour, they study things in micro and macro levels. “A big picture” is
given in macro analysis while a small social pattern will be the outcome of a micro-level analysis. With micro-
level analysis, the symbolic interactions such as use of symbols and face-to-face interactions are done. In micro
level, the relationship between the parts of society and the functional aspect of the society is taken into account.
The conflict theory from various resources is taken from a wide range of sociological patterns. Sociological
perspectives are given in Fig. 10.2.

Symbolic interaction

The symbolic interaction perspective directs a sociologist to consider a detailed study of the social symbols like
how people interact, what change in the society brings a considerable change in the life style, etc. For example,
in music, the musical notes will be given in some symbols such as dark dotes or dark lines of some particular
shape. This may be seen as a musical note for those people who know that particular musical knowledge. Others
may find it very difficult to find and seek the meaning.
Fig. 10.2 Sociological perspectives

Case Study

How to apply symbolic interaction to Christian marriage functions? In this case, symbols may include wedding
bands, vows of life-long commitments, music and flowers, white bridal dress, wedding cake, Christian
ceremony, etc.

People get attached to these symbols but individuals maintain their own perceptions. For example, the spouses
think about the ring as a never-ending symbol of love, while others may think about the financial expense and so
on.

Functionalism

In functionalism, the society is seen as independent and contributes to the society as a whole. The education is a
provision by the state government to the children, which is a functional perspective. Each family depends on
schools to help their children to learn and make them socially fit and to latter support their family. The
functionalism depends on the functional blocks. For example, during the financial recession with higher
unemployment and inflation families cut down the expenses.

Conflict theory

The conflict perspective originated on the class struggles between the functionalist and the symbolic interaction
perspectives. The society contributes to its stability through a negative conflict perspective. The conflict theory
perspective sees social life as a competition with explanation of social changes. According to Karl Marx,
societies reveal natural sources of conflict and tension.

Case Study

The Ajanta and Ellora Caves in Aurangabad district of Maharashtra have about 60 rock-cut Buddhist cave
monuments, which date from the 2nd century BCE to about 480 or 650 CE. The caves include paintings and
sculptures. The social researchers who visit the caves will have different perspectives. Someone who does
research in ancient paintings will be curious on the type of materials used for painting, how well the spread was
done and so on. But from the view point of an archaeologist, he/she will be interested in examining the caves,
old objects to study the past culture. It will be the kind of financial benefit that the government makes from these
will be of top priority to some economist who visits the area. Thus, perspectives differ but ultimately knowledge
is produced in one or the other way.

10.2.1 Complementary Perspectives


The theoretical paradigms that were used by the sociologist have different models to describe and understand
human behaviour. A popular and understandable paradigm focussed to social research is given in Fig. 10.3.

Fig. 10.3 Complementary perspectives

These perspectives are limited in the ability to describe the society and its behaviour. In animal kingdom,
humans play an important role. The cross-species perspective considers similarities and differences of human
behaviour in the society. A valuable insight is given into the nature of human society.

The cross-cultural perspective addresses cultural differences and issues within the society of human beings.
Research and investigations determine that the practice, beliefs and values are considerably from different
cultures. For example, cultural differences between Africans and Europeans. The comparative study in standard
behaviour within a system is a major concern of cross-cultural perspective.

In statistical perspective, frequency of occurrences of an attribute or practice of the society is seen. A statistical
measurement can be fit into these studies. For example, surveys are conducted in societies like average member
characteristics.

The historical perspective deals with social issues from a historical aspect, values and contexts. In many
complex issues, the easier backlog can be examined and can be analyzed its roles played in the history.

Human beings are religious and considered spirituality on individual and society. They are organized in teaching
morals, set values based on the religious environment.

Feminist perspective is concerned with gender differences and with selected limitations associated with it. The
major theory is associated with male dominant society. Feminist claims that their insight is equal to that of
males.

10.3 METHODS OF SOCIAL RESEARCH

Basic methods of scientific and engineering research are systematic and procedure oriented. In social research,
the procedures are more important than in physical or in natural sciences. The social scientists observe, classify
and analyze the facts and make some generalizations and then develop a test hypothesis to explain how these
generalizations are made. Compared to physical scientists, the problems here are more difficult. Such problems
are very difficult to interpret and classify. Also, the generalizations made or developing of laws by a social
scientist is less definite than a physical scientist. There are many difficulties in discovering exact laws, some of
which are given as follows:

1. Important things in social life such as satisfaction, social progress, democracy and so on cannot be
measured.
2. The society we live in is very complex. So it is very difficult and almost impossible to evaluate.
3. A human element (leader) is involved in all social issues. So, the issues and impacts cannot be predicted.

For example, a physical scientist may discover a new source of electricity. He/She may develop a new invention
to make electricity or a new equipment that works in electrical power. It is the social scientist who identifies the
impact of this invention in the society. He/She may conduct studies in different societies about the impact of this
newly developed equipment or procedure. Both researches are different but equally important. Compared to the
first, the second one may prove more difficult.

The social scientist is far fewer in number than the natural/physical scientist because they are forced to discard
their likes, dislikes, sympathies and views in the name of work. The signature of research is difficult for a social
scientist when compared to a natural scientist. Also, there is no ideal structure in conducting a research in the
area of social sciences. A reasonable approach to a social science problem is given in the following:

1. Observation
2. Problem definition
3. Literature review
4. More on observation
5. Theoretical framework definition and hypothesis formation
6. Research design selection
7. Data collection
8. Result analysis and discussion
9. Conclusion/suggestion

By following the above given methods, social research can extract the methodology contained in each field in
detail. Observation is the understanding of the real world. The observations of socially committed problems can
identify areas where future research is needed. In the problem definition, the terms that are used in research
work should be carefully defined. This will save much energy and time. Selection of the topic may raise some
fundamental issues that could find some values or tips to research. Literature review is a major task for every
social researcher. A proper literature review provides a knowledgeable background and suggests what was
already covered and served for redesign. In the next step, a framework can be formulated for predicting the
result. The framework may be a theory that is based on methodology which can help in formulating a
hypothesis. If the predicted result and clarification terms are within the framework of research, there may be a
scope of discussion. For example, suppose a hypothesis is “high price interests sales on fashionable dress”. How
do you specify what “high” is? How do you compare specification of price? Do we have low-priced fashionable
dress available in the market? What you mean by fashionable? Such questions need to be answered by different
researchers. These terms are different with respect to each researcher.

Research design is an important step for a social researcher. Selection of correct design for data collection is
needed as it influences further results. Researcher should have a clear observation as the conclusions made from
the results depend on the collected data. So, extreme care is required while preparing the collected data, its
selection and use. The next step is result analysis; this must classify the facts, identify the trends and should
tabulate the information derived from the data. The result interpretation is a key objective because debates may
occur while the researcher suggests a wrong proposal. The hypothesis is then confirmed by result analysis. After
the result analysis, selected hypothesis may be modified depending on the discussions and debates made. Even
an unanswered question during a debate can change the entire focus of the research.

In natural science research, the above steps are slightly different. Here, it primarily focusses on the testing of
hypothesis by controlled experiments. Such experiments are normally conducted under regulated conditions. In
social research, these types of controlled experiments are very difficult to construct.

10.4 SOCIAL SCIENCE APPROACHES


A social scientist uses many approaches and methods while they study problems. Even alternative approaches
and methods are used to solve a social science problem. In alternative approaches, the scientist analyzes the
problem that reflects his/her view point. There are four different theoretical approaches and three alternate
approaches in the social science research (Fig. 10.4).

Fig. 10.4 Social research approaches

1. Functional theory

This approach emphasizes interconnection between social life and society policy. Social judgements are
suggested based on function life theory with social conditions.

2. Exchange theory

It is closely related to functional approaches with voluntary exchange of individual choice. This reflects
personal desires in a society. Due to dysfunctional elements, society may get upset at times.

3. Conflict theory

It has a less harmony than the exchange theory approach. It shows social behaviour in terms of conflict
and tension among the selected groups. Major difference between exchange theory and conflict theory
approaches is the fancy of individual and groups.

4. Symbolic interaction theory

The individual can derive inferences from symbols or its form. It reflects what people do and more on
what they think and feel.

10.4.1 Alternative Approaches

In alternative methods, social scientist can use different methods. Historical methods can be used to compare
cross-cultural methods.

1. Historical method

In order to understand the background of a particular subject of interest, historical methods are suitable. A
complete understanding of historical situations may not be possible because of the complexity of historical
knowledge. Tracing the major developments of the past that is important under our current study also
comes under historical method. A historian traces the past events and uses some relevant methods to
obtain knowledge.

2. Case method

A social scientist devotes much of his/her time to deal cases that vary based on situations. Case method
deals with detailed analysis and examination of a particular issue or problem. Sociologists study key
changes in situations and then compare each of them to infer new knowledge. A case study is intended to
discover how to bring desirable changes for a particular problem. For example, a researcher would like to
study problems in migrant labours from other states. Selected case studies will help to throw light on
many similar situations that exist in the society. But selection of the cases is important as a wrong
candidate can mislead the researcher.

3. Comparative and cross-cultural method

Comparison is a common human behaviour. It may discover some sequential facts but comparison of
different societies plays an important role in natural sciences. This is called cross-cultural method. It can
be used in studies related to social patterns that compare different people in different ways. One of the
major threats in comparison to societies is that it can create personal grudges.

10.4.2 Interdisciplinary Approaches

Modern industries have many complex problems in their daily processes. These problems are not only
depending on a single subject or even we cannot solve such complex problems within a limited knowledge. In
this circumstance, people from different areas work together for a solution. For example, in a chemical factory,
role of a chemist and a computer specialist is to work together to solve a problem. Such areas are called
interdisciplinary research area. Today, research grows into interdisciplinary areas such as bioinformatics,
cheminformatics, geoinformatics, mechatronics, robotics and so on. The research person from different societies
is not master in all subjects; growing emphasis is placed on the interdisciplinary approach to many social
problems. The interdisciplinary subject in social science means a group of social scientists with different
specialties will work together on certain problem. In this case, all of these researchers may not know fully about
the entire problem or its solutions. For example, in an environmental problem, it may be necessary to call in, a
physical scientist, a geologist and a civil engineer.

10.4.3 Role of Statistics

When a social scientist relies on quantitative data or data that can be converted into numeric, then he/she comes
across various quantitative methods such as interviews, questionnaire and so on. Any of these methods can be
used in his/her research for data collection. In order to analyze these data, a social researcher uses some
statistical methods. For a qualitative data, it is more difficult to make conclusions using statistical methods. This
is due to the interpretation differences in the “facts” discovered by researchers.

Based on the availability of the quantitative data, social scientist creates information by statistical analysis. They
can be classified and can identify social relationship and processes. The statistical relationships give social
problems a simple interpretation. Testing theories and discovering relationships are the selected functions that
use statistical measures. For example, two datasets can be related through correlation. A high correlation means
an element in one set is highly influenced by another.

The data collection methods may be interviews or questionnaire. For example, a health survey that gives public
opinion polls regarding the cancer disease is shown in Fig. 10.5. The use of statistics been greatly facilitated by
the use of computers by recording, arranging and calculating the voluminous information.
Fig. 10.5 Survey poll regarding cancer disease

10.5 SOCIAL RESEARCH DESIGN

Many methods have been adopted by sociologist to study the social behaviour of the society. There are many
models used in the social research design. They fall into the following category:

Cross-sectional: Select a group of individual of different ages with same or different characteristics of interests.
The scientist takes these groups for the study at a single time.

Longitudinal: For a specified time, the scientist selects a group of individuals (mentioned above) for their study.

Cross-sequential: Scientist test individuals from the above cross-sectional population for more than once at a
period of time.

The social research design cycle starts by defining the problem. Figure 10.6 gives the design steps involved in
the social research. The problem can be well defined only if the researcher has a good observation skill. Quick
learning and interpretation skills are necessary for a good researcher to define a problem through observation.
Defining the problem is a cleaver task; the researcher needs to explain the background in which this problem is
valid. He/She also requires to convince the audience about the importance of the problem. A clearly stated
problem alone can make the research to get forward.

Once the problem is defined, the researcher needs to do an extensive literature review. He/She should learn the
current areas of development in his/her field and the works that has performed so far. Only a good literature
study increases the knowledge in the particular area. Many a times they open up new space for the research.

As and when you have finished quite a reasonable literature survey, you may have mastered some field in the
area. Now you need to formulate a hypothesis that can be tested against some valid theories. The hypothesis
developed needs to be in accordance to the problem which was stated earlier. Generalized hypothesis developed
may change in time when some real datasets/events are included to make the hypothesis specific.

The next step is the acting phase of the research. So far you have a well-defined problem, a really good backup
of literature and some specific or generalized hypothesis. Now you need to select a research design process and
should analyze the data. For example, a social researcher doing research in the area of health issues related to
children under the age of 14. First, he/she should find the areas where this problem is most severe. Then an
extensive hard work needs to be done to get the background of such children, such as the family condition, job
of parents, the surroundings, medical care centres nearby, vaccination details and so on. From all these data and
observations, he/she should come up with a valid reason and should suggest ways to eradicate the same. As a
social researcher, this phase is the most important one, as his/her results are going straight to the society and this
is used for future works.
Fig. 10.6 Social research designs

Generating a conclusion from the work is a tedious task; the conclusions should be apt, short, simple and should
convey the correct meaning. The conclusions should be so derived from the observation that each and every item
mentioned is self-explainable, as the audience to this varies from place to place and time to time. Work also
should throw light for the future expansions possible which can add more glory to your research problem.

10.6 QUANTITATIVE AND QUALITATIVE SOCIAL RESEARCH

Qualitative research is a broad term of research with a variety of approaches. The qualitative research spreads to
historical, sociological, education, etc. There are different styles in qualitative research and not much related to
scientific logic. People’s idea, attitude, motives and intentions are selected in the research. Perspectives of social
research fall under qualitative one.

In the quantitative research, primary goal is to understand social processes rather than representative samples.
This may feel lengthy period of time. The in-depth interviews, questionnaires, observations, etc. are the
characteristics of quantitative research. According to Hammersley (1993)1, quantitative methods emphasize
objective measurements and the statistical, mathematical, or numerical analysis of data collected through polls,
questionnaires, and surveys, or by manipulating pre-existing statistical data using computational techniques.

10.6.1 Quantitative Research Methods

“The team quantitative method refers in large part to the adoption of the natural science experiments as the
model for scientific research, its key features being quantitative measurement of the phenomena studied and
systematic control of the theoretical variables influencing those phenomena”. Quantitative approach relies on a
great deal on quantitative (statistical) data in the form of numbers collected through empirical observation or
from statistical digests. Chapter 3 details basic quantitative data collection methods used in research. A typical
example of a quantitative research is a simple survey by some experimental methods and thereby drawing
conclusions.

The quantitative social research may have the following characteristics:


Data collection is a major concern and is done by standard approaches.
Use casual relationships between variables after selecting.
Prepare a hypothesis and use appropriate test statistics.
The pre-conceptualization in this research may have high degree.
Selected theories are adopted through the research.
The research is carried out using reliable settings.
A detailed description of personal behaviour and thought are major objectives.
The focus may frequently change due to the theories involved rather than theory testing.
The research designs are small compared to quantitative research.
The data collection may be very simple (e.g., one-to-one interviews).

Major strengths of the quantitative research methods are as follows:

Based on the theories, testing and validation are constructed.


Hypothesis testing is based on prior data collection.
Based on sufficient random samples, a generalized research finding is arrived.
With different populations and sub-populations, a generalized research finding is arrived.
A quantitative and useful prediction from a dataset.
Establish cause-effective relationship with credible establishment by the influence of many variables.
Quantitative data collection methods such as questionnaire or interviews are used.
Research uses precise, quantitative and numerical data.
A less time-consuming data analytics by the use of statistical software.
The obtained results are relatively independent of the researcher.
Research study is in a large number of people.

10.6.2 Qualitative Research Methods

The major qualitative research methods are shown in Fig. 10.7.

1. Case study

In qualitative research methods, case studies play an important role. A social scientist needs to understand
how case studies are important in research design. The case studies are not a data collection method but
research strategy. Case studies offer hypothesis for future research and helps to establish generalization
and general findings.

Fig. 10.7 Qualitative research methods

2. Content analysis

Language is the major communication medium of human that contains emotions, knowledge attitudes and
values. There are many communication channels, which convey these ideas such as televisions, radio,
movie etc. Contents analysis is a method in social research which is focused at the qualitative document
analysis. According to Berelson (1952) “Content analysis is a research technique for the objective,
systematic and quantitative description of the manifest content of communication”.
The major characteristics of contents analysis are

Objectivity
Systematic
Generality
Quantification

There are five major types of contents analysis,

1. Word counting analysis


2. Conceptual analysis
3. Semantic analysis
4. Evaluative assertion analysis
5. Contextual analysis

The contents analysis is a direct responding method. It is a highly useful method in historical research. A
variety of cultural studies can be done as the part of social research. Hypothesis formulation, idea testing
and theories are part of the contents analysis. Powerful social values are its key while a research is focused
with small resources.

3. Narrative method

One of the constructive methods used in social research that describes sequential non-fictional or fictional
events is the narrative method. It is similar to case study method in the social science. Narration uncovers
more interesting and useful social theories and social policies. Social scientist gives narrative case studies
expressed in terms of policies and facts. Narrative study is one of the movements in social science
research that becomes quantitative nature. The narrative method becomes an analysis tool in the field of
cognitive science, knowledge theory, sociology, education, organizational studies, etc.

4. Focussed group interview

This is one of the quantitative research methods used to select opinions, beliefs, attitudes and concept
ideas from group members. Focus group interviews and discussions usually take one or two hours with 6–
12 people. Open-ended questions are created and are used to discover general reactions. These reactions
are recorded for a wide range of information gathering. This quantitative method is helpful in conjunction
with surveys.

Focussed interviews involve organized discussion of a selected group of individuals to information gain in
their view points. About the same topic of interest, it gives several perspectives with shared
understanding. A moderator plays a key role in the focussed interview and arranges focus points. A
feedback is obtained with insight interaction among the groups of different users who are involved in the
interview. Compared to other methods focussed interviews are relatively cheap.

10.6.3 Comparison of Quantitative and Qualitative Research

A comparative study between quantitative and qualitative research is given in Table 10.1.

Table 10.1 Difference between quantitative and qualitative research


10.6.4 Social Surveys

Social surveys are techniques used in sociology and related research areas. It is a systematic information
gathering technique used in qualitative research. The analysis and drawn conclusions are key elements of output
in social surveys. For example, National census is one of the biggest social surveys with a high number of
populations. There are numerous data which is being collected in the national population. Basically, social
surveys connected with formation of social reforms produce significant change in the society. According to
Duncan Mitchell’s dictionary, “the social survey is a systematic collection of facts about people living in a
specific geographic, cultural or administrative area”.

There are scientific steps for conducting a social survey and formulate final report. They are as follows:

1. General objective (problem statement).


2. Specific objectives
3. Sample selection (both universe and design simple)
4. Prepare questionnaire
5. Selected field work
6. Coding of data and its tabulation
7. Data analysis and report preparation

There are many limitations in the social surveys. Initially, the sample error may cause problems in the total
results. Also another problem is in difficulty in error measurement. There are certain limitations in this
questionnaire such as length and topics covered. Population and sample design are other limitations of these
social surveys. The population may be localized and needs adequate representation for those samples.

10.7 ETHICS AND POLITICS IN SOCIAL RESEARCH

The students gather ethics and make their decisions from many ways. This may be from mentor, advisor, fellow
student, family, friends, religious beliefs, faculty, seminars, professional organizations or from courses. Research
ethics may be divided into two parts: sharing scientific knowledge and laboratory practice. The objective of
research is to extend human knowledge. The generated knowledge should be shared to others. The sharing of
knowledge can be through publications, presentations and thesis. Scientific publication is an individual or teams
effort. The authors will get accountability and credit while publishing. Policies of scientific journals state that the
authorship of a paper depends on the direct and substantial intellectual contribution of the author. The
contribution may be in design, interpretation or drafting the research or paper, otherwise acknowledgement can
be given to the persons who have indirectly contributed to this work. The following are author responsibilities
while preparing and submitting manuscripts:

1. Author should ensure that the work is new and should be original research.
2. All authors listed in the publication must be aware of submission and agree with the content.
3. Author should provide copy of related works submitted or published elsewhere.
4. Anonymous reviewers may or may not review the article, which must be agreed upon by all authors.
5. If figures or tables are not produced as new, obtain copyright permission before submission.
6. Affiliation must be correct and need to be included in the paper.

10.7.1 Principles in Research Ethics

Academicians airing caution about ethical dilemmas that they faced in research work and advise their students
and colleagues on issues of ethical requirements. The ethical requirements focus on meeting with professionals,
conduct meeting with participants, supervise and teach students and give authorship.

1. Open discussion in intellectual property

Most of the time faculty or supervisors will not give much importance to the students who did some
relevant work they have furnished. An open discussion will help them to make a correct view in the
intellectual properties of their work and need a credit from their supervisor. One of the common ethics is
that give publication credit after the discussion with the students or who is associated with the research
work. According to American psychological association, “Minor contributions to the research or to the
writing for publications are acknowledged appropriately, such as in footnotes or in an introductory
statement.” This may be applicable to the students who contribute substantively to the proposed research
work in conceptualization, design, execution, analysis or interpretation. They should be listed as authors.
A primary contribution need not require warrant authorship, they need to acknowledge in the publication.

2. Conscious in multiple roles

A careful mind is required while a researcher gives multiple roles in collaboration with a person or
research group. These include in participating the research work recruiting students or clients,
investigating the effectiveness of a product of a company, role in laboratory procedures, etc. For example,
if computer scientist designs a software tool in molecular biology application for a biology lab, the idea of
conceptualization, design and coding is his/her part and the biological procedure is the biologist part. After
completion of the work, it should be in a writing form for the publication. Again the computer scientist
may have a role in writing the work. If the collaboration is not proper, the publication may be a solo from
the biological laboratory and the computer scientist may feel cheated. There are required combined
meetings, discussions and records for a celebrated work. This is also an ethical issue while visualizing a
combined work in a different entity.

3. Follow informed-consent rules

A consent process ensures that persons are willingly participating in the research with full knowledge of
applicable risks and payback. A researcher who conducts research work should focus to certain ethical
codes:

Identify the purpose of research and expected duration and procedures.


Anticipated consequence in participants’ rights to refuse to participate and to withdraw from the
research once it has started.
Consider reasonable factors that may influence a willingness of participation with potential risks
uneasiness or adverse effects.
Identify perspectives of research benefits.
Identify limits in confidentiality of data coding, data disposal, data sharing and archiving. Identify
when these confidentiality can be broken.
Provide incentives for participation.

In experimental treatment of research, specific mandates should be included for the researchers. It must be
noted that how participants are assigned to treatments and control the experimental group, alternative
compensation or monetary costs of participation.

4. Privacy and confidentiality

The expert says that the researcher needs to ask participants who are willing to talk sensitive topics in
research. They conduct interview questions so that participants can stop if they feel uncomfortable. The
research participants have a freedom to choose about the information selection and they reveal the
circumstances and must be careful about selection of participants in their study.

Practical security measures: Confidential records are stored in secured area with privileged access and
identify information. Confidential measures are taken for the researcher point of view from the public.

Data sharing before research begins: Purification of data is one of the clarity researches. A peer group can
verify the collected data before processing and draw any conclusions. This should be very clear because of
the ethics in research, to avoid duplication and scrutiny of the required. While data sharing, the researcher
should use established techniques when possible to protect confidentiality, such as coding data to hide
identities.

Limits of the internet: Continuous evolution of web results in knowledge net for all researchers. The
researchers depend on the internet as resource of their literature review. Can we fully trust internet
resources in our research work? Many limitations are there across the internet

5. Record ethics resources

Researchers should avoid and resolve ethical dilemmas with knowledge of their available resources and its
use. Researchers help themselves in the ethical issues and keep track on record how they are maintaining.
Recording of their research work includes keeping a research diary and monthly meeting schedules. The
research diary is one of the recorded materials for any of the researcher for later arguments related to
his/her work. The peer group can make internal presentations and make it as a record.
10.7.2 Politics in Research

“Social research” can be summed up as research in any field of social sciences. An empirical evidence and
analysis to understand and explain the nature of human behaviour are major perspectives of the social research.
It also provides analysis and understanding of social structures and cultures, as well as the social impacts of
issues such as government policies. Political representatives, media, academics and boniness policies are
selected choices of the policy makers in social research. This will help, understand and manage the risk
associated with the choices, which are cried out in public bodies, universities, colleges or even in special
research organizations.

The research related to politics is not termed as “political research” but referred to as “political” like political
party. It is more specific and varied from the social research.

EXERCISES

1. What is social research? Discuss its differences with scientific research.


2. What are the principal sources of literature for social research?
3. Explain the role of internet in social research.
4. Explain various survey techniques used in social science research.
5. What are the roles of statistics in social research?
6. Differentiate between “politics” and “political research”.
7. Explain the various fields in social science.
8. List out characteristics of social research.
9. What are the various paradigms of social research? Explain.
10. Explain the various scopes of social research.
11. What are social surveys? Explain its difference between questionnaires.
12. How knowledge is formed in social research?
13. What is historical research?
14. What is symbolic interaction?
15. What are complementary perspectives?
16. Discuss various social research approaches.
17. Explain the differences between quantitative and qualitative research methods.
18. What are the characteristics of quantitative social research?
19. What are the strengths and weaknesses of quantitative social research?
20. Explain various qualitative research methods.
21. Explain contents analysis. How it is suitable in social research?
22. Explain five principles in social research ethics.
chapter 11
PRESENTATION OF THE RESEARCH WORK
Objectives:

After completing this chapter, you can understand the following:

The definition of business report and how to plan and structure it


The definition of technical report and its various components
The detailed explanation of research report
General tips for writing report
The description on how to presentation the data
The definition of oral presentation, and advantages and disadvantages of oral communication
The detailed explanation of bibliography and references
The description of Intellectual Property Rights
The definition of open-access initiatives
The description of plagiarism

Goal of any research is presenting a good report that you tell others why and how you produce results from a
research work. A report is an organized text that has an order of appearance. Reports are arranged as sections
and subsections with headings and subheadings. They contain descriptive outputs such as graphs, charts, tables,
etc. The reports are different from essays or monographs. According to the area of presentations, different types
of reporting are available.

Types of reports

Different types of reports are prepared based on the nature, course and the topic being converted. The selection
of report depends on how its text is organized. The contents of each type of report are different. Basically such
changes are content based. The outlines of these reports are different based on the requirement. Certain types are
business report, technical report and research report.

11.1 BUSINESS REPORT

It is a standard report including sections that are outlined as a guideline. Business reporting, otherwise called
enterprise reporting, is a public reporting of financial data. The business reports are assignments that analyze a
situation and apply business theories, give suggestions for improvements. Business reports give the following:

1. Analyze available and potential solutions to a problem, situation or issue.


2. For a practical situation, apply business and management theory and try to solve the situation.
3. Evaluate skills and identify its weight and then demonstrate such skills.
4. Arrive conclusions for a problem.
5. For future action, recommendations are provided.
6. A clear communication skill is given.

The audience of the business report is considered during the preparation. The reports are usually address-specific
issues or problems. The steps involved in the business report preparation are planning, structuring and
presentation.

11.1.1 Planning a Business Report


Planning has a vital role in the preparation of business report. In the planning of business report, key
consideration should be addressed.

Purpose of the report

Generally to assist people make decisions from more opinions. It must be clear about what decision is to be
made and the role of the report in the decision.

Readers of this report

There are two types of readers for any report – main readers and secondary readers. Main readers are decision
makers and secondary readers are facilitators. Before preparing a report, try to understand what the readers
already know, what they need to know, and their use of report.

Main message of this report

Business report required a main message to the users. A clear vision is required before communicating the
message. It should be clear about what information need to include and what additional information need to be
provided?

11.1.2 Structuring of Business Report

A general as well as special structure is required in business reports. The general structure contains chapters,
sections, subsections, etc. It may also contain special structures such as case studies, illustrations and so on. A
business report may contain the following subheadings and subsections:

Covering letter
Title page
Executive summary
Table of Contents
Introduction
Conclusions
Recommendations
Findings and Discussions
References
Appendices

The covering letter officially introduces the report to the recipient. The title page is a brief description of the
report. It must contain date of the report, authors name and their association or organization.

The executive summary helps a reader to grasp what the report is to be focussed, what are the findings and what
goals are made. The executive summary should be within one page. The executive summary and abstracts are
different.

The table of contents gives a clear orientation to the readers of the lists of heading and subheading in the report.
List of tables and figures may also be noted as the part of table of contents.

The introduction is a stage of the report to the readers. It is a context with general interests. The introduction
gives a clear indication in what a reader expects. A business report expects conclusions and recommendations.
The orientation to time is the major difference between these two. Past and present situations are connected or
related through conclusions. In recommendation what is oriented in future? Any specific actions suggested? etc.
For example, the difference between conclusions and recommendations is given in Table 11.1.
Table 11.1 Difference between conclusions and recommendations

Discussion is one of the main components in a report. The discussion and supportive to findings are conclusions.
It justifies the recommendations that depend on scope and purpose. References are information source provided
to the reports. Organization style of referencing resources is different depending upon the preparation. The
appendices are supporting materials for the submitted reports. For example, the discussion provided in the report
may require a more clear data analytic and its summary is plotted in the discussion section. The detailed data
analytics can be placed in the appendix section.

In general, the business report must be written in well-structured manner with good usage. The pages must be
numbered. The figures and tables must be cited and should be given with proper footnotes. The white spaces in a
business report should be well balanced. Finally, grammatical check, spell check, punctuation check, etc. must
be done.

Example – Template of a business report

The basic template of a business report is given in the following.


11.2 TECHNICAL REPORT

This report is a work product of technical people and engineers. Generally, this report gives technical
specifications, technical-know hows, etc. Major purpose of this report is a communication between engineers or
technocrats about a process or product of technical or experimental work.

Technical writing is a term used in science, engineering and allied skilled areas. It is the key of any technical
report. The technical report may be focussed for communication in day-to-day business, article writing, patent
preparation or manual and instruction preparation. Remember that the audiences of the technical reports are
technical people or who have technical knowledge set. The following key areas are in the usage of technical
reports.

Laboratory reports and communications


Technical papers, articles, etc.
IPRs and patents
Operational procedures, manual, instructions

For example, an automobile engineer prepares a technical report regarding his work who spend more than an
year for developing a new car. The type of transmission design, evaluation, implementation of components, etc.
is under his visibility. His major objectives include cost, design and management perspectives. His technical
report must address these criteria.

11.2.1 Components of Technical Report

The following sections are needed for a technical report:

1. Title Page

It is a brief and meaningful description about the work. The title must indicate purpose of the study,
author’s name, affiliation, etc.

2. Executive Summary

Most of the engineers or technical people may expect an executive summary. Otherwise an abstract is also
sufficient in a technical report. It must be concise and clear overview of entire experiments or topic to be
discussed. The reader must obtain a clear clarification from the abstract. A good abstract or executive
summary contains overall purpose of the experiment or principal objectives. The studied problems,
experimental methods, materials, main results and main conclusions are included in the abstract.

3. Introduction

The work needs to perform is defined in this section. In the introduction, the scientific purpose or
objectives and problems with reason are mentioned. Sufficient background information about the work is
also noted. There is a clear note about why the study is performed or purpose of the study. These are
important notes while preparing the introduction.

4. Theory

This section can be included in the introduction section. If the theories are very important and expanded
form, a new section can be given for the theory. The theory section explains the background of the study;
equations, models, mathematical formulae, scientific relations, etc. are clearly illustrated. Standard
technical formats must be used for its writing.

5. Experimental Setup

A detailed discussion about the apparatus or instrument selected for the study is required to elaborate. For
example, a circuit or a machine that leads to a product or a process for the industry or college. The users
must clearly understand what it is, how to use it, what its components are etc.

6. Experimental Procedure

Chronologically ordered steps are required to list as the procedure of experiment. In this section, the
analysis of the data used can be mentioned as a separate subsection. Sample calculations can also be listed
as a subsection. Basically, they are mathematical relations and equations.

7. Results and Discussions

The results must be analyzed and should be stated with clear interpretation. Figures, tables, charts and
even photos are explanatory components in this section. A well-organized presentation of data is required.
This section progressively describes what was identified or discovered, significance of the results, focus of
the results, etc. If errors occur, why this error, rate of error, cause of error, did the error caused the
experiment/study need to be well explained.

8. Conclusions

Initially, a summary of the experiment and significant results of the experiment can be included. A
conclusion should answer the questions raised in the introduction and explain significance of the
experiments, implications of the study and discussions made.

9. Appendices

The useful information required for the work, if the texts are too long. The information provided to the
user may need some order and is must be useful for further references.

10. References

Sufficient number of references can be provided to the readers in order to increase their readability. The
reference should be apt and informative and should feel as a supportive document.

Example – Template of a technical report

Basic template of a technical report is given below.


11.3 RESEARCH REPORT

A research report is a document obtained from a research study after conducting literature review, organized data
collection and analysis. The format of research report varies with respect to the type of study, way of
presentation or requirement. In a research report, major three sections are covered – preliminary section, body of
the report and supplementary material. Each of this section covers sub titles or headings. The body of the report
is the main part of any research. Introduction, literature review, methodology, data collection, data analysis,
results, discussions, recommendations, conclusions, etc. are its subcomponents.

11.3.1 Preliminary Section

The dissertations or research report contains selected preliminary sections:

Title page
Copyright page
Abstract
Certificate/Signature page
Acknowledgement
Table of contents
List of figures
List of tables
List of abbreviations/symbols/formulae
The above may change with respect to the type of submissions. For example, in a social research report, list of
symbols or abbreviations will not be seen, while in mathematical research reports it may be seen. The page
numbers of the preliminary sections are written in small Roman numerals (e.g., i, ii, iii, …). Usually, this
numbering is given at the centred bottom of the page. The page numbering is at least 2 cm from the text. Use
appropriate margin (e.g., 3 cm from the top, bottom, left and right of the A4 size paper). The footnotes may be
avoided in the preliminary materials. The text can give appropriate font size (e.g., 12 pt), line spacing (e.g., 1.5
inch) and character spacing.

11.3.2 Body of the Report

1. Introduction

The introduction focusses the subject under investigation starting from the background information of the
topic. The introduction section must give a brief idea of the literature of the study, objectives, background
of the study and statement of the problem. Logically, the introduction section has two separate sections –
introductory section and background of the problem under investigation.

The background information in the introductory page must mark a brief account of what is to be
investigated in the study. For example, outline of the issues faced by the organization. This section must
provide history/origin of the report with date of request and importance of the study.

Objectives or aims of the study for the report are other key information provided to the readers. This
indicates what key questions and answers are trying to achieve. Scope of the study is a key consideration
of the research report that covers what idea(s) are covered in the report. Certain assumptions can be made
regarding the situations based on this report.

Finally, as an introductory summary, organization of the thesis can be given to the readers. This will help
reader’s choice to arrive how to read and what to read. The introductory section must contain a well
written problem statement for the study selected. The study will compare, contrast, investigate, determine,
develop, examine, describe, clarify and evaluate the issues of the selected problem.

2. Literature review

Literature review explains existing study recorded by the investigator. Limitations of the existing work can
be covered in this subsection. A comprehensive literature study results review articles and knowledge. The
current trends, theories and models used in the existing work are discussed in the literature survey. The
literature survey includes history of the principal-ship, current selection practices, recommendations and
results. The conclusions made by other researchers are presented in this section. The direct questions are
avoided but keep a flow to the ideas. The personal ideas or theories are avoided in the literature review
sections. Several paragraphs and subheadings can be included in this section.

3. Methodology or Procedures

Generally, this area must provide what methodologies are used in the study. Sample collection and sample
data can be provided as a separate subsection. The sources of data, population source, how to design
samples, how large the population is, etc. can be discussed.

The procedures are steps involved in the research process. Instruments, experiments or modelling in the
research process may be involved. The steps involved in the research process are called procedures. For
example, in Biotechnology research, an experimental setup has been put-up in the laboratory. There is a
sequence of steps involved to check the validity of the experiment. Hence, the procedures are created
based on the setup of the instruments with instructions. The revised setting of the instrument may be
required after checking the samples or pilot testing. This revision is also required in the modelling of the
experiments done by the researcher.
4. Data Collection and Analysis

Generally, this section is originated from the procedure. Method of data analysis is determined by the
hypothesis to be tested or answer of the question that is addressed in the hypothesis or research question.
For example, a researcher calculates standard deviation and mean from a response to research questions.
Analysis of variance (ANOVA) is used to determine significant difference in primary and secondary
principals of the study samples.

Statistical software (e.g., SPSS) can be used to analyze data. Another software, Matlab, can be used to
analyze scientific data. The inferential statistics such as chi-square test, t-test, ANOVA, etc. are useful
tests for analysis of data. There are quantitative research methodologies employed in the study. With
respect to the objectives, a researcher collects data in many forms. For example, unstructured data,
historical data, case studies, general test reports, etc. The researcher must give a comprehensive
description of the development of data collection as well as changes made in the instrument. A clear
framework should be provided to the reader for better understandability. The researcher should clearly
specify and describe the steps involved in the procedure and substantiate the selected data in the study.

5. Results and Finding

This section gives the outcome of the experiment or study. An introductory note is required for each of the
results section. The findings are reported here and response rate must be mentioned, if surveys are used.
The readers may get a gravity of the research while reading the chapters.

The demography of the data is very important because we are transmitting the information to the readers.
The tables, figures, charts or even photographs are demographies to the readers. The results are reported in
tabular format or as graphical format. Generally, tables are limited to columns and rows with column
headings. Detailed description must be given to each table and figure with its citation. For example, a
sample table and a sample research figure with captions and labels are given in Figs. 11.1 and 11.2,
respectively.

A summary of the results are good at the end of the chapter. This indicates what are the findings,
observation, etc. and gives a comprehensive reading.

6. Discussions and Recommendations

Discussions of the finding can either discussed separately or include in the result section. There are certain
recommendations or suggestions as the output of the research. Remember that findings are part of the
summary of results or discussion and not given as a separate chapter. In certain cases, this can be avoided.
The discussion section can be divided into subsections. The work done by other researchers discussed in
the literature review can be compared, analyszd or evaluated here.

7. Conclusion

This section presents finding and result analysis. A narrative form with researcher’s observation is given
by the researcher. Several findings can be incorporated into one conclusion. Also, one finding gives rise to
several conclusions. Generally, conclusions are written in past tense.
Fig. 11.1 Sample table with caption and table number

Fig. 11.2 Sample figure with caption and figure number

For example, a study in the undergraduate admission of any one University in Kerala 2014 yielded the following
findings:

1. 80% of the admissions are state syllabus students studied in plus 2 level.
2. 12% of the admission are CBSE students studied in plus 2 level.
3. 8% of the admission are ICE syllabus students studied in plus 2 level.

From these findings, the following conclusions are drawn.

1. The marks obtained in the state syllabus is high compared to other board syllabus.
2. The students who are studying under state syllabus obtain more marks in order to promote the state
schools.
Recommendations can be included before or after the conclusion or as a part of the conclusion. They are actions
that implemented based on the finding done in the research. In the above example, a recommendation may be,
“Normalize the marks of state, CBSE, ICE syllabus and then calculate the index marks before preparing the rank
list.”

In summary, body of the report contains the following sections:

Introduction
Literature Review
Methodology or Procedure
Data Collection and Analysis
Results or Finding
Discussions and Recommendations
Conclusions

11.3.3 Supplementary Materials

The additional materials provided in the research report are References and Appendices. The reference list
includes the full publication details of websites, books, articles, conference proceedings and other resources.

Appendices are other relevant information provided in the research report, which may contains the following:

Data collection forms


Maps and graphs
List of publications achieved
Articles or clippings
Tables/Charts
The data used
Specific diagrams or photographs
Pamphlets
Specifications
Additional derivations
Any resource that supports the thesis report

Example of a research report

The following shows the table of contents of a research thesis in an inter-disciplinary science like computational
biology.
11.4 GENERAL TIPS FOR WRITING REPORT

The following are some important points to be noted while writing a report.

1. Consider and proceed the guidelines provided by the University or Organization.


2. Work out a plan to discuss and focus carefully on expression.
3. Keep a writing flow. Write acceptable headings and subheadings with a logical order.
4. Avoid too many lists or phrases separated with bullet points. This will disjoint from the main text to lack
sufficient content.
5. References should be correctly cited. Also check the resources are correctly used. Check the primary and
secondary references correct and suitable for the report.
6. Write in third person and avoid personal pronouns such as I, My, You, Your, We or Our.
7. For describing peoples’ work, we use present tense. For example, Vinod (2008) points out that ….
8. Use gender non-specific language such as She, He or Their.
9. Do not start paragraphs with quotations.
10. Use formal language and avoid abbreviations.
11. Numbering of the pages in preliminary supplementary and body of the report are different. In preliminary
pages, usually we use, i, ii, …. Use Arabic numerals in the body of the report (1, 2, 3, …). That should be
reflected in the thesis by giving illustrations.
12. A good report should offer original thinking and creative thoughts.
13. Consistent formatting is essential.
14. Use good and official font and size along with the margin.
15. Use appropriate line spacing for the texts.
16. References should be arranged in chronological order or alphabetic order.
17. Provide citations in chronological order.
18. Use symbols and definitions in common format.
19. Use clear figures and tables to show the results.

11.4.1 Technical Writing

Technical writing is sometimes defined as simplifying the complex. Oxford Dictionaries Online (ODO) provides
one of the best definitions for technical writing. It defines technical writing as a strict application or
interpretation of the law or rules that is incorporated while writing a document. Good technical writing results in
relevant, useful and accurate information geared to specifically targeted audiences in order to enable a set of
actions on the part of the audience in pursuit of a defined goal. Figure 11.3 shows the various types of technical
writing,

Figure shows that technical writing carried out for teaching purposes, researches, for day-to-day business and by
specialist writers. While dealing with writings for teaching purposes, the basic documents include books,
articles, thesis, scripts, etc. From a researchers’ point of view, patent is a technical document, which he or she
claims when a new theory or learning is put forward.

Another type of technical writing is the one done by specialist groups. It includes manuals, instructions and
procedures. Even a detailed write up for some machinery working also gets included in this. There are also
various technical reports that are used for day-to-day uses. The various email messages that are exchanged, the
surveys, benchmark meetings, etc.
Fig 11.3 Technical writing

11.4.2 Goal of Technical Writing

Good technical writing results in relevant, useful and accurate information geared to specifically targeted
audiences in order to enable a set of actions on the part of the audience in pursuit of a defined goal. The goal
may be using a software application, operating industrial equipment, preventing accidents, safely consuming a
packaged food, assessing a medical condition, complying with a law, coaching a sports team, or any of an
infinite range of possible activities. If the activity requires expertise or skill to perform, then technical writing is
a necessary component.

Only a small proportion of technical writing is actually aimed at the general consumer audience. Businesses and
organizations deliver vast amounts of technical writing to explain internal procedures, design and produce
products, implement processes, sell products and services to other businesses, or define policies.

11.4.3 Foundations of Effective Technical Writing

1. Know your reader


2. Know your objective
3. Be simple, direct and concise
4. Know the context in which your communication will be received and used
5. Design your communication

These are the basic things to be kept in mind while dealing with a technical report. When starting with, always
know your reader who will be reading this. If you intend to write a report to a totally new audience, then start
with the basic ideas and basic structure. If the audience is very well into the technology, then such a startup will
be exhaustive to them. So always give what your audience needs. Never beat around the bush, be specific to
your objectives in very simple and direct manner. Another important thing to be taken care of is the context in
which your communication is used. Always design it to the manner which is welcomed by the audience.

11.4.4 Qualities of Good Technical Writing

1. The writing should summarize a set of conclusions to reveal the results obtained.
2. It should convey an impression of authority, thoroughness, soundness and honest work.
3. Should be a stand-alone document and must be understood by readers who are not even the part of the
initial audience.
4. Is free from typographical errors, grammatical slips and misspelled words.

11.5 PRESENTATION OF DATA

Data can be presented in text, table, chart, diagram or graphs. These forms are appropriate to give information to
the readers rather than reading. Set of numerical results should be presented as tables or pictures rather than
included in the text. A picture or a photograph is stronger than a 1000 words. It summarizes information that are
difficult to describe in word alone.

When integer numbers are required in the text, digits above nine is written in words whereas numbers less than
ten is written in numerals. The decimal numbers are written usually consistent with justified digits. Numerical
data can be plotted in tabular format. Tables are better than graphs; they give structured numeric information.
Comparison of the data is very good in the tables and is self-explanatory. Without any detailed reading to the full
text, the readers can understand tables and figures. The tables and figures should clearly be labelled and key
points must be pointed out. A verbal summary can be given to tables and figures to illustrate important points.
The numbering of the tables and figures should be as simple as possible. For example, Table 3.1 means, first
table cited in Chapter 3. Similarly, Figure 2.7 means seventh figure in Chapter 2.

Statistical information like standard errors described in the scientific article should be presented in a table or
graph that contains main message. Another notable point is that tables do not need to be boxed with the
boarders. Line graphs and bar charts are two general graphical demonstrations for data. The line graphs can fill
more information than bar charts. Continuous quantity such as time, mass and so on can be plotted in any of the
axis with varying values. Line graph can display more than one relationship in the same diagram. For clear
displaying, the results of bar charts are useful. The horizontal axis represents a discrete categorization. The bars
can be given colours and gives possibilities of clustering. The variety of graphical representation of data is given
in Fig. 11.4.

Fig. 11.4 Variety of graphs


Fig. 11.5 Pie chart

Another method of representing is a pie chart, another model of graph that gives percentage of distribution.
Figure 11.5 shows a pie chart. which indicates the worlds internet usage

The scientific data are represented in a digit format with possibility of significant digits. For example, pi can be
represented as 3.14. Some units are represented as values along with the unit specified. For example, 12.83 ppm
indicates parts per million. The standard deviation can be quoted as a ± sign. For example, the tolerance is 80±5
kg.

11.6 ORAL PRESENTATION

Research activities take several months to obtain a promising result. Next step is to present the result in front of
a research community or audience. Typically, use of audio visual aids can be used to convey the research idea. A
presentation needs to be like a story narration for the audience, so that keeps the flow in the ideas. A good
presentation should address who the audience is? The work should be discussed in a way that is acceptable to the
crowd. The following are some points regarding the oral presentation.

Time sensitive: A specified time is allotted for the presentation. So keep time lines during the same.
Fleeting: Avoid confusion to the listeners while flipping the pages.
Speech: The style of presentation is important with speed and pronunciation.
Visual: The presenter should have a good eye contact with the audience; the body language and gestures
are very important.

Figure 11.6 shows the structure of an oral presentation. The presentation is divided into three major sections –
beginning, middle and end.

The title slides should contain title of the report, author’s name and affiliation. The presenter’s name and
responsibilities (if any) should also be in the title slide. The introduction slide should be so created that it should
connect the presenter with the audience. It should discuss what this presentation is about, purpose and goal.

In the middle of the presentation, final design concept, sequence of activities, methodology, evaluation, etc. are
to be presented. At the end of the section, conclusions and recommendations can be presented. A good
presentation does not mean a lengthy presentation. It should be short and simple. The graphs, tables, pictures,
maps, charts, etc. can be used to convey the idea in the presentation. For example, comparison of two results
presented either in a tabular format or in a graph will be very effective to the audience. The speaker must be
patient and should not show any anger or frustration. Keep the interaction with the audience and always
establish eye contact.
Fig. 11.6 Structure of oral presentation

11.6.1 General Tips

Some tips that help to make oral presentation about a research more attractive is given in the following. A
general format of oral presentation with slides is given in Fig. 11.7.

Fig. 11.7 Format of presentation

Timing
What is the duration of presentation? This must be clear in mind of a presenter because 10-minute talk is
different from a 45-minute talk. For a 10-minute presentation, just convey the important points and idea.
For an elaborate presentation, we need to explain the entire mechanism of the process or findings in a
research work.

Audience

Before preparing the presentation, identify who are watching the presentation. General audience is
different from a specific group. With a general audience, the technical terms must be explained and a more
generalized talk and viewpoints should be given. The significance in the research and development may
be delivered to the specific group.

Content

It is not true about the thought that students need to explain every single thing they know or to perceive.
Prepare the presentation with examples and deliverables and keep all contents ready. Communicate the
contents to the audience with main points and details whenever necessary.

Organization

The presentation must be divided into three sections – beginning, middle and end.

1. Introduce yourself
2. Present research questions and objectives
3. How to conduct the research – give description
4. What are the findings – give description
5. Conclude the important points and frame the summary

Depending on the topic of research, provide sufficient background information to the audience to
understand the importance of the study. Next, bring the audience to focus the research topic.

Use audio-visual aids

PowerPoints or any other presentation software is useful to present the idea clearly. Incorporate images
and pictures in the presentation in order to avoid a monotonous.

Tone and practice

A best approach to present a talk is to give formal occasion, by giving respect to the audience and try to
address them with politeness. Do not address “you guys” or any other unorthodox usage. Present clearly
and speak slowly. Repeat the important points twice. Practice the presentation several times. Presenting
before family or friends or even in front of mirror helps you to attain the pace and style. Practising also
helps to track the presentation within the allotted time.

Case Study: A 20-Minute Presentation

For a 20-minute presentation, prepare slides and give talk within the provided time.

Title/Author/Affiliation – 1 slide
Scope/Objective – 1 slide
Outline – 1 slide

The agenda of the presentation

Background: Motivation and problem statement – 2 slides


Related works – 1 slide

Methods – 2 slides

The motivation indicates why this work was selected. In the related work, it should superficially cover the
other involved works in the specific area. A quick coverage of what was already mentioned is also
included in related work.

Results and Discussion – 4 to 6 slides

Present key results and key insights. This is the main body of the talk and its internal structure varies
greatly as a function of the researchers’ contribution. Here, a presenter cannot cover all results but should
not leave the important results without discussion. Never provide large figures and tables, provide only a
summary of the results.

Summary – 1 slide
Future Work – 1 slide

This is an optional slide that opens your research work to outside world.

Backup Slides – 3 slides

This is also an optional one, with a few slides for the expected questions, important finding, methods, etc.

11.6.2 Advantages of Oral Communication

The following are some of the advantages of oral presentation:

1. It simply saves money because it is very less expensive way of communication.


2. A spot feedback is received from the audience that makes sure what message is sent across.
3. It avoids time delays and lags.
4. A personal warmth, friendliness and develop a healthy bond between the audience and the speaker.
5. Oral presentations directly motivate people into doing “something”.
6. The presenter can directly and easily identify their mistakes and correct them.

11.6.3 Disadvantages of Oral Communication

The following are the disadvantages of oral communication:

1. Oral communication has no permanent record, which may lead to future issues.
2. Background noises are created; some can interfere the communication. This makes the session ineffective.
3. If the speaker is not a veteran, effective information transfer hardly occurs. A confident speaker alone can
convey the ideas to a crowd through oral presentation.
4. The audience may misunderstand the communicated information.
5. The feedbacks may mislead the speaker
6. Lengthy presentations may be perceived in a different manner by the audience.

11.6.4 Effective Oral Presentation

Oral presentation is the process of expressing information or ideas from a speaker to the interesting audience.
Oral communication is a verbal transmission of information and ideas from an individual to others. There are
formal and informal oral communications. A face-to-face discussion is an example of informal communication.
A presentation of thesis or political discussion among a group of people is formal oral communication. For an
effective oral presentation, several key points need to be practised.
1. Audience Analysis

Understand the difference between the audience and the speaker. Success of the presentation depends on
careful management of distance between the speaker and the listeners. Figure 11.8 shows such a distance
for the audience from the speaker.

Fig. 11.8 Presentation distance

The distance means the understanding of the audience. The quadrants A and D show good matches
between speaker and audience. Both audience and speaker agree on an informal distance (Quadrant A)
and both speaker and audience agree on a formal distance (Quadrant D). This projects a comfortable oral
communication between the presenter and the audience. If there is an informal distance projection, the
audience expects a formal one (Quadrant B). If the audience expects an informal one (Quadrant C), then
the speaker’s talk is poorly delivered or received by the audience. Quadrants A and D can be synchronized
but Quadrants B and C cannot synchronous with the audience.

A question arises what is synchronization with audience? Are you closely related? In informal
presentation, you are aware of audience. For example, in a research pre-submission seminar, the guide and
yourself will invite a group of researchers into the seminar hall. This presentation is pre-planned, so that
the presenter knows who are his/her audience.

2. Structure

To make the talk memorable and understandable, create a structure of the talk before presenting. A
minimum number of audiences can understand the main points and successful talk and make
understandable. For an understandable talk, the following points need to be clear.

A simple and pattern flow of points


A marked structure
Examples, supporting data, etc. required

The structure of the presentation follows patterns such as spatial, topical, solution based, general to
specific, chronological and familiar to unfamiliar. The talk literally organized is considered as spatial
strategy. It can be disorienting and maintain a focus. For example, a mechanic presents, “how to assemble
a particular item in a line assembly fashion?” He has to make a demonstration literally organized. This
makes a clear understanding to the audience.

A traditional organization of talk with an uncontroversial pattern. For example, a presentation that gives
an industrial plant practice job in imported goods handling. Here, an appropriate order of topic is given
like most important to least, largest or smallest, etc. In a question–answering pattern, speaker describes the
key problems, then present and defend the solution. For example, a web designer explains improved
online information about online product ordering website. He/She may explain the differences between
existing and the proposed websites as a problem solution manner. This technique is easy to the employee
in a competing solution based presentation. Speaker can present several choices of answers with their
advantages and disadvantages to a particular problem. He/She can describe the pros and cons of different
approaches. For example, an engineer presents various technologies available to improve security
enhancement in airports.

In generic to specific pattern format, a speaker has a possibility of movement from general principles to
specific principles of actions associated. This pattern is very difficult to organize because of the
mismatches of actions, which can confuse the audience. For example, a zoologist presents matching habit
of various kinds of insects.

In chronological structuring of talk, the speaker gives a straight narration of story. The research focus has
a chronological order of a proper uncovering sequences. For example, a historian presents milestones of
freedom fighters.

In familiar to unfamiliar organization of presentation, a speaker can demonstrate frequent process to the
audience. For example, a researcher presents a software that discusses current approaches of video
streaming and its analysis. Then he/she explains non-linear approaches that replace the existing
technology.

3. Supporting Materials

Use of supporting materials makes the audience more involved to the topic. For example, a chart, photo,
etc. may give visual support to the audience to understand the presentation. A comparison shows the
familiarity of recollection tendency (Table 11.2).

Table 11.2 Recollection tendency

The presenter needs to spend a little bit of time for making flip-charts drawing and handouts.

4. Delivery: The presentation skill improves by practising. The unexpected questions or events may occur
during the presentation. These questions must be clearly answered with some genuine. For example, a
researcher presenting his/her Ph.D work in front of known researchers. The contents in the presentation
must be simple and make sure that audio-visual aids are supported. The “cold” presentation will not create
interaction between the speaker and the audience.

11.7 BIBLIOGRAPHY AND REFERENCES

References are listed at the end of the text, article or research thesis that contains those works cited within the
text. The reference covers the articles to be refereed for further reading. Bibliography is any list of references at
the end of the text or thesis with cited or not. The bibliography references are additional background reading and
other related materials needed. Usually bibliography is listed in order of importance.
For example,

Books:

Journal:

Journal in press:

Conference Proceedings:

User Manuals:

Website:

The two terms are noted while dealing with bibliography: “citations” and “reference list”. A citation is a
reference made in the text to give source of information. This can be in the form of numbers, author name and
year or any indicator. Reference list is an organized list of works cited in the text or thesis that is placed at the
end of the document.

In academic research, the reference list can be organized for the searching purpose. The futuristic researchers
can use this list of references by searching the bibliography contents. There are mainly two styles that give
templates for listing the references – Harvard style and Numeric style.

11.7.1 Harvard Style

Citations

Harvard style citations consist of author or editor family names and the date of publication of an item. One of
two forms may be used:

Vinod (2010) considers how to run a …

One commentator (Anand 2012) has looked at …

Where a work has more than three authors or editors, cite the name of the first named author or editor only,
followed by, et al.:

e.g., A study of flora in Himalaya (Vinod C., et al. 2011) suggests …

If you refer to two or more sources by the same person from the same year, distinguish them by adding a lower-
case letter after the year, as follows:

Vinod (2001a), Vinod (2001b), Vinod (2001c), etc.

Where quoting directly from a work, or referring to particular pages, provide the page number(s) after the date:

“How will you select your professional and business advisers will have a direct bearing on your business
success.” (Anand 2012, p.118)

References lists and bibliographies

It is customary to put the titles of books and journals in italics. An alternative is to underline them. Whichever
method you use, use it throughout. The following examples use italics.

1. Book
Author(s) – family name, initials. (Year). Title of book. Edition. Place of publication: Publisher.

Examples

Vinod, C. (2010). Object Oriented Programming. Edition 1, New Delhi, PHI.


Vinod, C. and Anand, H. (2014). Machine Learning. 1st ed. New Delhi, PHI.
2. Work in edited book

Where a work from an edited collection is cited, references to both the individual work and to the
collection as a whole should be given.

Author(s) – family name, initials. (Year). Title of chapter. In: Editor(s) – family name, initials, ed(s). Title
of book. Edition. Place of publication: Publisher, Chapter or page numbers.

Examples

Vinod, C. (2011). Working with Pointers, Edition 1, Object oriented programming with C++. New
Delhi: Ch.6.
Vinod, C. and Anand, H. (2014). Association Rule Mining Algorithms. Artificial Intelligence and
Machine Learning. PHI, New Delhi, pp. 244–262.
3. Edited book

Editor(s) – family name, initials, ed(s). (Year). Title of book. Edition. Place of publication: Publisher.

Examples

Vinod, C. and Anand, H. (2014). Association Rule Mining Algorithms. Artificial Intelligence and
Machine Learning. Edition 1, New Delhi: PHI.
4. Conference Paper

Where a paper from conference proceedings is cited, references to both the individual paper and the
proceedings as a whole should be given.

Author(s) – family name, initials. (Year). Title of paper. In: Editor(s) – family name, initials, ed(s). Title of
conference, location, date held. Place of publication: Publisher, Page number(s).

Example

Vinod Chandra, S. S., Anand, H. (2014). Horizontal and Vertical Rule Mining Algorithms. In:
Elsevier International Conference on Advances in Computing, Communication and Information
Security (ACCIS 14), Kerala, May 29–31 2014. Elsevier Book Series, pp. 26–30.
5. Conference Proceedings

Editor(s) – family name, initials, ed(s). (Year). Title of conference, location, date held. Place of
publication: Publisher.

Example

Vinod, C., Maya Devi. (2013). Expert System for Power Plant Operator Performance Evaluation:
proceedings of the second International Conference (ICACC 13), Kochi, Aug 22–26 2013.
6. Report

It is important to be able to identify the body on whose behalf research was carried out. For this reason, if
a research report is part of a series, the title for the series and the volume/number of the report should be
given at the end of the reference.
Author(s) – family name, initials. (Year). Title of report. Edition. Place of publication: Publisher. (Series
and vol./no.).

Example

Vinod, C., et al. (2008). A Technical Note on HMM. DCB, Thiruvananthapuram, Edition 4:
Document Services. (Department of Computer Research, report no. 136, vol. 3, no. 2).
7. Academic Thesis

Author – family name, initials. (Year). Title of thesis. Type of thesis. Institution.

Example

Vinod, C. (2009). Computation Algorithms of Micro RNAs. Ph.D. thesis. University of Kerala.
8. Journal Article

It is used for print journal and electronic reproductions of print.

Author(s) – family name, initials. (Year). Title of article. Journal title, volume(issue number), Page
number(s).

Examples

John Prakash, Reji Moan, and Abdul Salim. (2014). Harmonics study. Machine Intelligence Journal
on Power Electronics, 3(2), pp. 191–206.

Electronic Document

As yet, no precise standards have been developed for referencing electronic documents. However, the
Harvard style can be adapted to accommodate these materials, noting the electronic format in square
brackets.

9. Online Journal Article

Web-based journals only; for online versions of print journals, give a reference to the print format.

Author(s) – family name, initials. (Year). Title of article. [Online]. (URL) Title of online journal,
volume(issue). (Date accessed).

Example

Vinod, C., Saritha, R. and Anand, H. (2014). Nature Inspired Project: Generation of Metadata in an
Open Access Environment. [Online]. (URL http://www.mirworks.ac.in/issue36/metadata/). Anusha,
(36). (Accessed 12 February 2014).
10. Website (excluding online journals)

Include in the reference as much of the following detail that is available from the Web page and related
home page. Where a Website has no identifiable author and is not the work of an organization, leave out
the author details, beginning the reference with the title of the Web page.

Author(s) – family name, initials. (Year, month day). Title of document. [Online]. (URL). Place of
publication: Publisher. (Date accessed).

Example
Anand, H. (2013, June 21). Tertius Algorithm. [Online]. (URL
http://www.anandhs.com/tertius.html). (Accessed 25 February 2014).

Note that the Website for this document contains no publication details, so these are not included in the
reference.

11. CD-ROM

Example

Title of product. (Year). [CD-ROM]. Place of publication: Publisher.

World development indicators. (2013). [CD-ROM]. New Delhi, D.C.: The Central Bank.

Citing foreign books and journals

If you are referencing a book in a foreign language, there are two ways to do it.

Either:

1. Give the title exactly as it appears in the book or article

e.g., Anand, H. (2014). Maschine Lerning. Berlin: Ullstein

Or:

1. 2. Provide the English translation of the title, together with details of the language of the book or article
was originally written in.

e.g., Anand, H. (2014). Machine Learning (in German). Berlin: Ullstein

It does not matter which of these methods you choose – the important thing is to be consistent and use the
same one throughout your research.

Citing a translation

When referencing a foreign language item which has been translated, use the following format:

Vinod, C. (2010). Informatics Bioinformatics. Translated from the German by A. Salim: Vintage.

Quoting a foreign book or journal

When quoting from a foreign language work in the main body of the text, the quote should always be provided
in English. The item should then be referenced in the bibliography using the format above.

11.7.2 Numeric (Vancouver) Style

There are three main differences between the Numeric (sometimes called Vancouver) style and the Harvard
style:

1. The way material is cited in the text


2. The position of the publication date in a reference
3. The way the references list is ordered
Citations

Material cited in the Numeric style is identified by a number, beginning with 1 for the first citation and
continuing in sequence. One of three forms of noting the number may be used:

Vinod1 considers how to run a …

Vinod [1] considers how to run a …

Vinod (1) considers how to run a …

Where a work has more than three authors or editors, cite the name of the first named author or editor only,
followed by, et al.:

Anand, H., et al. [2] suggest in a study of flora …

Where quoting directly from, or referring to particular pages, in a work, the relevant page number(s) can be
stated “after the citation number”, in the following way:

“How well you select your professional and business advisers will have a direct bearing on your business
success.” [1, p.118]

Where you refer to the same work on more than one occasion, two options are recommended.

Either:

Reuse the same number as the first citation to the document.

Or:

Continue the numeric sequence, proving an abbreviated reference to the document in the references list for
the second and any subsequent citations (see below “abbreviated reference”).

11.7.3 References Organization

General points

The general points made in relation to the Harvard scheme about items with more than three authors or editors,
second or subsequent editions, and putting titles in italics, apply to references in the Numeric scheme as well.
The key difference between references in the two schemes is the treatment of the date of publication. In the
Numeric scheme, this usually follows the place of publication. If no reliable information is provided about the
date, use the symbol “?” to show the fact, such as (2000?) or (1986?). Instructions on composing Numeric
references for specific formats of item are given below, using the examples referenced under the Harvard
scheme.

References lists and bibliographies

A references list should be provided at the end of the text, listed in numerical order, 1 onwards, to match the
numerical citations in the text. Any source material you wish or are required to refer to, but which is not cited in
the text, should be contained in a separate bibliography after the references list. For a bibliography – as opposed
to references list – the references should follow the alphabetical order of the author’s family names.

1. Book
When referring to a particular part of a book, the relevant page or chapter number(s) can be given at the
end of the reference, as an alternative to recording it in the citation. This also applies to references to
reports and theses.

Author(s) – family name, initials. Title of book. Edition. Place of publication: Publisher, Year, Page or
chapter number(s).

Examples

[1] Vinod, C., Artificial Intelligence. PHI: New Delhi, 2014.

[2] Vinod, C., Object Oriented Programming. 2th ed. Oxford: New Delhi, page: 101–165.

[3] Vinod C., (ref. 1, p.25, ref.2, chapter.4)

Abbreviated reference – reference [3] is an example of an abbreviated reference. This method should only
be used where one source is quoted twice and you have cited the source in the text using different
numbers.

2. Work in edited book

Where a work from an edited collection is cited, references to both the individual work and to the
collection as a whole should be given.

Author(s) – family name, initials. Title of chapter. In: Editor(s) – family name, initials, ed(s). Title of book.
Edition. Place of publication: Publisher, Year, Chapter or page number(s).

Examples

[4] Vinod, C. Artificial Intelligence and Machine Learning. In: Redclift, N. and Sinclair, M.T., eds.
Association Rule Mining. New Delhi, India: PHI, 2014, Ch.7.

[5] Anand, H. The Machine Learning Algorithms. In: Vinod, C., ed.1 Artificial Intelligence and Machine
Learning. New Delhi, India: PHI, 2014, pp. 244–262.

3. Edited book

Editor(s) – family name, initials, ed(s). Title of book. Edition. Place of publication: Publisher, Year.

Examples

[6] Vinod, C. and Anand, H., eds.1 Object Oriented Programming in C++. New Delhi, India, 2010.

[7] Vinod, C., ed.1 Informatics. New Delhi: PHI, 2000.

4. Conference paper

Where a paper from conference proceedings is cited, references to both the individual paper and the
proceedings as a whole should be given.

Author(s) – family name, initials. Title of paper. In: Editor(s) – family name, initials, ed(s). Title of
conference, location, date held. Place of publication: Publisher, Year, Page number(s).

Example
[8] Anand, H., Rejimoan and John, P. Performance of Association Algorithms. In: Chandra, V. and
Hareendran A., eds.1 Vertical and Horizontal Rule Mining: proceedings of the second International
Conference(AICCS 14), India, May 29–31 2014. Elsevier Book Series, 2014, pp. 87–98.

5. Conference proceedings

Editor(s) – family name, initials, ed(s). Title of conference, location, date held. Place of publication:
Publisher, Year.

Example

[9] Vinod, C. and Salim, A., eds. Association Rule Based Frequent Pattern Mining in Biological
Sequences: proceedings of the second International Conference (ICI 2013), India, Dec 26–28 2013. IEEE-
ICI, 2013.

6. Report

It is important to be able to identify the body on whose behalf research was carried out. For this reason, if
a research report is part of a series, the title for the series and the volume/number of the report should be
given at the end of the reference. Author(s) – family name, initials. (Year). Title of report. Edition. Place
of publication: Publisher, Year, Page number(s). (Series and vol./no.).

Example

[10] Vinod, C., et al. (2014) Document Generation Using LaTeX. ed.1., Corporate Document Services,
Kerala. (Department of Cyber Research, report no. 136).

7. Academic thesis

Author – family name, initials. Title of thesis. Type of thesis. Institution, Year.

Example

[11] Nadeera Bevi. An investigation Cross Compilation. Ph.D. thesis. University of Kerala, 2014.

8. Journal article

It is used for print journal and electronic reproductions of print.

Author(s) – family name, initials. Title of article. Journal title, volume(issue number), Year, Page
number(s).

Example

[12] Vinod, C. and John, P. Rules for referencing, copyright and fair use of Portal: Libraries and the
Academy, 3(2), 2003, pp. 191–206.

[13] Vinod C., et al. Micro RNA – Study and Review. Computational Biology, 23(2), 2010, pp. 169–176.

As yet, no precise standards have been developed for referencing electronic documents. However, the
Numeric style can be adapted to accommodate these materials, noting the electronic format in square
brackets.

9. Online journal article

Web-based journals only; for online versions of print journals, give a reference to the print format.
Author(s) – family name, initials. Title of article. [Online]. (URL) Title of online journal, volume(issue),
Year. (Date accessed).

Example

[14] Anand, H., Vinod, C. and John, P. Open Source Software for Association Mining: protecting
metadata. [Online]. (URL http://www.mirworks.in./issue36/software/). Saritha, (36), 2013. (Last
Accessed 22 February 2014).

10. Website (excluding online journals)

Include in the reference as much of the following detail that is available from the Web page and related
home page. Where a Website has no identifiable author, and is not the work of an organization, leave out
the author details, beginning the reference with the title of the Web page.

Author(s) – family name, initials. Title of document. [Online]. (URL). Place of publication: Publisher,
Year, month day. (Date accessed).

Examples

[15] Vinod, C. HMM – A Detailed Review. [Online]. (URL http://www.mirworks.in/vinod.html). 2009,


June 21. (Last Accessed 12 February 2014).

Note that the Website for this document contains no publication details, so these are not included in the
reference.

11. CD-ROM

Example

Title of product. [CD-ROM]. City of publication: Publisher, Year.

[16] World development indicators. [CD-ROM]. India: The Central Bank, 2003.

11.8 INTELLECTUAL PROPERTY RIGHTS

The term property is often found associated with physical objects only, such as household goods or land, for
which ownership and associated rights are guaranteed and protected by law prevalent in a country. The most
important feature of property is that proprietor or owner may use his property as he wishes and that nobody else
can lawfully use his property without his authorization. This property is described as tangible (i.e., perceptible
by touch). Intellectual property (IP), on the other hand, is intangible and includes such properties as “patents”,
“trade secrets”, “copyrights”, “trademarks” and so on. Thus, the intellectual property includes all rights resulting
from intellectual activity in the industrial, scientific, literary and artistic fields. The object of intellectual
property is the creation of human mind or human intellect and hence the term “Intellectual Property”. It is
simply the property created by the application of human mind. It is non-physical (incorporeal) and it derives its
value from idea(s). There is no uniform definition of IP. The right to protect this property prohibits others from
making, copying, using or selling the proprietary subject matter.

Common types of intellectual property rights are copyrights, trademarks, patents, industrial design rights and
trade secrets in some jurisdictions. Two factors significantly influence the value of an object as property. The
first is scarcity, which refers to its availability in relation to the need. The scarcer is a thing in relation to the
demand for it, the higher is its value. The second important factor influencing the value of an object is the
knowledge of its use or uses. The higher the value of an object, the more zealously it is guarded as a property.
IP law typically grants the writer of intellectual creation select rights for exploiting and benefiting from their
formation. These rights, also called monopoly right of development, are limited in scope, duration and
geographical extent.

General Agreement on Tariffs and Trade (GATT) came into force in 1948 as a global contract between nations.
It was meant to be a temporary arrangement to settle amicably, among countries, disputes regarding the sharing
of world trade. Later, a treaty was adopted by the diplomatic conference under the banner World Intellectual
Property Organization (WIPO) at Geneva on December 20, 1996. WIPO approved the following items under the
intellectual property:

Copyright and Related Rights;


Trademarks, Trade Names and Service Marks;
Trade Secrets;
Geographical Indications;
Industrial Designs;
Patents;
Layout Designs of Integrated Circuits;
Undisclosed Information.

Intellectual property is mainly divided into two categories: Industrial Property, which includes patents for
inventions, trademarks, industrial designs and geographical indications. The Copyrights covers literary works
(such as novels, poems and plays), films, music, artistic works (e.g., drawings, paintings, photographs and
sculptures) and architectural design. Rights related to copyright include those of performing artists in their
performances, producers of phonograms in their recordings, and broadcasters in their radio and television
programs. Also, protection is granted to related or adjacent rights like the rights of performers (e.g., actors,
singers and musicians), producers of phonograms (sound recordings) and broadcasting organizations (All India
Radio).

Industrial property can be divided into two main areas. The first one is characterized as protection of distinctive
signs, in particular trademarks. This indicates the goods or services of one undertaking from those of other
undertakings. The geographical indication that identifies a good originates from a place where a given feature of
the good is essentially attributable to its geographical origin. Next, types of industrial property are protected
primarily to stimulate innovation, design and the creation of technology. This includes categories such as
protected by patents, industrial designs and trade secrets.

11.8.1 Copyright©

The best example of copyright is the authored and edited books, or audio and video-cassettes or web content,
etc., which cannot be reproduced without the permission of the person (author, editor or publisher), who holds
the copyright. While patents and trade secrets get the protection for the basic idea, expressed or not expressed in
writing, the copyright is possible only on the expressed material (printed, painted, tape recorded, video recorded
or expressed in any other form). For detailed information for the copyright-related system in India, visit the
website of Copyright Office website at http://copyright.gov.in/.

11.8.2 Patents

A patent is a statutory right granted for a limited period to an inventor in respect of an invention to exclude any
other person from manufacturing, using or selling the patented product or from using the patented process,
without due permission. The word patent is derived from the Latin word “Patene”, which means “to open”. It
means the grant of exclusive privilege of making or selling of new inventions. As per the TRIPS Agreement of
the WTO, inventions in all fields of technology are patentable if they meet the criteria of novelty, involve an
inventive step and are capable of industrial application. Patents are one of the oldest forms of intellectual
property protection. The patent system started in the 1700s. The patent system aims at encouraging economic
and technological development by rewarding intellectual creativity. In India, the legislation governing the
patents is governed by The Patent Act, 1970 and The Patents (Second Amendment) Act, 2002.

Patents are of two kinds – product patents and process patents. If the outcome of a new process is a new article
or a better article or a cheap article than that produced by an old method, that process is patentable and is called
the process patent. In other words, a new and alternative method of arriving at the same result is patentable. A
product patent means the grant of a monopoly right to produce that product, which necessarily means preventing
any other person from producing the same product, even by adopting a different or new process, for the period
of patent.

Some of the intellectual property rights are referred to as patents. Industrial design rights are called design
patents; they protect the visual design of objects that are not purely utilitarian. Another type of patent is letters
patent. This is a type of legal instrumental in the form of an open letter issued by a government or authority,
granting an office, right, monopoly, title or status to a person or to some entity such as a corporation. Letters
patents can be used for the creation of cooperation or government offices or for the granting of city status or a
coat of arms. General Principles governing the Patent System in India and further details can be viewed at
DIP&P website at http://ipindia.nic.in/ipr/patent/patents.htm.

Patent laws

Patent laws refer to the grant of exclusive privilege of making or selling of new inventions. The concept of
patent system emerged in Britain as one minor form of state patronage for promoting discoveries and inventions.
The Industrial revolution necessitated significant changes in the law relating to patents. The history of patent
regime in British India is a history of legislative enactments. The main aim of these enactments was to enable
the English Patent holders to acquire control over Indian markets.

Patent Act, 1970

After independence the need for a comprehensive law that ensures the interests of the citizens was felt. So in
1948 the Government of India appointed the patents Enquiry Committee to review the working of the patent
laws in India. Keeping in view the above objective, the Patent Act, 1970 was enacted, to amend and consolidate
the existing laws.

The Patents (Second Amendment) Act, 2002

This law aims at opening vistas for an open trade with the rest of the world. However, in India, scientists,
professionals and political activists argue that the conditions laid down under GATT and Trade Related
Intellectual Property Rights (TRIPs) would ultimately be detrimental to the interests of India.

Patenting of biological material

Until recently, whole organisms were not patentable. But, for the first time, an oil-eating bacterium
(Pseudmonas) discovered by a non-resident Indian scientist (Chakrabarthy) was patented in USA in 1980. The
patent issued for oncomouse was another milestone in patenting of life forms. Since then a number of natural
and genetically modified organisms have been patented. However, patenting life forms have been a subject of
heated debate throughout the world. Still there is controversy on the propriety of awarding patents on biological
materials. The medicinal and pesticidal potentials of neem are well known and the applications of these have
been a part of Indian tradition from time immemorial. No patent law of any country permits patenting of the tree
or any of its parts. Similarly, products from neem (such as Azadiractin) are also not patentable. Only chemically
modified new derivatives from constituents of neem are patentable. However, the patent granted to W.R. Grace
has invited strong protest from India, which led, finally, to the cancellation of the patent.

11.8.3 Layout Design


Layout design (topography) of integrated circuits is a relatively new area in IP, which has appeared with
computer technology and has acquired importance as the technology makes rapid advances. The programming
instructions on a computer chip are implemented through a circuitry printed on semiconductor materials. The
design of circuitry on the chip requires great investment of knowledge, skills and capital and it needs to be
protected as IP.

Undisclosed information gets recognition as a kind of IP that needs to be protected under the Trade Related
Intellectual Property Rights (TRIPS) Agreement. Earlier to it, the WIPO treaty (1967) and the Paris Convention
recognized unfair competition as a part of IP. The shift in TRIPS to undisclosed information from unfair
competition has important implication. Unfair competition includes all acts contrary to honest practices in
industrial or commercial matters.

The legal protection afforded to intangible property that is created by human mind and which is protected under
trademarks, patents, copyright, etc. is called intellectual property right. Principles that govern the Industrial
Design System in India and further details can be viewed at website link at
http://ipindia.nic.in/ipr/design/designs.htm.

11.8.4 TrademarkTM

Trademark and service marks are distinctive symbols that help the consumer to distinguish between competing
goods or services and are a major part of the goodwill a company enjoys in the trade. A trade name is the name
of an enterprise, which also individualizes the enterprise in the minds of the customers. More information related
to Trademarks System in India and further details can be viewed at DIP&P website at
http://ipindia.nic.in/tmr_new/default.htm.

11.8.5 Geographical Indications (GI)

GI is a sign used on goods that have a specific geographical origin and possess qualities or a reputation that is
solely due to the place of origin. The agricultural products have qualities that derived from the place of
manufacturing influenced by climate and local factors such as soil and water. Such geographical indication
points to a specific place or region of production that characterizes its origins. This place of origin may be
village, town or country. Further information related to Geographical Indication System in India can be viewed
at website of Geographical Indication Registry, website at http://ipindia.nic.in/girindia/.

11.9 OPEN-ACCESS INITIATIVES

The Open-Access (OA) movement is a social movement in academic world, dedicated to the principle of open
access (information-sharing for the common good). Open access is a term used to refer to published information
that is electronically accessible without any financial constraints. The open-access movement traces its history
back to the 1960s, but became much more prominent in the 1990s with the advent of the digital age. With the
spread of the Internet and the ability to copy and distribute electronic data at no cost, the arguments for open
access gained new importance. Open access has since become the subject of much discussion among
researchers, academics, librarians, university administrators, funding agencies, government officials, commercial
publishers and learned-society publishers. Open access will help the scientists to establish collaborations with
the fellow scientists and to share scientific information at a rapid rate. It enables the increased usage and citation
of the research publications. Rapid dissemination of the outcome of the research and its worldwide recognition
are very significant as far as a scientist is concerned. The concept of open access largely satisfies these
requirements.

Open-access publishing has made it possible for everyone in the world to share knowledge freely and openly.
Public access to the World Wide Web became widespread in the late 1990s and early 2000s. The low-cost
distribution technology has fuelled the OA movement.
There are two main methods for providing open access:

1. In OA self-archiving (also known as the “green” road to OA), authors publish in a subscription journal,
but in addition make their articles freely accessible online, usually by depositing them in either an
institutional repository (such as the Okayama University Digital Information Repository) or a central
repository (such as PubMed Central).
2. In OA publishing (also known as the “gold” road to OA), authors publish in open-access journals that
make their articles freely accessible online immediately upon publication. Examples of OA publishers are
BioMed Central and the Public Library of Science.

The concept of open access has gained wide acceptance among the academia. But there is considerable debate
about the economics of funding peer review in OA publishing, and about the reliability and economic effects of
self-archiving.

11.9.1 Open-access Publishing

An open-access publication is a publication that provides immediately free online access to all users worldwide.
Open-access publishing has made it possible for everyone in the world to share knowledge freely and openly.
The need for this type of access has been justified by the high costs for other scholarly publications.

Open-access journals

Open-access journals are scholarly journals that are available online to the reader “without financial, legal or
technical barriers other than those inseparable from gaining access to the Internet itself”. At present, over 4000
journal publications fit this definition. Some are subsidized, and some require payment on behalf of the author.
Subsidized journals are financed by an academic institution or a government information centre; those requiring
payment are typically financed by money made available to researchers for the purpose from a public or private
funding agency, as part of a research grant.

Advantages of open-access publication:

Can achieve increased dissemination of information


Articles can be cited sooner and more frequently
Institutional costs for scholarly publishing are decreased

11.10 PLAGIARISM

Plagiarism is the use of someone else’s ideas, results, equipment design, visuals, wording or even sentence
structure as if they were your own. It is derived from the Latin word plagiarise (“kidnapper”). To plagiarize
means “to commit literary theft” and to present as new and original an idea or product derived from an existing
source. Plagiarism can be intentional when:

1. You use someone’s ideas or results without citing the source;


2. You copy something word for word without using quotation marks, even though you cite the source;
3. You use all or part of a visual without crediting the source.

Or, it can be accidental when:

4. You don’t realize what is considered plagiarism in a country;


5. You cannot think of a better way to say it and so copy sentences, phrases or even sentence structure from
the original without using quotation marks;
6. You unknowingly reproduce the words of others without denoting quotation marks.
Plagiarism is sometimes a moral and an ethical offence rather than a legal one since some instances of
plagiarism fall outside the scope of copyright infringement, a legal offence. The most obvious form of
plagiarism is to obtain and submit as your own a paper written by someone else. Or, less conspicuous forms of
plagiarism include the failure to give appropriate acknowledgement when repeating or paraphrasing another’s
wording. Appropriate crediting of the original sources of information can, to certain extent, avoid inviting the
charges of plagiarism.

Copyright violation is punishable by law. All original work such as textual matter, diagrams and designs,
multimedia, video clips, etc. are copyrighted. The use of somebody else’s material for your own economic
benefit is copyright violation. Most publications are copyrighted which may or may not display the copyright
symbol(©). However, many documents and publications from governments and international agencies such as
United Nations Organization are not copyrighted.

There are various softwares for finding plagiarism in documents. One of the major free software used for
plagiarism checking is Viper (http://www.scanmyessay.com/). Let us look how it works. First we need to select
the category of the input that we are going to give for testing. Category can vary from English literature to
applied engineering to cooking chapters. After selecting the category, we provide the corresponding text as
input. The software takes five consecutive words and check across the various domains linked with the particular
category. If it finds a match, it returns back.

The result can have a 10–13% of buffer value as many of the words can be seen in common on all streams. The
results returned will have the correct source page and the lines of match. One basic disadvantage is that it may
find match with some text which is entirely different from the current scenario. But still a primary plagiarism
check is very necessary while making some technical writing. We also have some other software in commercial
use based on the number of words/documents needs to scan. They are copyscape (http://www.copyscape.com/),
ithenticate (http://www.ithenticate.com/), turn-it-in (http://turnitin.com/), PlagTracker
(http://www.plagtracker.com/), Ephorus (https://www.ephorus.com/), etc. Some freely available plagiarism
checker softwares are chimpsky (http://chimpsky.uwaterloo.ca/), copytracker
(http://copytracker.software.informer.com/), eTBLAST (http://etest.vbi.vt.edu/etblast3/), Plagium
(http://www.plagium.com/), See Sources (http://www.plagscan.com/seesources/), etc. These softwares have
some limitations in the input document size.

Fig. 11.9 Viper software


Figure 11.9 shows a basic screen shot of the software Viper.

EXERCISES

1. What the various types of reports? Why many types of reports are required?
2. How a business report is prepared?
3. What is the purpose of a business report?
4. Prepare a business report for a chemical industry which mainly focussed to prepare wine glass.
5. What are the differences between Conclusions and Recommendations?
6. What is a technical report? Who are the main stakeholders of the technical report?
7. Write a technical report for an examination system for PG courses in an autonomous college.
8. What is the role of supplementary materials in a research report?
9. What is technical writing? Why it is important in research?
10. What are the goals of technical writing?
11. List out the qualities of good technical writing.
12. What are the various types of graphs? Explain how each one is different in the description of data?
13. Discuss the structure of oral presentation.
14. What is the importance of oral presentation?
15. How audio-video aids helped in oral presentation?
16. Discuss the format of oral presentation.
17. What are the advantages of oral communication?
18. How can we make an effective oral presentation?
19. Discuss the differences between bibliography and references.
20. What are the various styles adopted for the bibliography?
21. What do you mean by citation?
22. What is Vancouver style of reference?
23. How will you organize the references?
24. What do you mean by copyright?
25. What is a trade mark?
26. What are IPRs? How can we file an IPR?
27. How will you define a patent? Discuss the issues in patenting.
28. What are the different types of patenting?
29. How will you patent your work/invention?
30. Can you patent a discovery? Is it possible to patent a biological material?
31. Discuss the patent laws in India.
32. What are geographical indications? Is it comes under IPR?
33. List out some open-access journals in science.
34. What is Plagiarism? How it affects a researcher?
35. How can we resolve Plagiarism in our article?
36. List out some free Plagiarism checking software.
chapter 12
LATEX-DOCUMENT GENERATION TOOL
Objectives:

After completing this chapter, you can understand the following:

The definition of document generation tools


The explanation of getting started with LaTeX
The description of how LaTeX works
The description of document creation
The definition of tables and figures
The descriptions of math mode, algorrithm mode, bibliographic references and preparation of presentation
The definition of templates

Success of a research deeply resides in the manner in which it is presented. The way in which it is presented is as
important as content in getting your message to the audience. The paper or report that we publish needs to be in
a very tidy and orderly manner and should also have a consistent format. The format inconsistency or lag in text
orientation will make the reader unenthusiastic. In this chapter, we discuss about a tool, LaTeX, which helps in
creating a well-formed and structured document.

LaTeX is a typesetting tool, producing professionally typeset pages or in short it is a document preparation
system for producing professional looking documents. It was designed by Donald Knuth in the year 1978.
LaTeX was originally written by Leslie Lamport in 1980. LaTeX is particularly suited to produce long,
structured documents with equations and symbols.

12.1 DOCUMENT GENERATION TOOLS

There are various other document generation tools such as Microsoft word, EMC2 expression, Adobe InDesign
and so on of which word remains the basic document generation tool that is being widely used. To replace such a
widely accepted tool, LaTeX should have some upper hand over it.

Table 12.1 clearly shows the upper hand of LaTeX over Microsoft word. In LaTeX, we just code our document
and compile it like any of our programming language and finally run it to generate our document as output.
Figure 12.1 shows a complexity effort graph between Word and LaTeX.

Table 12.1 Comparisons between LaTeX and Word


Fig. 12.1 Complexity comparison graph

A LaTeX document is a plain text file with a .tex file extension. It can be typed in a simple text editor such as
LaTeX editor or tex maker or tex works. When the document is finished typing, we compile it and convert it into
another format. Several different output formats are available, but the mostly used output format is Portable
Document Format (PDF). It is then printed and transferred between computers.

12.2 GETTING STARTED

We have already mentioned that LaTeX is available for free. So let us begin from, how to get a LaTeX editor and
platform. The basic element to begin our document generation is to have a base element or a base platform. The
commonly used platform is MikTex. This can be obtained for free from the weblink, http://www.miktex.org/.
After installing the platform, we need to have an editor, one of the commonly used editor is texnic center, which
is available for free in the weblink, http://www.teXnicCenter.org/. After installing texnic center and MikTex, you
are ready to generate a document. In an advanced perspective, you can also have a postscript convertor and a
previewer for advanced viewing of the output. Figure 12.2 shows the basic installation steps for the LaTeX.
Fig. 12.2 LaTeX installation steps

A texnic center editor will have the following layout (Fig. 12.3). It basically has four parts: a Menu ribbon,
Navigator, Editor and a Log section. The menu ribbon has the options to open a new file, search for some text,
options to insert mathematical equations. It also has the options to compile and run the document. The second
part is the navigator. It shows where the particular file is stored. It also shows history of the recently opened
documents, the places where images are stored, etc. The editor is the space where we do all typing. It can also be
called the document coding area. All texts, images, tables, etc. are coded in this area. We also have a log section
that shows all sorts of warning and errors while we compile. It displays results with line number and will suggest
why the particular error has occurred. We can relate this to the way in which the errors are displayed while you
compile a java or c++ source code.
Fig. 12.3 Texnic center editor

12.3 HOW LATEX WORKS

After making the required coding, you need to save the document. The extension of the file will be .tex. This
specifies the LaTeX source file. For creating a bibliography, we have the .bib file. Also we can include various
class files, which have the extension .cls. Other types of files will be created automatically when the file is
compiled and run. Figure 12.4 shows the entire mechanism of texing.

Fig. 12.4 Texing mechanism

The traditional way is to run the LaTeX program, which creates a DVI (Device Independent) file. This file is in
binary format and not viewed directly. Then run a previewing program for viewing on screen and/or the dvips
program to create a PostScript file for viewing or for printing via GS View. Alternatively, you can run the
relatively recent pdf LaTeX program to create a PDF file for viewing or printing. Figure 12.5 shows the different
ways in which output is generated in LaTeX.
Fig. 12.5 Output generation

12.4 DOCUMENT CREATION

Document creation starts with the \documentclass command. The general format of \documentclass is given as
follows:

\documentclass[options]{class}

The text in the curly brackets specifies the document class. Document class can be either an article which is
suitable for shorter documents such as journal articles and short reports, or a report for longer documents with
chapters, conference proceedings and slides. We can also have a document class book, while preparing a
textbook. The text in the square brackets specifies options such as font size, font name, paper size and so on.
Figure 12.6 shows the fragments of basic document in LaTeX. A sample document class declaration is shown as
follows.

\documentclass[a4paper, 12pt]{article}

This says that, we are going to create an article in an a4 size paper with text font as 12 points. If we want to
mention any font to be used in particular, we can provide that too in the square brackets. So to conclude we can
say that, \documentclass command appears at the start of every LaTeX document depending on what type of
document that we are going to generate.

Entire text of our document lies between \begin{document} and \end{document} commands. Anything typed
before \begin{document} is known as the preamble and will affect the whole document. So, \documentclass[]{}
is the preamble. Anything typed after \ end{document} is ignored.

Consider the following code:

\documentclass[a4paper,14pt]{report}

\begin{document}

This is the first line

\end{document}

This will generate a report in A4 paper with font size of 14 with the text “This is the first line.”
Fig. 12.6 Writing a document in LaTeX

12.4.1 Page Setup, Page Numbering and Headings

LaTeX automatically sets reasonable values for the page dimensions, orientation, etc. However, in some cases
customization may be required. There are two ways to do this: the easy way, using several packages which do all
the work for you, and the hard way, which involves doing all the work yourself. The easy way involves using
certain packages to do the heavy lifting. For example, to set the margins using the geometry package, use

\usepackage[margin = 2.5cm]{geometry}

The default paper setting of LaTeX is shown in Fig. 12.7.

Fig. 12.7 Default paper setting of LaTeX

We can also set the values manually, but it is a hard task. We may need to set topmargin, bottom margin,
paragraph skip, etc. It can be done like the following:

\setlength{\topmargin}{0in}

\setlength{\headheight}{0in}

\setlength{\headsep}{0in}
\setlength{\textheight}{7.7in}

\setlength{\textwidth}{6.5in}

\setlength{\oddsidemargin}{0in}

\setlength{\evensidemargin}{0in}

\setlength{\parindent}{0.25in}

\setlength{\parskip}{0.25in}

The command \pagestyle controls page numbering and headings. It should always go between the
\documentclass{article} and the \begin{document} commands. It can take the following forms:

1. \pagestyle{plain} is the default, which puts the page number at the centre of the bottom of the page and
provides no headings.
2. \pagestyle{empty} provides neither page numbers nor headings.
3. \pagestyle{headings} will provide page numbers and headings from any \section’s that you are using.
4. \pagestyle{myheadings} will provide page numbers and custom headings.

These commands can also be applied to a single page using \thispagestyle instead of \pagestyle.

12.4.2 Creating a Title Page

The title, author and date of your document are information that various LaTeX commands can make use of, if
you provide it. It is a good habit to get into to provide this information in the preamble of your document. The
commands are as follows:

1. \verb#\title{yourtitlehere}#
2. \verb#\author{yournamehere}#
3. \verb#\date{currentdate}#

To create the title, place a \maketitle command immediately after the \begin{document} command.

12.4.3 Sections

LaTeX is a language for creating structured documents. One of the most important ways of creating structure in
a document is to split it into logical sections. If your document deals with more than one concept or theme, then
each concept should go into its own section. There are two related commands for creating sections:
\section{sectiontitle} and \section*{sectiontitle}. The first one numbers the sections, while the starred form does
not. Both create separate sections with titles in a larger font size. Its general format is as follows:

\section{In This First Section}

This is the first section.

\subsection{We Have This First Subsection}

This is the first subsection.

\subsubsection{And This Subsubsection}

A subsubsection.
Example:

\section{The Main section}

this is the main section of this chapter

\subsection{Subsection}

the first subsection begins here

\subsubsection{Subsubsection}

this is a section linked to the subsection

Coding similar to this will produce an output as in Fig. 12.8.

Fig. 12.8 Section output

To create a new paragraph, we can use the keyword \par {text}.

12.4.4 Font Size and Formatting

There are LaTeX commands for a variety of font effects:

\textit{words in italics} – words in italics

\textsl{words slanted} – words slanted

\textsc{words in smallcaps} – words in smallcaps

\textbf{words in bold} – words in bold

\texttt{words in teletype} – words in teletype

\textsf{sans serif words} – sans serif words

\textrm{roman words} – roman words

\underline{underlined words} – underlined words

Different font size can be represented as shown in Fig. 12.9.


Fig. 12.9 Font size

We have the \flushrigh and \flushleft commands to align the text to right and left. It can be used as follows.

\begin{flushleft}

textbf{Left allignment}

\end{flushleft}

\begin{flushright}

\textbf{Right alignment}

\end{flushright}

In between our text, there will be situations to give quotes. We can give the quotes using the keyword,

\begin{quotation}

Text here

\end{quotation}

An example of quotation may look like the following:

\begin{quotation}

\textit{A central task of developmental biology is to discover}

\end{quotation}

The corresponding output is shown in Fig. 12.10.

Fig. 12.10 Quotation output

12.4.5 List – Enumerate

LaTeX distinguishes between three different enumeration/itemization environments. Each of them provide four
levels, which means you can have nested lists of up to four levels. We use the following command for
enumeration.

\begin{enumerate}

\item This is the first item

\end{enumerate}

The enumerate-environment is used to create numbered lists. If you like to change the appearance of the
enumerator, the simplest way to change is to use the enumerate-package, giving you the possibility to optionally
choose an enumerator.

\usepackage{enumerate}

\begin{enumerate}[I]

\item First line

\end{enumerate}

This provides the Roman numbering scheme, I, II, III, ….

\begin{enumerate}[(a)]

\item First line

\end{enumerate}

This provides the alphabetic numbering a, b, c, ….

An example of enumerate is shown below:

\begin {enumerate}

\itemIdentification of pre-miRNAs from a human genome

\itemIdentification of mature microRNAs from pre-miRNAs

\itemIdentification microRNA targets from a miRNA:mRNA pair.

\end {enumerate}

This generates the output as shown in Fig. 12.11.

Fig. 12.11 Output

12.4.6 List – Itemize

\begin{itemize}
\item

\end{itemize}

Itemization is probably the mostly used list in LaTeX. It also provides four levels. The bullets can be changed
for each level. It is same as the way we explained for enumeration.

An example of combining both enumeration and itemize is given below.

\begin{itemize}

\item First level, itemize, first item

\begin{itemize}

\item Second level, itemize, first item

\item Second level, itemize, second item

\begin{enumerate}

\item Third level, enumerate, first item

\item Third level, enumerate, second item

\end{enumerate}

\end{itemize}

\item First level, itemize, second item

\end{itemize}

This generates the output as shown in Fig. 12.12.

Fig. 12.12 Output

12.4.7 Comments

Comments are created using %. When LaTeX encounters a % character while processing a .tex file, it ignores
the rest of the line. This can be used to write notes in the input file, which will not show up in the printed
version.

The following code:

It is a truth universally acknowledged % Note comic irony in the very first sentence, that a single man in
possession of a good fortune, must be in want of a wife.
Produces:

It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a
wife

Multiple consecutive spaces in LaTeX are treated as a single space. Several empty lines are treated as one empty
line. The main function of an empty line in LaTeX is to start a new paragraph. In general, LaTeX ignores blank
lines and other empty space in the .tex file. Two backslashes (\\) can be used to start a new line. We can use the
command \newpage to print the text in a fresh page. Also \linebreak is a command used to break printing in the
current line.

12.4.8 Special Characters

The following symbols are reserved characters, which have a special meaning in LaTeX:

$%^&_{}~\

All of these apart from the backslash \ can be inserted as characters in your document by adding a prefix
backslash:

\$ \% \^{} \& \_ \{ \} \~{}

Note that you need to type a pair of curly brackets {} after the hat \^ and tilde \~, otherwise these will appear as
accents over the following character.

For example, “\^ e” produces “e”

The above code will produce:

$%^&_{}~

The backslash character \ cannot be entered by adding a prefix backslash, \\, as this is used for line breaking. We
can use the \textbackslash command instead.

12.5 TABLES

The tabular command is used to typeset tables. By default, LaTeX tables are drawn without horizontal and
vertical lines | you need to specify if you want lines drawn. LaTeX determines the width of the columns
automatically.

The below code starts a table

\begin{tabular}{…},

where the dots between the curly brackets are replaced by code defining the columns:

l for a column of left-aligned text (letter el, not number one).


r for a column of right-aligned text.
c for a column of centre-aligned text.
| for a vertical line.

For example, lll (i.e., left left left) will produce 3 columns of left-aligned text with no vertical lines, while –l–l–
r– (i.e., |left|left|right|) will produce 3 columns where the first 2 are left-aligned, the third is right-aligned, and
there are vertical lines around each column. The table data follows the \begin command. & is placed between
columns and \\ is placed at the end of a row (to start a new one). We use \hline to insert a horizontal line and
\cline{1-2} to insert a partial horizontal line between column 1 and column 2. The command \end{tabular}
finishes the table.

Example code,

\begin{tabular}{|r|c|l|}

\hline

Right & Center & Left\\

\hline

alpha&beta&gamma\\

delta&epsilon&zeta\\

eta&theta&iota\\

\hline

\end{tabular}

This generates the output as shown in Fig. 12.13.

Fig. 12.13 Table output

Examples of various tabular codes and its resulting outputs:

\begin{tabular}{|l|l|}

Apples & Green \\

Strawberries & Red \\

Oranges & Orange \\

\end{tabular}

This generates the output as shown in Fig. 12.14.

Fig. 12.14 Table output

\begin{tabular}{rc}
Apples & Green \\

\hline

Strawberries & Red \\

\cline{1-1}

Oranges & Orange \\

\end{tabular}

This generates the output as shown in Fig. 12.15.

Fig. 12.15 Table output

\begin{tabular}{|r|l|}

\hline

8 & here’s \\

\cline{2-2}

86 & stuff \\

\hline \hline

2008 & now \\

\hline

\end{tabular}

This generates the output similar to Fig. 12.16.

Fig. 12.16 Output

12.6 FIGURES

LaTeX also allows embedding of images to the text. To insert an image, we require the graphicx package.
Images should be PDF, PNG, JPEG or GIF files. The following code will insert an image called myimage:

\begin{figure}
\centering

\includegraphics[width = 1\textwidth]{myimage}

\caption{Here is my image}

\label{image-myimage}

\end{figure}

The caption and labelling are same for tables too. \centering centres the image on the page, if not used images
are left-aligned by default. It is a good idea to use this as the figure captions are centred. Includegraphics… is
the command that actually puts the image in your document. The image file should be saved in the same folder
as the .tex file. [width = 1\textwidth] is an optional command that specifies the width of the picture – in this case,
the same width as the text. The width could also be given in centimetres (cm). You could also use [scale = 0.5]
which scales the image by the desired factor – in this case reducing by half. You can also specify the scale with
respect to the width of a line in the local environment(\linewidth), the width of the text on a page (\textwidth) or
the height of the text on a page (\textheight) using the following codes.

\includegraphics[width = \linewidth]{ myimage }

\includegraphics[width = \textwidth]{ myimage }

\includegraphics[height = \textheight]{ myimage }

To rotate the image, we can use the keyword angle and can specify the angle to which it needs to be rotated.

\includegraphics[scale = 0.5, angle = 180]{myimage}

To crop the image, we have the keyword trim. Its option parameter is in the order: left, bottom, right and top.
The sample code will look like,

\includegraphics[trim = 10mm 80mm 20mm 5mm, width = 3cm]{myimage}

It is also possible to have LaTeX create a border around your image by using \fbox. \fbox can be used in similar
way to have border around text also.

\setlength\fboxsep{0pt}

\setlength\fboxrule{0.5pt}

\fbox{\includegraphics{myimage}}

For both figure and table, we can have some auxiliary options. The details of the option are given in Table 12.2.

Table 12.2 Options used in figures and tables


This is how the options are used,

\begin{table}[ht] %table is placed on top of the current page

\caption{Nonlinear model results} % title of Table

\centering % used for centreing table

\begin{tabular}{c c c c} %{ centred columns 4) columns)

\hline\hline %inserts double horizontal lines

Case & Method\#1 & Method\#2 & Method\#3 \\ [0.5ex] % inserts table

%heading

\hline % inserts single horizontal line

1 & 50 & 837 & 970 \\ % inserting body of the table

2 & 47 & 877 & 230 \\

3 & 31 & 25 & 415 \\

4 & 35 & 144 & 2356 \\

5 & 45 & 300 & 556 \\ [1ex] % [1ex] adds vertical space

\hline %inserts single line

\end{tabular}

\label{table:nonlin} % is used to refer this table in the text

\end{table}

The corresponding output produced is shown in Fig. 12.17.


Fig. 12.17 Table output

While creating a report, we need to have an index of tables and figures. In normal word processing software, this
content generation is a tedious task. But LaTeX does all in a fly with a single command \listoffigures and
\listoftables. This command automatically builds the index page of tables and figures. To generate the entire
contents, we have a similar command called \tableofcontents.

12.7 MATH MODE

The LaTeX editor has a lot of inbuilt mathematics functions. The basic layout of the math’s environment is
shown in Fig. 12.18.

Fig. 12.18 Mathematics environment in LaTeX editor

One of the main reasons for writing documents in LaTeX is, it is really good at typesetting equations. Equations
are written in “math mode”. You can enter math mode with an opening and closing dollar sign $. This can be
used to write mathematical symbols within a sentence, for example, typing $1 + 2 = 3$ produces 1 + 2 = 3.

If you want a displayed equation on its own line use then $$…$$. For example, $$1+2 = 3$$ produces 1 + 2 = 3.

For a numbered displayed equation, use \begin{equation}…\end{equation}.

For example,

\begin{equation}1+2 = 3\end{equation}

produces:

Although some basic mathematical symbols (+ – = /( )) can be accessed directly from the keyboard, most of
them must be inserted using a command.

Powers are inserted using the hat ^ symbol. For example, $n^2$ produces n2.

Indices are inserted using an underscore _. For example, $2_a$ produces 2a. If the power or index includes more
than one character, group them using curly brackets {…}, e.g., $b_{a-2}$ produces ba–2.

Fractions are inserted using \frac{numerator}{denominator}.


$$\frac{a}{3}$$

produces:

We can also produce nested fractions like,

$$\frac{y}{\frac{3}{x}+b}$$

produces the output as,

Square root symbols are inserted using \sqrt{…} where … is replaced by the square root content.

%$$\sqrt{y^2}$$

produces:

The command \sum inserts a sum symbol; \int inserts an integral. For both functions, the upper limit is specified
by a hat ^ and the lower by an underscore _.

$$\sum_{x = 1}^5 y$$

produces:

$$\int_a^b f(x)$$

produces:

Greek letters can be typed in math mode using the name of the letter preceded by a backslash.

For example,

$\alpha$ = α $\beta$ = β

Some examples of mathematical mode are shown in Figs. 12.19 and 12.20.
Fig. 12.19 Maths mode 1

Fig. 12.20 Maths mode 2

If you want to have a series of equations or inequalities aligned together, you can surround the equations by
\begin{eqnarray*} and \end{eqnarray*}

\begin{eqnarray*}

1+2+\ldots+n & = & \frac{1}{2}((1+2+\ldots+n)+(n+\ldots+2+1))\\

& = & \frac{1}{2}\underbrace{(n+1)+(n+1)+\ldots+(n+1)}_{\mbox{$n$ copies}}\\

& = & \frac{n(n+1)}{2}\\

\end{eqnarray*}

Produces similarly (Fig. 12.21),

Fig. 12.21 Output


\[f(x) = \left\{\begin{array}{rll}

-1 & \mbox{if} & x < 0; \\

0 & \mbox{if} & x = 0; \\

1 & \mbox{if} & x > 0; \\

\end{array}\right.

\]

Produces (Fig. 12.22)

Fig. 12.22 Output

The continuation dots … are known as an ellipsis. They occur frequently enough in mathematics for LaTeX to
have four commands to typeset them with the right spacing. They are as follows.

\dots for 3 continuous dots.


\cdots for centre height dots.
\ddots for diagonal dots, which occur in matrices.
\ldots for lower height dots.
\vdots for vertical dots.

We have provided almost all special characters used and Greek letters used in LaTeX at the end of this chapter.

12.8 ALGORITHM MODE

Algorithm mode is also very important and is a major concern for computer science students. In many research
papers and technical reports, you may have to include algorithms. Now let us see how the algorithm
environment works. To have the algorithm environment, we should include the following packages:

\usepackage{algorithm}

\usepackage {algorithmic}

The algorithms like all other constructs start with a \begin and ends with a \end. Consider the following set of
code:

\begin{algorithm}

\caption{Calculate $y = x^n$}

\label{alg1}

\begin{algorithmic}

\REQUIRE $n \geq 0 \vee x \neq 0$

\ENSURE $y = x^n$
\STATE $y \Leftarrow 1$

\IF{$n < 0$}

\STATE $X \Leftarrow 1 / x$

\STATE $N \Leftarrow -n$

\ELSE

\STATE $X \Leftarrow x$

\STATE $N \Leftarrow n$

\ENDIF

\WHILE{$N \neq 0$}

\IF{$N$ is even}

\STATE $X \Leftarrow X \times X$

\STATE $N \Leftarrow N / 2$

\ELSE[$N$ is odd]

\STATE $y \Leftarrow y \times X$

\STATE $N \Leftarrow N - 1$

\ENDIF

\ENDWHILE\end{algorithmic}

\end{algorithm}

This will produce the output as shown in Fig. 12.23.


Fig. 12.23 Algorithm output 1

Similarly for the students interested in inserting procedures in the text, we can work in the similar manner. An
example of a slowsort procedure is mentioned below.

\usepackage{algorithmic}

{\bf procedure} $SlowSort(A,i,j)$

\begin{algorithmic}[1]

\IF{$i\geq j$}

\STATE Return

\ENDIF

\STATE $m\gets \lfloor (i+j)/2 \rfloor$

\STATE $SlowSort(A,i,m)$

\STATE $SlowSort(A,m+1,j)$

\IF{$A[m]>A[j]$}

\STATE exchange $A[j],A[m]$

\ENDIF

\STATE $SlowSort(A,i,j-1)$

\end{algorithmic}

This generates the output as shown in Fig. 12.24.

Fig. 12.24 Algorithm output 2

12.9 BIBLIOGRAPHIC REFERENCES

The easiest way to cite references in your document is with the author’s name and year of publication in
parenthesis (Lamport, 1994). This is actually the preferred method in many technical publications. You can
make a numbered list of references with the enumerate commands. If you choose this method, you should use
the Harvard system for formatting your list of references:

1. Last1, First1 Middle1, and First2 Middle2 Last2 (year). Book Title. ed. Publisher, City.
2. Last1, First1 Middle1, and First2 Middle2 Last2 (year). Article Title. Journal Name, vol. X, no. Y,
pagepage.

It is convenient to put all your references together in a separate file. Here eight references are placed in the file
called tref.tex. The bibliography file begins with the line:

\begin{thebibliography}{77}

where the 77 has the same width as the longest number of your reference. Each item in the bibliography begins
with a \bibitem{ label }where the label is used to cite to the reference in your text. The text following the
\bibitem line is the text of your reference. A suggested reference style is shown in this section. The line:
\end{thebibliography} ends the bibliography. To actually include the references in your document, put the
command: \input{tref} where you want your references to appear (usually at the end of the report). References
are automatically numbered.

We can also create a separate bibliography using a BibTeX file to store all our references. It is actually more
easy when compared to the manual bibliography creation (Fig. 12.25).

Fig. 12.25 BibTeX file inclusion

12.9.1 BibTeX File

BibTeX file contains all the references you want to cite in your document. It has the file extension .bib. It should
be given the same name as and kept in the same folder as yours .tex file. The .bib file is plain text – it can be
edited using Notepad or your LaTeX editor (e.g., Texworks, Texmaker, Texnic center). You should enter each of
your references in the BibTeX file in the following format:

@article{

AISSCE,

Author = {Vinodchandra, S. S. and Anand, H. S.},

Title = {Vertical and Horizontal Rule Mining Algorithm},

Journal = {Elsevier Book Series},

Volume = {1},

Pages = {123-128},

Year = {2014} }

The reference type declaration is followed by a curly bracket, then the citation key. Each reference’s citation key
must be unique. Each reference starts with the reference type (@). Likewise we can add any number of
references. Then we can add the below two lines of code to the place where we want our bibliography to appear.

\bibliographystyle{plain}

\bibliography{Doc1}

where Doc1 is the name of the bib file that we created.

Type \cite{citationkey} at the place where you want to cite your reference in your .tex document. If you do not
want an in-text citation, but still want the reference to appear in the bibliography, use \nocite{citationkey}. To
include a page number in your in-text citation, put it in square brackets before the citation key: \cite[p. 215]
{citationkey}. To cite multiple references, include all the citation keys within the curly brackets separated by
commas: \cite{citation01, citation02, citation03}. In our example, the citation key is AISSCE.

In the above example, we have used the code \bibliographystyle{plain}. This plain is the style in which numeric
in text citation comes. We have many other styles also.

Plain:

The citation is a number in square brackets (e.g., [1]). The bibliography is ordered alphabetically by first
author surname. All of the authors’ names are written in full.

Abbrv:

The same as plain except the authors’ first names are abbreviated to an initial.

Unsrt:

The same as plain except the references in the bibliography appear in the order that the citations appear in
the document.

Alpha:

The same as plain except the citation is an alphanumeric abbreviation based on the author(s) surname(s)
and year of publication, surrounded by square brackets (e.g., [chandra14]).

If you still want to use a different style like the one provided by the journal, you are submitting an article to, then
you should save the style file (.bst file) in the same folder as yours .tex and .bib files. Include the name of the
.bst file in the \bibliographystyle{…} command.

12.10 PREPARATION OF PRESENTATION

LaTeX is not just used for making reports and articles. It can be used for creating slides for presentation. We
have a \documentclass by name slides. Using this we can create a presentation. The usage of the particular
documentclass is as shown below.

\documentclass{slides}

Each slide starts with a \begin{slide} and ends with \end{slide}. Anything written in between comes as first
slide. To have the next slide again start with \begin{slide}… \end{slide}. All these keywords will be declared
inside the \begin{document} and \end{document}.

Beamer is another LaTeX document class for creating slides for presentations. It supports both pdfLaTeX and
LaTeX + dvips. The name is taken from the German word Beamer, a pseudo-anglicism for video projector.
Source code for beamer presentations, is the same as of LaTeX files. We can have various types of templates
which are inbuild in beamer. It also serves the same purpose as that of the document class slides.

For various purposes, there are various templates available. For example, if you are planning for a very formal
presentation, one of the beamer templates called Amsterdam is available. These template themes can be readily
downloaded from the internet. A wide variety of themes, both formal and casual themes are available, which
make beamer a better choice for people to prepare slides.

A sample code,

\begin{document}

\begin{slide}

\begin{center}

{\ Large Title of Presentation }\\

Name of Presenter\\

Affiliation\\

Date\\

\end{center}

\begin{center}

Outline of presentation

\end{center}

\begin{center}

1.\ Section One \\

2.\ Section Two\\

3.\ Section Three\\

\end{center}

\end{slide}

\end{document}

This will produce an output as shown in Fig. 12.26.


Fig. 12.26 Slides in LaTeX

12.11 TEMPLATES

We can either start a document creation from scratch or from an already specified template. In LaTeX, we have
already mentioned that there are various document classes like article, report, book, etc. Every class will have a
completely different page setup and layout. Even you can have templates specific for bibliographic creation. In
some cases, templates offer numbering of items, in others it offers the alphabetic arrangement and in some case
it provides the list without any numbering but with a very legal formatting.

Here we can describe in detail the various document class templates used during the preparation of documents.

12.11.1 Articles

An article is a piece of writing designed to clearly and concisely convey information to a reader. Articles are
generally non-fiction and are used to propagate information such as news and scientific research. The layout of
articles is generally multi-column to maximize the information per page and make text easier to read.

The articles templates will be normally provided by the particular publisher of the article. If you intend to
publish an article in a journal with high impact factor, you need to check out for the latest available LaTeX
template for the same and start working accordingly. Templates change from journals to journals. Almost all the
journals provide you with a sample .cls file, using that you can create a .tex file corresponding to it. This .tex file
will be in accordance with the template specified by the journal.

A sample article is shown here.


12.11.2 Thesis and Conference Proceedings

One of the major documents which need prime importance for formatting is nothing but a thesis. We need to set
our text height, width, page margins, etc. We make use of the document class report for both thesis and
conference proceedings. For fixing the width and height, we need to include a special package called, set space.
For including figures, we use the package “graphicx”.

The preamble for a thesis or conference proceeding will be similar to this:

\documentclass[a4paper,12pt]{report}

\usepackage{cite}

\usepackage{graphicx}

\usepackage{subfigure}

\usepackage{float}
\usepackage{setspace}

\setlength{\textheight}{8.5in}

\setlength{\textwidth}{5.5in}

\addtolength{\topmargin}{-1cm}

For the thesis, we need to have a title page, which mentions about the author, his affiliation, etc. To make that we
have the keywords \begin{titlepage} … \end{titlepage}. We provide all credentials between this code section,
and gives the command \maketitle to generate the title page.

\begin{titlepage}

\begin{document}

\title{Your title of the Project}

\author{Name of Author}

\maketitle

\end{titlepage}

Abstract of the study is also very important; we know that abstract stands alone in a page with special
formatting. In LaTeX, we have the keyword abstract to have an easy formatting of the same. The section in a
thesis will be as follows:

\begin{abstract}

This section provides the breif study of the work done.

This is the abstract.

\end{abstract}

12.11.3 Books

A book is a long document used for many purposes, such as for novels, non-fiction works or textbooks. This
variety of applications makes books one of the most complicated document types to write and typeset. Books
need to include the capability for including all document constituents and often contain many cross-references.

In such a situation other document creation tools finds difficult to have a easy formatting method. On the other
hand, LaTeX is the premiere tool for simplifying the inherent complexity of a book to allow the author to focus
on writing rather than formatting.

\documentclass[12pt,oneside]{book}

\usepackage[width = 4.375in, height = 7.0in, top = 1.0in, = {5.5in, 8.5in}]

\usepackage[pdftex]{graphicx}

\usepackage{amsmath}
\usepackage{amssymb}

\begin{document}

\chapter*{\Huge \center Name of Book }

\section*{\huge \center Author 1}

\newpage

\subsection*{\center \normalsize Copyright \copyright 2014 by Authors}

\subsection*{\center \normalsize All rights reserved.}

\subsection*{\center \normalsize ISBN \dots}

\subsection*{\center \normalsize \dots Publications}

\chapter*{\center \normalsize To my Parents}

\tableofcontents

\chapter{Introduction}

In mathematical logic, predicate logic is the generic term for symbolic formal systems like first-order logic,
second-order logic, many-sorted, or infinitary logic. This formal system is distinguished from other systems that
its formulae contain variables which can be quantified. Two common are the existential and universal
quantifiers. The variables could be elements in the universe under discussion, or perhaps or functions over that
universe.

\chapter{First Order Logic}

First-order logic is a formal system used in mathematics, philosophy, and computer science. It is also known as
first-order predicate calculus, lower predicate calculus, quantification theory, and predicate logic.

Figure \ref{15} forms the basis of the entire

\begin{figure}[ht]

\centering

\includegraphics[width

= 0.4\textwidth,natwidth = 0.1,natheight = 0.1]{15.jpg}\\

\caption{Slides in LaTeX}

\label{15}

\end{figure}

\chapter{Second Order Logic}

In logic and mathematics second-order logic is an extension of first-order which itself is an extension of
propositional logic. Second-order logic is in turn extended by higher-order logic and type theory.
Sample Table is shown below.

\begin{tabular}{|r|l|}

\hline

7C0 & hexadecimal \\

3700 & octal \\ \cline{2-2}

11111000000 & binary \\

\hline \hline

1984 & decimal \\

\hline

\end{tabular}

\begin{thebibliography}{99}

\bibitem{maths_books}

Books of Mathematics (Author Name): \\

http://shop.mathsbooks.org/

\bibitem{logicbooks}

Author names., \emph{Book Title}, Name of Publications (Year) \\

ISBN: Number

\end{thebibliography}

\end{document}

This sample code generates the book pages as follows:


12.12 NUTS AND BOLTS

To end with, let us look some auxiliary options available in LaTeX, one of which is the footnote. It facilities to
typeset inserted text. Footnotes are generated with the command \footnote{footnote text} which comes
immediately after the word requiring an explanation in a footnote. The text of the footnote appears as a footnote
in a smaller typeface at the bottom of the page. A sample footnote code will look like below.
\begin{minipage}{.5\linewidth}

\renewcommand{\thefootnote}{\thempfootnote}

\begin{tabular}{ll}

\multicolumn{2}{c}{\bfseries PostScript type 1 fonts} \\

Courier\footnote{Donated by IBM.} & cour,courb,courbi,couri \\

Charter\footnote{Donated by Bitstream.} & bchb,bchbi,bchr,bchri\\

Nimbus\footnote{Donated by URW GmbH.} & unmr, unmrs \\

URW Antiqua\footnotemark[\value{mpfootnote}] & uaqrrc\\

URW Grotesk\footnotemark[\value{mpfootnote}] & ugqp\\

Utopia\footnote{Donated by Adobe.} & putb, putbi, putr, putri

\end{tabular}

\end{minipage}

This generates the output as shown in Fig. 12.27.

Fig. 12.27 Output footnote

In the same way, we can add text on the margins and it is termed as marginal notes. The way in which it is used
is similar to that of footnotes. The \marginpar command generates the marginal note.

Another text appending method is endnotes; they appear at the end of each chapter or at the end of the
document. Endnotes are not supported in standard LaTeX, but by using the package endnotes we can adhere
endnotes in a way similar to footnotes. The command used is,

\renewcommand{\footnote}{\endnote}

EXERCISES

1. What is TeX?
2. What is “writing in TeX”?
3. How should I pronounce “TeX”?
4. What is Metafont?
5. What is MetaPost?
6. Things with “TeX” in the name
7. What is CTAN? Discuss the (CTAN) catalogue.
8. How can I be sure it is really TeX?
9. What is e-TeX?
10. What is PDFTeX?
11. What is LaTeX? What is LaTeX2e?
12. How should I pronounce “LaTeX(2e)”?
13. Should I use Plain TeX or LaTeX?
14. How does LaTeX relate to Plain TeX?
15. What is ConTeXt?
16. What are the AMS packages (AMSTeX, etc.)?
17. What is Eplain?
18. What is Texinfo?
19. If TeX is so good, how come it is free?
20. What is the future of TeX?
21. How will you read a (La)TeX file ?
22. Why is TeX not a WYSIWYG system?
23. What is a DVI?

You might also like