Professional Documents
Culture Documents
Full Text 02
Full Text 02
Harini Nekkanti
Sri Sai Vijay Raj Reddy
Faculty of Computing
Blekinge Institute of Technology
SE–371 79 Karlskrona, Sweden
This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology
in partial fulfillment of the requirements for the degree of Master of Science in Software
Engineering.
The thesis is equivalent to 20 weeks of full time studies.
Contact Information:
Author(s):
Harini Nekkanti
E-mail: hana15@student.bth.se
Sri Sai Vijay Raj Reddy
E-mail: srre15@student.bth.se
University advisor:
Tekn. Lic. Ahmad Nauman Ghazi
Department of Software Engineering(DIPT)
ii
Contents
Abstract i
Acknowledgments ii
1 Introduction 1
1.1 Research Gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Research Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Research Questions and Instrument . . . . . . . . . . . . . . . . . 6
1.5 Structure of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Research Method 18
3.1 Empirical methods . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 Research Overview: . . . . . . . . . . . . . . . . . . . . . . 19
3.1.1.1 Systematic Literature Review: . . . . . . . . . . . 20
3.1.1.2 Interview . . . . . . . . . . . . . . . . . . . . . . 22
3.1.1.3 Data analysis: . . . . . . . . . . . . . . . . . . . . 23
iii
3.2 Systematic Literature Review . . . . . . . . . . . . . . . . . . . . 23
3.2.1 Planning The Review . . . . . . . . . . . . . . . . . . . . . 24
3.2.2 Specifying the research questions . . . . . . . . . . . . . . 24
3.2.3 Developing the review protocol . . . . . . . . . . . . . . . 24
3.2.3.1 Search strategy . . . . . . . . . . . . . . . . . . . 25
3.2.3.2 Study selection criteria . . . . . . . . . . . . . . . 26
3.2.3.3 Quality criteria . . . . . . . . . . . . . . . . . . . 27
3.2.3.4 Data extraction strategy . . . . . . . . . . . . . . 28
3.2.3.5 Data synthesis . . . . . . . . . . . . . . . . . . . 29
3.2.3.6 Evaluating the review protocol . . . . . . . . . . 29
3.2.3.7 Pilot study . . . . . . . . . . . . . . . . . . . . . 29
3.2.4 Conducting the Research . . . . . . . . . . . . . . . . . . . 29
3.2.5 Identification of research . . . . . . . . . . . . . . . . . . . 30
3.2.6 Primary studies . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.7 List of selected studies . . . . . . . . . . . . . . . . . . . . 30
3.2.8 Quality assessment criteria . . . . . . . . . . . . . . . . . . 31
3.2.9 Data Extraction Strategy . . . . . . . . . . . . . . . . . . 32
3.2.10 Data synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.1 Selection of Interview Subjects: . . . . . . . . . . . . . . . 33
3.3.2 Interview design: . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.2.1 Interview setup: . . . . . . . . . . . . . . . . . . . 36
iv
4.4.2 Analysis for RQ2 from Interviews: . . . . . . . . . . . . . 64
Appendices 92
G Themes 111
G.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
G.1.1 Convenience sampling . . . . . . . . . . . . . . . . . . . . 111
G.1.2 Random Sampling . . . . . . . . . . . . . . . . . . . . . . 111
G.1.3 Stratified sampling . . . . . . . . . . . . . . . . . . . . . . 112
G.2 Questionnaire Problems . . . . . . . . . . . . . . . . . . . . . . . 112
G.2.1 Questionnaire Problems . . . . . . . . . . . . . . . . . . . 112
G.2.2 Questionnaire length . . . . . . . . . . . . . . . . . . . . . 112
G.2.3 Open-ended & closed-ended questions . . . . . . . . . . . . 113
G.2.4 Question order . . . . . . . . . . . . . . . . . . . . . . . . 113
G.2.5 Likert Scales . . . . . . . . . . . . . . . . . . . . . . . . . . 114
G.2.6 Sensitive Questions . . . . . . . . . . . . . . . . . . . . . . 114
G.2.7 Time limitation . . . . . . . . . . . . . . . . . . . . . . . . 114
G.2.8 Domain Knowledge . . . . . . . . . . . . . . . . . . . . . . 115
G.2.9 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
G.2.10 Hypothesis Guessing . . . . . . . . . . . . . . . . . . . . . 116
G.2.11 Language Issues . . . . . . . . . . . . . . . . . . . . . . . . 116
v
G.2.12 Cultural Issues . . . . . . . . . . . . . . . . . . . . . . . . 117
G.2.13 Generalizability . . . . . . . . . . . . . . . . . . . . . . . . 117
G.2.13.1 Bias . . . . . . . . . . . . . . . . . . . . . . . . . 117
G.2.13.2 Reliability . . . . . . . . . . . . . . . . . . . . . . 118
G.3 Response Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 118
G.3.1 Low Response Rate . . . . . . . . . . . . . . . . . . . . . . 118
G.3.2 Inconsistency . . . . . . . . . . . . . . . . . . . . . . . . . 119
G.3.3 Response Duplication . . . . . . . . . . . . . . . . . . . . . 119
G.3.4 Rewards . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
G.3.5 Sample Size . . . . . . . . . . . . . . . . . . . . . . . . . . 120
G.3.6 Analysis Techniques . . . . . . . . . . . . . . . . . . . . . 121
vi
List of Figures
vii
List of Tables
viii
Chapter 1
Introduction
1
Chapter 1. Introduction 2
type of phenomena, but the survey interventions in software engineering are about
tools, techniques and are more IT-Centric; while surveys in social sciences address
more personal information. The reason for this could be the subjective nature
of surveys in social sciences where psychological opinions of people are collected.
For example a question in such surveys would be “which one do you think has
more features Android or iOS?”; “how do you feel about iOS 9?” etc. These
questions are expected to be answered from the respondent’s perspective and are
mainly opinion based, respondents need not give evidences for what they answer
and have the freedom of expressing their views. Similar question posed in a
software engineering context could be “What are the major flaws that makes you
choose iOS over android or vice-versa?”. Here in this question the respondents are
not asked about their opinions instead they are expected to provide evidences to
support their answers, which are otherwise treated as invalid. In this way software
engineering surveys test the perception of the respondents, like the perceptions
about software, tool or a process. While designing social science surveys questions
starting with what, how, when are asked and mainly Why questions are avoided
with the fear of getting low responses. But software engineering surveys mainly
collect evidences thorough Why questions [39], [37].
The sample population in case of social science surveys are more generic in
nature while the software engineering surveys have specific sample. A customer
survey can be distributed to any type of population but in case of software engi-
neering, the population must be specific like the developers, testers, quality team
etc. The surveys in social sciences focus on more general information, but the soft-
ware engineering surveys address more technical information. In case of software
engineering surveys, it becomes difficult to identify the representative sample of
population easily, while the target population is easily identifiable in social sci-
ences context. Researchers in software engineering carefully verify the responses
before including them, flawed responses affect the overall survey outcomes. Sum-
marizing the above context, the main motivation for conducting surveys either
software engineering or social sciences is to obtain the data from large population
with the aim of generalizing the findings [39], [37].
Surveys at Present: In the past, surveys were conducted in traditional
ways where the sample population details were collected and respondents were
contacted in-person. The survey responses were then collected either by handing
over questionnaires or by conducting interviews. Responses obtained in this way
were more sample-specific as the respondent selection was done by the researcher
himself. Reasons for this could be varied, there is no guarantee that whole pop-
ulation answers the survey, sometimes there are problems of biased results. At
present many researcher’s are using online tools for conducting Surveys.Survey
Monkey [1] and Survey Gizimo [84] are two commonly used tools. The process
of gathering responses using tools is pretty simple and user-friendly. It all starts
with researchers by creating an account and the survey is posted specifying the
population-type and number of responses. Researcher’s are expected to wait for a
Chapter 1. Introduction 3
certain period of time before getting the responses. In the past if the researcher’s
had constraints like time-bound, sample specifications, large population size, they
were either compromised with the results or prolonged their research.Online tools
address such issues of getting the responses where the specific user is expected
to pay a certain amount based on his/her preferences. Further these websites
provide several options to customize the way of obtaining responses. Advantage
of using this tools is that responses obtained are demographically distributed;
there is no interaction between researcher and respondent which reduces biasness
to some extent; researcher is able to get sample-specific responses; and results are
obtained in specified time reducing result delays. In this way present day tech-
nologies are helping software engineering researchers to handle traditional survey
problems [39].
Online surveys can reach diversified sample due to Internet. They have high
response rate compared to paper based surveys. They do have negatives like
impersonal survey requests, self-selection samples, lack of internet facility, in-
appropriate respondents answering the survey, reporting problems, correctness
of responses, same respondent answering survey for more than one time, biased
sample [92], [39]. Technical problems like multiple operating systems, unsup-
ported email readers, incompatible browsers, poorly functioning software’s, server
crashes, data accumulation were discussed by Stephen et al. [106] in the online
surveys. All these factors show that online surveys do have their own risks.
Problem Domain
Carter steel et al. [15] argued the poor standards of ongoing research in empirical
software engineering and stressed the need for aggregating the empirical results in
overcoming the interpretation difficulties. Two major pitfalls forming the basis of
criticisms were identified. Former was the researchers being unfamiliar about the
practitioner’s approach (or industrial practices). Latter being lack of applicability
of the practices, methods, techniques, approaches prescribed by the researchers to
the practical context (in industry). Surveys were claimed to be useful in dealing
with such problems[15]. Paulk et al.[89] supports this argument by stating that
“surveys help to obtain a good feel for the breadth in deploying specific analysis
techniques in industrial perspective”.
Punter et al.[95] statistically specified the increased usage of surveys over case-
study and experiments. They discussed about ESERNET study that showed that
50% of respondents used surveys followed by case studies and experiments. In
comparison with another empirical investigations surveys have ability to capture
larger population. Less variable control, less possibility for variable manipulation,
low internal validity, high external validity are the reasons for surveys to gain
immense popularity among software engineering researchers[95].
Till date the software engineering researchers are still facing problems like
Chapter 1. Introduction 4
generalizability, low response rate, reliability etc.[25], [46], [125], [40], [88], [123].
Problems like improper participant selection, questionnaire flaws could be miti-
gated if proper measures are taken by the researchers while designing the survey
itself. Some problems like hypothesis guessing [46], reliability cannot be com-
pletely eliminated but their impact on survey outcomes can be reduced to some
extent.
The reason for researchers facing problems could be either he/she is unaware
of the problem or they might have overlooked the problem in the survey process.
In both the cases the outcome of survey is a “lemon”(a bad survey) [92] . We think
there are sufficient list of guidelines that discuss about designing and adminis-
tering a survey, main issue was the lack of documentation addressing common
problems in a survey process. For instance, if we look at the published articles,
survey problems are being addressed in the threats to validity section or limita-
tion sections only. Instead of taking the same road and designing guidelines, we
though that it would be great idea to have a problems checklist with mitigation
strategies. A checklist of this kind would be like a reference as it helps to predict
the survey problems, so that researchers can deal with them in advance.
We are researchers from software engineering domain and wanted to facilitate
researchers in software engineering. It is important to differentiate in area of
surveys in Software engineering vs social sciences. If surveys from every domain
were considered along with those in software engineering, it leads to a research
spanning across several years. We cannot achieve required results in this stip-
ulated time allotted for Master Thesis. Due to this reason we narrowed down
the area of our research. Our research will only be focused on surveys that were
published in the domain of software engineering.
1.3 Objectives
The main aim of this research is to document the lessons gained by researchers
in designing and application of survey as a methodology. Given below are the
objectives which will help us to achieve required outcome.
• To analyze the surveys with respect to variables like sample size, response
rate and analysis techniques.
• RQ1: What are the common problems faced by the software engineering
researchers while conducting a Survey?
Motivation: The motivation behind inclusion of this research question is
bi-fold. First, it helps to identify all the rough situations in the surveys
which researchers try to avoid. This in turn helps us to identify different
problems faced by researchers due to such situations. Secondly, we can
report our findings on problems and the suggested mitigation strategies.
There are evidences in the literature showing that, different problems arise
during the survey process, it has been identified that no proper documen-
tation has been published consisting the list of all possible problems.Hence
we focus our research in this area in this area.Also it was identified from ex-
isting literature that there is no documented literature stating the common
problems faced by software engineering researchers in survey process. This
is evident from the existing literature that problems faced are either men-
tioned in threats to validity sections or in limitation sections [47][33][46][31].
Answering this research question has two benefits. In one way the problem-
mitigation checklist benefits the researchers by becoming handy and in other
way it contributes to the body of knowledge in empirical software engineer-
ing domain. Additionally, the checklist also helps to identify the impact of
variables on survey outcomes.
Chapter 1. Introduction 7
• RQ2: How do the variables like sample size, response rate and analysis
methods affect the survey outcomes?
Motivation:In every domain may it be software engineering,social sciences,
political sciences, health care and automation researchers opt for surveys
to get information from a larger population. In this regard a sample size
selection is important since to obtain more data points and to achieve gen-
eralizability. Response rate is important since it depends on communication
with peers, research topic, willingness to contribute etc. In any survey the
selection of analysis technique is crucial since it decides the way in which
final results would be presented.Despite the outcome being positive or neg-
ative it is essential for any researcher to openly admit the results. Proper
analysis technique gives a helping hand to the researchers in portraying
their results. Also, survey results can only be critically examined with a
proper analysis technique. These factors motivated us for selecting sample
size, response rate and analysis techniques apart from other variables for
our research.A properly defined sample with specific size can be used to get
better statistical evidences for any survey. Different data patterns can be
identified and gerneralizabily of outcomes is possible with proper sample
size [67]. Increased response rate implies many data points for any sur-
vey. This helps the researchers to generalize the research outcomes with a
larger target population [107]. Wohlin et al. [121] defines that “for drawing
valid conclusions from survey data, researchers need to interpret the data
properly”. This explains the need for proper analysis methods at the end
of survey. When improper interpretation is done on a quantitative data, it
results in terrible patterns that completely discourage the need for conduct-
ing a survey. It is also evident from the existing literature that improper
sample size, poor response rate and wrong selection of analysis method
badly affect the survey outcomes [67][107][68]. Identifying and analyzing
how these variables impact the survey outcomes can be used to assess the
survey approach which researchers are adapting. Hence this questions helps
us to full-fill the objectives of our research.
Motivation: This question was framed with the motive of getting deeper
insights into our research domain, by direct interviews with the software
engineering researchers. Only partial fulfillment of objectives would be
Chapter 1. Introduction 8
achieved by doing systematic literature review where only the state of art
is studied, for more concrete validation we require state of practice as well.
This inclined us towards selecting interviews for our result validation. In or-
der to obtain answer to this research question additionally two sub questions
were written. The reason for choosing this question was to record and com-
pare the mitigation strategies that researchers adapt with those obtained
from literature. In the similar way we wished to obtain the researchers out-
look on selecting the sample size, response rate, and analysis methods. In
this way we will validate findings, try to extrapolate the researcher views
fulfilling our research aim.
2.1 Background
Colin Robson and Kieran McCartan[101] defines survey methodology as “a fixed
design which is first planned and then executed”, this statement is validated by
many authors through their research. Its clear from this statement that survey
methodology is a step by step process. Based on the existing literature, survey
methodology can be broadly classified into a series of eight stages[61]. Each
stage of this sequential process has many sub-processes and is unique in its own
way; when properly followed gives expected outcomes. Figure 2.1 gives a brief
explanation of all the stages:
• What are the possible areas which are close to the research objectives that
were left un-investigated?
• How will the data obtained from Survey be used? [61] [69] [18]
While defining the research objectives for a survey, the related work pertaining
to that particular field must be considered, the knowledge about similar practices
helps researchers to narrow down the objectives. All the stakeholders (like the
9
Chapter 2. Background and Related Work 10
researchers who are conducting survey and respondents) must know the goals of
the survey and need to have the idea of what to expect at the end. Failure to do
so results in the unnecessary iterations causing negative outcomes [61] [18].
Research objectives must capture the goal of the survey. They are formulated
either as Research Questions constituting “ What, Why, How ” or as final out-
comes [69]. Top-down approach or bottom-up approach can be followed while
formulating research objectives. Ciolkowski et al [18] proposes to adapt the Goal
Question Metric (GQM) method for top-down approach. In GQM Method first
the goals are defined, then the Questions are framed accordingly after which met-
rics are selected to measure the desired goals. When GQM is viewed from survey
methodology perspective, goals can be viewed as research objectives, then the
questions can be mapped to research questions and metrics represent question-
naire. Similar methodology can be implemented but in a reverse way in case of
bottom-up approach where research questions are first defined thoroughly, they
help in narrowing down and defining down the objectives [61].
While defining the research objectives based on GQM, a researcher needs to
keep in mind of what needs to be measured and how its reflected in research
objectives. To clearly identify this, Kitchenham and Pfleeger [69] have defined
three types of objectives which needs to be considered when the investigation is
done on a specific population:
Chapter 2. Background and Related Work 11
• Discovering the factors which affect the characteristics among the popula-
tion [69].
Wohlin et al. [121] clearly defines the purpose ( objective or motive) for conduct-
ing a survey . Based on the objective, any survey falls into one of the below three
categories:
• Explanatory Surveys tries to explain the claims about the given population.
For a given population this type of survey tries to find patterns, problems
observed in the population.
• Random Sampling
• Systematic Sampling
• Stratified Sampling
• Convenience Sampling
• Judgment Sampling
• Quota Sampling
• For open ended questions, content analysis is used. Two main types of clas-
sifications in this method can be qualitative content analysis and quantita-
tive content analysis. Other methods like Phenomenology, discourse analy-
sis, grounded theory can be applied for analyzing the open-ended questions
[32], [52], [11], [44], [105].
Punter et al. [95]have focused mainly on the present survey trend i.e. the
online-surveys. They have drafted a set of guidelines to perform online survey
from their experiences obtained from five on-line surveys. They describe that data
obtained from online surveys are easy to analyze as it is obtained in the expected
format where the paper-based forms are error prone. The online surveys track the
record of invited respondents and log the details of those who actually answered
the survey which is not the case with paper-based surveys by which researchers
can increase the response rate. In this way they argued that online surveys
mitigated the two main survey problems. First one being the problem of less
frequent surveys where online surveys gave researchers the scope to conduct more
surveys again and again, due to web survey tool management system. Second
problem of poor result disclosure was handled by visualizing and distributing the
survey results.
Pfleeger and Kitchenhamm [92] published a series of six articles which dis-
cussed the same survey methodology as described by Kasunic [61]. Even though
these papers are a set of short papers they are of high regard. This series was the
first to publish a concise documentation about survey methodology.
Less participation rate is a common problem for any survey which was iden-
tified by Smith et al. [107] in their research. Based on their expertise and the
existing literature, they performed a post-hoc analysis on previously conducted
surveys and came up with factors to improve participation rate. But they even
specified the limitations of the obtained results in that “an increase in participa-
tion doesn’t mean the results become generalizable”[107].
Pertaining to the survey sampling, Travassos et al. [27] argue that, there are
no specialized and adequate sources of sampling; to support their argument they
propose a framework consisting of target population, sampling frame, unit of ob-
servation, unit of attribute and an instrument for measurement. Ji et al. [56] have
conducted the surveys in China and addressed the issues relating to sampling,
contacts with respondents and data collection, validation issues. Conradi et al.
[21] have highlighted the problem of method biases, expensive contact process,
problem with census type data, national variations by performing an industrial
survey in three countries – Norway, Italy and Germany. This is the first study in
software engineering which used census type data. The problem of replications
of surveys was highlighted by Rout et al.[15] replicated a European survey which
was administered in the Australian Software development organization.
To make our work more applicable we randomly considered three surveys
which are from other domain apart from software engineering domain. First ar-
ticle focuses on measuring patient safety. It compares the factors like general
characteristics, dimensions covered and various study uses of patient safety cli-
mate surveys [20]. Second article discusses about the mental disorders that are
faced by most of the prisoners[34]. Third article discusses about the use of Chi-
nese medicine by cancer patients[14]. All the three articles share the common
point of being reviews on surveys. A brief discussion on obtained results along
Chapter 2. Background and Related Work 17
This section details about the research method used in order to achieve the aims
and objectives. A Systematic literature review and an interviews were the two
methodologies used in order to answer the research questions. Motivation of the
selected research method was also presented.
• Surveys: “The survey is defined as research in large” [120] which means cov-
ering the large sample or target population to collect the necessary informa-
tion by questionnaire and interviews.Interestingly our research is about the
investigating the problems faced by software engineering researchers while
18
Chapter 3. Research Method 19
In the first phase a systematic literature review provided by Kitchenham [63] was
applied for RQ1 and RQ2 in order to collect the relevant information for our
study i.e. to identify the problems faced by researchers while conducting surveys
in software engineering and how various factors like sample size, response rate
and analysis methods effect the survey outcomes. Narrative synthesis was used
to outline the gathered qualitative data.
In the second phase i.e. state of practice interviews were conducted to validate
the data obtained from systematic literature review. This was done by asking the
interviewee (researcher) questions regarding their experiences while conducting
surveys and suggestions to finally draft a checklist of problems along with the
mitigation strategies for conducting surveys. Thematic analysis was used to an-
alyze interview results.
Chapter 3. Research Method 20
3.1.1.2 Interview
“Interview is a data collection method of eliciting a vivid picture of an individuals
perspective on the research topic. It involves a meeting in which the researcher
asks a participant a series of questions. This method is useful as the researcher
can ask in-depth questions about the topic leading to a an fruitful discussion, a
follow-up is also possible with the interview participants” [119]. There exist many
other data collection techniques like survey questionnaires used by the researchers
that could be compared with interviews. In case of survey questionnaires the data
is collected from a large sample but bias exists in the results if the survey ques-
tionnaires are not administered properly, while the sample in interviews may be
small, the interviewees are selected according to the requirements, which results in
more promising results. Contrary to such designing and distribution of such ques-
tionnaires, the interview process requires less knowledge and background work.
Based on the level of “structure” interviews are classified into three categories
among which Structured Interviews are top of all, this is similar to the question-
naires but the interviewer expects short and immediate answers leaving no room
for further discussion on any other issues. Unstructured Interviews are on the
bottom of the structure, which happen by considering only limited topics, due
to much discussion rather than answering given questions it sometimes becomes
difficult for the interviewer to transcribe even if it’s the difficulty arises while
generalizing the results for analysis. Semi-structured Interviews are widely used
as it has combined benefits other unstructured and structured by means of open-
ended and close-ended questions the interviewer gets additional information along
with the required information [103]. In our research semi structured interviews
were used to know the experiences of researchers in conducting surveys and to
validate the findings of our SLR execution.
Motivation:Interviews are best sources to have in-depth discussions on spe-
cific topics. They help to collect qualitative information, that helps to validate the
findings from other research studies(like survey, case-study or experiment)[121][101].
The efficiency of the qualitative data obtained depends on selected subjects, ques-
tions asked and analysis of interview results. Interviews also help us to collect
personal opinions, and additional information related to our research that can be
compared and added to our results. Since the outcome of our research will be use-
ful to software engineering researchers, interviewing them will help us to present
in a better way based on their recommendations. These reasons motivated us to
Chapter 3. Research Method 23
The first phase represents planning, which involves the need for review, specify-
ing the research question, developing and evaluating the review protocol, which
details the SLR process. The second phase (conducting the review) involves
identification of research, selection of primary studies, quality assessment, data
extraction and synthesis. The third phase covers the complete review report [63].
The guidelines provided by Kitchenham et al. [63] had suggested the use
of PICO method as an efficient way to formulate a search string. Hence we
have opted to use PICO method for developing our search string. PICO is a
combination of 4 stages namely, Population, Intervention, Context and Outcomes
as shown in Table 3.2. We have implemented the same methodology on our
research area and we have obtained the following results.
• Data Sources: The articles that are used within the research are from the
advisory databases within the BTH student portal. The identified databases
that are used are as follows:
• INSPEC
• SCOPUS
The reasons for selection of two databases was their relevance to software en-
gineering domain, associations within these databases providing a wider scope
covering a range of publications. Another important reason was to avoid du-
plication of articles, the databases like Google Scholar and IEEE might result
in duplicates. The INSPEC and SCOPUS have the most comprehensive and
broader coverage [71] and they perform overall gathering of publications from
different databases.The full text articles that weren’t accessible using Inspec and
Scopus were downloaded from Google Scholar. High degree accessibility of arti-
cles and a simple text box interface were of additional help while using Google
Scholar [71].The suggestions from the BTH librarians were also considered in this
case and discussed with the supervisor.
We have used the above search string in the databases along with several
Boolean operators like AND, OR. In order to obtain results that are most relevant
for our research, we also had to use few search operators like DOCTYPE, “ ”, in
our search string as shown in Table 3.4. Suggestions from BTH librarians were
also considered before finalizing our search string.
• All the relevant articles related to software engineering domain were in-
cluded.
• Only published articles obtained from Inspec and Scopus databases were
included.
Exclusion criteria: All the articles that doesn’t fulfill inclusion criteria were
excluded under exclusion criteria.
In the above blueprint, we can find all the relevant columns that we con-
sidered as parameters and corresponding activity, which is performed. We have
formulated these parameters with utmost care, so that the results will be greatly
refined and effectively useful for our research.
from both the databases. Out of selected 745 studies only 679 did not fulfill the
inclusion criteria. So only 66 primary studies were considered.
Results
S. No Quality criteria
Yes (1) Partial (0.5) No (0)
Does the paper identify required
QC1 objectives fulfilling the aim of 34 32 0
research?
Chapter 3. Research Method 32
Article scoring was given as 1,0 and 0.5 where 1-yes 0-no 0.5-partial The total
score of the article was considered to access the quality criteria. If the quality
criteria of a particular study have all 1’s then these studies were clustered as high
quality. If the quality criteria of a particular study were greater than or equal to 2
then these studies were considered as medium quality and finally when the quality
criterion of a particular study was less than 2 they were grouped as low quality.
The following table illustrates the range of quality for individual studies. Quality
range table 3.8 shows the range of studies with high, medium and low quality.
Table E.1 in Appendix listed the quality criteria for all the primary studies.
information in the excel sheets was utilized to perform this phase. The findings
from the SLR were summarized in order to answer the research questions i.e. the
problems identified by conducting empirical surveys and the effect of variables like
sample size, response rate and analysis techniques affect the survey outcomes.
The continuation for the systematic literature review protocol has been dis-
cussed in Section 4.1
3.3 Interview
The qualitative research method used for the research question 3 is opted to be in-
terviews. The interview is data collection method where the verbal exchange takes
place between two persons, as the interviewer retrieves the required information
from other person experience. The interviewee presents their own experiences,
beliefs and behaviors either as a consumer or as an employee, it is job of the
interviewer to retrieve the information that is best suitable for their research.The
other data collection techniques such as questionnaire were not used as they have
large sample base and the results obtained are more generalized [103] whereas
interviews have small sample data which are based on the requirements that tend
to produce better results.
as name and department. As for the next step the goal and objective of the
interview are presented to the interviewer so that no deviation of discussion takes
place. The participation of the interviewee is less likely without knowing the goals
and objectives of the study [104]. After the introduction the general interview
information is gathered from the interviewee such as his/her experience in their
field of research. The next main crucial step is to focus on the interview question
from which the required data from the interviewee are gathered.
iii. Transcribing:At this stage the discussions made between the respondent
and the interviewee and the data gathered in the interview that are recorded
as audio file are converted into the text file. The interviews were transcribed
manually by listening to audio tapes.
iv. Analyzing:The analyzing stage is most crucial stage where the required
data for the research are extracted from the transcribed data and the interviews
that are recorded. A systematic organization of the questions is done in accor-
dance with the research questions so that ambiguity can be avoided.
Questions for Interviews:Generally, the questions asked in interviews are
framed with the intention of collecting required information from the intervie-
wees. The factors like time allotted for interviews, number of interviewee’s, re-
search theory, expertise of interviewee’s, etc. all impact the framing of interview
questions.
Among all, the research theory plays an important role. It is majorly divided
into two types:
1. Inductive Theory: Research which is formulated from the existing lit-
erature and theories. The interview questions generated from this theory are
influenced by the questions existing in the published papers.
The interview questions are influenced by any of the two above mentioned
methods. While designing the questionnaire we have to check for the following
conditions [103]:
i.They have no implicit assumptions.
ii.Avoiding the framing of questions that leads to single word(yes/no) answers.
iii. Preventing too generalized questions.
The ordering of the questions is to be done carefully such that one question has
to be naturally leading to the formation of other question and also the conclusion
of the interview. At last a pilot study has been done with the supervisor before
the interview is taken to make sure if any necessary changes have to be made.
The interview questionnaire can be found in Appendix-C
Chapter 3. Research Method 36
37
Chapter 4. Results and Analysis 38
from the graph in Figure4.1 in late 90’s surveys were conducted but in a small
number. In the last ten years there had been a considerable increase of performing
survey for data collection. Common problems like low response rate, quality of
outcomes, duplication existed almost every time. In the past primary studies
discussing such problems existed but were limited, in the recent years there is a
substantial increase of studies wherein survey is used as the research methodology.
Issues like bias, reliability, cultural issues etc. could be the problems worrying
researchers in modern days.
4.1.3 Domain
Primary studies that discussed about problems in survey methodology were con-
sidered from every domain. Below figure 4.2 shows the number of studies included
from each domain. Least number of studies discussing about survey problems
were from Software Security, Metrics and Object Oriented techniques domain,
reasons for this must be usage of experiments and case studies used for data
collection. In domains such as Software project management, requirements en-
gineering, software testing it is evident that surveys, interviews, case studies or
experiments were used as validation methods. But in the domain of software
development and software engineering researchers published many articles dis-
cussing survey problems and their mitigation strategies.
Chapter 4. Results and Analysis 39
• Context (C): Gives description of the problem and its relation to the pre-
vious research.
• Study Design (S): Information about products, services and resources used
for the study evaluation.
problems faced.
Relevance (Re): Authors specified four aspects which are needed for calculat-
ing Relevance of a study namely:
• Context (C): Gives description of the problem and its relation to the pre-
vious research.
• Scale (Sc): Scales used for evaluation.
• Users or Subjects(U): Description of subjects or users involved.
• Research Methodology (RM): Description about the research methodology
used.
Relevance (Re) = C+Sc+U+RM. Rubrics for each aspect to be consid-
ered as 1 if contribution exists and 0 is used if there is no contribution. Three
aspects were mapped to our research and Rigor was calculated for every primary
study as shown in Table 4.3.
Chapter 4. Results and Analysis 41
The Rigor and relevance for every primary study considered were calculated
and listed under F.1 in Appendix. The Bubble plot for Rigor and Relevance
values has been shown in Figure 4.3.
viding details for the next question, leading the respondents to write a specific
answer. This issue can be mitigated by the authors by randomizing the ques-
tions of the questionnaire. Margaret et al.[47] faced the same issue in their
survey, instead of randomizing they designed the questionnaire based on nat-
ural actions-sequence helping the respondents in recalling and understanding the
questionnaire. Garousi et al.[42] prevented this problem by designing a question-
naire looking at the similar surveys conducted in past. Once the questionnaire
has been designed the authors contacted several industrial practitioners for peer-
reviewing. In this way the authors conveyed useful information and also handled
the problem of question-order effect.
P 6 Survey Instrument Flaws: Survey Instrument flaws is one of the con-
clusion validity threat. Martini et al. [47] were really cautious in their method-
ology to avoid such flaws in their instrument design. They iteratively pretested
the whole process by first carrying it on known subjects and then on actual
subjects. Discussions with colleagues and domain experts were also the part of
pre-test process. Gorschek et al.[46] have also done redundancy check in addition
pre-tests and expert discussion to handle the Survey Instrumentation Problems.
Authors Travassos et al.[111] used external researchers that are not involved in
the research and reformulated the questionnaire based on their reviews.
P 7 Likert Scale Problems: Likert scale is one dimensional in nature, re-
searchers mostly use this in surveys with an assumption that respondent’s opin-
ions are mapped on to any one of the dimensions. In a realistic scenario it’s not
true, some respondents might get confused on what responses to pick, settling for
the middle option. Analyzing the results obtained by higher order Likert scales
are tiresome for analysis posing a threat of misinterpretation or data losses [33].
P 8 Survey Questionnaire Design: In practice every question in survey
questionnaire cannot be mutually exclusive and exhaustive. It could be a common
problem which arises due to poor designing. Sometimes questions are ambiguous
confusing respondents or they are leading which might not capture the whole idea
of survey methodology. With reformulations in questions a survey design is never
said to obtain required responses. Pilot surveys can handle the issues of survey
questionnaires [33], [48], [46].
P 9 Randomness of Participants: When a survey is conducted at a large
scale involving many respondents, sample selection must be done cautiously to
avoid any bias. Randomness of participants can handle the issue of bias, Garousi
et al.[41] used different publicity tools to achieve random set of samples, thereby
mitigating this issue.
P 10 Insufficient Sample Size: Insufficient sample size is the major threat
for any software engineering survey. Meaningful statistical evidences cannot be
obtained even when the parametric tests are applied on to a particular sample
due to insufficient size [83], [87].
P 11 Improper Participant Selection: Improper participant selection
happens when the selected respondent sample is not representative of the whole
Chapter 4. Results and Analysis 44
population [122]. It’s the main cause of issues like generalizability and biasness.
Automated way of selecting the respondents can reduce the impact of this issue
on the overall survey outcome [8]. Selecting the respondents based on a set of
criteria (that are defined at the survey instrumentation stage) can also reduce the
chances of improper selection [111].
P 12 Low participation rate:Low participation rates or low response rate
are one of the main problems faced in any kind of surveys. Low participation
of sample was mainly due to the busy schedules of respondents, poorly designed
survey layout, lack of awareness about survey and long surveys are one of the
reasons and can be improved by some ways like “emails for the survey were sent
from the personal email address to their own contacts because invitations from
known people are less likely to be tagged as spam” which was discussed in detail
in the further sections [40], [118], [88], [25], [3].
P 13 Sampling Method: Garousi et al.[42] discussed the motivation for sub-
ject selection based on Sampling methods. Authors describe the motivation for
researchers selecting convenience sampling over other techniques, reasons being
least expensive and troublesome. Thus a proper sampling technique is essen-
tial constituent of any good research. This problem of selecting wrong sampling
method can be handled by considering trade-offs between factors like anonymity
and reduction bias.
P 14 Lack of Motivation for Population Selection: Many researchers
fail to report their motivation for sample selection. Surveys of that kind are
difficult to replicate due to absence of openness. Wohlin et al.[121] showed that
if a research cannot be replicated, the purpose of doing it is not met. This shows
the motivation for selecting population [56].
P 15 People Perceptions: Perception of people answering the survey ad-
versely impacts the survey outcome. In software engineering a survey is done to
collect the attitudes, facts, and behaviors of respondents. Perceptions vary from
person to person which is the main reason, this why they are most important than
any other assessment method or tool. This issue cannot be mitigated completely
but can be handled to some extent [117].
P 16 Lack of Domain Knowledge: A posted survey could be answered
by the respondents without proper domain knowledge. This leads to misinter-
pretation of questionnaire resulting not participation of survey or giving wrong
answers. Lack of common understanding also leads to these kind of problems.
The inconsistent responses obtained due to such misinterpretation gives results
which have negative impact of survey outcomes [123], [15][83], [56] [46] stressed
the need for considering the impact of background influence of the subjects on
survey results while surveying.
P 17 Boredom: Sometimes respondents start answering the surveys but
they lose interest after sometime as the survey progresses, boredom leads to the
low response rate. Lengthy surveys might a reason for the respondents to feel
bored [40] Martini et al. [83] made interruptions in the middle so that respondents
Chapter 4. Results and Analysis 45
avoided.
P 18 Time Limitation: Time limitation limits the response rate in many
surveys. This factors influences every research at great extent. Darje et al.[88]
showed that time limitation is the main factor for respondents not answering
questionnaire or taking phone interviews. It can be clearly seen from these lines
“all the 13 respondents were asked to take part due to time limitation we obtained
only 9 responses.” Sometimes researchers neglect the responses obtained from the
actual subjects due to time limitation, following lines discuss about this issue “due
to rather low response rate and time limits, we have stopped on 33 responses which
covers 13.58% of the Turin ICT sector”[29].
P 19 Busy Schedules: Busy schedules are the main reasons for low re-
sponse rates in any survey involving industrial respondents [40][3]. Ji et al.[56]
commented that “busy executives likely ignore the questionnaires, sometimes their
secretaries finish the survey. In some case the responses obtained are filled with
out by the respondents without domain knowledge”, this gives the explanation
for the less quality responses obtained.
P 20 Inconsistency in Responses: Inconsistency in responses mainly arises
if the respondents lack common understanding. Respondents give two contrary
answers for the same question. For example, a respondent is asked about his
familiarity with non-functional requirements, he marks an YES option, when
asked to explain about it in next questions, he either skips that question or give
a wrong answer just for the sake of answering the survey. This problem can be
handled by posing same question in different ways [123].
P 21 Correctness of obtained Responses: In large scale surveys this
problem arises where respondents just answer the survey without thinking of its
final outcome. During analysis this problem poses a lot of work to the researcher
as they need to eliminate all the incorrect responses. Since survey is all about
getting a bigger picture of a particular issue, with incorrect responses researchers
fail to achieve it. They can be completely eliminated just by making the survey
strictly voluntary and collecting only the responses only from the respondents
who are willing to contribute [123].
P 22 Response Duplication: A major problem is faced in open-web surveys
is response duplication, where the same respondent answers the questionnaire
more than one time[81][33][48].
P 23 Generalizability: Survey’s main aim is to generalize findings to a larger
population. Generalizability increases survey’s confidence. Small sample size is
attributed as the main cause for the lack of generalizability. If generalizability is
not possible then the whole aim of the survey is not achieved [25] [122] [46] [125]
[17] [41].
P 24 Evaluation Apprehension: There are some people who wouldn’t be
comfortable being evaluated, which affects the outcome of any conducted study
[121]. It’s the same case with survey, sometimes respondents might not be in
a position to answers all the questions instead they shelter themselves by just
Chapter 4. Results and Analysis 46
selecting safer options. This affects the survey outcomes. Anonymity of subjects
reduced this problem of evaluation apprehension [46].
P 25 No Practical Usefulness: Any surveys that doesn’t prove to be useful
to the respondents, chances are much likely to skip the survey. Authors of [117]
clearly show this in the following lines “by far the study is interesting but to
whom are the results useful for?”. This issue can be handled by motivating the
respondents by giving description about survey outcomes and need for answering
survey.
P 26 Respondents Re-activity: This problem is generally overlooked and
can only be identified if the researcher thinks as a respondent as well. Respondents
while answering the questionnaire get into an idea and answer thinking that their
answers are right. This problem generally arises due to the question-order effect.
This problem is a respondent behavioral attribute and cannot be mitigated. This
problem if not mitigated drastically impacts the results. Researchers can only
make sure that this problem does not appear again by taking care of order of
questions [48].
P 27 Obvious Conclusions: When survey questionnaire is not clearly un-
derstood the respondents arrive at wrong conclusions about questions, as a result
they answer incorrectly [117].
P 28 Reliability: Reliability issue mainly occurs due to wrong answers or
misleading answers. For example, many respondents are not willing to admit
what they are truly working in their jobs positions. Some organizations face the
threat of evaluation apprehension. [15]. Pilot surveys can reduce the reliability
issues in a survey [87].
P 29 Credibility: For the survey methodology to be accepted by everyone,
the results need to be clearly presented. If not so, there are chances of study not
being considered. This internal validity threat can be eliminated by using coding
to categorize the responses obtained. Usage of more number codes for a single
answer increases its categorization capacity [47].
P 30 Confidentiality Issues: In some case software engineering researchers
would like to observe on-going trends in the industry or study about specific
industrial issues. But the software companies don’t allow the respondents to take
the survey due to the issue of confidentiality. This is problem was faced by one of
researchers in their survey “their companies wouldn’t allow employees to take this
survey due to concerns about confidentiality” [56]. This threat could be mitigated
by sending personal emails rather than system generated emails and by having
a follow-up with all those respondents till the survey ends. Even if this doesn’t
handle the issue then it’s better to have personal meeting to discuss about the
survey.
P 31 Bias: Bias or one-sidedness is the common problem during survey
process. There are different types of biases which damage the survey outcomes.
under present the theory involved, this is called mono-operation bias [121].
It can be avoided by collecting data from multiple sources, asking questions
clearly and framing different questions to address the same topic [46], [83].
process where one researcher works on the data extraction and other researcher
reports the results [111].
Convenient Clustering: Due to high number of problems obtained and
inability to validate every problem using Interviews, authors conveniently clus-
tered all 36 problems obtained. Finally a checklist of 15 problems were obtained
that were to be validated using Interviews. Initially problem dependencies were
checked for convenient clustering. The table showing the dependent and indepen-
dent issues is shown in Appendix B.
can be recruited for a survey using GitHub, it helps to gather the respondents
who are actively working their respective fields. In this way target population can
be identified, number of responses are obtained as expected, but the researcher
must take care of bias he introduces due to convenience sampling[19][48].
• Researchers can attract the respondents by giving rewards like Amazon points,
vouchers gifts. They have to be careful about the responses obtained, since re-
spondents might just answer survey for sake of rewards or answer it twice[19][23].
• It is always better to know the preference of respondents before sharing the
questionnaire, some might prefer answering online questionnaire while some are
reluctant being interviewed face to face[56].
• Pretesting questionnaire helps to identify the flaws in questionnaire[31]. Through
pre-trails it was identified that open ended questions have to be replaced with
close ended questions due to time limitation[56], this shows the need for the pi-
loting of survey questionnaires.
• Web servers can be used in global surveys for data collection, this is evident
from “usage of MIT secure web server reduced the privacy concerns of respondents
during data collection”[56].
• A pre-qualification question in survey helps to identify inconsistent responses[23].
• Researchers should avoid two point Likert scales ‘yes/no’ unless it’s a critical
situation, instead they are advised to use other multi-point scales[15].
• Over representation (presence of respondents with higher education qualifi-
cation, than an average survey respondent), increases the validity of obtained
responses[15].
• By default, data obtained from survey in software engineering is biased. Reason
is that survey just provides a snapshot of the existing tools, techniques etc that
provide evidences for creating a grounded research plan for any research. Since
only a snap shot can be obtained surveys results and they cannot be general-
ized to any context. This could be one reason why surveys are believed to be
impractical[117].
• Web advertising can be used to improve survey responses, reasons being the
existence of a broadly distributed respondents and the potential to associate with
them. It helps to increase generalizability of survey outcomes[40].
• Researcher’s must be open-minded and willing to collaborate with other re-
searchers to discuss their prospects, this can majorly prevent replication of studies[15].
Chapter 4. Results and Analysis 50
the need for the selection of a representative sample of the target population.
One must specify the objectives of selecting the target population clearly[26].
• Are the study objectives being addressed by the data analysis results?
2. Confidence Level: Even after careful selection there are chances that bad
sample is selected, that doesn’t represent the actual population. Researchers
are not confident about their choices, thus confidence level tells them how
much confident they are and whether the error tolerance is not exceeding
the precision specification. The confidence level can be obtained through
probability models like Standard Normal Distribution and Central Limit
Theorem.
Ch
apt
er4
. R
esu
ltsandAn
aly
sis 5
2
3
.P opu lat
ionSiz
e:P opul
a t
ionsizea l
soh asanimpactontheoutcomespro
-
v
id edou rpopula
tionsizeisv e
rysm a
ll
. Withtheassumpt
ionthatsimpl
e
rand oms ampl
ingisu sedforob tain
ingas ampleandthats ampl
esizeis
only,th enumbero fobtainedr e
sponses(notthenumb e
ro freque
stsfo
r
questionnai
rere
sponses
);the r
ea retwoformula
sforcal
cula
tingthesampl
e
s
iz edep end
ingonth epopulationsize
s.
•Wh
enpopu
lat
ioni
sla
rge
2
(
z pq)
n0=
(
e)
2
Wher
e, n0 isthesamplesize
zisapo intonth eabsc
iss
ao fthes
tanda
rdn o
rm alcurv
eth
atspec
ifiesth
e
confidenceleve
l.
pi sane st
im atedpropor
tionofattr
ibu
tepresentinapopula
tion
. and
q=1 p
eisth edes
iredleve
lofprecis
ion
.e=1 p r
ecis
i on.
•Whenpopu la
tionissm a
ll:
Whenthepopu lat
ionissm a
ll
,thenfini
tepopula
tionco
rre
lat
ionfac
tor(fp
c)
canb eemp loyedtoc a
lcul
atethesampl
esize
.fp cmeasu
restheextr
ap re
-
ci
siona ch
ievedwh ensamp les
izebec
om e
sclose
rt opopu
lat
ionsiz
e.U s
ing
fpcar evi
seds amplesizecanbeca
lcula
ted.
(n0)
nR = wh
ereX =(n0 1
)(N)
1+X
wher
enR i sth
er ev
isedsamp
leba
sedonthefpc
.
Ni sthepopula
tionsize
.
andn0 isthes
amp l
es i
zewhi
chi
sc a
lcu
lat
edinthep
rev
iou
sst
epwh
enth
e
popula
tioni
slarg
e.
•Whenth epopu
lat
ionisv e
rysmal
l(les
sth an2 0
0individu
als)
:
Ifth epopu
lat
ionsizeisverysmal
l,ar e
searcher mu
s tconduc
tcensu
s.A
censussamplehasallthem embe
rsofpopu l
a t
ioninas amp l
e. Whenth
e
samp lei
seithe
r2 00orlessth
en wholepopulationshou
ldb einc
ludedt
o
ach
i eveacons
ider
ableleve
lofpre
cis
ion.
4
.3.2 Impa
cto
f Re
spon
se Ra
teonSu
rvey Re
spon
ses
Outofa l
l6 6p rim a
rys tudiesonly3 8studie
sr eport
edth e
irrespon
serates
,b y
expl
ic
i t
lysp e
cifyingitasr esponserat
eo rgiv
ingth edeta
ilsofnumberofinv
itee
s
sentandr espons e
sob tained.1 4stud
iesdiscus
s edabouttherespons
esobtain
ed
intheirp a
rticularstudybu tn o
tabou ttheem ailinv
ita
tionssento
rth esample
sel
ected.
Inth er em a
inin g14s tud
iesresea
rcher
sh av
en otrepo
r t
edtherespons
e
Chapter 4. Results and Analysis 53
rates. Histogram in 4.5, shows the distribution of response rates by different pri-
mary studies. The available data cannot be generalized to a whole population,
we can infer that reporting of results by software engineering needs to improved.
We cannot get an estimate of overall response rates but we observe that way of
reporting the primary studies needs to be improved by the software engineering
researchers. Through the SLR study we found that all the primary studies dis-
cussed about the response rates they obtained. Every survey may not have the
same context, every survey might not lead to the expected outcomes and the pop-
ulation answering the survey differ from each survey. So, we cannot generalize the
results obtained for every survey. It must be clearly understood that increase in
response rate does not mean the generalizability of results is increased. But there
were few recommendations given by a primary study which would be a starting
point for calculating the impact of response rates in a survey.
Smitth et al [107] have addressed the issue of improving the participation of
developers in a survey. Authors have studied the existing literature and have
formulated a set of factors for improving the response rate of surveys. They
have divided the factors into two subsections to be more clear; the first one is
about persuasion research where the author studied drafted factors for improving
compliance and second category of factors were based on the authors experiences
Chapter 4. Results and Analysis 54
on conducting surveys. Below listed are the factors that help to improve the
response rate:
Based on Persuasion Research
• Reciprocity: The situation where respondents answer a survey for more
than one time, this helps to double the survey responses properly. Re-
searchers can induce reciprocity by giving rewards. Smith et al [107] were
not sure whether this practice was actually useful in software engineering
domain as the researchers themselves biasing their results.
• Authority and Credibility: The compliance for any kind of survey can
be increased by the author and credibility of a person who is administering
the survey. Researchers can utilize this benefit by providing the official
designations like Professor, Dr., Asst Professor in the signature of surveys
request mail. In this way response rate for a survey can be increased.
• Liking: Respondents tend to answer the surveys from the known persons.
The responsibility of gaining trust and liking of respondents lies on the
researchers. Liking can lead to increased response rate.
• Social Benefit: Authors describe that more respondents finish the survey
if it benefits to a large group instead of a particular community. Researchers
must convince the respondents that their survey benefits larger population
by explaining about the impact of survey outcome, and obtain high response
rate.
Chapter 4. Results and Analysis 55
• Timing: The time at which an email survey is sent also affects its response
rate. Sometimes respondents tend to clear their in-boxes and answer the
survey just for the sake of completion. Researchers should avoid sending sur-
vey invitations during office hours, weekdays. They have to be very careful
while selecting the time for dropping a mail, study shows that respondents
tend to answer emails right after their lunch[107].
When proper conditions are present the responses obtained using e-mail surveys
are more in number compared to the fax surveys or postal surveys, also the quality
of responses obtained using mails are more compared to other sources [107]
In the operation phase after the survey is done, Kitchenhamm and Pfleeger
Chapter 4. Results and Analysis 56
[67] describes that a researcher must follow the given sequence of techniques;
before doing the standard analysis on any Survey outcome.
From the coded data, data statistics and population statistics are possible. For
a sample obtained through non-probabilistic sampling the population statistics
cannot be generated [67]. Wohlin et al.[121] describes the standard data analysis
to be the first step of quantitative interpretation phase where data is represented
using descriptive statistics visualizing the central tendency, dispersion, depen-
dency etc. Next step being the data set reduction where invalid data points are
identified and excluded. Hypothesis testing is the third step where statistical
data evaluation is done at the given level of significance.
Final Result: From the literature study we infer that response rate and
sample size do impact the survey outcomes[67], [107]. They go hand-in-hand in
the whole survey process. More responses can be obtained if the sample size is
increased but the inverse is not possible. Even if more responses are obtained,the
sample size wouldn’t be the only deciding factor for obtaining responses.Literature
shows response rate depends on persuasion research and personal experience
as well and sample size depends on precision, confidence level and population
size.Researchers look for both quality and quantity of responses. Even though
sampling is done by the researchers the response rate is totally out of researcher’s
control. Pertaining to analysis techniques analysis techniques is selected based
on way in which survey is designed and type of scale used.This statement given
by Kitchenham and Pfleeger [67] validates our research outcomes “specific data
analysis a researcher needs depends on the survey design and scale type (nominal,
ordinal, interval, ratio, etc.)”. From the above discussion it can be seen that all
the three variables depends on derivable entities, opinions and design of survey;
Chapter 4. Results and Analysis 57
these are specific to particular survey and cannot be generalized to all the survey.
Thus, the impact of the variables can only be computed to a particular survey.
This sums up our result that impact of variables like sample size, response rate
and analysis techniques for any survey varies from one survey to other and cannot
be computed and generalized.
the transcripts prepared from all interviews. As explained above, our tran-
scripts were prepared immediately after the interviews. We have made field
notes during each and every interview to make sure that all the intervie-
wees exact view point and their suggestions about our research were penned
down during the interview itself. We have collected all these information
and documented as a part of this data extraction process. We have gone
through all the interview transcripts several times in order familiarize our-
selves about the information which we have extracted from our interviews
both verbally and non-verbally. We made sure that we have a clear idea of
all the information which we had extracted[22].
• Translation of Codes into themes: After all data was provided several
codes we have generated. All the codes were translated into several themes
according the information. Our main in translating the coded information
into themes was to obtain all similar information under one theme. This
will also help us in analyzing the information which we collected.
All the themes shown in the above Figure4.7 have been listed along
with their codes in APPENIDX-G
P1. Sampling Problems: All the interviewers have one thing in common,
they strongly believe that everyone who claims to use random and stratified
sampling have actually done convenience sampling. Reason for this being in-
feasibility to get a proper representative sample of the population. Using random
sampling everyone from the same domain cannot be included for analysis. Also
Chapter 4. Results and Analysis 60
the respondents selected using random sample lack motivation as they might
not know what for the survey is being done, or they might misinterpret the
survey, this way again noise is being introduced into the survey results. Some
said that random convenience is better option. Stratified sampling is believed to
be the hardest, expensive and time consuming, reason for this being not getting a
proper “strata” from the given population. Due to self-selection process which is
followed by them all of them recommended the usage of convenience snowballing.
In convenience snowballing the population characteristics are known before-hand,
researchers select respondents based on their choice. Questionnaire is then filled
and the respondents are asked to forward it to their peers. This way quality
responses are obtained. Sub-grouping the population for sake of sampling then
would reduce the chances of getting more response, this might also induce bias.
P2.Questionnaire Problems:Properly designing a questionnaire requires
more background work. This problem of poor questionnaire design can be mit-
igated but cannot be completely eliminated. A question formulation must be
done with great care. It must be understandable. Direct questions, consistency
questions, demographic, non-contradictory, non-overlapping, non-repeated, inde-
pendent questions must be asked to obtain vital information. Cross analysis must
be done to check respondent’s commitment. A question must have valid options.
With pilot sessions the questions can be reformulated to align with the objectives.
P3.Questionnaire Length: A questionnaire should be short and precise.
It must have a balance between time and number of questions. Interruptions
might occur while answering the questionnaire, researchers should expect this
while designing a survey.
P4. Open-ended & close-ended questions:A survey should have both
open-ended and close-ended questions. Close ended save time and are easy for
analysis, but open-ended give deeper insights about the study. Only committed
respondents fill open-ended which is the reason they are said to increase the con-
fidence of a researcher. By using proper constructs even, the respondents can
be categorized. Their number depends on the information need and depends on
research questions. Using free text boxes along with close-ended questions were
recommended by many researchers. When formulated with common understand-
ing results obtained from both the questions help to achieve efficient analysis
data.
P5.Question-order effect:This effect should be addressed in a survey. Ran-
domizing the questions won’t work in Software Engineering because logical ad-
herence might be lost. If the questions are self-contained then it can be done,
whereas respondent might lose the context of the question. There is a logical or-
der for asking questions which also help us to categorize the data. Branching can
be done for the questionnaire, after which randomization can be done. Survey
tool helps a lot in this regard. Randomize is possible if there are sub-groups of
questions and also be done few background questions.
P6.Likert Scale:Depends on the researcher whether to use them or not. They
Chapter 4. Results and Analysis 61
can be used to visualize the results. Improper usage of Likert scale confuses the
respondents, where they might go for a neutral point in odd scale. When even
scale is used the researcher is bringing respondent on to one side. Check-boxes
should be used after every question. Pre-survey should be done and analysis at
least 5 survey results must be visualized. Options must be mutually exclusive.
Odd can have used over even, so that there is a neutral point every time. 5-point
scale can be used as its more established in ICT. Researchers must have idea
about potential weaknesses of using scales. Culture and country must also be
considered while scale selection.
P7.Sensitive questions: These kind of questions must be generally avoided,
if asked they should be placed at the end. Respondents give answers based on the
level of abstraction in question context. Respondents expect to be anonymous
when answering such surveys. While doing global survey the Likert scale must
be consciously (due to cultural barriers). Respondents check for credibility of
source while answering these questions. No personal questions must be asked to
the respondents.
P8.Time Limitation:Researchers claim that having deadlines helps them to
get more responses. Average time should be mentioned. Don’t use countdown
timers for each questions, don’t close a survey if you’re done with the process.
Many researcher’s claim that a small survey which is 5 minutes is not a big issue,
then if its 10 min respondents the would be answering it even though its pushing
then, if its 15 minutes then its means a good survey. 20-minute surveys mainly
fail due to lack of responses.
P9. Domain Knowledge: This problem cannot be eliminated completely.
This is not a problem with mail based surveys as the respondent’s professional
background is known, the main problem is in case of open web surveys where
survey is being answered by many unknown individuals. A researcher must not
have any pre-conceptions or expectations at first. This can be mitigated by
targeting the specific population, motivating the respondents to be truthful about
their expertise, using demographic questions, doing cross questioning. There must
not be any loaded questions because respondents might become clueless. Remove
existing outliers in the survey. Simply putting options like “I don’t know” or “I
don’t want to answer” will encourage respondents to be truthful, also it helps
to rule-out inconsistent responses.One of the interviewees suggested performing
face-to-face interviews for further validation.
P10. Context: There are situations where respondents misunderstand the
question context, this is the common problem to every survey and cannot be
neglected. Simple things like maintaining questionnaire clarity, using appropriate
data collection method, inculcating trust in respondents, sticking to common
methodology can help researcher to deal with this problem. Researcher must
be approachable if there are any doubts to be cleared. Piloting with 5 to 10
people helps to design questionnaire clearly. Trust plays an important role in this
context. At first internal research validation should be employed then followed
Chapter 4. Results and Analysis 62
by piloting.
P11. Hypothesis Guessing: This is not a problem in case of exploratory
surveys.Hypothesis guessing can be eliminated by not asking loaded questions,
not prompting the respondents, by having more options for a question rather
than one and selecting potential candidates. Usage of indirect questions and
direct questions play a major role in testing the respondent whether he is on the
right track or not. Respondents should not be influenced instead they should be
motivated to be truthful on their part.The researchers mustn’t be at stake to the
respondents to avoid hypothesis guessing.
P12. Language Issues: This is one of the major problems when conducting
global surveys. This cannot be completely eliminated but can be mitigated. Being
as flat as possible, using local translations, checking understand-ability, consulting
senior researchers, contacting the researchers in same domain from same origin.
A survey if it has to be rolled it must be done through a proper channel so that
it reaches target population. A survey questionnaire in two different languages
would be of great help for interpreting. In such cases researcher might get radical
answers. Google translate must not be used for language translations.
P13. Cultural Issues: This happens in case of global surveys. It is over
hyped sometimes or used as an excuse. In Software Engineering this is the prob-
lem of varied cultures. Pre-surveys can give information about what to expect
from a survey and also the information of any overlooked flaws, etc. Using proper
nomenclature, sticking to a common base, avoiding sensitive questions and being
clear and concise with your questionnaire, doing meta-analysis and the cause iden-
tification will help researchers to handle cultural issues. Whenever possible the
researchers should use face-to-face interviews to gain trust of the respondents and
get better insights. Researchers must also keep in mind the individual differences.
P14. Generalizability: Ideally generalizability isn’t achievable. The main
reason behind this is researchers cannot explicitly define and target the popula-
tion. To make generalization happen instead they should anonymize the results.
One-way researcher’s try to generalize the results, other way they are unknowingly
introducing the bias as well. Demographic question helps to easily categorize the
obtain data. Proper analysis method and reporting helps researcher to generalize
the results involving some constraints. Applicability and survey value diminishes
over time, since survey is just a snapshot of a particular situation.
P15.Reliability:In order to ensure reliability, the researchers must check
whether the respondents are really committed towards the survey or not. Demo-
graphic, consistent, redundant questions can help to find the reliable responses
in a survey. It’s important to rule out the people with no hidden agenda or else
they will spoil the survey outcomes. Before removing any outcome, there must
be a proper evidence that explains the impact due to its presence. The reliability
of results might vary from one situation to other.
P16. Bias: It’s very difficult to prevent. It happens due to process repetition.
No survey process is complete without addressing bias. Researcher needs to
Chapter 4. Results and Analysis 63
identify the bias and report it correctly in his study. Researcher must know how
to balance respondents and reliability. Bias mainly happens due to irrelevant
answers; they must be removed. Researcher’s must be open to questions and
shouldn’t mask their mistakes. Repeating the process and investigating the same
thing again will reduce bias and give more data patterns. Researchers must know
their respondents. Any kind of bias must be admitted. Bias varies from one
network to another.
P17. Low response Rate:The response rate can be increased if the ques-
tionnaire has no contradictory questions, no overlapping questions. Personal
Involvement must be there by the researcher. Respondents mustn’t be baited
instead they have to motivated on why they should perform the survey and the
benefits they derive out of it. Anonymity must be preserved. Convenience snow-
balling can fetch additional number of responses if extended to LinkedIn, most
visited blogs and forums. Posting and re posting the survey link in such social
networks will make it be on the top and helps to obtain diversified responses.
There is a major risk of reliability and generalizability. Attending the confer-
ences related to the survey domain can also increase response rate. Respondents
must be selected with great care. Survey time and questionnaire length must be
specified beforehand. Questions must be framed in such a way that feeling of
being assessed is masked for the respondents.Since, incorrect responses are ob-
tained in every survey, researchers need to properly identify them and motivate
the reason behind exclusion of such responses.
P18.Inconsistency:Inconsistency can be identified by including mandatory
questions, redundant questions, optional questions. many researchers take this an
excuse for exclusion. Barriers between questions must be included and triangula-
tion between questions must be done. Cookies can be tracked for each question.
Researcher must honestly report the inconsistencies in his document.
P19.Response Duplication:Its not feasible to know. It is not a problem in
paper based surveys. This cannot be eliminated. It can be identified and handled
by crosschecking IP addresses, a consistency check can be made thought out the
survey, one-time links can be sent directly to the mails, survey tools monitor the
duplication. Researcher should specify upfront about the requirements and his
need for genuine support. Tracking session cookies while respondents answer the
survey gives information about how many times did the respondent paused and
resumed while answering. Course analysis can be done for additional help.
P20. Rewards:Never give rewards, it itself shows a researcher is biasing
his/her research. If rewards are given should be given at the end not at the
upfront, then one way you are taking only committed responses. Motivate the
respondents by telling the outcome of your research and encourage them to take
up your survey. If rewards were to be given then handover to each respondent
personally, this might reduce the members for taking surveys just for the sake of
rewards.
Chapter 4. Results and Analysis 64
• Reproduce knowledge.
• Pre-qualification question.
Chapter 5
Discussions and Limitations
5.1 Discussions
This section gives answers for the research questions by comparing the results
obtained in SLR and Interviews. We included our critical assessments and reflec-
tions of our research in this section. These discussions provide the base for our
results which were reported in the Results. First the comparison of results from
both the studies for each research question is described followed by the research
findings.
By conducting systematic literature review we obtained the list of common
problems faced by the researchers and also includes the effect of variables on sur-
vey outcomes, in the survey process. We have listed all the problems with their
description in SLR results section. We then validated few problems by conduct-
ing face to face interviews with software engineering researchers. The constraints
like 1-hour time for interview, busy schedules of researchers limited us to include
few problems in the Interview process.The problems which were validated using
interviews were conveniently clustered and the clustering table is given in Ap-
pendix B.The problems which were validated using interviews were finalized after
the brainstorming sessions of two researchers and discussions with the supervisor.
RQ1: What are the common problems faced by the software engineering re-
searcher’s while conducting a Survey?
P 1 Sampling Method: In the literature study many authors described
the usage of convenience sampling methods, less expensive and troublesome im-
plementation were the two reasons motivated to do so. This was validated by
researchers by suggested their usage of convenience sampling in their research.
Some had also discussed their usage of convenience snowballing and random con-
venience sampling in their research. Researchers argued that in many published
studies authors write they used random sampling but in the actual sense they
are using convenience, reasoning for this was the authors contacting peers for an-
swering their surveys but reporting the usage of random sampling. Further they
cautioned about biasing the results due to convenience sampling.
P 2 Survey Questionnaire Design: In literature study authors reported
65
Chapter 5. Discussions and Limitations 66
tions was another recommendation. Including the options “I don’t know” and “I
don’t want to answer” in questionnaire was another recommendation.
P 7 Hypothesis Guessing: In literature authors tried to mitigate this
issue by stressing on respondent’s honesty in survey’s introduction and posting a
video to express clarity. Apart from these two discussed about the interviewees
mentioned using a proper sample selection, not asking loaded questions. In SLR
and interviews, hypothesis guessing was categorized as the major problem.
P 8 Translation Issues: Both in theory and practice, researchers used
translators of same origin for translating their surveys. Additionally, interviewees
recommended drafting survey in different languages by involving researchers of
particular country working in same domain. They cautioned the use of Google
Translate as the whole meaning of questionnaire might change.
P 9 Culture Issues: In literature it was defined as a problem and only one
mitigation strategy was discussed in literature. In interviews some researchers
didn’t consider it a major issue, while other suggested the usage of common
base, avoiding sensitive questions, contradictory questions; sticking to a basic
language. Meta-analysis and identifying the cause of cultural issue was done by
two researchers in their researcher when facing this issue. Doing face to face
interviews when facing cultural issue was one of the recommendation given.
P 10 Response Duplication: This is viewed as a major problem in every
software engineering. Tracking IP addresses was the mitigation strategy pub-
lished in the literature. Interviewees described that response duplication is the
problem of open web surveys, they suggested that tracking IP addresses, consis-
tency checks, one-time links, using survey tools minimize its effect. They said it
cannot be eliminated instead can only be handled to some extent.
P 11 Inconsistency of Responses: Cross questioning was the strategy
used by the authors in literature to handle the inconsistency of responses. Inter-
viewees discussed that inconsistency shouldn’t be used an excuse to exclude the
respondents. Triangulation in questioning, identification and honest reporting
were suggested as recommendations.
P 12 Bias: Mono-operation bias, over-estimation bias and social desirability
bias were three types of biases identified from the literature study, data collec-
tion from multiple sources was the only mitigation strategy suggested for over-
estimation bias. We could not find the mitigation strategy for mono-operation
bias in our SLR study. When interviewees were asked about biasing problem in
surveys, they said it cannot be completely eliminated but can only be handled to
some extent. They described that bias must be identified and reported clearly.
In their opinion giving rewards biases the whole research process, piloting process
can also reduce biasing in survey results.
P 13 Reliability: Every survey has the problem of reliable responses; it
might be due to wrong answers given in the survey. Literature showed us the
reason for this might be the respondents trying to be positive. Pilot studies were
used by authors to handle the issue of reliability. Interviewees highlighted the
Chapter 5. Discussions and Limitations 68
• No sub-grouping.
2 Poor Questionnaire • Use consistent, demographic,
Design non-contradictory, non-
overlapping, non-repeated,
combinatorial questions.
• Do a pre-survey.
5 Time Limitation • Specify average time in survey in-
troduction.
• No loaded questions
8 Language Issues • Native Language Translators.
• Do meta-analysis.
• Piloting.
• Admit bias.
13 Inconsistency of • Do Cross questioning.
Responses
• Barriers between questions.
• Motivate respondents to be
truthful.
Chapter 5. Discussions and Limitations 72
• Do consistency checks.
• Preserve anonymity.
RQ2: How do the variables like sample size, response rate and analysis meth-
ods affect the survey outcomes?
motivated us to openly report our results that we could not achieve the required
outcomes for RQ2. Based on the selected primary studies we reported how the
sample size is selected; the number of studies properly reporting response rates
and the list of all analysis techniques chosen by primary studies along with the
mostly chosen analysis technique. Poor reporting which was found as a major
issue in Software Engineering was also validated from the findings Cater-Steel et
al.[15]. In this way we addressed the second research question in our research.
RQ3. What are the experiences gained by software engineering researchers
by conducting surveys?
This research question was framed with the motivation to provide validation
for the two previous research questions. Through interview process we validated
two research questions. We also collected the experiences of software engineering
researchers. During interviews we were asking them for the need for the check-
list of possible problems and mitigation strategies. This shows the need for the
problems checklist for conducting surveys in software engineering.
Interviewee No 1 2 3 4 5 6 7 8
Rating given 5 3 4 5 5 3 6 5
In the above table rating values were considered to be from 1-not needed to
5-very important.
First article is a study of nine surveys that were classified according to their
use. It is seen that almost all the surveys had demographics included. Five
point Likert scales were used by the authors. Leadership, policies and procedures
staffing communication and reporting were the common dimensions which the
authors to review the surveys. Surveys used different analysis techniques like item
analysis, exploratory factor analysis, confirmatory factor analysis, Cronbach’s
alpha analysis, test/retest reliability, composite scores combination and variance
analysis were few analysis techniques used by the surveys highlighted in this
article [20].
Our research can be made applicable to the above context, since the authors
specified the usage of Likert scales, analysis techniques, reporting issues, etc. We
tried to obtain suitable way of addressing Likert scale problems, mostly used
analysis technique was highlighted and also we found that researchers lack in
reporting their results.
Second article is a review of 62 surveys from 12 countries that included
almost 23,000 prisoners. Prevalence was calculated to find a pattern and link
Chapter 5. Discussions and Limitations 75
all the data points in the selected population. Chi-2 tests were used by the
authors to compute prevalence. Weighted mean was considered during calculation
of the baseline for prevalence. Demographics were given higher priority while
categorizing the results. A large survey was carried out which consisted of such
demographic questions. Simple random sample was the sampling strategy chosen
by the authors for carrying out their research [34].
Our research would help the authors of above article in giving suggestions for
asking demographic questions and an idea of possible problems due to improper
sampling strategy selection. Authors get benefited in using our research for get-
ting better outcomes.
Third article examines the prevalence and patterns of using Chinese medicine
in treating cancer patients. Authors studied nearly 99 studies after going through
411 studies. Authors considered demographics like year, country, group setting,
data collection methods, questionnaire types, sampling methods etc. as the basic
characteristics to categorize the studies[14].
Similar to first article considered our study would help authors of third ar-
ticle to understand better about the sampling techniques, questionnaire design
patterns, selection of analysis techniques etc.
This was just an attempt to show how our study could be applied to the
surveys in other domains. Studies focusing on surveys from software engineering
domain and social sciences domain must be conducted and the to fill the differen-
tiating gap that between various fields. This kind of work in turn helps researches
who are publishing work on surveys to generalize their findings. Since this kind
of research is out of our scope, it is mentioned in the future work section of this
document.
Neglecting the paid articles and only considering freely accessible articles is a
threat since all literature might not be covered. This threat couldn’t be mitigated,
but we tried to check other databases, search over Google and check the author
websites to get access the paid articles if they seem to be highly relevant to our
study.
Another internal validity threat due to poor instrumentation was handled
by maintaining the proper data extraction forms, one data extraction form was
maintained by each researcher, initially data was extracted from a set of articles
one researcher then the data extraction was done by second researcher on same
set. Both the forms were then compared and further discussions were made on
inclusion and exclusion criteria. Inspections from supervisor also ensured the
quality of our instrumentation.
Improper clustering could be another possible threat since both authors were
novice researchers. This threat was mitigated by consulting supervisor. Including
the clustering table in Appendix B helps the future readers to understand our
grouping and analyze document accordingly.
Threat to instrumentation in interviews is concerned with the questionnaire
design. Before designing the questionnaire, the objectives of conducting an inter-
view have been clearly defined. After brainstorming and discussions with super-
visor the questionnaire was designed. It was made sure that questionnaire was
understood by the interviewee, if not they were clearly asked and the context of
the question was clearly explained. Sound recordings were used to collect data
during interviews, software’s were used to transcribe the data to avoid losses.
The researchers have considered and validated the problems that were only
found from the obtained primary studies. Due to the exclusion criteria there are
chances that we might have excluded few studies with problems. The mitigation
strategy employed was listing the titles of all the 66 primary studies considered.
In this way a researcher will have an idea of what kind of problems were addressed
during this research.
External validity: The condition that limits a researcher’s ability to gen-
eralize the experimental findings is called external validity[121]. In our case we
have conducted interviews to validate our SLR results. There could be an issue
of generalizability. Through interviews we can only get experiences from a selec-
tive sample population. Instead of the issue of generalizability of results we tried
make them more applicable. To make this happen we have selected researchers
from a broader experience spectrum, our sample included novice researchers (PhD
students who had 3-4 years of experience); competent researchers (8-10 years of
experience) and well experienced researchers (who had 30 years of experience).
However, we managed to interview 8 software engineering researchers which would
be a sufficient number for generalizing our results.
Construct validity: This threat deals with the relationship that exists be-
tween the observation and theory, in other ways it’s the extent to which the
experimental setting is actually reflected in a study[121]. This threat can be of
Chapter 5. Discussions and Limitations 77
relevance to the research design. In the SLR process, the search string used might
not cover all the research articles published in the domain. We minimized this
threat by designing a search string by discussing with the librarian and supervisor.
Thesaurus was uses to search for synonyms. After search string was designed, it
was used only after our supervision had tested it.
Assigning codes: While coding interviews data, chances are that we might
have wrongly interpreted and coded. To mitigate this threat, the data after
coding was crosschecked with actual descriptions from interviews. Initial SLR
study helped us to gain deeper insights on surveys and helped us to mitigate this
issue.
Conclusion validity: Sometimes a researcher cannot draw conclusions from
the existing data, these kinds of issues come under conclusion validity[121]. Con-
clusion validity can be defined when synthesizing the data obtained from SLR and
interviews to obtain interviews. There were research objectives initially drafted,
both the researchers had clear of what were the expected outcomes, constant revi-
sion of the obtained data and meetings with the supervisor helped us to mitigate
this threat and made it possible to finally achieve the expected outcomes.
Scope: Attributed to the problem domain’s complexity which was initially se-
lected, there was a risk that thesis work may be misunderstood. To mitigate this
risk, the scope of our thesis project was adjusted with the assistance of thesis
supervisor and also considering the research contribution. The issue of adjusting
the scope was based on the consent between supervisor and the authors. The
final scope in which research was carried out has been specified in Chapter 1 of
this report to avoid misunderstanding.
Chapter 6
Conclusions and Future Work
6.1 Conclusions:
In this research authors identified the common problems faced by the software
engineering researchers while conducting surveys. Initially a systematic literature
review was conducted to identify common problems faced by software engineering
researchers in the survey process. Then face-to-face interviews were conducted to
gather the opinions of software engineering researchers. Then research findings
from both SLR and Interview were compared and a checklist of problems along
with mitigation strategies were drafted.
Some problems identified by systematic literature review had mitigation strate-
gies while others were only specified as problems in primary studies. Few prob-
lems were shortlisted and the interviewees (software engineering researchers) were
interviewed about their way of mitigating such problems. The interviewees opin-
ions upon analysis showed that along with the mitigation found in SLR, other
strategies were also being used. Their opinions were color coded and presented
as analysis.
Along with the problems the aim of our study was to investigate the impact
of sample size, response rate and analysis techniques on the survey outcomes.
Through literature review it was found that, selection of such variables varies
from survey to survey. These findings were supported by the arguments given by
the interviewees about the variables dependence on type of research, researcher
choice, research questions, scales used etc. The reason of variable dependency
was the major limitation to our study that prevented us from further continuing
our research.
Research Contribution:
78
Chapter 6. Conclusions and Future Work 79
80
BIBLIOGRAPHY 81
[12] Virginia Braun and Victoria Clarke. Using thematic analysis in psychology.
Qualitative research in psychology, 3(2):77–101, 2006.
[13] Ricardo Britto, Emilia Mendes, and Jürgen Börstler. An empirical inves-
tigation on effort estimation in agile global software development. In 2015
IEEE 10th International Conference on Global Software Engineering, pages
38–45. IEEE, 2015.
[14] Bridget Carmady and Caroline A. Smith. Use of Chinese medicine by cancer
patients: a review of surveys. Chinese medicine, 6(1):1, 2011.
[15] Aileen Cater-Steel, Mark Toleman, and Terry Rout. Addressing the chal-
lenges of replications of surveys in software engineering research. In 2005
International Symposium on Empirical Software Engineering, 2005., pages
10–pp. IEEE, 2005.
[17] Yguaratã Cerqueira Cavalcanti, Paulo Anselmo da Mota Silveira Neto, Ivan
do Carmo Machado, Eduardo Santana de Almeida, and Silvio Romero
de Lemos Meira. Towards understanding software change request assign-
ment: a survey with practitioners. In Proceedings of the 17th International
Conference on Evaluation and Assessment in Software Engineering, pages
195–206. ACM, 2013.
[18] Marcus Ciolkowski, Oliver Laitenberger, Sira Vegas, and Stefan Biffl. Prac-
tical experiences in the design and conduct of surveys in empirical soft-
ware engineering. In Empirical methods and studies in software engineering,
pages 104–128. Springer, 2003.
[19] Jürgen Cito, Philipp Leitner, Thomas Fritz, and Harald C. Gall. The mak-
ing of cloud applications: An empirical study on software development for
the cloud. In Proceedings of the 2015 10th Joint Meeting on Foundations
of Software Engineering, pages 393–403. ACM, 2015.
[21] Reidar Conradi, Jingyue Li, Odd Petter N. Slyngstad, Vigdis By Kamp-
enes, Christian Bunse, Maurizio Morisio, and Marco Torchiano. Reflections
on conducting an international survey of Software Engineering. In 2005
International Symposium on Empirical Software Engineering, 2005., pages
10–pp. IEEE, 2005.
[22] Daniela S. Cruzes and Tore Dyba. Recommended steps for thematic synthe-
sis in software engineering. In 2011 International Symposium on Empirical
Software Engineering and Measurement, pages 275–284. IEEE, 2011.
[23] Ermira Daka and Gordon Fraser. A survey on unit testing practices and
problems. In 2014 IEEE 25th International Symposium on Software Relia-
bility Engineering, pages 201–211. IEEE, 2014.
[26] Rafael Maiani de Mello, Pedro Correa da Silva, and Guilherme Horta
Travassos. Sampling improvement in software engineering surveys. In Pro-
ceedings of the 8th ACM/IEEE International Symposium on Empirical Soft-
ware Engineering and Measurement, page 13. ACM, 2014.
[27] Rafael Maiani de Mello, Pedro Corrêa Da Silva, and Guilherme Horta
Travassos. Investigating probabilistic sampling approaches for large-scale
surveys in software engineering. Journal of Software Engineering Research
and Development, 3(1):1, 2015.
[28] Tore Dyba. An empirical investigation of the key factors for success in soft-
ware process improvement. IEEE Transactions on Software Engineering,
31(5):410–424, 2005.
[29] Evgenia Egorova, Marco Torchiano, and Maurizio Morisio. Evaluating the
perceived effect of software engineering practices in the Italian industry.
In International Conference on Software Process, pages 100–111. Springer,
2009.
[31] Asim El Sheikh and Haroon Tarawneh. A survey of web engineering prac-
tice in small Jordanian web development firms. In The 6th Joint Meeting on
BIBLIOGRAPHY 83
[43] Kiely Gaye, Finnegan Patrick, and Butler Tom. MANAGING GLOBAL
VIRTUAL TEAMS: AN EXPLORATION OF OPERATION AND PER-
FORMANCE. 2014.
[44] Yaser Ghanam, Frank Maurer, and Pekka Abrahamsson. Making the leap
to a software platform strategy: Issues and challenges. Information and
Software Technology, 54(9):968–984, 2012.
[45] Lisa M. Given. The Sage encyclopedia of qualitative research methods. Sage
Publications, 2008.
[46] Tony Gorschek, Ewan Tempero, and Lefteris Angelis. A large-scale em-
pirical study of practitioners’ use of object-oriented concepts. In 2010
ACM/IEEE 32nd International Conference on Software Engineering, vol-
ume 1, pages 115–124. IEEE, 2010.
[49] Michaela Greiler, Arie van Deursen, and Margaret-Anne Storey. Test con-
fessions: a study of testing practices for plug-in systems. In 2012 34th
International Conference on Software Engineering (ICSE), pages 244–254.
IEEE, 2012.
[50] Heather Haeger, Amber D. Lambert, Jillian Kinzie, and James Gieser. Us-
ing cognitive interviews to improve survey instruments. Association for
Institutional Research, New Orleans, 2012.
[53] Irum Inayat and Siti Salwah Salim. A framework to study requirements-
driven collaboration among agile teams: Findings from two case studies.
Computers in Human Behavior, 51:1367–1379, 2015.
BIBLIOGRAPHY 85
[54] Martin Ivarsson and Tony Gorschek. A method for evaluating rigor and
industrial relevance of technology evaluations. Empirical Software Engi-
neering, 16(3):365–395, 2011.
[55] Samireh Jalali and Claes Wohlin. Systematic literature studies: database
searches vs. backward snowballing. In Proceedings of the ACM-IEEE inter-
national symposium on Empirical software engineering and measurement,
pages 29–38. ACM, 2012.
[56] Junzhong Ji, Jingyue Li, Reidar Conradi, Chunnian Liu, Jianqiang Ma, and
Weibing Chen. Some lessons learned in conducting software engineering
surveys in china. In Proceedings of the Second ACM-IEEE international
symposium on Empirical software engineering and measurement, pages 168–
177. ACM, 2008.
[57] James Jiang and Gary Klein. Software development risks to project effec-
tiveness. Journal of Systems and Software, 52(1):3–10, 2000.
[58] F. Joseph Jr. Hair Jr, William C. Black, Barry J. Babin, and Rolph E.
Anderson, Multivariate data analysis. Prentice Hall, 2009.
[60] Katja Karhu, Ossi Taipale, and Kari Smolander. Investigating the relation-
ship between schedules and knowledge transfer in software testing. Infor-
mation and Software Technology, 51(3):663–677, 2009.
[61] Mark Kasunic. Designing an effective survey. Technical report, DTIC Doc-
ument, 2005.
[62] Daljit Kaur and Parminder Kaur. Software Development Life Cycle Secu-
rity Issues. In 2ND INTERNATIONAL CONFERENCE ON METHODS
AND MODELS IN SCIENCE AND TECHNOLOGY (ICM2ST-11), vol-
ume 1414, pages 237–239. AIP Publishing, 2011.
[66] Barbara Kitchenham, O. Pearl Brereton, David Budgen, Mark Turner, John
Bailey, and Stephen Linkman. Systematic literature reviews in software
engineering–a systematic literature review. Information and software tech-
nology, 51(1):7–15, 2009.
[67] Barbara Kitchenham and Shari Lawrence Pfleeger. Principles of survey
research: part 5: populations and samples. ACM SIGSOFT Software En-
gineering Notes, 27(5):17–20, 2002.
[68] Barbara Kitchenham and Shari Lawrence Pfleeger. Principles of survey
research part 6: data analysis. ACM SIGSOFT Software Engineering Notes,
28(2):24–27, 2003.
[69] Barbara A. Kitchenham and Shari L. Pfleeger. Personal opinion surveys. In
Guide to Advanced Empirical Software Engineering, pages 63–92. Springer,
2008.
[70] Barbara A. Kitchenham, Shari Lawrence Pfleeger, Lesley M. Pickard, Pe-
ter W. Jones, David C. Hoaglin, Khaled El Emam, and Jarrett Rosenberg.
Preliminary guidelines for empirical research in software engineering. IEEE
Transactions on software engineering, 28(8):721–734, 2002.
[71] Charles W. Knisely and Karin I. Knisely. Engineering Communication.
Cengage Learning, 2014.
[72] S. Arun Kumar and Arun Kumar Thangavelu. Factors affecting the out-
come of Global Software Development projects: An empirical study. In
Computer Communication and Informatics (ICCCI), 2013 International
Conference on, pages 1–10. IEEE, 2013.
[73] Anne Lacey and Donna Luff. Qualitative data analysis. Trent focus
Sheffield, 2001.
[74] Effie Lai-Chong Law and Marta Kristín Lárusdóttir. Whose experience do
we care about? Analysis of the fitness of scrum and kanban to user expe-
rience. International Journal of Human-Computer Interaction, 31(9):584–
602, 2015.
[75] Lucas Layman, Forrest Shull, Paul Componation, Sue O’Brien, Dawn Saba-
dos, Anne Carrigy, and Richard Turner. A methodology for mapping system
engineering challenges to recommended approaches. In Systems Conference,
2010 4th Annual IEEE, pages 294–299. IEEE, 2010.
[76] Jingyue Li, Finn Olav Bjørnson, Reidar Conradi, and Vigdis B. Kamp-
enes. An empirical study of variations in COTS-based software development
processes in the Norwegian IT industry. Empirical Software Engineering,
11(3):433–461, 2006.
BIBLIOGRAPHY 87
[78] Johan Linaker, Sardar Muhammad Sulaman, Martin Höst, and Rafael Ma-
iani de Mello. Guidelines for Conducting Surveys in Software Engineering
v. 1.1. 2015.
[80] Mark S. Litwin. How to measure survey reliability and validity, volume 7.
Sage Publications, 1995.
[81] Garm Lucassen, Fabiano Dalpiaz, Jan Martijn EM van der Werf, and Sjaak
Brinkkemper. The use and effectiveness of user stories in practice. In In-
ternational Working Conference on Requirements Engineering: Foundation
for Software Quality, pages 205–222. Springer, 2016.
[83] Antonio Martini, Lars Pareto, and Jan Bosch. A multiple case study on
the inter-group interaction speed in large, embedded software companies
employing agile. Journal of Software: Evolution and Process, 28(1):4–26,
2016.
[84] Liz Millikin. SurveyGizmo | Professional Online Survey Software & Tools.
[85] Jeffrey Moore, Joanne Pascale, Pat Doyle, Anna Chan, and Julia Klein
Griffiths. Using field experiments to improve instrument design: The SIPP
methods panel project. Methods for testing and evaluating survey question-
naires, pages 189–207, 2004.
[86] Daniel Méndez Fernández and Stefan Wagner. Naming the pain in require-
ments engineering: A design for a global family of surveys and first results
from Germany. Information and Software Technology, 57:616–643, January
2015.
[87] Paulo Anselmo da Mota Silveira Neto, Joás Sousa Gomes, Eduardo Santana
De Almeida, Jair Cavalcanti Leite, Thais Vasconcelos Batista, and Larissa
Leite. 25 years of software engineering in Brazil: Beyond an insider’s view.
Journal of Systems and Software, 86(4):872–889, 2013.
BIBLIOGRAPHY 88
[88] Indira Nurdiani, Ronald Jabangwe, Darja Šmite, and Daniela Damian. Risk
identification and risk mitigation instruments for global software develop-
ment: Systematic review and survey results. In 2011 IEEE Sixth Interna-
tional Conference on Global Software Engineering Workshop, pages 36–41.
IEEE, 2011.
[89] Mark C. Paulk, Dennis Goldenson, and David M. White. The 1999 survey
of high maturity organizations. 2000.
[90] Javier Pereira, Narciso Cerpa, June Verner, Mario Rivas, and J. Drew Pro-
caccino. What do software practitioners really think about project success:
A cross-cultural comparison. Journal of Systems and Software, 81(6):897–
907, 2008.
[91] Shari Lawrence Pfleeger. Experimental design and analysis in software
engineering. Annals of Software Engineering, 1(1):219–253, 1995.
[92] Shari Lawrence Pfleeger and Barbara A. Kitchenham. Principles of survey
research: part 1: turning lemons into lemonade. ACM SIGSOFT Software
Engineering Notes, 26(6):16–18, 2001.
[93] Robert T. Plant and Panagiotis Tsoumpas. A survey of current practice
in aerospace software development. Information and Software Technology,
37(11):623–636, 1995.
[94] Danny CC Poo and Mui Ken Chung. Software engineering practices in
Singapore. Journal of Systems and Software, 41(1):3–15, 1998.
[95] Teade Punter, Marcus Ciolkowski, Bernd Freimut, and Isabel John. Con-
ducting on-line surveys in software engineering. In Empirical Software En-
gineering, 2003. ISESE 2003. Proceedings. 2003 International Symposium
on, pages 80–88. IEEE, 2003.
[96] Jane Radatz, Anne Geraci, and Freny Katki. IEEE standard glossary of
software engineering terminology. IEEE Std, 610121990(121990):3, 1990.
[97] Austen Rainer and Tracy Hall. A quantitative and qualitative analysis
of factors affecting software processes. Journal of Systems and Software,
66(1):7–21, 2003.
[98] Ayushi Rastogi, Arpit Gupta, and Ashish Sureka. Samiksha: mining issue
tracking system for contribution and performance assessment. In Proceed-
ings of the 6th India Software Engineering Conference, pages 13–22. ACM,
2013.
[99] Jörg Rech, Eric Ras, and Björn Decker. Intelligent assistance in german
software development: A survey. IEEE software, 24(4):72–79, 2007.
BIBLIOGRAPHY 89
[100] C. Robson. Real World Research: A Resource for Social Scientists and
Practitioner Researchers Blackwell. Oxford, 1993.
[101] Colin Robson and Kieran McCartan. Real world research. John Wiley &
Sons, 2016.
[102] Mark Rodgers, Amanda Sowden, Mark Petticrew, Lisa Arai, Helen Roberts,
Nicky Britten, and Jennie Popay. Testing methodological guidance on the
conduct of narrative synthesis in systematic reviews effectiveness of in-
terventions to promote smoke alarm ownership and function. Evaluation,
15(1):49–73, 2009.
[105] Helen Sharp, Mark Woodman, and Fiona Hovenden. Tensions around the
adoption and evolution of software quality management systems: a dis-
course analytic approach. International journal of human-computer studies,
61(2):219–236, 2004.
[107] Edward Smith, Robert Loftin, Emerson Murphy-Hill, Christian Bird, and
Thomas Zimmermann. Improving developer participation rates in surveys.
In Cooperative and Human Aspects of Software Engineering (CHASE), 2013
6th International Workshop on, pages 89–92. IEEE, 2013.
[108] Colin Snook and Rachel Harrison. Practitioners’ views on the use of formal
methods: an industrial survey by structured interview. Information and
Software Technology, 43(4):275–283, 2001.
[109] Prashanth Harish Southekal and Ginger Levin. Validation of a generic GQM
based measurement framework for software projects from industry practi-
tioners. In Cognitive Informatics & Cognitive Computing (ICCI* CC), 2011
10th IEEE International Conference on, pages 367–372. IEEE, 2011.
[111] Rodrigo Oliveira Spínola and Guilherme Horta Travassos. Towards a frame-
work to characterize ubiquitous software projects. Information and Software
Technology, 54(7):759–785, 2012.
[113] Shahida Sulaiman, Norbik Bashah Idris, and Shamsul Sahibuddin. Pro-
duction and maintenance of system documentation: what, why, when and
how tools should support the practice. In Software Engineering Conference,
2002. Ninth Asia-Pacific, pages 558–567. IEEE, 2002.
[114] Richard H. Thayer, Arthur Pyster, and Roger C. Wood. Validating solutions
to major problems in software engineering project management. Computer,
15(8):65–77, 1982.
[115] Richard H. Thayer, Arthur B. Pyster, and Roger C. Wood. Major issues in
software engineering project management. IEEE Transactions on Software
Engineering, (4):333–342, 1981.
[116] Steven K. Thompson. Sampling. Wiley, New York, page 12, 2002.
[117] Marco Torchiano and Filippo Ricca. Six reasons for rejecting an industrial
survey paper. In Conducting Empirical Studies in Industry (CESI), 2013
1st International Workshop on, pages 21–26. IEEE, 2013.
[118] Ljiljana Vukelja, Lothar Müller, and Klaus Opwis. Are engineers con-
demned to design? a survey on software engineering and UI design in
Switzerland. In IFIP Conference on Human-Computer Interaction, pages
555–568. Springer, 2007.
[120] Claes Wohlin, Martin Höst, and Kennet Henningsson. Empirical research
methods in software engineering. In Empirical methods and studies in soft-
ware engineering, pages 7–23. Springer, 2003.
[121] Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Björn
Regnell, and Anders Wesslén. Experimentation in software engineering.
Springer Science & Business Media, 2012.
BIBLIOGRAPHY 91
[122] Aiko Yamashita and Leon Moonen. Surveying developer knowledge and
interest in code smells through online freelance marketplaces. In User Eval-
uations for Software Engineering Researchers (USER), 2013 2nd Interna-
tional Workshop on, pages 5–8. IEEE, 2013.
[123] Chen Yang, Peng Liang, and Paris Avgeriou. A survey on software archi-
tectural assumptions. Journal of Systems and Software, 113:362–380, 2016.
[124] Zhuojun Yi, Dongming Xu, and Jon Heales. The Moderating Effect of
Social Influence on Ethical Decision Making in Software Piracy. In PACIS,
page 236, 2013.
[125] He Zhang and Muhammad Ali Babar. Systematic reviews in software engi-
neering: An empirical investigation. Information and Software Technology,
55(7):1341–1354, 2013.
Appendices
92
Appendix A
Selected Primary Studies
93
Appendix A. Selected Primary Studies 94
Randomness of Participants
Insufficient Sample Size
1 Sampling Method Improper Participant Selection
Poor Questionnaire
2 Design Survey Instrument Flaws
3 Question-Order Effect
4 Likert Scale Problems
Boredom
5 Time Limitation Busy Schedules
6 Domain Knowledge People's Perceptions
7 Hypothesis Guessing
8 Language Issues
Geographical Issues
9 Cultural Issues Country-specific issues
10 Generalizability Confidentiality Issues
Correctness of obtained responses
11 Reliability Confidentiality Issues
12 Bias
Inconsistency of
13 Responses Lack of motivation for respondents
14 Response Duplication
15 Low participation rate Lack of motivation for participation selection
101
Appendix C
Interview Questionnaire
2. Surveys are used in social sciences and other disciplines including software
engineering, do you think there are some special instructions to be followed
while conducting surveys. How is it different in software engineering? What
factors one shall consider while designing surveys in software engineering
research?
3. When designing a survey what type of questions do you prefer asking, (open-
ended or close-ended) and why? (Is the evaluation method your primary
102
Appendix C. Interview Questionnaire 103
motivating factor for choosing it? Are evaluation methods one of the reasons
for choosing the type of questions? what are the other factors that enable
you to include both type of questions in your Survey)
4. In a survey one question may have provided context to the next one which
may drive respondents to specific answers, randomization of questions to
some extent may reduce this question-order effect. Can you suggest some
other techniques to deal with this question order effect?
5. How do you make sure that respondents understand the right context of
your question, what measures do you adapt for making the questionnaire
understandable?
6. Our SLR analysis showed that 31.4% of primary studies used Stratified sam-
pling technique, while only 15.7% of studies reported the usage of Snowball
Sampling by researchers. (Literature describes that Snowball sampling leads
to a better sample selection, where researcher has the freedom of choosing
sample that suits to his/her requirements). Have you faced any situation
where other sampling techniques were chosen over snowballing, what factors
did you consider while making the selection?
7. Low response rates are common problem for any survey, how can the re-
sponse rate be improved?
8. When a survey is posted, there are few respondents without a proper domain
knowledge answering it. They might misinterpret data giving incorrect
answers and this affects the overall analysis. In yours research how are such
responses identified and ruled-out?
9. Our analysis showed that hypothesis guessing is an issue that can only be
reduced to some extent rather than avoiding it completely. Explain how
this problem is addressed in your work.
10. What measures do you take to avoid the duplication of responses in your
surveys?
11. How do you overcome each of these common problems like bias, generaliz-
ability and reliability in the following cases?
(a) Case A: Respondent’s answering the survey just for the sake of rewards.
(b) Case B: Respondents answering surveys posted on social networks like
LinkedIn and Facebook.
12. How do you mitigate the issue of inconsistency in responses? (Case-When a
respondent is asked about his familiarity with non-functional requirements
he chooses a “Yes” option. When asked to elaborate his opinion he just
writes “No idea”, here comes the problem of inconsistency)
Appendix C. Interview Questionnaire 104
13. You have conducted a global survey in Sweden, Norway, China and Italy
collecting information from diverse respondents. How do you address the
following issues in your research?
(a) Issue: Questionnaire gets translated into Chinese making it under-
standable to respondents over there, due to poor translation there
might be an issue of data losses. How do you handle this language
issue?
(b) Issue: There might be some cultural issues where people of one coun-
try are more comfortable in answering an online questionnaire, while
people of other country are more responsive to face-to- face interviews.
In your opinion how can this kind cultural issue be mitigated?
14. In the literature review we found out that researchers used Likert scale for
gathering wide range of experiences. Even though it’s a commonly used
we could obtain few problems like central tendency bias when a 4-point or
7-point Likert scale is used, respondent fatigue and interpretation problems
have been identified when 9 or 10 point scales have been used. How do you
address these kind of issues in your research?
15. How do you decide upon a particular sample size for your survey?
16. What motivates you to select a specific analysis technique for your research?
105
Appendix D. Codes and Themes from Interviews 106
107
Appendix E. Quality Criteria 108
109
Appendix F. Rigor and Relevance 110
S29 1.5 4
S30 2.5 2
S31 2 2
S32 3 4
S33 2.5 4
S34 2 4
S35 2 4
S36 2.5 3
S37 2.5 3
S38 2.5 3
S39 1.5 4
S40 2.5 2
S41 2.5 3
S42 2.5 4
S43 2.5 4
S44 1.5 4
S45 2 2
S46 2.5 3
S47 2 2
S48 1 4
S49 3 4
S50 2 4
S51 1.5 4
S52 3 4
S53 1.5 2
S54 3 4
S55 2 2
S56 1.5 3
S57 2 4
S58 3 3
S59 2.5 4
S60 2.5 4
S61 2.5 3
S62 3 3
S63 2 2
S64 2.5 4
S65 3 4
S66 3 3
Appendix G
Themes
G.1 Sampling
G.1.1 Convenience sampling
111
Appendix G. Themes 112
Table G.5:
short Relevance
Table G.6:
Table G.7:
Table G.8:
Table G.9:
Table G.10:
Table G.11:
G.2.9 Context
Table G.12:
Table G.13:
Table G.14:
Table G.15:
G.2.13 Generalizability
Table G.16:
G.2.13.1 Bias
Table G.17:
Appendix G. Themes 118
G.2.13.2 Reliability
Table G.18:
Table G.19:
G.3.2 Inconsistency
Table G.20:
Table G.21:
G.3.4 Rewards
Table G.22:
Table G.23:
Table G.24: