Full Text 02

Thesis no: MSSE-2016-26
Surveys in Software Engineering

A Systematic Literature Review and Interview Study
Harini Nekkanti
Sri Sai Vijay Raj Reddy
Faculty of Computing
Blekinge Institute of Technology
SE–371 79 Karlskrona, Sweden
This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology
in partial fulfillment of the requirements for the degree of Master of Science in Software
Engineering.
The thesis is equivalent to 20 weeks of full time studies.
Contact Information:
Author(s):
Harini Nekkanti
E-mail: hana15@student.bth.se
Sri Sai Vijay Raj Reddy
E-mail: srre15@student.bth.se
University advisor:
Tekn. Lic. Ahmad Nauman Ghazi
Department of Software Engineering(DIPT)
Faculty of Computing Internet : www.bth.se

Blekinge Institute of Technology Phone : +46 455 38 50 00
SE–371 79 Karlskrona, Sweden Fax : +46 455 38 50 57
Abstract
Context: The need for empirical investigations in the software engineer-

ing domain is growing immensely. Many researchers nowadays, conduct
and validate their study using empirical evidences. Survey is one such em-
pirical investigation method which enables researchers to collect data from
the large population. Main aim of the survey is to generalize the findings.
Many problems are faced by the researchers in the survey process. Survey
outcomes also depend upon variables like sample size, response rate and
analysis techniques. Hence there is need for the literature addressing all
the possible problems faced and also the impact of survey variables on out-
comes.
Objectives: Firstly, to identify the common problems faced by the re-
searchers from the existing literature and also analyze the impact of the
survey variables. Secondly to collect the experiences of the software engi-
neering researchers regarding the problems faced and the survey variables.
Finally come up with a checklist of all the problems and mitigation strate-
gies along with the information about the impact of survey variables.
Methods: Initially a systematic literature review was conducted, to iden-
tify the existing problems in the literature and to know the effect of response
rate, sample size, analysis techniques on survey outcomes. Then systematic
literature review results were validated by conducting semi-structured, face-
to-face interviews with software engineering researchers.
Results: We were successful in providing a checklist of problems along with
their mitigation strategies. The survey variables dependency on type of re-
search, researcher’s choices limited us from further analyzing their impact
on survey outcomes. The face-to-face interviews with the software engineer-
ing researchers provided validations to our research results.
Conclusions: This research gave us deeper insights into the survey method-
ology. It helped us to explore the differences that exists between the state
of art and state of practice towards problem mitigation in survey process.
Keywords: Software Engineering, Empirical, Problems

Acknowledgments
We are delighted in taking this opportunity to acknowledge the remarkable sup-

port and guidance of our supervisor Ahmad Nauman Ghazi. The completion of
our project may provide us immense happiness, but the encouragement he gave
us during the difficult times and the way he motivated us despite of his busy
schedule was more enchanted.
We also thank library staff at BTH for providing us their valuable time with
their appointments and for their guidance.
We specially thank Prof. Claes Wohlin, Prof. Tony Gorschek, Dr. Samuel
A. Fricker, Prof. Kai Petersen, Dr. Michael Unterkalmsteiner, Dr. Nauman
Bin Ali, Tekn. Lic. Muhammad Usman, Tekn. Lic. Ricardo Britto and Jefferson
Molleri for sparing their time with us amidst of their busy schedules and providing
their valuable inputs through interviews. The interviews were of a great learning
experience for us.
The care and support from our family and friends is to be specially mentioned.
We will always be indebted to them for everything they did for us. We are proud
to acknowledge every single effort of our parents in making us successful.
ii
Contents
Abstract i
Acknowledgments ii
1 Introduction 1
1.1 Research Gap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Research Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Research Questions and Instrument . . . . . . . . . . . . . . . . . 6
1.5 Structure of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Background and Related Work 9

2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.1 Research objectives are defined . . . . . . . . . . . . . . . 9
2.1.2 Target Audience and Sampling Frame are identified . . . . 11
2.1.3 Sample Plan is Designed . . . . . . . . . . . . . . . . . . . 12
2.1.3.1 Probabilistic Sampling: . . . . . . . . . . . . . . . 12
2.1.3.2 Non probabilistic Sampling . . . . . . . . . . . . 12
2.1.4 Survey instrument is Designed . . . . . . . . . . . . . . . . 13
2.1.4.1 Questionnaire Design . . . . . . . . . . . . . . . . 13
2.1.5 Survey Instrument is Evaluated . . . . . . . . . . . . . . . 13
2.1.6 Survey Data is Analyzed . . . . . . . . . . . . . . . . . . . 14
2.1.7 Conclusions Extracted from Survey Data . . . . . . . . . . 14
2.1.8 Survey Documented and Reported . . . . . . . . . . . . . 14
2.1.8.1 Survey Documentation . . . . . . . . . . . . . . . 14
2.1.8.2 Survey Reporting . . . . . . . . . . . . . . . . . . 15
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Research Method 18
3.1 Empirical methods . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 Research Overview: . . . . . . . . . . . . . . . . . . . . . . 19
3.1.1.1 Systematic Literature Review: . . . . . . . . . . . 20
3.1.1.2 Interview . . . . . . . . . . . . . . . . . . . . . . 22
3.1.1.3 Data analysis: . . . . . . . . . . . . . . . . . . . . 23
iii
3.2 Systematic Literature Review . . . . . . . . . . . . . . . . . . . . 23
3.2.1 Planning The Review . . . . . . . . . . . . . . . . . . . . . 24
3.2.2 Specifying the research questions . . . . . . . . . . . . . . 24
3.2.3 Developing the review protocol . . . . . . . . . . . . . . . 24
3.2.3.1 Search strategy . . . . . . . . . . . . . . . . . . . 25
3.2.3.2 Study selection criteria . . . . . . . . . . . . . . . 26
3.2.3.3 Quality criteria . . . . . . . . . . . . . . . . . . . 27
3.2.3.4 Data extraction strategy . . . . . . . . . . . . . . 28
3.2.3.5 Data synthesis . . . . . . . . . . . . . . . . . . . 29
3.2.3.6 Evaluating the review protocol . . . . . . . . . . 29
3.2.3.7 Pilot study . . . . . . . . . . . . . . . . . . . . . 29
3.2.4 Conducting the Research . . . . . . . . . . . . . . . . . . . 29
3.2.5 Identification of research . . . . . . . . . . . . . . . . . . . 30
3.2.6 Primary studies . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.7 List of selected studies . . . . . . . . . . . . . . . . . . . . 30
3.2.8 Quality assessment criteria . . . . . . . . . . . . . . . . . . 31
3.2.9 Data Extraction Strategy . . . . . . . . . . . . . . . . . . 32
3.2.10 Data synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.1 Selection of Interview Subjects: . . . . . . . . . . . . . . . 33
3.3.2 Interview design: . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.2.1 Interview setup: . . . . . . . . . . . . . . . . . . . 36
4 Results and Analysis 37

4.1 Reporting Results from SLR . . . . . . . . . . . . . . . . . . . . . 37
4.1.1 Selected primary studies . . . . . . . . . . . . . . . . . . . 37
4.1.2 Publication year . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.3 Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1.4 Rigor and Relevance: . . . . . . . . . . . . . . . . . . . . 39
4.2 Analysis of Literature showing the problems faced by software en-
gineering researchers in survey process . . . . . . . . . . . . . . . 42
4.2.1 Recommendations for Surveys identified from Systematic
Literature Review . . . . . . . . . . . . . . . . . . . . . . . 48
4.3 Analysis of Literature showing the impact of survey variables on
Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.1 Impact of Sample Size on Survey Outcomes . . . . . . . . 50
4.3.1.1 Need for having a valid sample . . . . . . . . . . 50
4.3.1.2 How to achieve a valid sample . . . . . . . . . . . 51
4.3.1.3 Need for the proper sample size . . . . . . . . . . 51
4.3.2 Impact of Response Rate on Survey Responses . . . . . . . 52
4.3.3 Impact of Analysis Techniques on Survey Outcomes: . . . 55
4.4 Results and Observations from Interviews: . . . . . . . . . . . . . 57
4.4.1 Analysis for RQ1 from Interviews: . . . . . . . . . . . . . 59
iv
4.4.2 Analysis for RQ2 from Interviews: . . . . . . . . . . . . . 64
5 Discussions and Limitations 65

5.1 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 Threats to validity . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6 Conclusions and Future Work 78

6.1 Conclusions: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.2 Future Work: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Appendices 92
A Selected Primary Studies 93
B Convenient Clustering 101
C Interview Questionnaire 102

C.1 Researcher’s Perspective . . . . . . . . . . . . . . . . . . . . . . . 102
C.2 Respondent’s Perspective . . . . . . . . . . . . . . . . . . . . . . . 104
C.3 Concluding questions about survey guidelines . . . . . . . . . . . 104
D Codes and Themes from Interviews 105
E Quality Criteria 107
F Rigor and Relevance 109
G Themes 111
G.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
G.1.1 Convenience sampling . . . . . . . . . . . . . . . . . . . . 111
G.1.2 Random Sampling . . . . . . . . . . . . . . . . . . . . . . 111
G.1.3 Stratified sampling . . . . . . . . . . . . . . . . . . . . . . 112
G.2 Questionnaire Problems . . . . . . . . . . . . . . . . . . . . . . . 112
G.2.1 Questionnaire Problems . . . . . . . . . . . . . . . . . . . 112
G.2.2 Questionnaire length . . . . . . . . . . . . . . . . . . . . . 112
G.2.3 Open-ended & closed-ended questions . . . . . . . . . . . . 113
G.2.4 Question order . . . . . . . . . . . . . . . . . . . . . . . . 113
G.2.5 Likert Scales . . . . . . . . . . . . . . . . . . . . . . . . . . 114
G.2.6 Sensitive Questions . . . . . . . . . . . . . . . . . . . . . . 114
G.2.7 Time limitation . . . . . . . . . . . . . . . . . . . . . . . . 114
G.2.8 Domain Knowledge . . . . . . . . . . . . . . . . . . . . . . 115
G.2.9 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
G.2.10 Hypothesis Guessing . . . . . . . . . . . . . . . . . . . . . 116
G.2.11 Language Issues . . . . . . . . . . . . . . . . . . . . . . . . 116
v
G.2.12 Cultural Issues . . . . . . . . . . . . . . . . . . . . . . . . 117
G.2.13 Generalizability . . . . . . . . . . . . . . . . . . . . . . . . 117
G.2.13.1 Bias . . . . . . . . . . . . . . . . . . . . . . . . . 117
G.2.13.2 Reliability . . . . . . . . . . . . . . . . . . . . . . 118
G.3 Response Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 118
G.3.1 Low Response Rate . . . . . . . . . . . . . . . . . . . . . . 118
G.3.2 Inconsistency . . . . . . . . . . . . . . . . . . . . . . . . . 119
G.3.3 Response Duplication . . . . . . . . . . . . . . . . . . . . . 119
G.3.4 Rewards . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
G.3.5 Sample Size . . . . . . . . . . . . . . . . . . . . . . . . . . 120
G.3.6 Analysis Techniques . . . . . . . . . . . . . . . . . . . . . 121
vi
List of Figures
1.1 Research Instrument for our study . . . . . . . . . . . . . . . . . . 6

1.2 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1 Eight Steps of a Survey . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1 Research Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2 Primary Study Identification . . . . . . . . . . . . . . . . . . . . . 30
3.3 Primary Study Selection . . . . . . . . . . . . . . . . . . . . . . . 31
4.1 Year Classification Graph . . . . . . . . . . . . . . . . . . . . . . 38

4.2 Domain Classification Graph . . . . . . . . . . . . . . . . . . . . . 39
4.3 Rigor and Relevance Bubble Plot . . . . . . . . . . . . . . . . . . 41
4.4 Sampling Methods Graph . . . . . . . . . . . . . . . . . . . . . . 50
4.5 Response Rates Graph . . . . . . . . . . . . . . . . . . . . . . . . 53
4.6 Analysis Techniques Graph . . . . . . . . . . . . . . . . . . . . . . 55
4.7 RQ1 Thematic Analysis . . . . . . . . . . . . . . . . . . . . . . . 59
D.1 Example Coding Process 1 . . . . . . . . . . . . . . . . . . . . . . 105

vii
List of Tables
2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1 Search Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 PICO Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Final Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.4 Search strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 Quality criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.6 Data Extraction Form . . . . . . . . . . . . . . . . . . . . . . . . 28
3.7 Quality Criteria Table . . . . . . . . . . . . . . . . . . . . . . . . 31
3.8 Quality Range Table . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.9 Interviewee’s Details . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1 Primary Study Classification . . . . . . . . . . . . . . . . . . . . . 37

4.2 Rigor Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 Relevance Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.1 Problem-Mitigation Strategy Checklist . . . . . . . . . . . . . . . 68

5.2 Need for the checklist of problems . . . . . . . . . . . . . . . . . . 74
E.1 Quality criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
F.1 Rigor and Relevance for Primary Studies . . . . . . . . . . . . . . 109
G.1 Convenience Sampling . . . . . . . . . . . . . . . . . . . . . . . . 111

G.2 random sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
G.3 Stratified Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . 112
G.4 Questionnaire Problems . . . . . . . . . . . . . . . . . . . . . . . 112
viii
Chapter 1
Introduction
Software Engineering is defined as “applying a systematic, disciplined and quan-

tifiable approach during development, operation and maintenance stages of soft-
ware development [96]”. In software engineering context there are four research
methods namely Scientific, Engineering, Analytical and Empirical. Human in-
tervention exists at every stage in the software engineering domain. Empirical
methods share their roots in the other disciplines as well. Apart from the technical
context, the software engineering has similar nature as that of other disciplines.
The usage frequency of above mentioned methods is more in other domains. This
gives the need for empiricism in software engineering [121].
Survey is one of the empirical investigation method which is used to collect
data on past event from a large population[61]. This sentence is validated by many
authors thorough their research as “survey is often an investigation performed in
retrospection [91]”; “surveys aim is to understand the whole population depending
on the sample drawn” [7], Fink [36] describes that “surveys are useful for analyzing
societal knowledge with individual knowledge”; “many quantifiable and process-
able variables can be collected using a survey, giving a possibility for constructing
variety of explanatory models [121]”; Fowler [39] quotes “statistical evidences can
be obtained in a survey”. “surveys draw either qualitative or quantitative data
from population” [24].
Survey process starts by defining research objectives. Based on the research
objectives whole survey planning is done. A sample population is then deter-
mined, questionnaire is devised and administered to the sample population. Sur-
vey results are then collected from the target audience. Simultaneously the gath-
ered results will be analyzed to find patterns. Finally, the survey results are
reported. The whole survey process is dependent upon resources available and
researchers involved [39]. Factors like sample size, response rate, analysis tech-
niques were found to affect the survey outcomes apart from poor respondent
selection, errors and biases [92], [39].
Surveys in software engineering vs Surveys in Social sciences:

Theoretically surveys done in both social sciences and software engineering are
similar. In practice there might be few differences. Both investigate different
1
Chapter 1. Introduction 2
type of phenomena, but the survey interventions in software engineering are about
tools, techniques and are more IT-Centric; while surveys in social sciences address
more personal information. The reason for this could be the subjective nature
of surveys in social sciences where psychological opinions of people are collected.
For example a question in such surveys would be “which one do you think has
more features Android or iOS?”; “how do you feel about iOS 9?” etc. These
questions are expected to be answered from the respondent’s perspective and are
mainly opinion based, respondents need not give evidences for what they answer
and have the freedom of expressing their views. Similar question posed in a
software engineering context could be “What are the major flaws that makes you
choose iOS over android or vice-versa?”. Here in this question the respondents are
not asked about their opinions instead they are expected to provide evidences to
support their answers, which are otherwise treated as invalid. In this way software
engineering surveys test the perception of the respondents, like the perceptions
about software, tool or a process. While designing social science surveys questions
starting with what, how, when are asked and mainly Why questions are avoided
with the fear of getting low responses. But software engineering surveys mainly
collect evidences thorough Why questions [39], [37].
The sample population in case of social science surveys are more generic in
nature while the software engineering surveys have specific sample. A customer
survey can be distributed to any type of population but in case of software engi-
neering, the population must be specific like the developers, testers, quality team
etc. The surveys in social sciences focus on more general information, but the soft-
ware engineering surveys address more technical information. In case of software
engineering surveys, it becomes difficult to identify the representative sample of
population easily, while the target population is easily identifiable in social sci-
ences context. Researchers in software engineering carefully verify the responses
before including them, flawed responses affect the overall survey outcomes. Sum-
marizing the above context, the main motivation for conducting surveys either
software engineering or social sciences is to obtain the data from large population
with the aim of generalizing the findings [39], [37].
Surveys at Present: In the past, surveys were conducted in traditional
ways where the sample population details were collected and respondents were
contacted in-person. The survey responses were then collected either by handing
over questionnaires or by conducting interviews. Responses obtained in this way
were more sample-specific as the respondent selection was done by the researcher
himself. Reasons for this could be varied, there is no guarantee that whole pop-
ulation answers the survey, sometimes there are problems of biased results. At
present many researcher’s are using online tools for conducting Surveys.Survey
Monkey [1] and Survey Gizimo [84] are two commonly used tools. The process
of gathering responses using tools is pretty simple and user-friendly. It all starts
with researchers by creating an account and the survey is posted specifying the
population-type and number of responses. Researcher’s are expected to wait for a
certain period of time before getting the responses. In the past if the researcher’s
had constraints like time-bound, sample specifications, large population size, they
were either compromised with the results or prolonged their research.Online tools
address such issues of getting the responses where the specific user is expected
to pay a certain amount based on his/her preferences. Further these websites
provide several options to customize the way of obtaining responses. Advantage
of using this tools is that responses obtained are demographically distributed;
there is no interaction between researcher and respondent which reduces biasness
to some extent; researcher is able to get sample-specific responses; and results are
obtained in specified time reducing result delays. In this way present day tech-
nologies are helping software engineering researchers to handle traditional survey
problems [39].
Online surveys can reach diversified sample due to Internet. They have high
response rate compared to paper based surveys. They do have negatives like
impersonal survey requests, self-selection samples, lack of internet facility, in-
appropriate respondents answering the survey, reporting problems, correctness
of responses, same respondent answering survey for more than one time, biased
sample [92], [39]. Technical problems like multiple operating systems, unsup-
ported email readers, incompatible browsers, poorly functioning software’s, server
crashes, data accumulation were discussed by Stephen et al. [106] in the online
surveys. All these factors show that online surveys do have their own risks.
Problem Domain
Carter steel et al. [15] argued the poor standards of ongoing research in empirical
software engineering and stressed the need for aggregating the empirical results in
overcoming the interpretation difficulties. Two major pitfalls forming the basis of
criticisms were identified. Former was the researchers being unfamiliar about the
practitioner’s approach (or industrial practices). Latter being lack of applicability
of the practices, methods, techniques, approaches prescribed by the researchers to
the practical context (in industry). Surveys were claimed to be useful in dealing
with such problems[15]. Paulk et al.[89] supports this argument by stating that
“surveys help to obtain a good feel for the breadth in deploying specific analysis
techniques in industrial perspective”.
Punter et al.[95] statistically specified the increased usage of surveys over case-
study and experiments. They discussed about ESERNET study that showed that
50% of respondents used surveys followed by case studies and experiments. In
comparison with another empirical investigations surveys have ability to capture
larger population. Less variable control, less possibility for variable manipulation,
low internal validity, high external validity are the reasons for surveys to gain
immense popularity among software engineering researchers[95].
Till date the software engineering researchers are still facing problems like
generalizability, low response rate, reliability etc.[25], [46], [125], [40], [88], [123].
Problems like improper participant selection, questionnaire flaws could be miti-
gated if proper measures are taken by the researchers while designing the survey
itself. Some problems like hypothesis guessing [46], reliability cannot be com-
pletely eliminated but their impact on survey outcomes can be reduced to some
extent.
The reason for researchers facing problems could be either he/she is unaware
of the problem or they might have overlooked the problem in the survey process.
In both the cases the outcome of survey is a “lemon”(a bad survey) [92] . We think
there are sufficient list of guidelines that discuss about designing and adminis-
tering a survey, main issue was the lack of documentation addressing common
problems in a survey process. For instance, if we look at the published articles,
survey problems are being addressed in the threats to validity section or limita-
tion sections only. Instead of taking the same road and designing guidelines, we
though that it would be great idea to have a problems checklist with mitigation
strategies. A checklist of this kind would be like a reference as it helps to predict
the survey problems, so that researchers can deal with them in advance.
We are researchers from software engineering domain and wanted to facilitate
researchers in software engineering. It is important to differentiate in area of
surveys in Software engineering vs social sciences. If surveys from every domain
were considered along with those in software engineering, it leads to a research
spanning across several years. We cannot achieve required results in this stip-
ulated time allotted for Master Thesis. Due to this reason we narrowed down
the area of our research. Our research will only be focused on surveys that were
published in the domain of software engineering.
1.1 Research Gap

A survey process involves sampling, designing questionnaire and data collection;
their combined contribution is essential for a good survey [92]. From the literature
it is evident that the factors like response rate, sample size, analysis techniques
determine the success or failure of an Empirical Survey [67][107][68]. The software
engineering researchers are also facing problems in publishing their work [117].
In spite of problems being identified and solutions suggested by the researchers,
we identified there are no proper checklist of possible problems faced by software
engineering researchers in survey process. This was the major research problem
identified which motivated us to select the domain of empirical software engineer-
ing with Survey methodology as the domain of our research. Our research will be
mainly focused on the problems that researchers face during the survey process
and try to list all the problems along with their mitigation strategies. We would
like to analyze the impact of variables like sample size, response rate and analysis
techniques on survey outcomes.
1.2 Research Aim

The aim of our research was two-folds, first was to create a checklist of all the
possible problems that software engineering researchers might face while conduct-
ing a survey and second was to study the impact of survey variables on outcomes.
At first the proposed checklist of problems along with their mitigation strategies
will help researchers to assess the possible problems they might face in a sur-
vey before even conducting it, while the second outcome will enable researchers
to administer the survey in a way to get better outcomes. The checklist along
with the information about impact of variables helps to improve outcome of any
survey conducted in the domain of software engineering. The main focus of our
research is on the problems faced by the researchers during the survey process.
Our approach was to identify the problems and their mitigation strategies from
the existing literature, compare them with the experiences of the software engi-
neering researchers and come-up with a final set of well documented mitigation
strategies. Our focus was mainly identifying the problems and also the affect
of variables through existing literature. Due to this reason only those articles
discussing about problems and variables were considered to be within the scope
of our research.
1.3 Objectives
The main aim of this research is to document the lessons gained by researchers
in designing and application of survey as a methodology. Given below are the
objectives which will help us to achieve required outcome.
• To clearly identify the problems faced by the software engineering researchers

while conducting surveys.
• To analyze the surveys with respect to variables like sample size, response
rate and analysis techniques.
• To identify what the software engineering researchers learn by conducting

surveys.
Figure 1.1: Research Instrument for our study
1.4 Research Questions and Instrument

This section reports the research questions which have been formulated to achieve
the above stated objectives. Research Questions RQ 1 and RQ 2 were answered
by performing a systematic literature review. RQ 3 was answered by conducting
face to face interviews with software engineering researchers. Results obtained
from the RQ1 and RQ2 were validated using interviews. The research instrument
for our study can be seen from the Figure 1.1 gives the Research Instrument
which was used for our research.
• RQ1: What are the common problems faced by the software engineering
researchers while conducting a Survey?
Motivation: The motivation behind inclusion of this research question is
bi-fold. First, it helps to identify all the rough situations in the surveys
which researchers try to avoid. This in turn helps us to identify different
problems faced by researchers due to such situations. Secondly, we can
report our findings on problems and the suggested mitigation strategies.
There are evidences in the literature showing that, different problems arise
during the survey process, it has been identified that no proper documen-
tation has been published consisting the list of all possible problems.Hence
we focus our research in this area in this area.Also it was identified from ex-
isting literature that there is no documented literature stating the common
problems faced by software engineering researchers in survey process. This
is evident from the existing literature that problems faced are either men-
tioned in threats to validity sections or in limitation sections [47][33][46][31].
Answering this research question has two benefits. In one way the problem-
mitigation checklist benefits the researchers by becoming handy and in other
way it contributes to the body of knowledge in empirical software engineer-
ing domain. Additionally, the checklist also helps to identify the impact of
variables on survey outcomes.
• RQ2: How do the variables like sample size, response rate and analysis
methods affect the survey outcomes?
Motivation:In every domain may it be software engineering,social sciences,
political sciences, health care and automation researchers opt for surveys
to get information from a larger population. In this regard a sample size
selection is important since to obtain more data points and to achieve gen-
eralizability. Response rate is important since it depends on communication
with peers, research topic, willingness to contribute etc. In any survey the
selection of analysis technique is crucial since it decides the way in which
final results would be presented.Despite the outcome being positive or neg-
ative it is essential for any researcher to openly admit the results. Proper
analysis technique gives a helping hand to the researchers in portraying
their results. Also, survey results can only be critically examined with a
proper analysis technique. These factors motivated us for selecting sample
size, response rate and analysis techniques apart from other variables for
our research.A properly defined sample with specific size can be used to get
better statistical evidences for any survey. Different data patterns can be
identified and gerneralizabily of outcomes is possible with proper sample
size [67]. Increased response rate implies many data points for any sur-
vey. This helps the researchers to generalize the research outcomes with a
larger target population [107]. Wohlin et al. [121] defines that “for drawing
valid conclusions from survey data, researchers need to interpret the data
properly”. This explains the need for proper analysis methods at the end
of survey. When improper interpretation is done on a quantitative data, it
results in terrible patterns that completely discourage the need for conduct-
ing a survey. It is also evident from the existing literature that improper
sample size, poor response rate and wrong selection of analysis method
badly affect the survey outcomes [67][107][68]. Identifying and analyzing
how these variables impact the survey outcomes can be used to assess the
survey approach which researchers are adapting. Hence this questions helps
us to full-fill the objectives of our research.
• RQ3: What are the experiences gained by software engineering researchers

by conducting surveys?
– RQ 3.1: What mitigation strategy do the researchers employ to han-

dle common problems in the survey process?
– RQ 3.2: How does the researcher decide upon the variables like sample
size, response rate and analysis techniques for a survey?
Motivation: This question was framed with the motive of getting deeper
insights into our research domain, by direct interviews with the software
engineering researchers. Only partial fulfillment of objectives would be
achieved by doing systematic literature review where only the state of art
is studied, for more concrete validation we require state of practice as well.
This inclined us towards selecting interviews for our result validation. In or-
der to obtain answer to this research question additionally two sub questions
were written. The reason for choosing this question was to record and com-
pare the mitigation strategies that researchers adapt with those obtained
from literature. In the similar way we wished to obtain the researchers out-
look on selecting the sample size, response rate, and analysis methods. In
this way we will validate findings, try to extrapolate the researcher views
fulfilling our research aim.
1.5 Structure of Thesis

This report is mainly divided into six chapter namely, Introduction, Background
and Related Work, Research Methodology, SLR Results and Interview Results;
Discussions; Conclusion and Future Work.
Figure 1.2 gives the structure of our Thesis document.
Figure 1.2: Thesis structure

Chapter 2
Background and Related Work
2.1 Background
Colin Robson and Kieran McCartan[101] defines survey methodology as “a fixed
design which is first planned and then executed”, this statement is validated by
many authors through their research. Its clear from this statement that survey
methodology is a step by step process. Based on the existing literature, survey
methodology can be broadly classified into a series of eight stages[61]. Each
stage of this sequential process has many sub-processes and is unique in its own
way; when properly followed gives expected outcomes. Figure 2.1 gives a brief
explanation of all the stages:
2.1.1 Research objectives are defined

In any survey methodology initial step is to identify the research objectives. They
help to get a brief idea of research by providing the required research scope and
context for framing research questions. While identifying the research objectives
it is essential to throw light on certain issues apart from just identifying the
research questions. Following criteria needs to be fulfilled by research objectives
[61]:
• What is the targeted respondent population of survey?
• What are the possible areas which are close to the research objectives that
were left un-investigated?
• What is the motivation behind Survey?
• What are the resources required to accomplish the Survey’s goals?
• How will the data obtained from Survey be used? [61] [69] [18]
While defining the research objectives for a survey, the related work pertaining
to that particular field must be considered, the knowledge about similar practices
helps researchers to narrow down the objectives. All the stakeholders (like the
9
Chapter 2. Background and Related Work 10
Figure 2.1: Eight Steps of a Survey
researchers who are conducting survey and respondents) must know the goals of
the survey and need to have the idea of what to expect at the end. Failure to do
so results in the unnecessary iterations causing negative outcomes [61] [18].
Research objectives must capture the goal of the survey. They are formulated
either as Research Questions constituting “ What, Why, How ” or as final out-
comes [69]. Top-down approach or bottom-up approach can be followed while
formulating research objectives. Ciolkowski et al [18] proposes to adapt the Goal
Question Metric (GQM) method for top-down approach. In GQM Method first
the goals are defined, then the Questions are framed accordingly after which met-
rics are selected to measure the desired goals. When GQM is viewed from survey
methodology perspective, goals can be viewed as research objectives, then the
questions can be mapped to research questions and metrics represent question-
naire. Similar methodology can be implemented but in a reverse way in case of
bottom-up approach where research questions are first defined thoroughly, they
help in narrowing down and defining down the objectives [61].
While defining the research objectives based on GQM, a researcher needs to
keep in mind of what needs to be measured and how its reflected in research
objectives. To clearly identify this, Kitchenham and Pfleeger [69] have defined
three types of objectives which needs to be considered when the investigation is
done on a specific population:
• Measuring the rate or frequency of a certain characteristic among the pop-

ulation.
• Evaluating the gravity of a certain characteristic among the population.
• Discovering the factors which affect the characteristics among the popula-
tion [69].
Wohlin et al. [121] clearly defines the purpose ( objective or motive) for conduct-
ing a survey . Based on the objective, any survey falls into one of the below three
categories:
• Descriptive Surveys are conducted with the intention of enabling assertions

about given population. They help researchers to determine distribution of
certain characteristics and to understand what the distribution is about.
• Explanatory Surveys tries to explain the claims about the given population.
For a given population this type of survey tries to find patterns, problems
observed in the population.
• Exploratory Surveys helps the researcher’s to look at a particular topic from

a different perspective. These surveys are generally done as pre-study, it
helps to identify unknown patterns of the domain which is being worked
on. The knowledge obtained from this pre-study will serve as a foundation
actual surveys which will be conducted in future [121].
2.1.2 Target Audience and Sampling Frame are identified

Identification of target population implies the establishment of targeted audience.
Target audience selection must be done based on the information they provide
which helps to achieve research objective. Taking this as the rule of thumb
the survey instrument design must be done from the respondent’s perspective
not from the researcher’s perspective. Similarly the rule must be applied while
selecting the method of surveying (questionnaire or interviews) [61].
Target audience are generally selected from the overall population, if they are
attributed with distinct values. Sample is selected from the target population.
In statistics, target audience are defined as the subset of population from which
sample selection is done. Populations can be categorized into sub populations
based on the distinct values attributed to the target audience, which is termed
as heterogeneous [116]. Kitchenham et al., [70] states that, in the absence of
population characterization, there are no inferences drawn over the results. Au-
thors of [25] argue the limitations on performing population characterization from
software engineering context, while the authors of [27] investigated the alterna-
tive sources such as open databases, digital libraries, etc., suggesting mitigation
strategies.
Surveys which follow questionnaire methods, generally gather information

from respondents who can be termed as “unit of observation” (or the primary
objects). Nonetheless a higher level of analysis (“unit of analysis”) is needed in
survey design. Filtering and retrieving the units in a specific source is an issue,
thus the concept of “search unit” was introduced. Since sources are available for
sampling which is a limitation, the concept representation is done using differ-
ent entities [27]. Four basic problems of sampling frames are identified in [65]
which are “missing elements, foreign elements, duplicate entries and group based
clusters”. Due to the in-occurrence of representative sampling in software engi-
neering surveys, it becomes hard to avert the basic problems and achieve an ideal
sampling frame.
2.1.3 Sample Plan is Designed

Sampling is the process of selecting a sample for the purpose of studying the char-
acteristics of whole sample. Sampling is needed to get an idea of large population
at once [58]. Sampling is mainly divided in two types [70]:
2.1.3.1 Probabilistic Sampling:

Each member of the population has non-zero probability of being selected. Below
are the three types of probabilistic sampling techniques:
• Random Sampling
• Systematic Sampling
• Stratified Sampling
2.1.3.2 Non probabilistic Sampling

Member selection in this case is done in some non-random order. Below are the
types of Non-random sampling techniques:
• Convenience Sampling
• Judgment Sampling
• Quota Sampling
• Snowball Sampling [39], [61]

2.1.4 Survey instrument is Designed

2.1.4.1 Questionnaire Design
Survey Instrument is generally the questionnaire; initially open ended questions
are designed after which they are modified as survey questions. Survey outcomes
directly depends how efficiently a questionnaire has been designed. Through
survey questionnaire researchers capture the populations attributes quantitatively
which are then interpreted to achieve outcomes [61] .Following factors need to be
considered while designing the Survey questionnaire :
1. Adapting a Team based approach.
2. Deciding on what needs to be measured.
3. Make sure that research objectives are aligned with internal and survey
questions.
4. Selecting the suitable type of questionnaire.
5. Ordering the internal questions based on priority.
6. Selecting the type of survey questions.
7. Identifying the appropriate execution method.
8. Determining the length of questionnaire.
9. Ordering the survey questions.
10. Deciding on the response format.
11. Survey Questionnaire Design.
2.1.5 Survey Instrument is Evaluated

After the Survey Instrument has been designed, it needs to be evaluated to find
out if there are any flaws. To determine a questionnaire’s reliability (whether
its giving survey results even before survey is conducted) and validity (checking
whether survey is measuring what its intended to measure) preliminary evaluation
is done.
Below mentioned are different Evaluation Methods:
• Expert Reviews [112].
• Focus Groups[112].
• Cognitive Interviews[112][50][80].
• Experiment[85].
2.1.6 Survey Data is Analyzed

The obtained survey data is analyzed in this step. The data analysis depends on
the type of questions used in the survey.
• For open ended questions, content analysis is used. Two main types of clas-
sifications in this method can be qualitative content analysis and quantita-
tive content analysis. Other methods like Phenomenology, discourse analy-
sis, grounded theory can be applied for analyzing the open-ended questions
[32], [52], [11], [44], [105].
• For close ended questions, quantitative analysis can be employed. Methods

such as statistical analysis, hypothesis testing, data visualizations can be
employed to analyze the close ended questions [121].
2.1.7 Conclusions Extracted from Survey Data

After the outcomes have been analyzed, conclusions need to be extracted from
them. A critical review and an evaluation must be done on the obtained outcomes,
to ensure effectiveness of results. Thus validity, reliability and risk management
should be done on outcomes to extract final conclusions. Every research has
threats, but the main motive is to identify them at the early stages and try to
handle them and note down how they were handled. There can be two classes
of threats, one which can be handled by designing the whole survey again and
other which cannot be mitigated, such threats should be identified and possible
mitigation strategies should be applied. To handle such threats, it is advised
that more than one method must be used for reducing the impact of a particular
threat [18] [101].
2.1.8 Survey Documented and Reported

2.1.8.1 Survey Documentation
Simply defined as the process of writing the Survey process. It is helpful in keeping
track of all the issues happening as a part of ongoing research. Well documented
survey is highly accepted by a number of researchers. The documentation starts
as and when the research objectives are specified. It is then updated iteratively
as the Survey progresses. Documentation includes below elements:
• Research Objectives and Questions.
• Target population description.
• Activity planning and record of milestones achieved.
• Designation of those conducting survey and organization details if any.

• Designing of Sampling Method and description about it.
• Information of all the quality assurance methods or survey instrument eval-

uation methods involved.
• Questions to be asked either in online surveys or interviews.
• Description about distribution methods.
• Data collection methods.
• Data analysis methods.
• Outcomes expected and final deliverable.
This documentation is called as “questionnaire specification” by [69], while it is

named as “survey plan” in [61].
2.1.8.2 Survey Reporting

After all the sequential steps have been completed in a Survey Methodology,
the last step is reporting the analysis and conclusion. Even though the survey
methodology is administered sequentially, results reporting might vary depending
on the person to which Survey is being reported or depending on the place where
results are reported. Since the interests of audience is different, authors of [61]
recommend conducting an audience analysis. Checking form and language com-
plexity, detail and abstraction level are the three important aspects with respect
to reporting of survey methodologies.
2.2 Related Work

We have identified that several authors have published the process for conducting
surveys or the guidelines for conducting surveys but failed to highlight the prob-
lems faced by the software engineering researchers. We attempted to collect the
existing literature on the studies which were related to our research. Table 2.1
below gives the list of all the studies which were considered as the related work.
We found an article from the Software Engineering Institute(SEI) published by
Kasunic [61] described the guidelines for conducting a survey. Author describes
each step in the Survey process, by documenting every measure taken by the
software engineering researchers for conducting a survey in a structured manner.
We tried to be specific every step of the survey process in Section 2. This article
is referred to as the highest cited article while conducting a survey in software
engineering. Researchers from Lund University [78] have carried out a similar
type of research where they analyzed every step in the systematic process by
listing few possible problems that a researcher might face during survey process.
Punter et al. [95]have focused mainly on the present survey trend i.e. the
online-surveys. They have drafted a set of guidelines to perform online survey
from their experiences obtained from five on-line surveys. They describe that data
obtained from online surveys are easy to analyze as it is obtained in the expected
format where the paper-based forms are error prone. The online surveys track the
record of invited respondents and log the details of those who actually answered
the survey which is not the case with paper-based surveys by which researchers
can increase the response rate. In this way they argued that online surveys
mitigated the two main survey problems. First one being the problem of less
frequent surveys where online surveys gave researchers the scope to conduct more
surveys again and again, due to web survey tool management system. Second
problem of poor result disclosure was handled by visualizing and distributing the
survey results.
Pfleeger and Kitchenhamm [92] published a series of six articles which dis-
cussed the same survey methodology as described by Kasunic [61]. Even though
these papers are a set of short papers they are of high regard. This series was the
first to publish a concise documentation about survey methodology.
Less participation rate is a common problem for any survey which was iden-
tified by Smith et al. [107] in their research. Based on their expertise and the
existing literature, they performed a post-hoc analysis on previously conducted
surveys and came up with factors to improve participation rate. But they even
specified the limitations of the obtained results in that “an increase in participa-
tion doesn’t mean the results become generalizable”[107].
Pertaining to the survey sampling, Travassos et al. [27] argue that, there are
no specialized and adequate sources of sampling; to support their argument they
propose a framework consisting of target population, sampling frame, unit of ob-
servation, unit of attribute and an instrument for measurement. Ji et al. [56] have
conducted the surveys in China and addressed the issues relating to sampling,
contacts with respondents and data collection, validation issues. Conradi et al.
[21] have highlighted the problem of method biases, expensive contact process,
problem with census type data, national variations by performing an industrial
survey in three countries – Norway, Italy and Germany. This is the first study in
software engineering which used census type data. The problem of replications
of surveys was highlighted by Rout et al.[15] replicated a European survey which
was administered in the Australian Software development organization.
To make our work more applicable we randomly considered three surveys
which are from other domain apart from software engineering domain. First ar-
ticle focuses on measuring patient safety. It compares the factors like general
characteristics, dimensions covered and various study uses of patient safety cli-
mate surveys [20]. Second article discusses about the mental disorders that are
faced by most of the prisoners[34]. Third article discusses about the use of Chi-
nese medicine by cancer patients[14]. All the three articles share the common
point of being reviews on surveys. A brief discussion on obtained results along
with the comparison can be found in discussion section.
Study Year Result

Kasunic [61] 2005 Guidelines for conducting Sur-
veys
Martin Host et al.[78] 2015 Guidelines for conducting Sur-
veys
Punter et al.[95] 2003 Guidelines for conducting Online
surveys
Kitchenhamm and pfleeger [92] 2001 Guidelines for conducting Sur-
veys
Smith et. al [107] 2013 Checklist to improve participa-
tion rates
Travassos et al.[27] 2015 Conceptual framework for large
scale sampling in Software Engi-
neering Surveys
Ji et al.[56] 2008 Issues related to sampling, con-
tacting respondent’s, data collec-
tion and data validation.
Li et al.[76] 2005 Describes about stratified ran-
dom sampling
Rout et al.[15] 2005 Replications in Surveys
Table 2.1: Related Work
Chapter 3
Research Method
This section details about the research method used in order to achieve the aims
and objectives. A Systematic literature review and an interviews were the two
methodologies used in order to answer the research questions. Motivation of the
selected research method was also presented.
3.1 Empirical methods

There exist many kinds of empirical methods that are used to perform research,
the reason why these methods are essential for research is that the “software
development is human intensive and hence it does not lend itself to analytical
approaches” [120]. Listed below that are commonly used to obtain results from
the study are experiments, case studies, surveys and action research.
• Experiments: “For conducting experiments we initially need to know the

possible outcomes of several variable factors that we initially consider. The
objective is to manipulate one or more variables and control all other vari-
ables at fixed levels” [120]. In our research context, we will not be able to
define variables in the initial stages of the research and the possible out-
comes will be known only after we proceed to later stages. Hence we have
not considered this as our research method.
• Case studies: “The case study is normally aimed at tracking a specific

attribute or establishing relationships between different attributes” [120].
Case studies are conducted by taking a real project into consideration and
it aims to trace an element or faults in a particular area of research. Our
research aim to gather results from a huge amount of available information
and hence we didn’t choose the case study process as this will not assist our
research.
• Surveys: “The survey is defined as research in large” [120] which means cov-
ering the large sample or target population to collect the necessary informa-
tion by questionnaire and interviews.Interestingly our research is about the
investigating the problems faced by software engineering researchers while
18
Chapter 3. Research Method 19
conducting surveys . Since our research domain itself is Empirical software

engineering. There are chances that choosing survey would bias our results,
hence we didn’t employ survey for collecting required data.
• Action Research: Action research by its nature is regarded as case study,

but the main difference here is that study is being done to leverage the
facets of subjects [100]. Since our research is mainly focused on usage of
research instrument(survey), studying about subjects involved will not help
to fulfill our aims. This made us to choose SLR and interviews over Action
Research.
3.1.1 Research Overview:

This section shows the overall research design of our thesis. This was mainly
divided into two phases.
• State of art (systematic literature review)
• State of practice (interview)
In the first phase a systematic literature review provided by Kitchenham [63] was
applied for RQ1 and RQ2 in order to collect the relevant information for our
study i.e. to identify the problems faced by researchers while conducting surveys
in software engineering and how various factors like sample size, response rate
and analysis methods effect the survey outcomes. Narrative synthesis was used
to outline the gathered qualitative data.
In the second phase i.e. state of practice interviews were conducted to validate
the data obtained from systematic literature review. This was done by asking the
interviewee (researcher) questions regarding their experiences while conducting
surveys and suggestions to finally draft a checklist of problems along with the
mitigation strategies for conducting surveys. Thematic analysis was used to an-
alyze interview results.
Figure 3.1: Research Overview
3.1.1.1 Systematic Literature Review:

The systematic literature review is "a means by which all the available literature
relevant to a research can be identified, evaluated and interpreted ”[63]. As shown
in Figure 3.1 Systematic literature review was conducted in five steps [63]:
• Initially the all the possible key words were identified and those related to
our research objectives were defined.
• A predefined search string was formulated using these keywords.INSPEC
and SCOPUS were the two databases used in our research. So, two search
stings were farmed at this stage.
• A set of primary studies were obtained from the search results and the
articles were refined by inclusion,exclusion criteria. This task was performed
by reading the abstracts and titles of every article.
• In this step the relevant data was extracted using data extraction forms
from the final set of primary studies.
• Finally the drafted information was analyzed using narrative synthesis in
our research.
Motivation: The area of empirical software engineering is vast by nature and
we are determined to obtain the utmost optimal results with this research. we
chose systematic literature review over snowballing, tertiary reviews and literature
reviews because of its research potential to summarize the existing evidence [63].
Our choice has been clearly motivated by the arguments in this section below.
The following reasons made us believe in Systematic Literature Reviews.
• Helps to synthesizes the existing work.: Systematic literature review is said

to synthesize the existing work in a fair manner[63]. We aim to gather the
common problems while conducting surveys; hence we would need ample
amount of relevant information. Although there is huge amount of data
available, we should identify the data, which is most relevant for us.
• Helps to synthesizes the existing work.: Systematic literature review is said
to synthesize the existing work in a fair manner[63]. We aim to gather the
common problems while conducting surveys; hence we would need ample
amount of relevant information. Although there is huge amount of data
available, we should identify the data, which is most relevant for us.
Why not Tertiary Reviews and Systematic Mapping Studies ?
A Tertiary Review is possible "if there are vast number of systematic litera-
ture reviews already existing in a research domain"[63]. Since, our chosen area
was vast and there are limited systematic reviews with proper quality related to
our study; we have not considered tertiary review as our main research method.
Systematic mapping studies can be one of the research method suitable for our
study but was not recommended for our research because we aim to achieve a
broader coverage i.e. when a little evidence is likely to exist, systematic mapping
was mostly preferable, which was not possible in our study because we need solid
evidence by performing systematic literature review to identify all the problems
in surveys. Analysis of systematic mapping studies doesn’t include any synthesis
or techniques but are done mostly by summarizing the data, which is one of the
reasons for choosing systematic literature review over systematic mapping [63].
Why not Snowballing ?

Kitchenham guidelines and snowballing are two commonly used approaches
for conducting systematic literature reviews in software engineering.Snowballing
process can also be used as a search approach for our study but was not preferred
as our research method because it doesn’t give effective results as systematic lit-
erature review i.e. missing of many relevant papers due to iteration search and
not in a systematic way in which there exist no predefined search string but only
a set of articles cited in references. The potential threat of using snowballing
approach cited by Wohlin and Jalali [55] that “we might find several papers from
same authors since there previous research is usually relevant and is cited. Thus
the results of snowballing approach may be biased by over presenting specific
authors’ research where as the database search approach performs searches on all
papers in the database which eliminates this risk". In snowballing the judgment
of papers will based on the title of the paper in the reference list where the lack
of consistent keyword may lead to missing of relevant articles by excluding them
whereas in systematic literature review the judgment of papers will be done on
reading the title, abstract and conclusion [55].
Why not literature review ? Although we could perform literature review

we have opted for systematic literature review because literature review is of low
scientific value and it also doesn’t include any predefined search strategy and
quality assessment criteria like in systematic literature review [63]. Change of
missing the articles is more possible in literature review compared to systematic
review[63].
3.1.1.2 Interview
“Interview is a data collection method of eliciting a vivid picture of an individuals
perspective on the research topic. It involves a meeting in which the researcher
asks a participant a series of questions. This method is useful as the researcher
can ask in-depth questions about the topic leading to a an fruitful discussion, a
follow-up is also possible with the interview participants” [119]. There exist many
other data collection techniques like survey questionnaires used by the researchers
that could be compared with interviews. In case of survey questionnaires the data
is collected from a large sample but bias exists in the results if the survey ques-
tionnaires are not administered properly, while the sample in interviews may be
small, the interviewees are selected according to the requirements, which results in
more promising results. Contrary to such designing and distribution of such ques-
tionnaires, the interview process requires less knowledge and background work.
Based on the level of “structure” interviews are classified into three categories
among which Structured Interviews are top of all, this is similar to the question-
naires but the interviewer expects short and immediate answers leaving no room
for further discussion on any other issues. Unstructured Interviews are on the
bottom of the structure, which happen by considering only limited topics, due
to much discussion rather than answering given questions it sometimes becomes
difficult for the interviewer to transcribe even if it’s the difficulty arises while
generalizing the results for analysis. Semi-structured Interviews are widely used
as it has combined benefits other unstructured and structured by means of open-
ended and close-ended questions the interviewer gets additional information along
with the required information [103]. In our research semi structured interviews
were used to know the experiences of researchers in conducting surveys and to
validate the findings of our SLR execution.
Motivation:Interviews are best sources to have in-depth discussions on spe-
cific topics. They help to collect qualitative information, that helps to validate the
findings from other research studies(like survey, case-study or experiment)[121][101].
The efficiency of the qualitative data obtained depends on selected subjects, ques-
tions asked and analysis of interview results. Interviews also help us to collect
personal opinions, and additional information related to our research that can be
compared and added to our results. Since the outcome of our research will be use-
ful to software engineering researchers, interviewing them will help us to present
in a better way based on their recommendations. These reasons motivated us to
select interviews as one of the research methodology.
3.1.1.3 Data analysis:

In our research context, interviews help to validate our findings from systematic
literature review. The collected data from the research method was analyzed in
this phase. We have used narrative synthesis for analyzing SLR data and thematic
analysis for analyzing the data gathered from interviews(research method). This
was the most crucial step for any research method because the result of the study
was presented in this section.
Narrative Synthesis: Using narrative synthesis novice researchers can pro-
duce a simple summary of their research findings. Other techniques like meta-
ethnography, Bayesian Methods requires strong domain knowledge[102]. This
motivated us to select narrative synthesis for validating and analyzing the in-
formation we collected using systematic literature review. We have summarized
all data from SLR and after the synthesis process we acquired a lot of evident
data. The obtained results are very effective which also helped us later during
our research.
Thematic Analysis: In our research we are neither going through data
patterns nor proposing a hypothesis thorough obtained results. We aim to analyze
the data collected by face to face interviews of software engineering researchers.
This motivated us to choose Thematic analysis over other data analysis methods
like Grounded theory and hypothesis testing[119]. “Thematic synthesis draws on
the principles of thematic analysis and identifies the recurring themes or issues
from multiple studies, interprets and explains these themes, and draws conclusions
in systematic reviews” [22]. We know that the information we collect during any
interviews will be based on real time experiences and events. “It can also be
a constructionist method, which examines the ways in which events, realities,
meanings, experience, and other aspects affect the range of discourses” [22].
Comparative analysis: Comparative analysis approach is the method used
to compare the findings and validate the differences between them [45], [110]. In
our study the results obtained from SLR i.e. the problems faced by researchers
while conducting surveys and how various factors affecting the survey outcomes
was compared with the results validated from the interview questionnaire by
knowing the experiences of the researchers.
3.2 Systematic Literature Review

It is "a systematic way of representing a body of knowledge on a research area or
topic and a means of evaluating and interpreting all available research relevant to
a particular research question, topic area or phenomenon of interest"[125], [63].
Systematic literature review is mostly preferable for our study because it was
methodical, thorough and repeatable [125].

Systematic literature review involves the following three main phases [63]:
• Planning the review.
• Conducting the review.
• Reporting the review.
The first phase represents planning, which involves the need for review, specify-
ing the research question, developing and evaluating the review protocol, which
details the SLR process. The second phase (conducting the review) involves
identification of research, selection of primary studies, quality assessment, data
extraction and synthesis. The third phase covers the complete review report [63].
3.2.1 Planning The Review

This section illustrates the steps followed in a systematic way and also the devel-
opment of review protocol.
Need for a review: The purpose of SLR was bi-fold. Firstly, for identifying the
common problems faced by software engineering researchers for conducting sur-
veys and secondly to analyze how various factors like sample size, response rate
and analysis techniques affect the survey outcomes through existing literature.
3.2.2 Specifying the research questions

In systematic literature review a research question plays a major role because
the final results of any study is based on RQ. It must be well defined as the
data extracted in the forms were based on the defined research questions. The
following research questions determine our expected outcomes in our study.
RQ1: What are the common problems faced by software engineering re-
searchers while conducting surveys?
RQ2: How do the variables like sample size, response rate and analysis tech-
niques affect the survey outcomes?
3.2.3 Developing the review protocol

The method of performing systematic literature review, which includes search
strategy, study selection, data extraction and quality assessment criteria are ra-
tionalized in this section. A predefined protocol reduces bias [63].
3.2.3.1 Search strategy

Systematic literature review provides us an opportunity in formulating a search
string, which helps us in identifying the most relevant data, which is useful for this
research. A systematic review search log was maintained that helps to include
the studies from different database [63].
1. Search keywords: The search string is formulated based on the keywords

defined. Below table 3.1 shows the initial set of keywords identified from
the research objectives but depending on the search results, these keywords
have been modified accordingly. Set 1 represents the initial set of key words
identified and set 2 represents the final set.
Table 3.1: Search Keywords

Set number Keywords
Set 1 Software Engineering,Survey, Guidelines, Problems, Limita-
tions
Set 2 Software Engineering, Survey, Problem,Issue, Chal-
lenge,Drawback
The guidelines provided by Kitchenham et al. [63] had suggested the use
of PICO method as an efficient way to formulate a search string. Hence we
have opted to use PICO method for developing our search string. PICO is a
combination of 4 stages namely, Population, Intervention, Context and Outcomes
as shown in Table 3.2. We have implemented the same methodology on our
research area and we have obtained the following results.
Table 3.2: PICO Strategy

Population Software Researchers
Interventions Systematic Literature Review
Context Software Engineering, Empirical Surveys
Outcomes Checklist, problems, mitigation
The selected keywords were crosschecked with Research Questions to check

whether they were aligned with research objectives and outcomes or not.The final
keywords were shown in Table 3.3.
Table 3.3: Final Keywords

Research Question 1 Problems , issues
Research Question 2 Sample size, response rate, analysis methods
• Data Sources: The articles that are used within the research are from the
advisory databases within the BTH student portal. The identified databases
that are used are as follows:
• INSPEC
• SCOPUS
The reasons for selection of two databases was their relevance to software en-
gineering domain, associations within these databases providing a wider scope
covering a range of publications. Another important reason was to avoid du-
plication of articles, the databases like Google Scholar and IEEE might result
in duplicates. The INSPEC and SCOPUS have the most comprehensive and
broader coverage [71] and they perform overall gathering of publications from
different databases.The full text articles that weren’t accessible using Inspec and
Scopus were downloaded from Google Scholar. High degree accessibility of arti-
cles and a simple text box interface were of additional help while using Google
Scholar [71].The suggestions from the BTH librarians were also considered in this
case and discussed with the supervisor.
Table 3.4: Search strings
INSPEC ((((((("software engineering") WN KY) AND ((survey)

WN KY)) AND ((problem OR issue OR challenge OR
drawback) WN KY))) AND ((ja OR ca) WN DT)) AND
(english WN LA))
SCOPUS TITLE-ABS-KEY ( "software engineering" ) AND
TITLE-ABS-KEY ( survey ) AND TITLE-ABS-KEY (
problem OR ISSUE OR challenge OR drawback ) AND
TITLE-ABS-KEY ( researcher OR practitioner ) AND
DOCTYPE ( ar OR cp ) AND ( LIMIT-TO ( LAN-
GUAGE , "English" ) )
We have used the above search string in the databases along with several
Boolean operators like AND, OR. In order to obtain results that are most relevant
for our research, we also had to use few search operators like DOCTYPE, “ ”, in
our search string as shown in Table 3.4. Suggestions from BTH librarians were
also considered before finalizing our search string.
3.2.3.2 Study selection criteria

Study selection criteria are intended to identify those primary studies that provide
direct evidence to the research question [63].
Inclusion and exclusion criteria

This process will enable us to refine articles that will be consistent with respect
to the research area. In some cases we include few inappropriate articles and
neglect some relevant articles, in order to overcome these flaws a proper inclusion
and exclusion criteria is necessary. The following points illustrate the criteria we
applied in our research for inclusion and exclusion.
The related or relevant papers to the research were selected based on the in-
clusion and exclusion criteria.
Inclusion criteria: This section illustrates about the articles to be included
in the research. We had formulated a set of ground rules that every paper or
information which we collect should meet. The inclusion of that particular in-
formation or paper in our research will happen only if it meets the below set of
ground rules.There were no stipulated time period that was set for article selec-
tion. All the articles irrespective of their publication year were selected to make
sure all the relevant information were considered for our study.
• All the relevant articles related to software engineering domain were in-
cluded.
• Only English articles were included.
• Full text articles that were openly accessible were included.
• Only peer reviewed articles were considered.
• Only published articles obtained from Inspec and Scopus databases were
included.
• Articles relevant to our keywords were included.
Exclusion criteria: All the articles that doesn’t fulfill inclusion criteria were
excluded under exclusion criteria.
3.2.3.3 Quality criteria

As mentioned earlier in this paper, we aimed for quality of information in which
we collected, we have developed criteria to make sure our information has met
a certain quality parameters. A quality assessment criterion was performed to
analyze the quality of the considered studies to reduce bias and also serves more
precise inclusion and exclusion criteria[63]. We have developed few questions that
will we apply to all the gathered information and decide if the information can
be effectively used in our research. We have formulated the questions in Table
3.5 based on our research study to check that each and every paper from the
databases is validated based on these parameters.
Table 3.5: Quality criteria
QC 1: Does the paper identify required objectives fulfilling the aim of

our research?
QC 2: Does the paper have contribution to the field of software engi-
neering?
QC 3: Does the author clearly document the research process?
QC 4: Does the final conclusion relate to the aims and objectives?
3.2.3.4 Data extraction strategy

According to the guidelines provided by Kitchenham et al.[63] Data extraction
forms need to be piloted on a sample of primary studies, the data extracted from
the primary studies were recorded in data extraction forms after filtering the
articles based on inclusion, exclusion and quality criteria. The relevant data was
listed in the way to answer the above research questions. Data extraction process
is implemented exclusively on all the information, which is collected after the
inclusion process. The results in table 3.6 will provide us a clear picture of all the
information and also we will be able to assess the quality of it. The below tabular
format is the template of the data extraction method which we have implemented
on all the collected information[63].
Table 3.6: Data Extraction Form
SNo Column Name Comments

DC1 Article Identifier Serial number
DC2 Article Name Title of the article
DC3 Authors Authors of the paper
DC4 Year Published year
DC5 To which SE Domain is it related? Type of research area
in SE domain
DC6 Is the Survey done? Yes/No, if Yes- ex-
plain
DC7 What is the Survey Outcome? Yes/No, if Yes- ex-
plain
DC8 Who were the Survey Respon- Yes/No, if Yes- ex-
dents? plain
DC9 Survey Problems Specified Yes/No, if Yes- ex-
plain
DC10 Scale Types Used Yes/No, if Yes- ex-
plain...
DC11 Types of Sampling Techniques Yes/No, if Yes- ex-
plain
DC12 Descriptive Statistics Yes/No, if Yes- ex-

plain
DC13 Hypothesis Testing Yes/No, if Yes- ex-
plain
DC14 Other Analysis Techniques Used Yes/No, if Yes- ex-
plain
DC15 Response Rate Yes/No, if Yes- ex-
plain
DC16 Duplication’s,if any?How Yes/No, if Yes- ex-
many?How were they avoided? plain
In the above blueprint, we can find all the relevant columns that we con-
sidered as parameters and corresponding activity, which is performed. We have
formulated these parameters with utmost care, so that the results will be greatly
refined and effectively useful for our research.
3.2.3.5 Data synthesis

The data collected from the primary studies were summarized in this section with
respect to narrative synthesis.
3.2.3.6 Evaluating the review protocol

The review protocol defined earlier should be validated in order to reduce bias.
Pilot study was proposed in this section and the review protocol was validated by
the supervisor including the search strings and data extraction forms. Discussions
and meetings were made with librarian and supervisor in order to validate and
finalize the search string.
3.2.3.7 Pilot study

Piloting the study is one of the most crucial steps neglected by many of the
researchers. “It was essential because it finds the mistakes in data collection
and aggregation procedures” [63]. Every thing related to the review protocol
like search strategy, data extraction forms, quality assessment was piloted and
reviewed by the two authors along with the supervisor in this research.
3.2.4 Conducting the Research

After the protocol development and evaluation, next stage was conducting the
review process according to the protocol.
3.2.5 Identification of research

The purpose of systematic literature review was to spot out as many as primary
studies equivalent to the research questions described above. The quality of good
primary studies depends on holding an unbiased search strategy. The steps per-
formed in the search process were discussed in the above section 3.2.3.1 Initially
a set of keywords was specified relating our research questions and were further
modified based on the search results. PICO strategy was used to define the search
string as mentioned in the Kitchenham.et.al guidelines [63]. Boolean operators
like “AND” and “OR” were used in the search strategy for identification of research
Fig 3.2.
Figure 3.2: Primary Study Identification
3.2.6 Primary studies

‘The primary studies for our research study was considered from two databases
INSPEC and SCOPUS and the motivation behind selecting these databases was
discussed in section 3.2.3.1. The formulated search string, which is relevant to our
research, was being used to identify the related articles and we have also included
some articles relevant to our study based on the manual search. Initially we
identified 745 studies from both the databases, which include 584 from INSPEC
and 161 articles from SCOPUS. After using the reference management tool for
ruling out the duplication of studies, 150 articles are identified and excluded. We
finally got 595 articles without any duplication. Inclusion and exclusion criteria
is performed on this set of 595 articles where 382 studies were excluded based
on the criteria followed in section 3.2.3.2. From the set of 213 studies 69 were
excluded after reading the title, abstract and conclusion. Finally by reading the
full text articles 78 studies were excluded from 144 articles and the papers which
include every aspect of our research question like papers describing or mentioning
the problems faced by researchers while conducting surveys and also the studies
which describe about there sample size, response rate and analysis methods are
only included as our primary studies which finally determines 66 articles are
included for our SLR execution. This process is represented in below figure 3.3
3.2.7 List of selected studies

This section mentions about the total number of articles chosen from our SLR
analysis. The final set of primary studies included in our research was articles
Figure 3.3: Primary Study Selection
from both the databases. Out of selected 745 studies only 679 did not fulfill the
inclusion criteria. So only 66 primary studies were considered.
3.2.8 Quality assessment criteria

Quality criterion was one of the most crucial steps to be followed to recognize the
quality of each study. Significance of each paper was judged based on questions
mentioned in the data extraction form where each study was considered individ-
ually and crosschecked whether the paper was quoted according to the criteria
mentioned above. Clash regarding the judgment of papers was mitigated by the
mutual agreement between the two authors. The below table 3.7 illustrates the
results obtained by performing quality criteria.
Table 3.7: Quality Criteria Table
Results
S. No Quality criteria
Yes (1) Partial (0.5) No (0)
Does the paper identify required
QC1 objectives fulfilling the aim of 34 32 0
research?
Does the paper have contribution

QC2 to the field of the software 64 2 0
engineering?
Does the author clearly document
QC3 41 17 8
the research process?
Do the final conclusions relate
QC4 27 22 17
the aims of objectives?
Article scoring was given as 1,0 and 0.5 where 1-yes 0-no 0.5-partial The total
score of the article was considered to access the quality criteria. If the quality
criteria of a particular study have all 1’s then these studies were clustered as high
quality. If the quality criteria of a particular study were greater than or equal to 2
then these studies were considered as medium quality and finally when the quality
criterion of a particular study was less than 2 they were grouped as low quality.
The following table illustrates the range of quality for individual studies. Quality
range table 3.8 shows the range of studies with high, medium and low quality.
Table E.1 in Appendix listed the quality criteria for all the primary studies.
Table 3.8: Quality Range Table
Quality range Number of studies

High quality 20
Medium quality 42
Low quality 4
3.2.9 Data Extraction Strategy

Data extraction strategy was categorized to retrieve the necessary information
available from the primary studies after the accomplishment of the review proto-
col. The authors of this study documented the information using data extraction
forms. This process was implemented by both the authors and crosschecked in
order to reduce bias and also to avoid the chance of missing any valuable in-
formation. Each individual study was scrutinized by the authors and recorded
according to the template provided in data extraction from by using MS excel
sheets. Finally this extracted data was used in our further research like analysis
of our study.
3.2.10 Data synthesis

" Data synthesis involves collating and summarizing the results of the included
primary studies” [63]. The data extracted from the primary studies i.e. recorded
information in the excel sheets was utilized to perform this phase. The findings
from the SLR were summarized in order to answer the research questions i.e. the
problems identified by conducting empirical surveys and the effect of variables like
sample size, response rate and analysis techniques affect the survey outcomes.
The continuation for the systematic literature review protocol has been dis-
cussed in Section 4.1
3.3 Interview
The qualitative research method used for the research question 3 is opted to be in-
terviews. The interview is data collection method where the verbal exchange takes
place between two persons, as the interviewer retrieves the required information
from other person experience. The interviewee presents their own experiences,
beliefs and behaviors either as a consumer or as an employee, it is job of the
interviewer to retrieve the information that is best suitable for their research.The
other data collection techniques such as questionnaire were not used as they have
large sample base and the results obtained are more generalized [103] whereas
interviews have small sample data which are based on the requirements that tend
to produce better results.
3.3.1 Selection of Interview Subjects:

Initially a list of 20 software engineering researchers were chosen to be inter-
viewed, after discussing with supervisor. Request mails were sent to everyone
stating the research purpose and the need for their appointment. We received
nine replies stating their willingness for an interview. We conducted nine face-to-
face interviews. All the interviews were conducted for a time-span of 1-1.5 hours.
Face-to-face interviews were conducted in the researcher’s office, to avoid any dis-
tractions during interviews. The selected interview subjects had varied levels of
research experience in publishing surveys. The subjects included Professor, Assis-
tant Professor, Post doctoral researchers and PhD Researchers as shown in Table
3.9. Among the interviewed subjects there were researchers whose papers were
actually considered during our systematic literature study.Through this process
in one way we were validating the results obtained from SLR and also collecting
their personal experiences.
Interviewee Designation Interview Time(in

Number minutes)
1 Professor 80
2 Professor 90
3 Professor 60
4 Professor 40
5 Post Doctoral Researcher 60

6 Post Doctoral Researcher 60
7 Post Graduate Re- 90
searcher
searcher
searcher
Table 3.9: Interviewee’s Details
3.3.2 Interview design:

Generally, interviews are conducted either way individually or with group of peo-
ple, focus groups [103]. In this research we have conducted individual interviews
where interviews are done one person at a time. The characteristics of the inter-
view that we have conducted are as follows[59]:
• Open-ended questions:Through these questions we aimed for an extended
discussion of the topic instead of just an Yes or No. In this way interviewees
had the freedom of expressing their opinions based on their experiences.
• Semi-Structured format: We focused on getting an in-depth knowledge of
topic thorough interviews. This can be achieved if the interviewer has a set
of questions and issues that were to be covered in the interview and also
ask additional questions whenever required. Due to the flexible format we
have chosen semi-structured interviews.
• Recording of responses: The interviews were audio recorded with intervie-
wees consent . Field notes were maintained by the interviewer which were
helpful in the deeper meaning and better understanding of the results.
The results obtained from the literature review are developed as the question-
naires for the interviews. After each interview the questionnaires are modified
based on the previous interview and the order of the questionnaire is also made
flexible.
Interviews are conducted in the following process[12]:
i. Thematizing: This is the regarded as the initial step of any interview
process, here the main objectives for conducting an interview were defined, the
outcomes to be derived were shortlisted. The questions to be asked during the
interviews were drafted, the background of the interview subjects were studied to
make sure that proper domain knowledge is obtained for interviewing.
ii. Interviewing:Before the start of the interview an initial environment
has to be setup where the brief introduction of the researcher is presented such
as name and department. As for the next step the goal and objective of the
interview are presented to the interviewer so that no deviation of discussion takes
place. The participation of the interviewee is less likely without knowing the goals
and objectives of the study [104]. After the introduction the general interview
information is gathered from the interviewee such as his/her experience in their
field of research. The next main crucial step is to focus on the interview question
from which the required data from the interviewee are gathered.
iii. Transcribing:At this stage the discussions made between the respondent
and the interviewee and the data gathered in the interview that are recorded
as audio file are converted into the text file. The interviews were transcribed
manually by listening to audio tapes.
iv. Analyzing:The analyzing stage is most crucial stage where the required
data for the research are extracted from the transcribed data and the interviews
that are recorded. A systematic organization of the questions is done in accor-
dance with the research questions so that ambiguity can be avoided.
Questions for Interviews:Generally, the questions asked in interviews are
framed with the intention of collecting required information from the intervie-
wees. The factors like time allotted for interviews, number of interviewee’s, re-
search theory, expertise of interviewee’s, etc. all impact the framing of interview
questions.
Among all, the research theory plays an important role. It is majorly divided
into two types:
1. Inductive Theory: Research which is formulated from the existing lit-
erature and theories. The interview questions generated from this theory are
influenced by the questions existing in the published papers.
2. Deductive Theory: Research which is a result of the practices or experi-

ences. Generally, this kind of research is hard to initiate, administer and propose.
Being practical oriented, the researcher must have a lots of expertise from his/her
field while performing the research.
The interview questions are influenced by any of the two above mentioned
methods. While designing the questionnaire we have to check for the following
conditions [103]:
i.They have no implicit assumptions.
ii.Avoiding the framing of questions that leads to single word(yes/no) answers.
iii. Preventing too generalized questions.
The ordering of the questions is to be done carefully such that one question has
to be naturally leading to the formation of other question and also the conclusion
of the interview. At last a pilot study has been done with the supervisor before
the interview is taken to make sure if any necessary changes have to be made.
The interview questionnaire can be found in Appendix-C
3.3.2.1 Interview setup:

Once the participants were finalized and the design of the questionnaire is com-
pleted, the planning and setup of the interview has to be done. Here an invitation
was sent through an electronic mail stating the name of authors, aim of the re-
search and seeking their availability. Basing on the availability the interviews
were scheduled at their convenient time. A conformation mail has been sent to
make sure the meeting interview time is fixed. The interviews were conducted in
the meeting rooms, so that no distractions have taken place. The continuation
to this interview process is specified in the Section 4.4
Chapter 4
Results and Analysis
4.1 Reporting Results from SLR

Reporting the review was the concluding step for performing the systematic lit-
erature review. Findings from the entire process were reported in this phase.
4.1.1 Selected primary studies

Primary studies for our research were extracted from INSPEC and SCOPUS.
The motivation for choosing these databases was explained in the above sections.
After inclusion and exclusion criteria a total of 66 primary studies were obtained
from both INSPEC and SCOPUS, 24 studies were from INSPEC and 42 studies
were from SCOPUS. Table A.1 and Table 4.1 which gives a clear description of
how different primary studies were included in the selection process.
List of Discussing only Discussing only Discussing both

Primary about Analysis about Problems problems and
studies methods analysis
SCOPUS 37 5 42
INSPEC 22 2 24
TOTAL 59 7 66
Table 4.1: Primary Study Classification
4.1.2 Publication year

We didn’t have any time constraints while selection of primary studies. All the
articles were considered but due to our inclusion criteria, we could only get articles
from the year 1981 till present year. There could be possibility that studies before
1981 might have information about surveys, we think that authors might have
referenced those articles in their work.In this way we could get information from
previous years as well. This issue is similar to not considering all the primary
studies, so we have discussed about this threat in Section 5.2. When observed
37
Chapter 4. Results and Analysis 38
from the graph in Figure4.1 in late 90’s surveys were conducted but in a small
number. In the last ten years there had been a considerable increase of performing
survey for data collection. Common problems like low response rate, quality of
outcomes, duplication existed almost every time. In the past primary studies
discussing such problems existed but were limited, in the recent years there is a
substantial increase of studies wherein survey is used as the research methodology.
Issues like bias, reliability, cultural issues etc. could be the problems worrying
researchers in modern days.
Figure 4.1: Year Classification Graph
4.1.3 Domain
Primary studies that discussed about problems in survey methodology were con-
sidered from every domain. Below figure 4.2 shows the number of studies included
from each domain. Least number of studies discussing about survey problems
were from Software Security, Metrics and Object Oriented techniques domain,
reasons for this must be usage of experiments and case studies used for data
collection. In domains such as Software project management, requirements en-
gineering, software testing it is evident that surveys, interviews, case studies or
experiments were used as validation methods. But in the domain of software
development and software engineering researchers published many articles dis-
cussing survey problems and their mitigation strategies.
Figure 4.2: Domain Classification Graph
4.1.4 Rigor and Relevance:

Any research is said to have some potential and contribution only if it has been
carried out properly and if it results are reported correctly. Correctness of any
research is defined by the rigor of that research. Rigor also constitutes proper
way of reporting a study. When a study is not properly reported researchers face
difficulty in applying it or replication it to a different context. The ability of
research replication is called relevance. We consider academic relevance in our
context, which is the ability to get published or to get cited in other researchers
[121][54] . This explains the importance of Rigor and Relevance of any research.
This motivated us to do the rigor and relevance analysis for every primary study
included. Evaluation model provided by Tony Gorschek and Martin Ivarsson was
used for this analysis [54].
Rigor:Authors specified that three aspects which are needed for calculating
Rigor of a study namely:
• Context (C): Gives description of the problem and its relation to the pre-
vious research.
• Study Design (S): Information about products, services and resources used
for the study evaluation.
• Validity Threats (V): Information about any kind of threats, limitations or

problems faced.
Rigor = C+S+V. Rubrics for each aspect should be considered as Strong

(1), Medium (0.5), Weak (0).
Three aspects were mapped to our research and Rigor was calculated for every
primary study as shown in Table 4.2
Table 4.2: Rigor Table
Context Description (C): Motivation for describing the selection of sur-

vey as research methodology.
Study Design (S): Following questions were considered to ensure rigor of
study design:
• How survey was designed?
• How sample size was selected?
• On what basis was respondent selection done? Motivation for survey

process conducted
Validity discussed (V): Primary studies were checked for below given
conditions:
• Authors of research paper clearly described the analysis methods for

the data obtained from survey methodology.
• The paper must discuss threats to validity for their study.
Relevance (Re): Authors specified four aspects which are needed for calculat-
ing Relevance of a study namely:
• Context (C): Gives description of the problem and its relation to the pre-
vious research.
• Scale (Sc): Scales used for evaluation.
• Users or Subjects(U): Description of subjects or users involved.
• Research Methodology (RM): Description about the research methodology
used.
Relevance (Re) = C+Sc+U+RM. Rubrics for each aspect to be consid-
ered as 1 if contribution exists and 0 is used if there is no contribution. Three
aspects were mapped to our research and Rigor was calculated for every primary
study as shown in Table 4.3.
Table 4.3: Relevance Table
Context (C): Survey is performed by researchers.

Subjects or Users(U): Software Engineering Researchers.
Scale(Sc): Primary Studies should have following:
• Description about Survey design
• About conducting survey
• Giving survey guidelines.

Research Methodology (RM): Survey is done in the primary study.
The Rigor and relevance for every primary study considered were calculated
and listed under F.1 in Appendix. The Bubble plot for Rigor and Relevance
values has been shown in Figure 4.3.
Figure 4.3: Rigor and Relevance Bubble Plot

4.2 Analysis of Literature showing the problems

faced by software engineering researchers in
survey process
This section gives description of all the problems identified using systematic lit-
erature review. The mitigation strategies are listed if they are mentioned in the
literature, otherwise only the problem description is written.
In the below list of problems P1, P2, P3, P4 are specific problems regarding
global surveys:
P 1 Geographical Coverage: When researchers conduct survey pertaining to
a single country, results obtained cannot be generalized. Geographical coverage
issue occurs when researchers focus on a particular geographical location and
conduct the survey. Bias in results occurs when sampling selection is not done
in a proper way, results obtained in this kind of survey cannot be generalized.
For example, if a researcher’s conducts a survey in one country to collect list
of functional requirements in the software industry, they cannot generalize the
results with respect to other country. Instead they have to conduct same survey
in other country to take the results to get an overall idea [117], [42].
P 2 Translation Issues: Translation issue is one of the common problems
faced in global surveys. Avgerio et al. [123] conducted a global survey in Europe
and China, to collect data from a diverse population. Authors posted a question-
naire after translation, as a result of poor translation data loss occurred. It led to
misinterpretation by the respondents leading to false answers. This problem can
be handled when researchers working same domain of the same origin are involved
in translation process. Language issue like accent and sentence formulation can
be handled in the same manner[47], [56].
P 3 Cultural Issues: Conradi et al[56] clearly explained the impact of cul-
ture on response rate. They argued that socio-economic positions of respondents
might hinder the way of answering the questionnaire. Authors showed that col-
lectivism had direct influence on the information sharing, where collaboration is
high in their self-groups and people are not interested to share information to
the researchers out of their group”. This issue can be handled by carefully de-
signing questionnaire by clearly knowing the cultures of the respondents whom
researchers would like to survey.
P 4 Country-specific issue: Researchers approach towards conducting a
survey process differs from one country to another. In one country researchers
might collect information from Census bureau, motivate respondents using re-
search reports and exclude incomplete questionnaires. In other country researchers
get access to information via software industry database, they motivate the re-
spondents using rewards and they might exclude incomplete questionnaires. This
issue cannot be mitigated but can only be handle to some extent [56].
P5 Question-Order Effect: Question-order effect means a question pro-
viding details for the next question, leading the respondents to write a specific
answer. This issue can be mitigated by the authors by randomizing the ques-
tions of the questionnaire. Margaret et al.[47] faced the same issue in their
survey, instead of randomizing they designed the questionnaire based on nat-
ural actions-sequence helping the respondents in recalling and understanding the
questionnaire. Garousi et al.[42] prevented this problem by designing a question-
naire looking at the similar surveys conducted in past. Once the questionnaire
has been designed the authors contacted several industrial practitioners for peer-
reviewing. In this way the authors conveyed useful information and also handled
the problem of question-order effect.
P 6 Survey Instrument Flaws: Survey Instrument flaws is one of the con-
clusion validity threat. Martini et al. [47] were really cautious in their method-
ology to avoid such flaws in their instrument design. They iteratively pretested
the whole process by first carrying it on known subjects and then on actual
subjects. Discussions with colleagues and domain experts were also the part of
pre-test process. Gorschek et al.[46] have also done redundancy check in addition
pre-tests and expert discussion to handle the Survey Instrumentation Problems.
Authors Travassos et al.[111] used external researchers that are not involved in
the research and reformulated the questionnaire based on their reviews.
P 7 Likert Scale Problems: Likert scale is one dimensional in nature, re-
searchers mostly use this in surveys with an assumption that respondent’s opin-
ions are mapped on to any one of the dimensions. In a realistic scenario it’s not
true, some respondents might get confused on what responses to pick, settling for
the middle option. Analyzing the results obtained by higher order Likert scales
are tiresome for analysis posing a threat of misinterpretation or data losses [33].
P 8 Survey Questionnaire Design: In practice every question in survey
questionnaire cannot be mutually exclusive and exhaustive. It could be a common
problem which arises due to poor designing. Sometimes questions are ambiguous
confusing respondents or they are leading which might not capture the whole idea
of survey methodology. With reformulations in questions a survey design is never
said to obtain required responses. Pilot surveys can handle the issues of survey
questionnaires [33], [48], [46].
P 9 Randomness of Participants: When a survey is conducted at a large
scale involving many respondents, sample selection must be done cautiously to
avoid any bias. Randomness of participants can handle the issue of bias, Garousi
et al.[41] used different publicity tools to achieve random set of samples, thereby
mitigating this issue.
P 10 Insufficient Sample Size: Insufficient sample size is the major threat
for any software engineering survey. Meaningful statistical evidences cannot be
obtained even when the parametric tests are applied on to a particular sample
due to insufficient size [83], [87].
P 11 Improper Participant Selection: Improper participant selection
happens when the selected respondent sample is not representative of the whole
population [122]. It’s the main cause of issues like generalizability and biasness.
Automated way of selecting the respondents can reduce the impact of this issue
on the overall survey outcome [8]. Selecting the respondents based on a set of
criteria (that are defined at the survey instrumentation stage) can also reduce the
chances of improper selection [111].
P 12 Low participation rate:Low participation rates or low response rate
are one of the main problems faced in any kind of surveys. Low participation
of sample was mainly due to the busy schedules of respondents, poorly designed
survey layout, lack of awareness about survey and long surveys are one of the
reasons and can be improved by some ways like “emails for the survey were sent
from the personal email address to their own contacts because invitations from
known people are less likely to be tagged as spam” which was discussed in detail
in the further sections [40], [118], [88], [25], [3].
P 13 Sampling Method: Garousi et al.[42] discussed the motivation for sub-
ject selection based on Sampling methods. Authors describe the motivation for
researchers selecting convenience sampling over other techniques, reasons being
least expensive and troublesome. Thus a proper sampling technique is essen-
tial constituent of any good research. This problem of selecting wrong sampling
method can be handled by considering trade-offs between factors like anonymity
and reduction bias.
P 14 Lack of Motivation for Population Selection: Many researchers
fail to report their motivation for sample selection. Surveys of that kind are
difficult to replicate due to absence of openness. Wohlin et al.[121] showed that
if a research cannot be replicated, the purpose of doing it is not met. This shows
the motivation for selecting population [56].
P 15 People Perceptions: Perception of people answering the survey ad-
versely impacts the survey outcome. In software engineering a survey is done to
collect the attitudes, facts, and behaviors of respondents. Perceptions vary from
person to person which is the main reason, this why they are most important than
any other assessment method or tool. This issue cannot be mitigated completely
but can be handled to some extent [117].
P 16 Lack of Domain Knowledge: A posted survey could be answered
by the respondents without proper domain knowledge. This leads to misinter-
pretation of questionnaire resulting not participation of survey or giving wrong
answers. Lack of common understanding also leads to these kind of problems.
The inconsistent responses obtained due to such misinterpretation gives results
which have negative impact of survey outcomes [123], [15][83], [56] [46] stressed
the need for considering the impact of background influence of the subjects on
survey results while surveying.
P 17 Boredom: Sometimes respondents start answering the surveys but
they lose interest after sometime as the survey progresses, boredom leads to the
low response rate. Lengthy surveys might a reason for the respondents to feel
bored [40] Martini et al. [83] made interruptions in the middle so that respondents
avoided.
P 18 Time Limitation: Time limitation limits the response rate in many
surveys. This factors influences every research at great extent. Darje et al.[88]
showed that time limitation is the main factor for respondents not answering
questionnaire or taking phone interviews. It can be clearly seen from these lines
“all the 13 respondents were asked to take part due to time limitation we obtained
only 9 responses.” Sometimes researchers neglect the responses obtained from the
actual subjects due to time limitation, following lines discuss about this issue “due
to rather low response rate and time limits, we have stopped on 33 responses which
covers 13.58% of the Turin ICT sector”[29].
P 19 Busy Schedules: Busy schedules are the main reasons for low re-
sponse rates in any survey involving industrial respondents [40][3]. Ji et al.[56]
commented that “busy executives likely ignore the questionnaires, sometimes their
secretaries finish the survey. In some case the responses obtained are filled with
out by the respondents without domain knowledge”, this gives the explanation
for the less quality responses obtained.
P 20 Inconsistency in Responses: Inconsistency in responses mainly arises
if the respondents lack common understanding. Respondents give two contrary
answers for the same question. For example, a respondent is asked about his
familiarity with non-functional requirements, he marks an YES option, when
asked to explain about it in next questions, he either skips that question or give
a wrong answer just for the sake of answering the survey. This problem can be
handled by posing same question in different ways [123].
P 21 Correctness of obtained Responses: In large scale surveys this
problem arises where respondents just answer the survey without thinking of its
final outcome. During analysis this problem poses a lot of work to the researcher
as they need to eliminate all the incorrect responses. Since survey is all about
getting a bigger picture of a particular issue, with incorrect responses researchers
fail to achieve it. They can be completely eliminated just by making the survey
strictly voluntary and collecting only the responses only from the respondents
who are willing to contribute [123].
P 22 Response Duplication: A major problem is faced in open-web surveys
is response duplication, where the same respondent answers the questionnaire
more than one time[81][33][48].
P 23 Generalizability: Survey’s main aim is to generalize findings to a larger
population. Generalizability increases survey’s confidence. Small sample size is
attributed as the main cause for the lack of generalizability. If generalizability is
not possible then the whole aim of the survey is not achieved [25] [122] [46] [125]
[17] [41].
P 24 Evaluation Apprehension: There are some people who wouldn’t be
comfortable being evaluated, which affects the outcome of any conducted study
[121]. It’s the same case with survey, sometimes respondents might not be in
a position to answers all the questions instead they shelter themselves by just
selecting safer options. This affects the survey outcomes. Anonymity of subjects
reduced this problem of evaluation apprehension [46].
P 25 No Practical Usefulness: Any surveys that doesn’t prove to be useful
to the respondents, chances are much likely to skip the survey. Authors of [117]
clearly show this in the following lines “by far the study is interesting but to
whom are the results useful for?”. This issue can be handled by motivating the
respondents by giving description about survey outcomes and need for answering
survey.
P 26 Respondents Re-activity: This problem is generally overlooked and
can only be identified if the researcher thinks as a respondent as well. Respondents
while answering the questionnaire get into an idea and answer thinking that their
answers are right. This problem generally arises due to the question-order effect.
This problem is a respondent behavioral attribute and cannot be mitigated. This
problem if not mitigated drastically impacts the results. Researchers can only
make sure that this problem does not appear again by taking care of order of
questions [48].
P 27 Obvious Conclusions: When survey questionnaire is not clearly un-
derstood the respondents arrive at wrong conclusions about questions, as a result
they answer incorrectly [117].
P 28 Reliability: Reliability issue mainly occurs due to wrong answers or
misleading answers. For example, many respondents are not willing to admit
what they are truly working in their jobs positions. Some organizations face the
threat of evaluation apprehension. [15]. Pilot surveys can reduce the reliability
issues in a survey [87].
P 29 Credibility: For the survey methodology to be accepted by everyone,
the results need to be clearly presented. If not so, there are chances of study not
being considered. This internal validity threat can be eliminated by using coding
to categorize the responses obtained. Usage of more number codes for a single
answer increases its categorization capacity [47].
P 30 Confidentiality Issues: In some case software engineering researchers
would like to observe on-going trends in the industry or study about specific
industrial issues. But the software companies don’t allow the respondents to take
the survey due to the issue of confidentiality. This is problem was faced by one of
researchers in their survey “their companies wouldn’t allow employees to take this
survey due to concerns about confidentiality” [56]. This threat could be mitigated
by sending personal emails rather than system generated emails and by having
a follow-up with all those respondents till the survey ends. Even if this doesn’t
handle the issue then it’s better to have personal meeting to discuss about the
survey.
P 31 Bias: Bias or one-sidedness is the common problem during survey
process. There are different types of biases which damage the survey outcomes.
• Mono-operation Bias: Sometimes the instrument in survey process might

under present the theory involved, this is called mono-operation bias [121].
It can be avoided by collecting data from multiple sources, asking questions
clearly and framing different questions to address the same topic [46], [83].
• Over-estimation Bias: Sometimes the respondents of the survey over-estimate

themselves, introducing bias into survey results. Mello and Travassos [25]
identified that “LinkedIn members tend to overestimate their skills biasing
the results”.
• Social Desirability Bias:There are situations where respondents tend to ap-

pear in the positive light. This might be due the fear of being assessed by the
superior authorities. This has a lot of influence on survey outcomes. Miti-
gation strategy for this kind of problem could be maintaining the anonymity
in responses and sharing the overall survey result after reporting [47].
P 32 Hypothesis Guessing: A construct validity threat where respondents

guess the survey outcomes, they try to base that anticipation (hypothesis) towards
answering questions either in a positive way or a negative way Wohlin et al [121].
This leads to inaccurate survey results. This threat cannot be eliminated but
Gorscheck et.al [46] tried to mitigate by stressing the importance of honesty in
the introduction of the survey by means of a video and a web page.
P 33 Redundancy: Redundancy happens when many questions address the
same question. This conclusion validity threat happens due to the poor survey
instrument design. It can be addressed by conducting pilot studies and proper
questionnaire evaluation. This threat either misleads the respondent by giving
him a wrong idea about survey questions or the respondents can guess the next
questions, thereby introducing some bias [46].
P 34 Respondent Interaction: This is a conclusion validity threat. During
the survey process the respondents might interact with each other influencing on
answers of few respondents. In small surveys this threat has a large impact on the
survey outcome but in case of surveys done at large scale the impact gradually
decreases [46].
P 35 Gate Keeper Effect: In some countries the details of respondents
are not posted publicly rather a gate keeper (person having all the details of
employees) from that particular company is contacted by the researcher. The
questionnaire is then sent to the gatekeeper, then he forwards it to respondents in
that company. Sometime respondents don’t receive questionnaire resulting in the
less participation rate for a survey. This issue was reported by Conradi et al.[102]
in their research. Authors mitigated this problem by contacting IT-Responsible
for that particular company for getting respondent details [56].
P 36 Inaccuracy in Data Extraction: Inaccuracy of data might arise
when data extraction from questionnaire and result reporting is done by same
person. This threat can be avoided when two researchers are involved in this
process where one researcher works on the data extraction and other researcher
reports the results [111].
Convenient Clustering: Due to high number of problems obtained and
inability to validate every problem using Interviews, authors conveniently clus-
tered all 36 problems obtained. Finally a checklist of 15 problems were obtained
that were to be validated using Interviews. Initially problem dependencies were
checked for convenient clustering. The table showing the dependent and indepen-
dent issues is shown in Appendix B.
4.2.1 Recommendations for Surveys identified from Sys-

tematic Literature Review
• Before adapting a new kind of survey instrument researchers must do the suf-
ficient background work to know the existing practices. Researcher must ensure
that newly adapted survey instrument adheres to the current practices. Re-
searcher must be careful while reporting the new results obtained using such
survey instrument[15].
• Third party advertising can lead to more survey responses, Bacchelli et al[47]
obtained a 25% increase in responses rate by following this process. Here they
have published the survey results online and made them accessible to diversi-
fied population, this encourage the uninvited respondents to answer the survey.
Similarly Conradi et al[56] have published the results of their survey to Chinese
respondents .Deuresen at al [48] have used customized reports along with third
party advertising to increase their response rate.
• Survey invitations which are sent from personal emails addressing the respon-
dents, chances are that the responses rate is high. Emails of such are not
marked as spam and receive personal attention, as they are sent from recognized
person[42]. Travassoss and mello[25] have used the similar approach of sending
individual invitations and achieved higher response rate.
• Response rate can be increased by using convenience snowballing during sample
selection where the target population are selected and the questionnaire is sent
to them, then the respondents are asked to answer and forward it to their col-
leagues. Researchers have used this technique due to limited resources and also
ensure respondent anonymity[42][41].
• Population can be monitored and controlled by progressive publicity plan where
the publicity gets initiated and after a time frame again the publicity plan is ex-
ecuted, this makes it easy the researchers to obtain the patterns in the survey
results. Statistical inferences are obtained by observing exponential increase[42].
• Kappa statistics can be used to obtain genuine responses in a survey. It can be
done by comparing codes obtained from two researchers and then comparing them
with the kappa statistic, by doing so inter-rate reliability can also be checked[33].
• Specific sect of respondent’s population like testers, coders, quality team etc.;
can be recruited for a survey using GitHub, it helps to gather the respondents
who are actively working their respective fields. In this way target population can
be identified, number of responses are obtained as expected, but the researcher
must take care of bias he introduces due to convenience sampling[19][48].
• Researchers can attract the respondents by giving rewards like Amazon points,
vouchers gifts. They have to be careful about the responses obtained, since re-
spondents might just answer survey for sake of rewards or answer it twice[19][23].
• It is always better to know the preference of respondents before sharing the
questionnaire, some might prefer answering online questionnaire while some are
reluctant being interviewed face to face[56].
• Pretesting questionnaire helps to identify the flaws in questionnaire[31]. Through
pre-trails it was identified that open ended questions have to be replaced with
close ended questions due to time limitation[56], this shows the need for the pi-
loting of survey questionnaires.
• Web servers can be used in global surveys for data collection, this is evident
from “usage of MIT secure web server reduced the privacy concerns of respondents
during data collection”[56].
• A pre-qualification question in survey helps to identify inconsistent responses[23].
• Researchers should avoid two point Likert scales ‘yes/no’ unless it’s a critical
situation, instead they are advised to use other multi-point scales[15].
• Over representation (presence of respondents with higher education qualifi-
cation, than an average survey respondent), increases the validity of obtained
responses[15].
• By default, data obtained from survey in software engineering is biased. Reason
is that survey just provides a snapshot of the existing tools, techniques etc that
provide evidences for creating a grounded research plan for any research. Since
only a snap shot can be obtained surveys results and they cannot be general-
ized to any context. This could be one reason why surveys are believed to be
impractical[117].
• Web advertising can be used to improve survey responses, reasons being the
existence of a broadly distributed respondents and the potential to associate with
them. It helps to increase generalizability of survey outcomes[40].
• Researcher’s must be open-minded and willing to collaborate with other re-
searchers to discuss their prospects, this can majorly prevent replication of studies[15].
4.3 Analysis of Literature showing the impact of

survey variables on Outcomes
4.3.1 Impact of Sample Size on Survey Outcomes
SLR analysis gave us information that out of all the primary studies, conve-
nience sampling is widely used which is specified in 23 primary studies, followed
by Random sampling which was identified in 17 primary studies, even though
snowballing gives quality responses it was only used in 9 primary studies. Conve-
nience snowballing which combines the benefits of both convenience and snowball
sampling was used in 2 studies. Stratified sampling was used only by the 6 pri-
mary studies out of which only two specified that they have actually formed a
start of sample. Cluster and judgmental sampling were used only once by the
researchers. From the bar graph below it is prevalent that convenience sampling
was used by majority. Graph 4.4 given below shows the sampling techniques for
selected studies.
Figure 4.4: Sampling Methods Graph
4.3.1.1 Need for having a valid sample

A sample is selected from the target population. A group of individuals to whom
the conducted survey is applicable is called as the Target population. An ideal
target population should represent the list of all its members finitely. In this re-
gard any sample could represent the target population, but the survey should only
be carried out on a Valid sample in order to generalize the results. This explains
the need for the selection of a representative sample of the target population.
One must specify the objectives of selecting the target population clearly[26].
4.3.1.2 How to achieve a valid sample

Kitchenham and Pfleeger specify that target population which is selected on the
basis of data analysis methods results an ideal sample. This can be achieved if
the researchers keep in mind the two questions:
• Are the study objectives being addressed by the data analysis results?
• Are the target population able to answer the research questions?
If researchers aim is to generate strong influences to the target population, then

probabilistic sampling method must be considered, in other cases non-probabilistic
sampling methods would suffice the research requirements.
4.3.1.3 Need for the proper sample size

Kitchenham and Fleeger [67] describes that inadequate sample size negatively
impacts the survey outcomes in two ways. Firstly, deficient sample size leads to
results that don’t show any statistical significance. If the results are statistically
insignificant, researchers cannot generalize their research findings. Secondly poor
sampling of clusters reduces researcher’s ability to compare and contrast various
subsets of the population.
This stresses the need for the proper instrument for selecting a sample size.
Kasunic [61] describes that sample size as a function of desired precision, confi-
dence and population size.
1. Precision: Precision is the measure of closeness of the estimate to the actual

characteristic for a given population. It is mainly the error introduced into
the population through a sample. Precision and sampling error are two
different terms with a same meaning. Every researcher tries to increase
the precision but the trade off is that sample size increases simultaneously.
Due to this reason researcher compromise with precision when sample size
is small.
2. Confidence Level: Even after careful selection there are chances that bad
sample is selected, that doesn’t represent the actual population. Researchers
are not confident about their choices, thus confidence level tells them how
much confident they are and whether the error tolerance is not exceeding
the precision specification. The confidence level can be obtained through
probability models like Standard Normal Distribution and Central Limit
Theorem.
Ch
apt
er4
. R
esu
ltsandAn
aly
sis 5
2
3
.P opu lat
ionSiz
e:P opul
a t
ionsizea l
soh asanimpactontheoutcomespro
-
v
id edou rpopula
tionsizeisv e
rysm a
ll
. Withtheassumpt
ionthatsimpl
e
rand oms ampl
ingisu sedforob tain
ingas ampleandthats ampl
esizeis
only,th enumbero fobtainedr e
sponses(notthenumb e
ro freque
stsfo
r
questionnai
rere
sponses
);the r
ea retwoformula
sforcal
cula
tingthesampl
e
s
iz edep end
ingonth epopulationsize
s.
•Wh
enpopu
lat
ioni
sla
rge
2
(
z pq)
n0=
(
e)
2
Wher
e, n0 isthesamplesize
zisapo intonth eabsc
iss
ao fthes
tanda
rdn o
rm alcurv
eth
atspec
ifiesth
e
confidenceleve
l.
pi sane st
im atedpropor
tionofattr
ibu
tepresentinapopula
tion
. and
q=1 p
eisth edes
iredleve
lofprecis
ion
.e=1 p r
ecis
i on.
•Whenpopu la
tionissm a
ll:
Whenthepopu lat
ionissm a
ll
,thenfini
tepopula
tionco
rre
lat
ionfac
tor(fp
c)
canb eemp loyedtoc a
lcul
atethesampl
esize
.fp cmeasu
restheextr
ap re
-
ci
siona ch
ievedwh ensamp les
izebec
om e
sclose
rt opopu
lat
ionsiz
e.U s
ing
fpcar evi
seds amplesizecanbeca
lcula
ted.
(n0)
nR = wh
ereX =(n0 1
)(N)
1+X
wher
enR i sth
er ev
isedsamp
leba
sedonthefpc
.
Ni sthepopula
tionsize
.
andn0 isthes
amp l
es i
zewhi
chi
sc a
lcu
lat
edinthep
rev
iou
sst
epwh
enth
e
popula
tioni
slarg
e.
•Whenth epopu
lat
ionisv e
rysmal
l(les
sth an2 0
0individu
als)
:
Ifth epopu
lat
ionsizeisverysmal
l,ar e
searcher mu
s tconduc
tcensu
s.A
censussamplehasallthem embe
rsofpopu l
a t
ioninas amp l
e. Whenth
e
samp lei
seithe
r2 00orlessth
en wholepopulationshou
ldb einc
ludedt
o
ach
i eveacons
ider
ableleve
lofpre
cis
ion.
4
.3.2 Impa
cto
f Re
spon
se Ra
teonSu
rvey Re
spon
ses
Outofa l
l6 6p rim a
rys tudiesonly3 8studie
sr eport
edth e
irrespon
serates
,b y
expl
ic
i t
lysp e
cifyingitasr esponserat
eo rgiv
ingth edeta
ilsofnumberofinv
itee
s
sentandr espons e
sob tained.1 4stud
iesdiscus
s edabouttherespons
esobtain
ed
intheirp a
rticularstudybu tn o
tabou ttheem ailinv
ita
tionssento
rth esample
sel
ected.
Inth er em a
inin g14s tud
iesresea
rcher
sh av
en otrepo
r t
edtherespons
e
Figure 4.5: Response Rates Graph
rates. Histogram in 4.5, shows the distribution of response rates by different pri-
mary studies. The available data cannot be generalized to a whole population,
we can infer that reporting of results by software engineering needs to improved.
We cannot get an estimate of overall response rates but we observe that way of
reporting the primary studies needs to be improved by the software engineering
researchers. Through the SLR study we found that all the primary studies dis-
cussed about the response rates they obtained. Every survey may not have the
same context, every survey might not lead to the expected outcomes and the pop-
ulation answering the survey differ from each survey. So, we cannot generalize the
results obtained for every survey. It must be clearly understood that increase in
response rate does not mean the generalizability of results is increased. But there
were few recommendations given by a primary study which would be a starting
point for calculating the impact of response rates in a survey.
Smitth et al [107] have addressed the issue of improving the participation of
developers in a survey. Authors have studied the existing literature and have
formulated a set of factors for improving the response rate of surveys. They
have divided the factors into two subsections to be more clear; the first one is
about persuasion research where the author studied drafted factors for improving
compliance and second category of factors were based on the authors experiences
on conducting surveys. Below listed are the factors that help to improve the
response rate:
Based on Persuasion Research
• Reciprocity: The situation where respondents answer a survey for more
than one time, this helps to double the survey responses properly. Re-
searchers can induce reciprocity by giving rewards. Smith et al [107] were
not sure whether this practice was actually useful in software engineering
domain as the researchers themselves biasing their results.
• Consistency: It is the nature of humans to experience cognitive pressure

when they are not performing the promised deeds. This characteristic can
induce more responses for a survey. Researchers can approach the respon-
dents early to when they are free to fill a survey. By this a respondent is
more committed towards a survey. In this way consistency improves the
survey responses
• Authority and Credibility: The compliance for any kind of survey can
be increased by the author and credibility of a person who is administering
the survey. Researchers can utilize this benefit by providing the official
designations like Professor, Dr., Asst Professor in the signature of surveys
request mail. In this way response rate for a survey can be increased.
• Liking: Respondents tend to answer the surveys from the known persons.
The responsibility of gaining trust and liking of respondents lies on the
researchers. Liking can lead to increased response rate.
• Scarcity: It is the human nature to react fast when something is scarce,

research can increase the survey’s response rate by convincing about the
surveys scarcity. It can be achieved when researchers explain the reason for
particular respondent selection, the main benefit that a respondent might
derive out of a survey.
Based on Personal Experiences
• Brevity: Respondents tend to answer shorter surveys compared to lengthy
ones. Researcher should address the number of questions at the start of
survey, a progress bar must be placed to help respondents know the sur-
vey progress. Usage of close ended questions also helps to attract more
respondents.
• Social Benefit: Authors describe that more respondents finish the survey
if it benefits to a large group instead of a particular community. Researchers
must convince the respondents that their survey benefits larger population
by explaining about the impact of survey outcome, and obtain high response
rate.
• Timing: The time at which an email survey is sent also affects its response
rate. Sometimes respondents tend to clear their in-boxes and answer the
survey just for the sake of completion. Researchers should avoid sending sur-
vey invitations during office hours, weekdays. They have to be very careful
while selecting the time for dropping a mail, study shows that respondents
tend to answer emails right after their lunch[107].
When proper conditions are present the responses obtained using e-mail surveys
are more in number compared to the fax surveys or postal surveys, also the quality
of responses obtained using mails are more compared to other sources [107]
4.3.3 Impact of Analysis Techniques on Survey Outcomes:

SLR study showed that out of all the primary studies selected 24 of them re-
ported using only descriptive statistics for data analysis. Studies that used only
hypothesis were only 2 in number. Other studies were using either two or three
analysis techniques in combination to analyze the survey results. Techniques like
grounded theory, regression analysis, mutation analysis, parametric tests were
used for analysis, as shown in Figure 4.6. Wohlin et al [121] specifies that ini-
tially descriptive statistics is used for data visualization and finding mean, then
hypothesis test are used to find specific patterns in the survey results. This vali-
dates our findings that the primary studies we have included used either a single
analysis method or a combination of two or more methods.
Figure 4.6: Analysis Techniques Graph
In the operation phase after the survey is done, Kitchenhamm and Pfleeger
[67] describes that a researcher must follow the given sequence of techniques;
before doing the standard analysis on any Survey outcome.
1. Data Validation: Before evaluating the survey results, researchers must

first check for the consistency and completeness of responses. The non-
responses must be analyzed in the same way as the rejected questionnaires
so that systematic bias is not introduced by the researchers. Responses to
ambiguous questions must be identified and handled. Data obtained after
this stage is termed as a valid data.
2. Partitioning of Responses: Researchers need to partition their responses

into subgroups before data analysis. Partitioning is generally done using
the data obtained from the demographic questions. With responses being
partitioned it is exceedingly difficult for the researchers to do the analysis.
3. Data Coding: When statistical packages cannot handle the character

string categories of responses, researchers must convert the nominal and
ordinal scale data from names to scores, before the data is being given as
an input to electronic data files. There few cases where coding is done
before analysis, it is generally done at the time of questionnaire design.
From the coded data, data statistics and population statistics are possible. For
a sample obtained through non-probabilistic sampling the population statistics
cannot be generated [67]. Wohlin et al.[121] describes the standard data analysis
to be the first step of quantitative interpretation phase where data is represented
using descriptive statistics visualizing the central tendency, dispersion, depen-
dency etc. Next step being the data set reduction where invalid data points are
identified and excluded. Hypothesis testing is the third step where statistical
data evaluation is done at the given level of significance.
Final Result: From the literature study we infer that response rate and
sample size do impact the survey outcomes[67], [107]. They go hand-in-hand in
the whole survey process. More responses can be obtained if the sample size is
increased but the inverse is not possible. Even if more responses are obtained,the
sample size wouldn’t be the only deciding factor for obtaining responses.Literature
shows response rate depends on persuasion research and personal experience
as well and sample size depends on precision, confidence level and population
size.Researchers look for both quality and quantity of responses. Even though
sampling is done by the researchers the response rate is totally out of researcher’s
control. Pertaining to analysis techniques analysis techniques is selected based
on way in which survey is designed and type of scale used.This statement given
by Kitchenham and Pfleeger [67] validates our research outcomes “specific data
analysis a researcher needs depends on the survey design and scale type (nominal,
ordinal, interval, ratio, etc.)”. From the above discussion it can be seen that all
the three variables depends on derivable entities, opinions and design of survey;
these are specific to particular survey and cannot be generalized to all the survey.
Thus, the impact of the variables can only be computed to a particular survey.
This sums up our result that impact of variables like sample size, response rate
and analysis techniques for any survey varies from one survey to other and cannot
be computed and generalized.
4.4 Results and Observations from Interviews:

In this section we are going to discuss about the results obtained from the inter-
views and the process we followed to analyze the obtained results. As described
in the above section, our interviewees are the subject matter experts in the re-
search area on which had interviewed them.“Logically, it may be beneficial for
researchers to transcribe their own interview data, given that they have first-
hand knowledge from their involvement in the interview process, expertise in the
interview subject, and the advantage of having participated in both verbal and
nonverbal exchanges with the participants [51]”. All our interviews happened to
be face to face. We took the audio recordings for every interview. Hence we tran-
script all the verbal, non-verbal information and also data obtained from field
notes.
We made sure that all the information obtained during interviews is docu-
mented immediately. “As soon after the interview as possible, to ensure that
reflections remain fresh, researchers should review their field notes and expand
on their initial impressions of the interaction with more considered comments and
perceptions [51]”. By transcription all the information immediately after the in-
terviews, the exact view point and perceptions of the interviewees were recorded
and documented without any chance for our personal point of views.
We have chosen thematic analysis process to analyze the results obtained
during the interviews. Although there are many other procedures that can be
followed to analyze we have a strong reason for opting thematic analysis. The
information which needs to be analyzed is the information obtained after conduct-
ing several interviews. "It must be noted that, richness and depth verbal data
generated by interviews varies with research design and research question under
investigation [51]”.For example, if we chose to grounded theory for analyzing our
results, it may not be as effective as thematic analysis because in grounded the-
ory we will be comparing the obtained results continuously. “The researcher looks
for relationships between these concepts and categories, by constantly comparing
them, to form the basis of the emerging theory [73]”.
Since, we were analyzing the results obtained from several interviews, we
believed that thematic analysis will assist in analyzing the information very ef-
fectively. In the following part of this section, we are going to describe several
steps performed during analysis[22].
• Extraction of Information: In this stage, we collect all the data from
the transcripts prepared from all interviews. As explained above, our tran-
scripts were prepared immediately after the interviews. We have made field
notes during each and every interview to make sure that all the intervie-
wees exact view point and their suggestions about our research were penned
down during the interview itself. We have collected all these information
and documented as a part of this data extraction process. We have gone
through all the interview transcripts several times in order familiarize our-
selves about the information which we have extracted from our interviews
both verbally and non-verbally. We made sure that we have a clear idea of
all the information which we had extracted[22].
• Coding of Data: As a process of coding our data, we have exclusive codes

for all the interviews we conducted. We started with Interview1, Interview2
and so on. This will ensure that our information is segregated according to
the interviews which will assist us during the later phases of analysis. We
also provided coding few concepts which are similar for all interviews like
Interview 1.1 and Interview 2.1 and so on.
The coding process which was manually done by authors can be seen in
APPENIDX-D
• Translation of Codes into themes: After all data was provided several
codes we have generated. All the codes were translated into several themes
according the information. Our main in translating the coded information
into themes was to obtain all similar information under one theme. This
will also help us in analyzing the information which we collected.
• Mapping of Themes: Mapping of themes is the process which acted

as a check point for the standard of information which we have collected.
This assisted to assessing if the amount of information is sufficient for our
research and also checks on if we have missed out on any aspect during
our process. All the themed information is mapped with the relevant codes
during this process.
• Assess the trustworthiness of our synthesis:This process is to assess

that if we had achieved our anticipated results and are the results obtained
after the thematic analysis are in sync in what we actually desired. This
also helped us in gaining confidence when we know that our analysis came
out well and this analysis is going to contribute us a lot in advanced stages
of our research.
As specified in Appendix B all the 15 problems were validated using inter-

views.While categorizing the interview codes we wanted to be more specific with
information obtained and didn’t want to do any unnecessary group. So we defined

criteria for inclusion of interview codes in the analysis. They are:
• Relevance: If a code appears in the literature and appears in the interview
as well. Then they are selected based on relevance.
• Frequency: If a code appears more than one time in the transcription,
then those words were marked as Frequency and were included.
• Additional: These are the words which were exclusively used by the re-
searchers in their interviews to express their experiences. If a code doesn’t occur
in relevance and is more important, it is marked as additional.
4.4.1 Analysis for RQ1 from Interviews:
Figure 4.7: RQ1 Thematic Analysis
All the themes shown in the above Figure4.7 have been listed along
with their codes in APPENIDX-G
P1. Sampling Problems: All the interviewers have one thing in common,
they strongly believe that everyone who claims to use random and stratified
sampling have actually done convenience sampling. Reason for this being in-
feasibility to get a proper representative sample of the population. Using random
sampling everyone from the same domain cannot be included for analysis. Also
the respondents selected using random sample lack motivation as they might
not know what for the survey is being done, or they might misinterpret the
survey, this way again noise is being introduced into the survey results. Some
said that random convenience is better option. Stratified sampling is believed to
be the hardest, expensive and time consuming, reason for this being not getting a
proper “strata” from the given population. Due to self-selection process which is
followed by them all of them recommended the usage of convenience snowballing.
In convenience snowballing the population characteristics are known before-hand,
researchers select respondents based on their choice. Questionnaire is then filled
and the respondents are asked to forward it to their peers. This way quality
responses are obtained. Sub-grouping the population for sake of sampling then
would reduce the chances of getting more response, this might also induce bias.
P2.Questionnaire Problems:Properly designing a questionnaire requires
more background work. This problem of poor questionnaire design can be mit-
igated but cannot be completely eliminated. A question formulation must be
done with great care. It must be understandable. Direct questions, consistency
questions, demographic, non-contradictory, non-overlapping, non-repeated, inde-
pendent questions must be asked to obtain vital information. Cross analysis must
be done to check respondent’s commitment. A question must have valid options.
With pilot sessions the questions can be reformulated to align with the objectives.
P3.Questionnaire Length: A questionnaire should be short and precise.
It must have a balance between time and number of questions. Interruptions
might occur while answering the questionnaire, researchers should expect this
while designing a survey.
P4. Open-ended & close-ended questions:A survey should have both
open-ended and close-ended questions. Close ended save time and are easy for
analysis, but open-ended give deeper insights about the study. Only committed
respondents fill open-ended which is the reason they are said to increase the con-
fidence of a researcher. By using proper constructs even, the respondents can
be categorized. Their number depends on the information need and depends on
research questions. Using free text boxes along with close-ended questions were
recommended by many researchers. When formulated with common understand-
ing results obtained from both the questions help to achieve efficient analysis
data.
P5.Question-order effect:This effect should be addressed in a survey. Ran-
domizing the questions won’t work in Software Engineering because logical ad-
herence might be lost. If the questions are self-contained then it can be done,
whereas respondent might lose the context of the question. There is a logical or-
der for asking questions which also help us to categorize the data. Branching can
be done for the questionnaire, after which randomization can be done. Survey
tool helps a lot in this regard. Randomize is possible if there are sub-groups of
questions and also be done few background questions.
P6.Likert Scale:Depends on the researcher whether to use them or not. They
can be used to visualize the results. Improper usage of Likert scale confuses the
respondents, where they might go for a neutral point in odd scale. When even
scale is used the researcher is bringing respondent on to one side. Check-boxes
should be used after every question. Pre-survey should be done and analysis at
least 5 survey results must be visualized. Options must be mutually exclusive.
Odd can have used over even, so that there is a neutral point every time. 5-point
scale can be used as its more established in ICT. Researchers must have idea
about potential weaknesses of using scales. Culture and country must also be
considered while scale selection.
P7.Sensitive questions: These kind of questions must be generally avoided,
if asked they should be placed at the end. Respondents give answers based on the
level of abstraction in question context. Respondents expect to be anonymous
when answering such surveys. While doing global survey the Likert scale must
be consciously (due to cultural barriers). Respondents check for credibility of
source while answering these questions. No personal questions must be asked to
the respondents.
P8.Time Limitation:Researchers claim that having deadlines helps them to
get more responses. Average time should be mentioned. Don’t use countdown
timers for each questions, don’t close a survey if you’re done with the process.
Many researcher’s claim that a small survey which is 5 minutes is not a big issue,
then if its 10 min respondents the would be answering it even though its pushing
then, if its 15 minutes then its means a good survey. 20-minute surveys mainly
fail due to lack of responses.
P9. Domain Knowledge: This problem cannot be eliminated completely.
This is not a problem with mail based surveys as the respondent’s professional
background is known, the main problem is in case of open web surveys where
survey is being answered by many unknown individuals. A researcher must not
have any pre-conceptions or expectations at first. This can be mitigated by
targeting the specific population, motivating the respondents to be truthful about
their expertise, using demographic questions, doing cross questioning. There must
not be any loaded questions because respondents might become clueless. Remove
existing outliers in the survey. Simply putting options like “I don’t know” or “I
don’t want to answer” will encourage respondents to be truthful, also it helps
to rule-out inconsistent responses.One of the interviewees suggested performing
face-to-face interviews for further validation.
P10. Context: There are situations where respondents misunderstand the
question context, this is the common problem to every survey and cannot be
neglected. Simple things like maintaining questionnaire clarity, using appropriate
data collection method, inculcating trust in respondents, sticking to common
methodology can help researcher to deal with this problem. Researcher must
be approachable if there are any doubts to be cleared. Piloting with 5 to 10
people helps to design questionnaire clearly. Trust plays an important role in this
context. At first internal research validation should be employed then followed
by piloting.
P11. Hypothesis Guessing: This is not a problem in case of exploratory
surveys.Hypothesis guessing can be eliminated by not asking loaded questions,
not prompting the respondents, by having more options for a question rather
than one and selecting potential candidates. Usage of indirect questions and
direct questions play a major role in testing the respondent whether he is on the
right track or not. Respondents should not be influenced instead they should be
motivated to be truthful on their part.The researchers mustn’t be at stake to the
respondents to avoid hypothesis guessing.
P12. Language Issues: This is one of the major problems when conducting
global surveys. This cannot be completely eliminated but can be mitigated. Being
as flat as possible, using local translations, checking understand-ability, consulting
senior researchers, contacting the researchers in same domain from same origin.
A survey if it has to be rolled it must be done through a proper channel so that
it reaches target population. A survey questionnaire in two different languages
would be of great help for interpreting. In such cases researcher might get radical
answers. Google translate must not be used for language translations.
P13. Cultural Issues: This happens in case of global surveys. It is over
hyped sometimes or used as an excuse. In Software Engineering this is the prob-
lem of varied cultures. Pre-surveys can give information about what to expect
from a survey and also the information of any overlooked flaws, etc. Using proper
nomenclature, sticking to a common base, avoiding sensitive questions and being
clear and concise with your questionnaire, doing meta-analysis and the cause iden-
tification will help researchers to handle cultural issues. Whenever possible the
researchers should use face-to-face interviews to gain trust of the respondents and
get better insights. Researchers must also keep in mind the individual differences.
P14. Generalizability: Ideally generalizability isn’t achievable. The main
reason behind this is researchers cannot explicitly define and target the popula-
tion. To make generalization happen instead they should anonymize the results.
One-way researcher’s try to generalize the results, other way they are unknowingly
introducing the bias as well. Demographic question helps to easily categorize the
obtain data. Proper analysis method and reporting helps researcher to generalize
the results involving some constraints. Applicability and survey value diminishes
over time, since survey is just a snapshot of a particular situation.
P15.Reliability:In order to ensure reliability, the researchers must check
whether the respondents are really committed towards the survey or not. Demo-
graphic, consistent, redundant questions can help to find the reliable responses
in a survey. It’s important to rule out the people with no hidden agenda or else
they will spoil the survey outcomes. Before removing any outcome, there must
be a proper evidence that explains the impact due to its presence. The reliability
of results might vary from one situation to other.
P16. Bias: It’s very difficult to prevent. It happens due to process repetition.
No survey process is complete without addressing bias. Researcher needs to
identify the bias and report it correctly in his study. Researcher must know how
to balance respondents and reliability. Bias mainly happens due to irrelevant
answers; they must be removed. Researcher’s must be open to questions and
shouldn’t mask their mistakes. Repeating the process and investigating the same
thing again will reduce bias and give more data patterns. Researchers must know
their respondents. Any kind of bias must be admitted. Bias varies from one
network to another.
P17. Low response Rate:The response rate can be increased if the ques-
tionnaire has no contradictory questions, no overlapping questions. Personal
Involvement must be there by the researcher. Respondents mustn’t be baited
instead they have to motivated on why they should perform the survey and the
benefits they derive out of it. Anonymity must be preserved. Convenience snow-
balling can fetch additional number of responses if extended to LinkedIn, most
visited blogs and forums. Posting and re posting the survey link in such social
networks will make it be on the top and helps to obtain diversified responses.
There is a major risk of reliability and generalizability. Attending the confer-
ences related to the survey domain can also increase response rate. Respondents
must be selected with great care. Survey time and questionnaire length must be
specified beforehand. Questions must be framed in such a way that feeling of
being assessed is masked for the respondents.Since, incorrect responses are ob-
tained in every survey, researchers need to properly identify them and motivate
the reason behind exclusion of such responses.
P18.Inconsistency:Inconsistency can be identified by including mandatory
questions, redundant questions, optional questions. many researchers take this an
excuse for exclusion. Barriers between questions must be included and triangula-
tion between questions must be done. Cookies can be tracked for each question.
Researcher must honestly report the inconsistencies in his document.
P19.Response Duplication:Its not feasible to know. It is not a problem in
paper based surveys. This cannot be eliminated. It can be identified and handled
by crosschecking IP addresses, a consistency check can be made thought out the
survey, one-time links can be sent directly to the mails, survey tools monitor the
duplication. Researcher should specify upfront about the requirements and his
need for genuine support. Tracking session cookies while respondents answer the
survey gives information about how many times did the respondent paused and
resumed while answering. Course analysis can be done for additional help.
P20. Rewards:Never give rewards, it itself shows a researcher is biasing
his/her research. If rewards are given should be given at the end not at the
upfront, then one way you are taking only committed responses. Motivate the
respondents by telling the outcome of your research and encourage them to take
up your survey. If rewards were to be given then handover to each respondent
personally, this might reduce the members for taking surveys just for the sake of
rewards.
4.4.2 Analysis for RQ2 from Interviews:

The code and themes given in the below description have been listed in APPENIDX-
G
P21. Sample Size: There is no hard and fast rule for determining the sample
size of a survey. It depends on various factors like type of research, researcher,
population size, sampling method. Main problem is due to a non-representative
sample of respondents. It is advised to select a heterogeneous sample, based on
existing literature and your requirements triangulate to achieve the sample size.
Very hard to decide and generalize this single concept for the whole population.
First the sample selection must be done clearly then sample size must be decided.
Always aim for less similar responses and more different responses. Sample size
basically depends on number of responses and answers obtained.
P22.Analysis Techniques:Analysis Techniques must generally be a part of
research design. Selection of an analysis method depends on researcher’s choice,
RQ’s, survey questions, scale type, number of responses obtained. There is lit-
erature showing how to select the analysis techniques. Sometimes researcher’s
without any prior idea select the techniques (go artificial), which is not good.
Analysis technique selection also depends of patterns which we are trying to find
in the outcomes, when the sample is non-representative sample wrong patterns
are obtained. The analysis technique selected must be statistically correct one
and easy to communicate. Researcher must have a clear idea on how the responses
will be shown.
Recommendations for Surveys identified from Interviews

• Know your sample.
• Use survey with other empirical method.
• Avoid boring questions.
• Include progress bar.
• Survey is interesting if it tests level of expertise.
• Include the designation and source of sender.
• Must benefit respondents.
• Reproduce knowledge.
• Benefit the research community.
• Pre-qualification question.
Chapter 5
Discussions and Limitations
5.1 Discussions
This section gives answers for the research questions by comparing the results
obtained in SLR and Interviews. We included our critical assessments and reflec-
tions of our research in this section. These discussions provide the base for our
results which were reported in the Results. First the comparison of results from
both the studies for each research question is described followed by the research
findings.
By conducting systematic literature review we obtained the list of common
problems faced by the researchers and also includes the effect of variables on sur-
vey outcomes, in the survey process. We have listed all the problems with their
description in SLR results section. We then validated few problems by conduct-
ing face to face interviews with software engineering researchers. The constraints
like 1-hour time for interview, busy schedules of researchers limited us to include
few problems in the Interview process.The problems which were validated using
interviews were conveniently clustered and the clustering table is given in Ap-
pendix B.The problems which were validated using interviews were finalized after
the brainstorming sessions of two researchers and discussions with the supervisor.
RQ1: What are the common problems faced by the software engineering re-
searcher’s while conducting a Survey?
P 1 Sampling Method: In the literature study many authors described
the usage of convenience sampling methods, less expensive and troublesome im-
plementation were the two reasons motivated to do so. This was validated by
researchers by suggested their usage of convenience sampling in their research.
Some had also discussed their usage of convenience snowballing and random con-
venience sampling in their research. Researchers argued that in many published
studies authors write they used random sampling but in the actual sense they
are using convenience, reasoning for this was the authors contacting peers for an-
swering their surveys but reporting the usage of random sampling. Further they
cautioned about biasing the results due to convenience sampling.
P 2 Survey Questionnaire Design: In literature study authors reported
65
Chapter 5. Discussions and Limitations 66
the usage of mutually exclusive,exhaustive questions in the questionnaire and us-

age of pilot studies. Poor questionnaire design leads to ambiguity of responses.
During interviews researchers reported the usage of consistent questions, demo-
graphic questions, non-contradictory questions, non-overlapping questions, non-
repeated questions. Cross analysis was one recommendation to check commitment
of respondents.
We observed from literature study that authors used both open-ended and
close-ended questions in their survey. When questioned about this issue inter-
viewee’s explained that both of them should be in the questionnaire and that
open ended answers show the respondents commitment, close-ended questions
are easy for analysis. The survey questionnaire given must be short and precise.
Questionnaire length is one more factor which was mentioned by the interviewees,
they stated that one must strike a balance between number of questions and time
which a respondent might allot when asked to answer it. Context was another
issue highlighted by interviewees, they suggested that usage of appropriate data
collection methods, being researchers being approachable will make the respon-
dents clearly understand the context of questions. From both the results we see
that piloting questionnaire, including right questions related to our context will
help improve the questionnaire design.
P 3 Question-order Effect: Literature showed that randomizing the ques-
tions, designing the questions based on actions-sequence and piloting were the
techniques used to handle this effect. To validate this interviewee were asked
about this issue. They suggested that randomization of question is a bad idea
they suggested the usage of branching as a measure to prevent this effect.
P 4 Likert Scale Problem: We couldn’t get any mitigation strategy for
this problem through literature, so we have asked the opinion of researchers.
They suggested the usage of odd scales and adapting a process of pre-survey for
handling this issue. The five-point Likert scale was suggested to be used, for its
more ICT domain.
P 5 Time Limitation Problem: Literature showed that time for which sur-
vey is posted affects the its outcomes, respondents might answer the survey after
wards, in a hurry respondents might focus on filling the survey rather than doing
it with a commitment. This issue was addressed in the interviews saying that it
was true, interviewees argued that deadlines are necessary. Researchers mitigated
this issue by keeping the survey short for 10-15 minutes. They strongly disagreed
the usage of countdown timers for each question, instead they encouraged the
inclusion of feature where respondents can pause and continue the survey.
P 6 Domain Knowledge: Lack of domain knowledge was only mentioned
in the literature while authors haven’t discussed the mitigation strategies for it.
Interviewees suggested that it’s only a problem with open web surveys where
unknown population answers the survey. They suggested to remove loaded ques-
tions, outliers. They advised that researchers must not have any preconceptions
or assumptions at first while designing the question.Usage of demographic ques-
tions was another recommendation. Including the options “I don’t know” and “I
don’t want to answer” in questionnaire was another recommendation.
P 7 Hypothesis Guessing: In literature authors tried to mitigate this
issue by stressing on respondent’s honesty in survey’s introduction and posting a
video to express clarity. Apart from these two discussed about the interviewees
mentioned using a proper sample selection, not asking loaded questions. In SLR
and interviews, hypothesis guessing was categorized as the major problem.
P 8 Translation Issues: Both in theory and practice, researchers used
translators of same origin for translating their surveys. Additionally, interviewees
recommended drafting survey in different languages by involving researchers of
particular country working in same domain. They cautioned the use of Google
Translate as the whole meaning of questionnaire might change.
P 9 Culture Issues: In literature it was defined as a problem and only one
mitigation strategy was discussed in literature. In interviews some researchers
didn’t consider it a major issue, while other suggested the usage of common
base, avoiding sensitive questions, contradictory questions; sticking to a basic
language. Meta-analysis and identifying the cause of cultural issue was done by
two researchers in their researcher when facing this issue. Doing face to face
interviews when facing cultural issue was one of the recommendation given.
P 10 Response Duplication: This is viewed as a major problem in every
software engineering. Tracking IP addresses was the mitigation strategy pub-
lished in the literature. Interviewees described that response duplication is the
problem of open web surveys, they suggested that tracking IP addresses, consis-
tency checks, one-time links, using survey tools minimize its effect. They said it
cannot be eliminated instead can only be handled to some extent.
P 11 Inconsistency of Responses: Cross questioning was the strategy
used by the authors in literature to handle the inconsistency of responses. Inter-
viewees discussed that inconsistency shouldn’t be used an excuse to exclude the
respondents. Triangulation in questioning, identification and honest reporting
were suggested as recommendations.
P 12 Bias: Mono-operation bias, over-estimation bias and social desirability
bias were three types of biases identified from the literature study, data collec-
tion from multiple sources was the only mitigation strategy suggested for over-
estimation bias. We could not find the mitigation strategy for mono-operation
bias in our SLR study. When interviewees were asked about biasing problem in
surveys, they said it cannot be completely eliminated but can only be handled to
some extent. They described that bias must be identified and reported clearly.
In their opinion giving rewards biases the whole research process, piloting process
can also reduce biasing in survey results.
P 13 Reliability: Every survey has the problem of reliable responses; it
might be due to wrong answers given in the survey. Literature showed us the
reason for this might be the respondents trying to be positive. Pilot studies were
used by authors to handle the issue of reliability. Interviewees highlighted the
issue of reliability in a survey, they checked the respondent’s commitment in a

survey. They stressed the usage of demographic, consistent, redundant questions
to find reliable responses. They said that its crucial to remove unreliable responses
and report them.
P 14 Generalizability: This problem is faced by every researcher in his/her
survey, the main motive of survey is to generalize the findings. Through SLR we
found that every researcher tries to generalize his findings. We tried to further
investigate this issue in interviews. Every interviewee highlighted it as their
main problem, when asked about mitigation strategies they said generalizability
problem is impossible to mitigate. Interviewees described that “researchers are
unknowingly biasing their results”, “ideally generalizability is not achievable”. The
reason for this was given as inability in obtaining a representative sample for
population in software engineering.
P 15 Low Response Rate: Literature shows that busy schedules, poorly de-
signed survey layout, lack of awareness etc. were the causes for low response rate.
Similar causes were highlighted by the interviewees. Preserving the respondent’s
identity, using convenience snowballing, posting on linked in forums and blogs,
not baiting the respondents would increase response rate. Interviewees cautioned
about introducing bias and collecting reliable responses while increasing number
of responses. The checklist of problems along with mitigation strategies has been
listed in the table below.
Table 5.1: Problem-Mitigation Strategy Checklist
SNo: Problem Name Possible Mitigation Strategy

1 Sampling Method • Use Convenience snowballing.
• No sub-grouping.
2 Poor Questionnaire • Use consistent, demographic,
Design non-contradictory, non-
overlapping, non-repeated,
combinatorial questions.
• Piloting survey questionnaire

3 Question-Order • No randomization.
Effect
• Use Branching.
• Independent and background

questions can be randomized.
4 Likert Scale • Usage of consistent scale.
• Use odd over even order scales.
• 5-point scale more preferable.
• Know potential weakness of

scales.
• Use free text boxes along with op-

tions.
• Do a pre-survey.
5 Time Limitation • Specify average time in survey in-
troduction.
• Keep survey questionnaire short,

between 10-
15 minutes
5 minutes- no sufficient data col-
lected
10 minutes- shall be answered
15 minutes- well-structured sur-
vey sufficient to
collect required data.
20 min- low quality responses.
6 Domain Knowledge • No pre-conceptions and assump-
tions by
researchers.
• Target right population based on

research
context.
• Include “I don’t know” and “I

don’t want to
answer” in questionnaire.
7 Hypothesis Guessing • Stress on respondent’s honesty in

survey’s introduction.
• Dont self stake.
• Ensure clarity of questionnaire.
• No loaded questions
8 Language Issues • Native Language Translators.
• Don’t use Google translate.
• Roll out survey through proper

channel.
• Involve researchers of same origin

9 Cultural Issues • Identify the cause.
• Do meta-analysis.
• Use proper nomenclature.
• Stick to a common base.
• Avoid sensitive questions.
• Be clear and concise with ques-

tionnaire.
• Know individual differences.

10 Generalizability • Ideally not achievable.
Anonymize results.
Select suitable analysis tech-
nique.
11 Reliability • Check respondent’s commitment.
• Use Demographic, consistent, re-

dundant
questions.
• Identify and rule-out respondents

with no
hidden agenda.
• Motivate the reason behind ex-

cluding a
response.
12 Bias • Don’t give rewards.
• Piloting.
• Proper bias identification and re-

porting.
• Repeat the process if bias pre-

vails.
• Admit bias.
13 Inconsistency of • Do Cross questioning.
Responses
• Barriers between questions.
• Track session cookies.
• Identify and honestly report

wrong answers.
• Motivate respondents to be
truthful.
14 Response • Do course analysis.

Duplication
• Track IP addresses.
• Do consistency checks.
• Use one-time links.
• Use survey tools.
• Use cross questioning.

15 Low Response Rate • Don’t use contradictory or over-
lapping
questions.
• Personal involvement of re-

searcher.
• Don’t bait respondents.
• Preserve anonymity.
• Use Convenience snowballing.
• Post and repost the survey link in

LinkedIn,
forums and most visited blogs.
• Attend domain related confer-

ences.
• Be careful while respondent selec-

tion.
• Specify average time for survey

and
questionnaire length beforehand.
• Mask the feeling of respondents

feeling of
being assessed by proper question
framing.
• Personally addressing the emails.
• Including sender’s designation.
• Identify incorrect answer, moti-

vate reason for their exclusion.
RQ2: How do the variables like sample size, response rate and analysis meth-
ods affect the survey outcomes?
Systematic Literature Review showed us that sample size depends on preci-

sion, confidence level and population size; response rate depends on persuasion
research and personal experience; and analysis method depends on research ques-
tion, survey questionnaire, scales used, expected data patterns. In all the above
variables only few can be computed and others depend on opinions. There is
existing literature for selecting sample size, for calculating the response time and
selection and analysis technique selection based on the obtained outcome. To
validate these findings, we have interviewed software engineering researchers on
this issue. They suggested that response rate depends on questionnaire, personal
involvement, survey time. Sample size depends on type of research, researcher,
population size, sampling method. When questioned about analysis method re-
searchers stated that it depends on researcher’s choice, RQ’s, survey questions,
scale type, number of responses obtained. From these it is evident that the im-
pact of sample size, response rate and analysis techniques cannot be calculated.
We could only present our SLR findings in Section 4.1.
During the interviews, software engineering researchers discussed the signifi-

cance of openly reporting the results even if they are positive or negative. This
motivated us to openly report our results that we could not achieve the required
outcomes for RQ2. Based on the selected primary studies we reported how the
sample size is selected; the number of studies properly reporting response rates
and the list of all analysis techniques chosen by primary studies along with the
mostly chosen analysis technique. Poor reporting which was found as a major
issue in Software Engineering was also validated from the findings Cater-Steel et
al.[15]. In this way we addressed the second research question in our research.
RQ3. What are the experiences gained by software engineering researchers
by conducting surveys?
This research question was framed with the motivation to provide validation
for the two previous research questions. Through interview process we validated
two research questions. We also collected the experiences of software engineering
researchers. During interviews we were asking them for the need for the check-
list of possible problems and mitigation strategies. This shows the need for the
problems checklist for conducting surveys in software engineering.
Table 5.2: Need for the checklist of problems
Interviewee No 1 2 3 4 5 6 7 8
Rating given 5 3 4 5 5 3 6 5
In the above table rating values were considered to be from 1-not needed to
5-very important.
Comparison of results with surveys from other domains: This section

compares our work with selected surveys from other domains.
First article is a study of nine surveys that were classified according to their
use. It is seen that almost all the surveys had demographics included. Five
point Likert scales were used by the authors. Leadership, policies and procedures
staffing communication and reporting were the common dimensions which the
authors to review the surveys. Surveys used different analysis techniques like item
analysis, exploratory factor analysis, confirmatory factor analysis, Cronbach’s
alpha analysis, test/retest reliability, composite scores combination and variance
analysis were few analysis techniques used by the surveys highlighted in this
article [20].
Our research can be made applicable to the above context, since the authors
specified the usage of Likert scales, analysis techniques, reporting issues, etc. We
tried to obtain suitable way of addressing Likert scale problems, mostly used
analysis technique was highlighted and also we found that researchers lack in
reporting their results.
Second article is a review of 62 surveys from 12 countries that included
almost 23,000 prisoners. Prevalence was calculated to find a pattern and link
all the data points in the selected population. Chi-2 tests were used by the
authors to compute prevalence. Weighted mean was considered during calculation
of the baseline for prevalence. Demographics were given higher priority while
categorizing the results. A large survey was carried out which consisted of such
demographic questions. Simple random sample was the sampling strategy chosen
by the authors for carrying out their research [34].
Our research would help the authors of above article in giving suggestions for
asking demographic questions and an idea of possible problems due to improper
sampling strategy selection. Authors get benefited in using our research for get-
ting better outcomes.
Third article examines the prevalence and patterns of using Chinese medicine
in treating cancer patients. Authors studied nearly 99 studies after going through
411 studies. Authors considered demographics like year, country, group setting,
data collection methods, questionnaire types, sampling methods etc. as the basic
characteristics to categorize the studies[14].
Similar to first article considered our study would help authors of third ar-
ticle to understand better about the sampling techniques, questionnaire design
patterns, selection of analysis techniques etc.
This was just an attempt to show how our study could be applied to the
surveys in other domains. Studies focusing on surveys from software engineering
domain and social sciences domain must be conducted and the to fill the differen-
tiating gap that between various fields. This kind of work in turn helps researches
who are publishing work on surveys to generalize their findings. Since this kind
of research is out of our scope, it is mentioned in the future work section of this
document.
5.2 Threats to validity

After the research is conducted, the question which every researcher asks himself
to which extent is the research valid?. This section discusses the threats that we
faced during our research and how did we handle them and make our study valid.
Wohlin et al [121] four types of threats that hinder the validity of any research.
Each of these threats were defined according to the context of our research along
with their mitigation strategies:
Internal Validity Threat: This threat is introduced without researcher’s
knowledge when independent variables get affected with respect to the causality
[121]. As stated by publication bias was one of the threats found while executing
a systematic literature review, which means that researchers are more likely to
consider the positive results be published over the negative ones [66] . This threat
was avoided by scanning for the grey literature over internet using Google scholar
and contacting the software engineering researchers for the unpublished articles
in our research domain.
Neglecting the paid articles and only considering freely accessible articles is a
threat since all literature might not be covered. This threat couldn’t be mitigated,
but we tried to check other databases, search over Google and check the author
websites to get access the paid articles if they seem to be highly relevant to our
study.
Another internal validity threat due to poor instrumentation was handled
by maintaining the proper data extraction forms, one data extraction form was
maintained by each researcher, initially data was extracted from a set of articles
one researcher then the data extraction was done by second researcher on same
set. Both the forms were then compared and further discussions were made on
inclusion and exclusion criteria. Inspections from supervisor also ensured the
quality of our instrumentation.
Improper clustering could be another possible threat since both authors were
novice researchers. This threat was mitigated by consulting supervisor. Including
the clustering table in Appendix B helps the future readers to understand our
grouping and analyze document accordingly.
Threat to instrumentation in interviews is concerned with the questionnaire
design. Before designing the questionnaire, the objectives of conducting an inter-
view have been clearly defined. After brainstorming and discussions with super-
visor the questionnaire was designed. It was made sure that questionnaire was
understood by the interviewee, if not they were clearly asked and the context of
the question was clearly explained. Sound recordings were used to collect data
during interviews, software’s were used to transcribe the data to avoid losses.
The researchers have considered and validated the problems that were only
found from the obtained primary studies. Due to the exclusion criteria there are
chances that we might have excluded few studies with problems. The mitigation
strategy employed was listing the titles of all the 66 primary studies considered.
In this way a researcher will have an idea of what kind of problems were addressed
during this research.
External validity: The condition that limits a researcher’s ability to gen-
eralize the experimental findings is called external validity[121]. In our case we
have conducted interviews to validate our SLR results. There could be an issue
of generalizability. Through interviews we can only get experiences from a selec-
tive sample population. Instead of the issue of generalizability of results we tried
make them more applicable. To make this happen we have selected researchers
from a broader experience spectrum, our sample included novice researchers (PhD
students who had 3-4 years of experience); competent researchers (8-10 years of
experience) and well experienced researchers (who had 30 years of experience).
However, we managed to interview 8 software engineering researchers which would
be a sufficient number for generalizing our results.
Construct validity: This threat deals with the relationship that exists be-
tween the observation and theory, in other ways it’s the extent to which the
experimental setting is actually reflected in a study[121]. This threat can be of
relevance to the research design. In the SLR process, the search string used might
not cover all the research articles published in the domain. We minimized this
threat by designing a search string by discussing with the librarian and supervisor.
Thesaurus was uses to search for synonyms. After search string was designed, it
was used only after our supervision had tested it.
Assigning codes: While coding interviews data, chances are that we might
have wrongly interpreted and coded. To mitigate this threat, the data after
coding was crosschecked with actual descriptions from interviews. Initial SLR
study helped us to gain deeper insights on surveys and helped us to mitigate this
issue.
Conclusion validity: Sometimes a researcher cannot draw conclusions from
the existing data, these kinds of issues come under conclusion validity[121]. Con-
clusion validity can be defined when synthesizing the data obtained from SLR and
interviews to obtain interviews. There were research objectives initially drafted,
both the researchers had clear of what were the expected outcomes, constant revi-
sion of the obtained data and meetings with the supervisor helped us to mitigate
this threat and made it possible to finally achieve the expected outcomes.
Scope: Attributed to the problem domain’s complexity which was initially se-
lected, there was a risk that thesis work may be misunderstood. To mitigate this
risk, the scope of our thesis project was adjusted with the assistance of thesis
supervisor and also considering the research contribution. The issue of adjusting
the scope was based on the consent between supervisor and the authors. The
final scope in which research was carried out has been specified in Chapter 1 of
this report to avoid misunderstanding.
Chapter 6
Conclusions and Future Work
6.1 Conclusions:
In this research authors identified the common problems faced by the software
engineering researchers while conducting surveys. Initially a systematic literature
review was conducted to identify common problems faced by software engineering
researchers in the survey process. Then face-to-face interviews were conducted to
gather the opinions of software engineering researchers. Then research findings
from both SLR and Interview were compared and a checklist of problems along
with mitigation strategies were drafted.
Some problems identified by systematic literature review had mitigation strate-
gies while others were only specified as problems in primary studies. Few prob-
lems were shortlisted and the interviewees (software engineering researchers) were
interviewed about their way of mitigating such problems. The interviewees opin-
ions upon analysis showed that along with the mitigation found in SLR, other
strategies were also being used. Their opinions were color coded and presented
as analysis.
Along with the problems the aim of our study was to investigate the impact
of sample size, response rate and analysis techniques on the survey outcomes.
Through literature review it was found that, selection of such variables varies
from survey to survey. These findings were supported by the arguments given by
the interviewees about the variables dependence on type of research, researcher
choice, research questions, scales used etc. The reason of variable dependency
was the major limitation to our study that prevented us from further continuing
our research.
Research Contribution:
• C1: Common problems faced while identified from Literature.
• C2: Prepared checklist will be useful to novice researchers for predicting

upcoming problems.
• C3: Recommendations by researchers regarding the problems during survey

process
78
Chapter 6. Conclusions and Future Work 79
6.2 Future Work:

The future work for this research includes:
• More Interview Subjects: In the process on result validation, only few

problems obtained in systematic literature review were considered and asked
in interviews. The reason being limited sample size and time constraint.
All the obtained problems could be validated by selection of more interview
subjects.
• Impact Calculation: Different problems were identified through our re-

search; it would be a good idea to identify the impact of such issues on
outcomes. It would be a better if further research has been carried out in
this area where researchers figure-out some way to calculate the impact of
response rate, sample size and analysis techniques on survey outcomes.
• Data collection: After conducting literature study, a survey can be per-

formed targeting the researchers who have high experience in conducting
software engineering surveys.
• Looking at the bigger picture: Similar to our research, a systematic

literature review could be conducted by considering all the secondary stud-
ies on social science surveys. Those results can then be compared to our
research findings. This comparative analysis helps to identify the problems
faced by the researchers in other domains. In this way our research could
be made applicable across several domains.
Bibliography
[1] SurveyMonkey: Free online survey software & questionnaire tool.

[2] Jörg Abke, Yvonne Sedelmaier, and Carolin Gold. A Retrospective Course
Survey of Graduates to Analyse Competencies in Software Engineering.
[3] Rehan Akbar, Mohd Fadzil Hassan, and Azrai Abdullah. A framework
of software process tailoring for small and medium size IT companies. In
Computer & Information Science (ICCIS), 2012 International Conference
on, volume 2, pages 914–918. IEEE, 2012.
[4] Wasfi G. Al-Khatib, Omran Bukhres, and Patricia Douglas. An empirical
study of skills assessment for software practitioners. Information Sciences-
Applications, 4(2):83–118, 1995.
[5] Areej Sewalh AL_Zaidi and M. Rizwan Jameel Qureshi. Scrum Practices
and Global Software Development. International Journal of Information
Engineering and Electronic Business, 6(5):22, 2014.
[6] Carmelo Ardito, Paolo Buono, Danilo Caivano, Maria Francesca Costabile,
and Rosa Lanzilotti. Investigating and promoting UX practice in industry:
An experimental study. International Journal of Human-Computer Studies,
72(6):542–551, 2014.
[7] Earl R. Babbie. Survey research methods. Wadsworth, 1973.
[8] Veronika Bauer, Jonas Eckhardt, Benedikt Hauptmann, and Manuel
Klimek. An exploratory study on reuse at google. In Proceedings of the
1st International Workshop on Software Engineering Research and Indus-
trial Practices, pages 14–23. ACM, 2014.
[9] Leland L. Beck and Thomas E. Perkins. A survey of software engineer-
ing practice: tools, methods, and results. IEEE Transactions on Software
Engineering, (5):541–561, 1983.
[10] Sarah Beecham and John Noll. What Motivates Software Engineers Work-
ing in Global Software Development? In International Conference on
Product-Focused Software Process Improvement, pages 193–209. Springer,
2015.
80
BIBLIOGRAPHY 81
[11] Jonas Boustedt. A methodology for exploring students’ experiences and

interaction with large-scale software through role-play and phenomenogra-
phy. In Proceedings of the Fourth international Workshop on Computing
Education Research, pages 27–38. ACM, 2008.
[12] Virginia Braun and Victoria Clarke. Using thematic analysis in psychology.
Qualitative research in psychology, 3(2):77–101, 2006.
[13] Ricardo Britto, Emilia Mendes, and Jürgen Börstler. An empirical inves-
tigation on effort estimation in agile global software development. In 2015
IEEE 10th International Conference on Global Software Engineering, pages
38–45. IEEE, 2015.
[14] Bridget Carmady and Caroline A. Smith. Use of Chinese medicine by cancer
patients: a review of surveys. Chinese medicine, 6(1):1, 2011.
[15] Aileen Cater-Steel, Mark Toleman, and Terry Rout. Addressing the chal-
lenges of replications of surveys in software engineering research. In 2005
International Symposium on Empirical Software Engineering, 2005., pages
10–pp. IEEE, 2005.
[16] Adnan Causevic, Daniel Sundmark, and Sasikumar Punnekkat. An indus-

trial survey on contemporary aspects of software testing. In 2010 Third
International Conference on Software Testing, Verification and Validation,
pages 393–401. IEEE, 2010.
[17] Yguaratã Cerqueira Cavalcanti, Paulo Anselmo da Mota Silveira Neto, Ivan
do Carmo Machado, Eduardo Santana de Almeida, and Silvio Romero
de Lemos Meira. Towards understanding software change request assign-
ment: a survey with practitioners. In Proceedings of the 17th International
Conference on Evaluation and Assessment in Software Engineering, pages
195–206. ACM, 2013.
[18] Marcus Ciolkowski, Oliver Laitenberger, Sira Vegas, and Stefan Biffl. Prac-
tical experiences in the design and conduct of surveys in empirical soft-
ware engineering. In Empirical methods and studies in software engineering,
pages 104–128. Springer, 2003.
[19] Jürgen Cito, Philipp Leitner, Thomas Fritz, and Harald C. Gall. The mak-
ing of cloud applications: An empirical study on software development for
the cloud. In Proceedings of the 2015 10th Joint Meeting on Foundations
of Software Engineering, pages 393–403. ACM, 2015.
[20] J. B. Colla, A. C. Bracken, L. M. Kinney, and W. B. Weeks. Measuring

patient safety climate: a review of surveys. Quality and safety in health
care, 14(5):364–366, 2005.
BIBLIOGRAPHY 82
[21] Reidar Conradi, Jingyue Li, Odd Petter N. Slyngstad, Vigdis By Kamp-
enes, Christian Bunse, Maurizio Morisio, and Marco Torchiano. Reflections
on conducting an international survey of Software Engineering. In 2005
International Symposium on Empirical Software Engineering, 2005., pages
10–pp. IEEE, 2005.
[22] Daniela S. Cruzes and Tore Dyba. Recommended steps for thematic synthe-
sis in software engineering. In 2011 International Symposium on Empirical
Software Engineering and Measurement, pages 275–284. IEEE, 2011.
[23] Ermira Daka and Gordon Fraser. A survey on unit testing practices and
problems. In 2014 IEEE 25th International Symposium on Software Relia-
bility Engineering, pages 201–211. IEEE, 2014.
[24] Christian W. Dawson. Projects in computing and information systems: a

student’s guide. Pearson Education, 2005.
[25] Rafael M. de Mello and Guilherme H. Travassos. Would Sociable Soft-

ware Engineers Observe Better? In 2013 ACM/IEEE International Sympo-
sium on Empirical Software Engineering and Measurement, pages 279–282.
IEEE, 2013.
[26] Rafael Maiani de Mello, Pedro Correa da Silva, and Guilherme Horta
Travassos. Sampling improvement in software engineering surveys. In Pro-
ceedings of the 8th ACM/IEEE International Symposium on Empirical Soft-
ware Engineering and Measurement, page 13. ACM, 2014.
[27] Rafael Maiani de Mello, Pedro Corrêa Da Silva, and Guilherme Horta
Travassos. Investigating probabilistic sampling approaches for large-scale
surveys in software engineering. Journal of Software Engineering Research
and Development, 3(1):1, 2015.
[28] Tore Dyba. An empirical investigation of the key factors for success in soft-
ware process improvement. IEEE Transactions on Software Engineering,
31(5):410–424, 2005.
[29] Evgenia Egorova, Marco Torchiano, and Maurizio Morisio. Evaluating the
perceived effect of software engineering practices in the Italian industry.
In International Conference on Software Process, pages 100–111. Springer,
2009.
[30] Khaled El Emam and A. Günes Koru. A replicated survey of IT software

project failures. IEEE software, 25(5):84–90, 2008.
[31] Asim El Sheikh and Haroon Tarawneh. A survey of web engineering prac-
tice in small Jordanian web development firms. In The 6th Joint Meeting on
BIBLIOGRAPHY 83
European software engineering conference and the ACM SIGSOFT sympo-

sium on the foundations of software engineering: companion papers, pages
481–489. ACM, 2007.
[32] Satu Elo and Helvi Kyngäs. The qualitative content analysis process. Jour-
nal of advanced nursing, 62(1):107–115, 2008.
[33] Neil A. Ernst, Stephany Bellomo, Ipek Ozkaya, Robert L. Nord, and Ian
Gorton. Measure it? Manage it? Ignore it? Software practitioners and tech-
nical debt. In Proceedings of the 2015 10th Joint Meeting on Foundations
of Software Engineering, pages 50–60. ACM, 2015.
[34] Seena Fazel and John Danesh. Serious mental disorder in 23 000 prisoners:
a systematic review of 62 surveys. The lancet, 359(9306):545–550, 2002.
[35] Raimund L. Feldmann, Forrest Shull, Christian Denger, Martin Host,
and Christin Lindholm. A survey of software engineering techniques in
medical device development. In High Confidence Medical Devices, Soft-
ware, and Systems and Medical Device Plug-and-Play Interoperability, 2007.
HCMDSS-MDPnP. Joint Workshop on, pages 46–54. IEEE, 2007.
[36] Arlene Fink. How to Conduct Surveys: A Step-by-Step Guide: A Step-by-
Step Guide. Sage Publications, 2012.
[37] Arlene Fink. How to conduct surveys: A step-by-step guide. Sage Publica-
tions, 2015.
[38] Andrew Forward and Timothy C. Lethbridge. Problems and opportunities
for model-centric versus code-centric software development: a survey of
software professionals. In Proceedings of the 2008 international workshop
on Models in software engineering, pages 27–32. ACM, 2008.
[39] Floyd J. Fowler Jr. Survey research methods. Sage publications, 2013.
[40] Matthias Galster and Dan Tofan. Exploring web advertising to attract
industry professionals for software engineering surveys. In Proceedings of the
2nd International Workshop on Conducting Empirical Studies in Industry,
pages 5–8. ACM, 2014.
[41] Vahid Garousi, Ahmet Coşkunçay, Aysu Betin-Can, and Onur Demirörs. A
survey of software engineering practices in Turkey. Journal of Systems and
Software, 108:148–177, 2015.
[42] Vahid Garousi, Ahmet Coşkunçay, Onur Demirörs, and Ali Yazici. Cross-
factor analysis of software engineering practices versus practitioner demo-
graphics: An exploratory study in Turkey. Journal of Systems and Software,
111:49–73, 2016.
BIBLIOGRAPHY 84
[43] Kiely Gaye, Finnegan Patrick, and Butler Tom. MANAGING GLOBAL
VIRTUAL TEAMS: AN EXPLORATION OF OPERATION AND PER-
FORMANCE. 2014.
[44] Yaser Ghanam, Frank Maurer, and Pekka Abrahamsson. Making the leap
to a software platform strategy: Issues and challenges. Information and
Software Technology, 54(9):968–984, 2012.
[45] Lisa M. Given. The Sage encyclopedia of qualitative research methods. Sage
Publications, 2008.
[46] Tony Gorschek, Ewan Tempero, and Lefteris Angelis. A large-scale em-
pirical study of practitioners’ use of object-oriented concepts. In 2010
ACM/IEEE 32nd International Conference on Software Engineering, vol-
ume 1, pages 115–124. IEEE, 2010.
[47] Georgios Gousios, Margaret-Anne Storey, and Alberto Bacchelli. Work

practices and challenges in pull-based development: the contributor’s per-
spective. In Proceedings of the 38th International Conference on Software
Engineering, pages 285–296. ACM, 2016.
[48] Georgios Gousios, Andy Zaidman, Margaret-Anne Storey, and Arie

Van Deursen. Work practices and challenges in pull-based development:
the integrator’s perspective. In Proceedings of the 37th International Con-
ference on Software Engineering-Volume 1, pages 358–368. IEEE Press,
2015.
[49] Michaela Greiler, Arie van Deursen, and Margaret-Anne Storey. Test con-
fessions: a study of testing practices for plug-in systems. In 2012 34th
International Conference on Software Engineering (ICSE), pages 244–254.
IEEE, 2012.
[50] Heather Haeger, Amber D. Lambert, Jillian Kinzie, and James Gieser. Us-
ing cognitive interviews to improve survey instruments. Association for
Institutional Research, New Orleans, 2012.
[51] Elizabeth J. Halcomb and Patricia M. Davidson. Is verbatim transcription

of interview data always necessary? Applied Nursing Research, 19(1):38–42,
2006.
[52] Hsiu-Fang Hsieh and Sarah E. Shannon. Three approaches to qualitative

content analysis. Qualitative health research, 15(9):1277–1288, 2005.
[53] Irum Inayat and Siti Salwah Salim. A framework to study requirements-
driven collaboration among agile teams: Findings from two case studies.
Computers in Human Behavior, 51:1367–1379, 2015.
BIBLIOGRAPHY 85
[54] Martin Ivarsson and Tony Gorschek. A method for evaluating rigor and
industrial relevance of technology evaluations. Empirical Software Engi-
neering, 16(3):365–395, 2011.
[55] Samireh Jalali and Claes Wohlin. Systematic literature studies: database
searches vs. backward snowballing. In Proceedings of the ACM-IEEE inter-
national symposium on Empirical software engineering and measurement,
pages 29–38. ACM, 2012.
[56] Junzhong Ji, Jingyue Li, Reidar Conradi, Chunnian Liu, Jianqiang Ma, and
Weibing Chen. Some lessons learned in conducting software engineering
surveys in china. In Proceedings of the Second ACM-IEEE international
symposium on Empirical software engineering and measurement, pages 168–
177. ACM, 2008.
[57] James Jiang and Gary Klein. Software development risks to project effec-
tiveness. Journal of Systems and Software, 52(1):3–10, 2000.
[58] F. Joseph Jr. Hair Jr, William C. Black, Barry J. Babin, and Rolph E.
Anderson, Multivariate data analysis. Prentice Hall, 2009.
[59] Annabel Bhamani Kajornboon. Using interviews as research instruments.

E-journal for Research Teachers, 2(1), 2005.
[60] Katja Karhu, Ossi Taipale, and Kari Smolander. Investigating the relation-
ship between schedules and knowledge transfer in software testing. Infor-
mation and Software Technology, 51(3):663–677, 2009.
[61] Mark Kasunic. Designing an effective survey. Technical report, DTIC Doc-
ument, 2005.
[62] Daljit Kaur and Parminder Kaur. Software Development Life Cycle Secu-
rity Issues. In 2ND INTERNATIONAL CONFERENCE ON METHODS
AND MODELS IN SCIENCE AND TECHNOLOGY (ICM2ST-11), vol-
ume 1414, pages 237–239. AIP Publishing, 2011.
[63] Staffs Keele. Guidelines for performing systematic literature reviews in

software engineering. In Technical report, Ver. 2.3 EBSE Technical Report.
EBSE. 2007.
[64] Mahvish Khurum, Niroopa Uppalapati, and Ramya Chowdary Veeramacha-

neni. Software requirements triage and selection: state-of-the-art and state-
of-practice. In 2012 19th Asia-Pacific Software Engineering Conference,
volume 1, pages 416–421. IEEE, 2012.
[65] Leslie Kish. Survey Sampling Wiley. New York, 1965.

BIBLIOGRAPHY 86
[66] Barbara Kitchenham, O. Pearl Brereton, David Budgen, Mark Turner, John
Bailey, and Stephen Linkman. Systematic literature reviews in software
engineering–a systematic literature review. Information and software tech-
nology, 51(1):7–15, 2009.
[67] Barbara Kitchenham and Shari Lawrence Pfleeger. Principles of survey
research: part 5: populations and samples. ACM SIGSOFT Software En-
gineering Notes, 27(5):17–20, 2002.
[68] Barbara Kitchenham and Shari Lawrence Pfleeger. Principles of survey
research part 6: data analysis. ACM SIGSOFT Software Engineering Notes,
28(2):24–27, 2003.
[69] Barbara A. Kitchenham and Shari L. Pfleeger. Personal opinion surveys. In
Guide to Advanced Empirical Software Engineering, pages 63–92. Springer,
2008.
[70] Barbara A. Kitchenham, Shari Lawrence Pfleeger, Lesley M. Pickard, Pe-
ter W. Jones, David C. Hoaglin, Khaled El Emam, and Jarrett Rosenberg.
Preliminary guidelines for empirical research in software engineering. IEEE
Transactions on software engineering, 28(8):721–734, 2002.
[71] Charles W. Knisely and Karin I. Knisely. Engineering Communication.
Cengage Learning, 2014.
[72] S. Arun Kumar and Arun Kumar Thangavelu. Factors affecting the out-
come of Global Software Development projects: An empirical study. In
Computer Communication and Informatics (ICCCI), 2013 International
Conference on, pages 1–10. IEEE, 2013.
[73] Anne Lacey and Donna Luff. Qualitative data analysis. Trent focus
Sheffield, 2001.
[74] Effie Lai-Chong Law and Marta Kristín Lárusdóttir. Whose experience do
we care about? Analysis of the fitness of scrum and kanban to user expe-
rience. International Journal of Human-Computer Interaction, 31(9):584–
602, 2015.
[75] Lucas Layman, Forrest Shull, Paul Componation, Sue O’Brien, Dawn Saba-
dos, Anne Carrigy, and Richard Turner. A methodology for mapping system
engineering challenges to recommended approaches. In Systems Conference,
2010 4th Annual IEEE, pages 294–299. IEEE, 2010.
[76] Jingyue Li, Finn Olav Bjørnson, Reidar Conradi, and Vigdis B. Kamp-
enes. An empirical study of variations in COTS-based software development
processes in the Norwegian IT industry. Empirical Software Engineering,
11(3):433–461, 2006.
BIBLIOGRAPHY 87
[77] Bennet P. Lientz and E. Burton Swanson. Problems in application software

maintenance. Communications of the ACM, 24(11):763–769, 1981.
[78] Johan Linaker, Sardar Muhammad Sulaman, Martin Höst, and Rafael Ma-
iani de Mello. Guidelines for Conducting Surveys in Software Engineering
v. 1.1. 2015.
[79] Eveliina Lindgren and Jürgen Münch. Software development as an experi-

ment system: a qualitative survey on the state of the practice. In Interna-
tional Conference on Agile Software Development, pages 117–128. Springer,
2015.
[80] Mark S. Litwin. How to measure survey reliability and validity, volume 7.
Sage Publications, 1995.
[81] Garm Lucassen, Fabiano Dalpiaz, Jan Martijn EM van der Werf, and Sjaak
Brinkkemper. The use and effectiveness of user stories in practice. In In-
ternational Working Conference on Requirements Engineering: Foundation
for Software Quality, pages 205–222. Springer, 2016.
[82] Walid Maalej. Task-first or context-first? tool integration revisited. In Pro-

ceedings of the 2009 IEEE/ACM International Conference on Automated
Software Engineering, pages 344–355. IEEE Computer Society, 2009.
[83] Antonio Martini, Lars Pareto, and Jan Bosch. A multiple case study on
the inter-group interaction speed in large, embedded software companies
employing agile. Journal of Software: Evolution and Process, 28(1):4–26,
2016.
[84] Liz Millikin. SurveyGizmo | Professional Online Survey Software & Tools.
[85] Jeffrey Moore, Joanne Pascale, Pat Doyle, Anna Chan, and Julia Klein
Griffiths. Using field experiments to improve instrument design: The SIPP
methods panel project. Methods for testing and evaluating survey question-
naires, pages 189–207, 2004.
[86] Daniel Méndez Fernández and Stefan Wagner. Naming the pain in require-
ments engineering: A design for a global family of surveys and first results
from Germany. Information and Software Technology, 57:616–643, January
2015.
[87] Paulo Anselmo da Mota Silveira Neto, Joás Sousa Gomes, Eduardo Santana
De Almeida, Jair Cavalcanti Leite, Thais Vasconcelos Batista, and Larissa
Leite. 25 years of software engineering in Brazil: Beyond an insider’s view.
Journal of Systems and Software, 86(4):872–889, 2013.
BIBLIOGRAPHY 88
[88] Indira Nurdiani, Ronald Jabangwe, Darja Šmite, and Daniela Damian. Risk
identification and risk mitigation instruments for global software develop-
ment: Systematic review and survey results. In 2011 IEEE Sixth Interna-
tional Conference on Global Software Engineering Workshop, pages 36–41.
IEEE, 2011.
[89] Mark C. Paulk, Dennis Goldenson, and David M. White. The 1999 survey
of high maturity organizations. 2000.
[90] Javier Pereira, Narciso Cerpa, June Verner, Mario Rivas, and J. Drew Pro-
caccino. What do software practitioners really think about project success:
A cross-cultural comparison. Journal of Systems and Software, 81(6):897–
907, 2008.
[91] Shari Lawrence Pfleeger. Experimental design and analysis in software
engineering. Annals of Software Engineering, 1(1):219–253, 1995.
[92] Shari Lawrence Pfleeger and Barbara A. Kitchenham. Principles of survey
research: part 1: turning lemons into lemonade. ACM SIGSOFT Software
Engineering Notes, 26(6):16–18, 2001.
[93] Robert T. Plant and Panagiotis Tsoumpas. A survey of current practice
in aerospace software development. Information and Software Technology,
37(11):623–636, 1995.
[94] Danny CC Poo and Mui Ken Chung. Software engineering practices in
Singapore. Journal of Systems and Software, 41(1):3–15, 1998.
[95] Teade Punter, Marcus Ciolkowski, Bernd Freimut, and Isabel John. Con-
ducting on-line surveys in software engineering. In Empirical Software En-
gineering, 2003. ISESE 2003. Proceedings. 2003 International Symposium
on, pages 80–88. IEEE, 2003.
[96] Jane Radatz, Anne Geraci, and Freny Katki. IEEE standard glossary of
software engineering terminology. IEEE Std, 610121990(121990):3, 1990.
[97] Austen Rainer and Tracy Hall. A quantitative and qualitative analysis
of factors affecting software processes. Journal of Systems and Software,
66(1):7–21, 2003.
[98] Ayushi Rastogi, Arpit Gupta, and Ashish Sureka. Samiksha: mining issue
tracking system for contribution and performance assessment. In Proceed-
ings of the 6th India Software Engineering Conference, pages 13–22. ACM,
2013.
[99] Jörg Rech, Eric Ras, and Björn Decker. Intelligent assistance in german
software development: A survey. IEEE software, 24(4):72–79, 2007.
BIBLIOGRAPHY 89
[100] C. Robson. Real World Research: A Resource for Social Scientists and
Practitioner Researchers Blackwell. Oxford, 1993.
[101] Colin Robson and Kieran McCartan. Real world research. John Wiley &
Sons, 2016.
[102] Mark Rodgers, Amanda Sowden, Mark Petticrew, Lisa Arai, Helen Roberts,
Nicky Britten, and Jennie Popay. Testing methodological guidance on the
conduct of narrative synthesis in systematic reviews effectiveness of in-
terventions to promote smoke alarm ownership and function. Evaluation,
15(1):49–73, 2009.
[103] Jennifer Rowley. Conducting research interviews. Management Research

Review, 35(3/4):260–271, 2012.
[104] Carolyn B. Seaman. Qualitative methods in empirical studies of software

engineering. IEEE Transactions on software engineering, 25(4):557–572,
1999.
[105] Helen Sharp, Mark Woodman, and Fiona Hovenden. Tensions around the
adoption and evolution of software quality management systems: a dis-
course analytic approach. International journal of human-computer studies,
61(2):219–236, 2004.
[106] Stephen J. Sills and Chunyan Song. Innovations in survey research an

application of web-based surveys. Social science computer review, 20(1):22–
30, 2002.
[107] Edward Smith, Robert Loftin, Emerson Murphy-Hill, Christian Bird, and
Thomas Zimmermann. Improving developer participation rates in surveys.
In Cooperative and Human Aspects of Software Engineering (CHASE), 2013
6th International Workshop on, pages 89–92. IEEE, 2013.
[108] Colin Snook and Rachel Harrison. Practitioners’ views on the use of formal
methods: an industrial survey by structured interview. Information and
Software Technology, 43(4):275–283, 2001.
[109] Prashanth Harish Southekal and Ginger Levin. Validation of a generic GQM
based measurement framework for software projects from industry practi-
tioners. In Cognitive Informatics & Cognitive Computing (ICCI* CC), 2011
10th IEEE International Conference on, pages 367–372. IEEE, 2011.
[110] Roland Spitzlinger. Mixed Method Research-Qualitative Comparative

Analysis. 2006.
BIBLIOGRAPHY 90
[111] Rodrigo Oliveira Spínola and Guilherme Horta Travassos. Towards a frame-
work to characterize ubiquitous software projects. Information and Software
Technology, 54(7):759–785, 2012.
[112] J. M. Stahura. Stanley Presser, Jennifer M. Rothgeb, Mick P. Couper, Ju-

dith T. Lessler, Elizabeth Martin, Jean Martin, and Eleanor Singer, eds.
Methods for Testing and Evaluating Survey Questionnaires. CONTEMPO-
RARY SOCIOLOGY, 34(4):427, 2005.
[113] Shahida Sulaiman, Norbik Bashah Idris, and Shamsul Sahibuddin. Pro-
duction and maintenance of system documentation: what, why, when and
how tools should support the practice. In Software Engineering Conference,
2002. Ninth Asia-Pacific, pages 558–567. IEEE, 2002.
[114] Richard H. Thayer, Arthur Pyster, and Roger C. Wood. Validating solutions
to major problems in software engineering project management. Computer,
15(8):65–77, 1982.
[115] Richard H. Thayer, Arthur B. Pyster, and Roger C. Wood. Major issues in
software engineering project management. IEEE Transactions on Software
Engineering, (4):333–342, 1981.
[116] Steven K. Thompson. Sampling. Wiley, New York, page 12, 2002.
[117] Marco Torchiano and Filippo Ricca. Six reasons for rejecting an industrial
survey paper. In Conducting Empirical Studies in Industry (CESI), 2013
1st International Workshop on, pages 21–26. IEEE, 2013.
[118] Ljiljana Vukelja, Lothar Müller, and Klaus Opwis. Are engineers con-
demned to design? a survey on software engineering and UI design in
Switzerland. In IFIP Conference on Human-Computer Interaction, pages
555–568. Springer, 2007.
[119] Claes Wohlin and Aybüke Aurum. Towards a decision-making structure

for selecting a research design in empirical software engineering. Empirical
Software Engineering, 20(6):1427–1455, 2015.
[120] Claes Wohlin, Martin Höst, and Kennet Henningsson. Empirical research
methods in software engineering. In Empirical methods and studies in soft-
ware engineering, pages 7–23. Springer, 2003.
[121] Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Björn
Regnell, and Anders Wesslén. Experimentation in software engineering.
Springer Science & Business Media, 2012.
BIBLIOGRAPHY 91
[122] Aiko Yamashita and Leon Moonen. Surveying developer knowledge and
interest in code smells through online freelance marketplaces. In User Eval-
uations for Software Engineering Researchers (USER), 2013 2nd Interna-
tional Workshop on, pages 5–8. IEEE, 2013.
[123] Chen Yang, Peng Liang, and Paris Avgeriou. A survey on software archi-
tectural assumptions. Journal of Systems and Software, 113:362–380, 2016.
[124] Zhuojun Yi, Dongming Xu, and Jon Heales. The Moderating Effect of
Social Influence on Ethical Decision Making in Software Piracy. In PACIS,
page 236, 2013.
[125] He Zhang and Muhammad Ali Babar. Systematic reviews in software engi-
neering: An empirical investigation. Information and Software Technology,
55(7):1341–1354, 2013.
Appendices
92
Appendix A
Selected Primary Studies
Identifier Name of Primary Study

S1 G. Gousios, M.-A. Storey, and A. Bacchelli, “Work practices
and challenges in pull-based development: the contributor’s
perspective,” in Proceedings of the 38th International Confer-
ence on Softwar Engineering, 2016, pp. 285–296 [47].
S2 C. Yang, P. Liang, and P. Avgeriou, “A survey on software
architectural assumptions,” J. Syst. Softw., vol. 113, pp.
362–380, 2016 [123].
S3 V. Garousi, A. Coşkunçay, O. Demirörs, and A. Yazici, “Cross-
factor analysis of software engineering practices versus prac-
titioner demographics: An exploratory study in Turkey,” J.
Syst. Softw., vol. 111, pp. 49–73, 2016 [42].
S4 A. Martini, L. Pareto, and J. Bosch, “A multiple case study on
the inter-group interaction speed in large, embedded software
companies employing agile,” J. Softw. Evol. Process, vol. 28,
no. 1, pp. 4–26, 2016 [83].
S5 I. Inayat and S. S. Salim, “A framework to study requirements-
driven collaboration among agile teams: Findings from two
case studies,” Comput. Hum. Behav., vol. 51, pp. 1367–1379,
2015 [53].
S6 N. A. Ernst, S. Bellomo, I. Ozkaya, R. L. Nord, and I. Gorton,
“Measure it? Manage it? Ignore it? Software practitioners
and technical debt,” in Proceedings of the 2015 10th Joint
Meeting on Foundations of Software Engineering, 2015, pp.
50–60 [33].
S7 J. Cito, P. Leitner, T. Fritz, and H. C. Gall, “The making of
cloud applications: An empirical study on software develop-
ment for the cloud,” in Proceedings of the 2015 10th Joint
Meeting on Foundations of Software Engineering, 2015, pp.
393–403 [19].
93
Appendix A. Selected Primary Studies 94
S8 R. Britto, E. Mendes, and J. Börstler, “An empirical inves-

tigation on effort estimation in agile global software develop-
ment,” in 2015 IEEE 10th International Conference on Global
Software Engineering, 2015, pp. 38–45 [13].
S9 G. Gousios, A. Zaidman, M.-A. Storey, and A. Van Deursen,
“Work practices and challenges in pull-based development:
the integrator’s perspective,” in Proceedings of the 37th Inter-
national Conference on Software Engineering-Volume 1, 2015,
pp. 358–368 [48].
S10 V. Garousi, A. Coşkunçay, A. Betin-Can, and O. Demirörs,
“A survey of software engineering practices in Turkey,” J. Syst.
Softw., vol. 108, pp. 148–177, 2015 [41].
S11 E. Lindgren and J. Münch, “Software development as an ex-
periment system: a qualitative survey on the state of the prac-
tice,” in International Conference on Agile Software Develop-
ment, 2015, pp. 117–128 [79].
S12 E. L.-C. Law and M. K. Lárusdóttir, “Whose experience do
we care about? Analysis of the fitness of scrum and kanban
to user experience,” Int. J. Hum.-Comput. Interact., vol. 31,
no. 9, pp. 584–602, 2015 [74].
S13 V. Bauer, J. Eckhardt, B. Hauptmann, and M. Klimek, “An
exploratory study on reuse at google,” in Proceedings of the
1st International Workshop on Software Engineering Research
and Industrial Practices, 2014, pp. 14–23 [8].
S14 C. Ardito, P. Buono, D. Caivano, M. F. Costabile, and R.
Lanzilotti, “Investigating and promoting UX practice in indus-
try: An experimental study,” Int. J. Hum.-Comput. Stud.,
vol. 72, no. 6, pp. 542–551, 2014 [6].
S15 E. Daka and G. Fraser, “A survey on unit testing practices
and problems,” in 2014 IEEE 25th International Symposium
on Software Reliability Engineering, 2014, pp. 201–211 [23].
S16 M. Galster and D. Tofan, “Exploring web advertising to at-
tract industry professionals for software engineering surveys,”
in Proceedings of the 2nd International Workshop on Con-
ducting Empirical Studies in Industry, 2014, pp. 5–8 [40].
S17 K. Gaye, F. Patrick, and B. Tom, “MANAGING GLOBAL
VIRTUAL TEAMS: AN EXPLORATION OF OPERATION
AND PERFORMANCE,” 2014 [43].
S18 Y. C. Cavalcanti, P. A. da Mota Silveira Neto, I. do Carmo

Machado, E. S. de Almeida, and S. R. de Lemos Meira, “To-
wards understanding software change request assignment: a
survey with practitioners,” in Proceedings of the 17th Interna-
tional Conference on Evaluation and Assessment in Software
Engineering, 2013, pp. 195–206 [17].
S19 D. Méndez Fernández and S. Wagner, “Naming the pain in
requirements engineering: A design for a global family of sur-
veys and first results from Germany,” Inf. Softw. Technol.,
vol. 57, pp. 616–643, Jan. 2015 [86].
S20 P. A. da M. S. Neto, J. S. Gomes, E. S. De Almeida, J. C.
Leite, T. V. Batista, and L. Leite, “25 years of software engi-
neering in Brazil: Beyond an insider’s view,” J. Syst. Softw.,
vol. 86, no. 4, pp. 872–889, 2013 [87].
S21 A. Rastogi, A. Gupta, and A. Sureka, “Samiksha: mining is-
sue tracking system for contribution and performance assess-
ment,” in Proceedings of the 6th India Software Engineering
Conference, 2013, pp. 13–22 [98].
S22 Z. Yi, D. Xu, and J. Heales, “The Moderating Effect of Social
Influence on Ethical Decision Making in Software Piracy.,” in
PACIS, 2013, p. 236 [124].
S23 M. Greiler, A. van Deursen, and M.-A. Storey, “Test confes-
sions: a study of testing practices for plug-in systems,” in
2012 34th International Conference on Software Engineering
(ICSE), 2012, pp. 244–254 [49].
S24 M. Khurum, N. Uppalapati, and R. C. Veeramachaneni, “Soft-
ware requirements triage and selection: state-of-the-art and
state-of-practice,” in 2012 19th Asia-Pacific Software Engi-
neering Conference, 2012, vol. 1, pp. 416–421 [64].
S25 W. Maalej, “Task-first or context-first? tool integration revis-
ited,” in Proceedings of the 2009 IEEE/ACM International
Conference on Automated Software Engineering, 2009, pp.
344–355 [82].
S26 E. Egorova, M. Torchiano, and M. Morisio, “Evaluating the
perceived effect of software engineering practices in the Italian
industry,” in International Conference on Software Process,
2009, pp. 100–111 [29].
S27 A. Forward and T. C. Lethbridge, “Problems and Opportu-

nities for Model-centric Versus Code-centric Software Devel-
opment: A Survey of Software Professionals,” in Proceedings
of the 2008 International Workshop on Models in Software
Engineering, New York, NY, USA, 2008, pp. 27–32 [38].
S28 K. El Emam and A. G. Koru, “A replicated survey of IT soft-
ware project failures,” IEEE Softw., vol. 25, no. 5, pp. 84–90,
2008 [30].
S29 J. Pereira, N. Cerpa, J. Verner, M. Rivas, and J. D. Pro-
caccino, “What do software practitioners really think about
project success: A cross-cultural comparison,” J. Syst. Softw.,
vol. 81, no. 6, pp. 897–907, 2008 [90].
S30 A. El Sheikh and H. Tarawneh, “A survey of web engineering
practice in small Jordanian web development firms,” in The
6th Joint Meeting on European software engineering confer-
ence and the ACM SIGSOFT symposium on the foundations
of software engineering: companion papers, 2007, pp. 481–489
[31].
S31 J. Rech, E. Ras, and B. Decker, “Intelligent assistance in ger-
man software development: A survey,” IEEE Softw., vol. 24,
no. 4, pp. 72–79, 2007 [99].
S32 T. Dyba, “An empirical investigation of the key factors for
success in software process improvement,” IEEE Trans. Softw.
Eng., vol. 31, no. 5, pp. 410–424, 2005 [28].
S33 A. Rainer and T. Hall, “A quantitative and qualitative anal-
ysis of factors affecting software processes,” J. Syst. Softw.,
vol. 66, no. 1, pp. 7–21, 2003 [97].
S34 S. Sulaiman, N. B. Idris, and S. Sahibuddin, “Production and
maintenance of system documentation: what, why, when and
how tools should support the practice,” in Software Engineer-
ing Conference, 2002. Ninth Asia-Pacific, 2002, pp. 558–567
[113].
S35 C. Snook and R. Harrison, “Practitioners’ views on the use
of formal methods: an industrial survey by structured inter-
view,” Inf. Softw. Technol., vol. 43, no. 4, pp. 275–283, 2001
[108].
S36 J. Jiang and G. Klein, “Software development risks to project
effectiveness,” J. Syst. Softw., vol. 52, no. 1, pp. 3–10, 2000
[57].
S37 R. T. Plant and P. Tsoumpas, “A survey of current practice in

aerospace software development,” Inf. Softw. Technol., vol.
37, no. 11, pp. 623–636, 1995 [93].
S38 J. Abke, Y. Sedelmaier, and C. Gold, “A Retrospective Course
Survey of Graduates to Analyse Competencies in Software
Engineering [2].”
S39 L. Vukelja, L. Müller, and K. Opwis, “Are engineers con-
demned to design? a survey on software engineering and
UI design in Switzerland,” in IFIP Conference on Human-
Computer Interaction, 2007, pp. 555–568 [118].
S40 R. L. Feldmann, F. Shull, C. Denger, M. Host, and C. Lind-
holm, “A survey of software engineering techniques in medi-
cal device development,” in High Confidence Medical Devices,
Software, and Systems and Medical Device Plug-and-Play In-
teroperability, 2007. HCMDSS-MDPnP. Joint Workshop on,
2007, pp. 46–54 [35].
S41 L. L. Beck and T. E. Perkins, “A survey of software engi-
neering practice: tools, methods, and results,” IEEE Trans.
Softw. Eng., no. 5, pp. 541–561, 1983 [9].
S42 D. C. Poo and M. K. Chung, “Software engineering practices
in Singapore,” J. Syst. Softw., vol. 41, no. 1, pp. 3–15, 1998
[94].
S43 R. H. Thayer, A. B. Pyster, and R. C. Wood, “Major issues
in software engineering project management,” IEEE Trans.
Softw. Eng., no. 4, pp. 333–342, 1981 [115].
S44 R. H. Thayer, A. Pyster, and R. C. Wood, “Validating solu-
tions to major problems in software engineering project man-
agement,” Computer, vol. 15, no. 8, pp. 65–77, 1982[114].
S45 A. Causevic, D. Sundmark, and S. Punnekkat, “An industrial
survey on contemporary aspects of software testing,” in 2010
Third International Conference on Software Testing, Verifica-
tion and Validation, 2010, pp. 393–401[16].
S46 E. Smith, R. Loftin, E. Murphy-Hill, C. Bird, and T. Zimmer-
mann, “Improving developer participation rates in surveys,”
in Cooperative and Human Aspects of Software Engineering
(CHASE), 2013 6th International Workshop on, 2013, pp.
89–92[107].
S47 I. Nurdiani, R. Jabangwe, D. Šmite, and D. Damian, “Risk

identification and risk mitigation instruments for global soft-
ware development: Systematic review and survey results,” in
2011 IEEE Sixth International Conference on Global Software
Engineering Workshop, 2011, pp. 36–41[88].
S48 B. P. Lientz and E. B. Swanson, “Problems in application
software maintenance,” Commun. ACM, vol. 24, no. 11, pp.
763–769, 1981[77].
S49 D. Kaur and P. Kaur, “Software Development Life Cycle Se-
curity Issues,” in 2ND INTERNATIONAL CONFERENCE
ON METHODS AND MODELS IN SCIENCE AND TECH-
NOLOGY (ICM2ST-11), 2011, vol. 1414, pp. 237–239[62].
S50 S. Beecham and J. Noll, “What Motivates Software Engineers
Working in Global Software Development?,” in International
Conference on Product-Focused Software Process Improve-
ment, 2015, pp. 193–209[10].
S51 L. Layman, F. Shull, P. Componation, S. O’Brien, D. Saba-
dos, A. Carrigy, and R. Turner, “A methodology for mapping
system engineering challenges to recommended approaches,”
in Systems Conference, 2010 4th Annual IEEE, 2010, pp.
294–299[75].
S52 S. A. Kumar and A. K. Thangavelu, “Factors affecting the
outcome of Global Software Development projects: An empir-
ical study,” in Computer Communication and Informatics (IC-
CCI), 2013 International Conference on, 2013, pp. 1–10[72].
S53 W. G. Al-Khatib, O. Bukhres, and P. Douglas, “An empirical
study of skills assessment for software practitioners,” Inf. Sci.-
Appl., vol. 4, no. 2, pp. 83–118, 1995[4].
S54 A. S. AL_Zaidi and M. R. J. Qureshi, “Scrum Practices and
Global Software Development,” Int. J. Inf. Eng. Electron.
Bus., vol. 6, no. 5, p. 22, 2014[5].
S55 R. M. de Mello and G. H. Travassos, “Would Sociable Soft-
ware Engineers Observe Better?,” in 2013 ACM/IEEE Inter-
national Symposium on Empirical Software Engineering and
Measurement, 2013, pp. 279–282 [25].
S56 K. Karhu, O. Taipale, and K. Smolander, “Investigating the
relationship between schedules and knowledge transfer in soft-
ware testing,” Inf. Softw. Technol., vol. 51, no. 3, pp.
663–677, 2009[60].
S57 P. H. Southekal and G. Levin, “Validation of a generic GQM

based measurement framework for software projects from in-
dustry practitioners,” in Cognitive Informatics & Cognitive
Computing (ICCI* CC), 2011 10th IEEE International Con-
ference on, 2011, pp. 367–372[109] .
S58 T. Gorschek, E. Tempero, and L. Angelis, “A large-scale em-
pirical study of practitioners’ use of object-oriented concepts,”
in 2010 ACM/IEEE 32nd International Conference on Soft-
ware Engineering, 2010, vol. 1, pp. 115–124[46].
S59 R. O. Spínola and G. H. Travassos, “Towards a framework to
characterize ubiquitous software projects,” Inf. Softw. Tech-
nol., vol. 54, no. 7, pp. 759–785, 2012[111].
S60 G. Lucassen, F. Dalpiaz, J. M. E. van der Werf, and S.
Brinkkemper, “The use and effectiveness of user stories in
practice,” in International Working Conference on Require-
ments Engineering: Foundation for Software Quality, 2016,
pp. 205–222[81].
S61 H. Zhang and M. A. Babar, “Systematic reviews in software
engineering: An empirical investigation,” Inf. Softw. Tech-
nol., vol. 55, no. 7, pp. 1341–1354, 2013[125].
S62 M. Torchiano and F. Ricca, “Six reasons for rejecting an in-
dustrial survey paper,” in Conducting Empirical Studies in
Industry (CESI), 2013 1st International Workshop on, 2013,
pp. 21–26[117].
S63 J. Ji, J. Li, R. Conradi, C. Liu, J. Ma, and W. Chen, “Some
lessons learned in conducting software engineering surveys in
china,” in Proceedings of the Second M-IEEE international
symposium on Empirical software engineering and measure-
ment, 2008, pp. 168–177 [56].
S64 A. Cater-Steel, M. Toleman, and T. Rout, “Addressing the
challenges of replications of surveys in software engineering re-
search,” in 2005 International Symposium on Empirical Soft-
ware Engineering, 2005., 2005, p. 10–pp[15].
S65 A. Yamashita and L. Moonen, “Surveying developer knowl-
edge and interest in code smells through online freelance mar-
ketplaces,” in User Evaluations for Software Engineering Re-
searchers (USER), 2013 2nd International Workshop on, 2013,
pp. 5–8[122].
S66 R. Akbar, M. F. Hassan, and A. Abdullah, “A framework of

software process tailoring for small and medium size IT com-
panies,” in Computer & Information Science (ICCIS), 2012
International Conference on, 2012, vol. 2, pp. 914–918 [3].
Appendix B
Convenient Clustering
SNO Independent Issues Dependant Issues
 Randomness of Participants
 Insufficient Sample Size
1 Sampling Method  Improper Participant Selection
Poor Questionnaire
2 Design Survey Instrument Flaws
3 Question-Order Effect
4 Likert Scale Problems
 Boredom
5 Time Limitation  Busy Schedules
6 Domain Knowledge People's Perceptions
7 Hypothesis Guessing
8 Language Issues
 Geographical Issues
9 Cultural Issues  Country-specific issues
10 Generalizability  Confidentiality Issues
 Correctness of obtained responses
11 Reliability  Confidentiality Issues
12 Bias
Inconsistency of
13 Responses Lack of motivation for respondents
14 Response Duplication
15 Low participation rate Lack of motivation for participation selection
101
Appendix C
Interview Questionnaire
The aim of this interview questionnaire is to investigate the problems faced by

the researchers while conducting surveys in software engineering. This question-
naire is divided into two sets of questions The first set of questions mainly focuses
on problems that are commonly faced by the researchers like cultural issues, in-
strument flaws, validity threats and generalizability issue. The interviewee is
expected to answer these questions from a Researcher’s perspective. The sec-
ond set of questions mainly focusses on problems that a respondent might face
while answering the survey questionnaire. It also includes the questions asking
for suggestions and recommendations regarding the questionnaire design. The in-
terviewee (software engineering researcher) is expected to answer these questions
from a Respondent’s Perspective.
Finally, the questionnaire ends by asking researchers their recommendations
regarding the design of problem checklist. We have taken sufficient care to make
the questions self-explanatory. Please be patient as the questionnaire is a bit
lengthy. Our main aim is to record your valuable responses to analyse it alongside
of the issues identified from SLR analysis.
C.1 Researcher’s Perspective

1. You have been doing research, publishing articles since long time, by looking
at your publications it is visible that you have conducted multiple surveys.
Can you explain in which context you think choosing survey as a research
method is more beneficial than action research, case studies and experi-
ments?
2. Surveys are used in social sciences and other disciplines including software
engineering, do you think there are some special instructions to be followed
while conducting surveys. How is it different in software engineering? What
factors one shall consider while designing surveys in software engineering
research?
3. When designing a survey what type of questions do you prefer asking, (open-
ended or close-ended) and why? (Is the evaluation method your primary
102
Appendix C. Interview Questionnaire 103
motivating factor for choosing it? Are evaluation methods one of the reasons
for choosing the type of questions? what are the other factors that enable
you to include both type of questions in your Survey)
4. In a survey one question may have provided context to the next one which
may drive respondents to specific answers, randomization of questions to
some extent may reduce this question-order effect. Can you suggest some
other techniques to deal with this question order effect?
5. How do you make sure that respondents understand the right context of
your question, what measures do you adapt for making the questionnaire
understandable?
6. Our SLR analysis showed that 31.4% of primary studies used Stratified sam-
pling technique, while only 15.7% of studies reported the usage of Snowball
Sampling by researchers. (Literature describes that Snowball sampling leads
to a better sample selection, where researcher has the freedom of choosing
sample that suits to his/her requirements). Have you faced any situation
where other sampling techniques were chosen over snowballing, what factors
did you consider while making the selection?
7. Low response rates are common problem for any survey, how can the re-
sponse rate be improved?
8. When a survey is posted, there are few respondents without a proper domain
knowledge answering it. They might misinterpret data giving incorrect
answers and this affects the overall analysis. In yours research how are such
responses identified and ruled-out?
9. Our analysis showed that hypothesis guessing is an issue that can only be
reduced to some extent rather than avoiding it completely. Explain how
this problem is addressed in your work.
10. What measures do you take to avoid the duplication of responses in your
surveys?
11. How do you overcome each of these common problems like bias, generaliz-
ability and reliability in the following cases?
(a) Case A: Respondent’s answering the survey just for the sake of rewards.
(b) Case B: Respondents answering surveys posted on social networks like
LinkedIn and Facebook.
12. How do you mitigate the issue of inconsistency in responses? (Case-When a
respondent is asked about his familiarity with non-functional requirements
he chooses a “Yes” option. When asked to elaborate his opinion he just
writes “No idea”, here comes the problem of inconsistency)
Appendix C. Interview Questionnaire 104
13. You have conducted a global survey in Sweden, Norway, China and Italy
collecting information from diverse respondents. How do you address the
following issues in your research?
(a) Issue: Questionnaire gets translated into Chinese making it under-
standable to respondents over there, due to poor translation there
might be an issue of data losses. How do you handle this language
issue?
(b) Issue: There might be some cultural issues where people of one coun-
try are more comfortable in answering an online questionnaire, while
people of other country are more responsive to face-to- face interviews.
In your opinion how can this kind cultural issue be mitigated?
14. In the literature review we found out that researchers used Likert scale for
gathering wide range of experiences. Even though it’s a commonly used
we could obtain few problems like central tendency bias when a 4-point or
7-point Likert scale is used, respondent fatigue and interpretation problems
have been identified when 9 or 10 point scales have been used. How do you
address these kind of issues in your research?
15. How do you decide upon a particular sample size for your survey?
16. What motivates you to select a specific analysis technique for your research?
C.2 Respondent’s Perspective

1. You must have answered many surveys till now, in your opinion how can a
survey grab the attention of the respondent?
2. Does the questionnaire length affect your way of answering a survey?
3. What do you prefer mostly to answer, open-ended or close-ended questions?
4. Does time limitation affect your way of answering the Survey?
5. Would you be willing to disclose details about your research study when
answering a survey? (Confidentiality issues, Intellectual theft)
C.3 Concluding questions about survey guidelines

1. Do you think there is a need for the checklist of all the problems faced by
the Software Engineering Researchers while conducting surveys?
2. On the scale of 5, please rate the need for having such kind of checklist.
Appendix D
Codes and Themes from Interviews
Figure D.1: Example Coding Process 1
105
Appendix D. Codes and Themes from Interviews 106

Appendix E
Quality Criteria
Table E.1: Quality criteria
IDENTIFIERS QC1 QC2 QC3 QC4 SCORE

S1 1 1 1 1 4
S2 1 1 1 1 4
S3 1 1 1 0.5 3.5
S4 1 1 1 0.5 3.5
S5 0.5 1 1 0.5 3
S6 1 1 0.5 1 3.5
S7 1 1 0.5 0.5 3
S8 1 1 1 1 4
S9 1 1 1 1 4
S10 1 1 1 0.5 3.5
S11 0.5 1 0.5 0.5 2.5
S12 0.5 1 0 0.5 2
S13 1 1 0.5 0 2.5
S14 0.5 1 0.5 0.5 2.5
S15 1 1 1 1 4
S16 1 1 0 0 2
S17 0.5 1 0 0 1.5
S18 1 1 1 0.5 3.5
S19 1 1 1 1 4
S20 1 1 1 1 4
S21 1 1 0.5 0.5 3
S22 0.5 1 0 0 1.5
S23 0.5 1 0 0 1.5
S24 0.5 1 1 0.5 3
S25 0.5 1 0.5 0.5 2.5
S26 1 1 1 1 4
S27 1 1 1 1 4
S28 1 1 1 1 4
107
Appendix E. Quality Criteria 108
S29 0.5 1 1 1 3.5

S30 1 1 1 1 4
S31 0.5 1 1 1 3.5
S32 0.5 1 1 0.5 3
S33 0.5 1 1 1 3.5
S34 0.5 1 1 0 2.5
S35 0.5 1 0.5 0 2
S36 0.5 1 1 0 2.5
S37 0.5 1 0.5 0 2
S38 1 1 1 1 4
S39 1 1 1 1 4
S40 1 1 1 0.5 3.5
S41 1 1 1 1 4
S42 0.5 1 1 1 3.5
S43 1 0.5 0.5 0.5 2.5
S44 0.5 1 1 0.5 3
S45 1 1 0.5 0 2.5
S46 0.5 1 1 0 2.5
S47 0.5 1 1 0.5 3
S48 0.5 1 1 0.5 3
S49 0.5 1 1 1 3.5
S50 1 1 1 1 4
S51 1 1 1 1 4
S52 0.5 1 0.5 0.5 2.5
S53 0.5 1 0.5 0 2
S54 0.5 1 0.5 1 3
S55 0.5 1 1 0 2.5
S56 0.5 1 0 0 1.5
S57 1 1 1 1 4
S58 0.5 1 0.5 0 2
S59 1 1 1 1 4
S60 0.5 1 1 1 3.5
S61 1 1 1 1 4
S62 1 1 0.5 0.5 3
S63 1 1 1 1 4
S64 1 1 0.5 0 2.5
S65 0.5 1 0 0.5 2
S66 1 1 1 0 3
Appendix F
Rigor and Relevance
Table F.1: Rigor and Relevance for Primary Studies
Identifier Rigor = C+S+V Relevance = C+RM+U+S

S1 3 2
S2 3 4
S3 3 4
S4 2.5 3
S5 1.5 2
S6 2.5 3
S7 2 2
S8 3 4
S9 3 4
S10 3 4
S11 1.5 3
S12 2 4
S13 2.5 2
S14 2.5 3
S15 2.5 4
S16 2.5 2
S17 2 4
S18 3 4
S19 3 2
S20 3 4
S21 2.5 2
S22 1 2
S23 1.5 2
S24 2 3
S25 1.5 2
S26 2.5 4
S27 2 3
S28 3 3
109
Appendix F. Rigor and Relevance 110
S29 1.5 4
S30 2.5 2
S31 2 2
S32 3 4
S33 2.5 4
S34 2 4
S35 2 4
S36 2.5 3
S37 2.5 3
S38 2.5 3
S39 1.5 4
S40 2.5 2
S41 2.5 3
S42 2.5 4
S43 2.5 4
S44 1.5 4
S45 2 2
S46 2.5 3
S47 2 2
S48 1 4
S49 3 4
S50 2 4
S51 1.5 4
S52 3 4
S53 1.5 2
S54 3 4
S55 2 2
S56 1.5 3
S57 2 4
S58 3 3
S59 2.5 4
S60 2.5 4
S61 2.5 3
S62 3 3
S63 2 2
S64 2.5 4
S65 3 4
S66 3 3
Appendix G
Themes
G.1 Sampling
G.1.1 Convenience sampling
Table G.1: Convenience Sampling
Convenience sampling Inclusion Criteria

Better Frequency
Convenience Snowballing Frequency Relevance.
No Representative Sample Frequency, Relevance.
Knowing Population Characteristics Relevance
Self Selection Frequency, Relevance.
No Sub-Grouping Additional.
G.1.2 Random Sampling
Table G.2: random sampling

Codes Inclusion Criteria
Not Random But Convenience Better Frequency
Misinterpretation Frequency
No Representative Sample Frequency, Relevance
Highlighy Unfeasible Frequency, Additional
Lack Motivation Frequency
Noise Introduce Relevance
Random Convenince Additional
Misinterpretation Relevance
111
Appendix G. Themes 112
G.1.3 Stratified sampling
Table G.3: Stratified Sampling

Expensive and Time taking Frequency
Hard Frequency
No Representative Sample Frequency, Relevance
G.2 Questionnaire Problems

G.2.1 Questionnaire Problems
Table G.4: Questionnaire Problems
Codes Inclusion criteria

can mitigate cannot eliminate Frequency
Question Formulation Frequency, Relevance
Valid Options Frequency, Relevance
Understandable Frequency
pilot sessions Frequency, relevance
cross analysis Additional
extact vital information Frequency, relevance
direct questions Additional
consistency questions Additional
Demographic Frequency, Relevance
non-repeatation questions Additional
non-contradictory questions Additional
non-overlapping questions Additional
Indpendent questions Relevance
G.2.2 Questionnaire length
Table G.5:

Maintain Balance Frequency, Additional
interruptions Frequency, Additional
simple Frequency, Relevance
15 questions Frequency, Additional
short Relevance
G.2.3 Open-ended & closed-ended questions
Table G.6:

Need both Relevance, Frequency
Closed Ended saves time Relevance, Frequency
Only committed fill open-ended Relevance
Open-ended actually gives confi- Additional
dence
Free text-box along with close Frequency, Additional
ended
Depends on RQ’s Relevance, Frequency
"Close ended- easy for analysis Relevance, Frequency
open ended- more info obtained"
Right question constructs Frequency
common understanding Frequency, Additional
respondent selection Additional
Depends on RQ’s Frequency, Relevance
G.2.4 Question order
Table G.7:
Dont Randomize. Frequency

Demographics Frequency
Explicit Order Frequency
Logical Order Additonal
Question Phrasing Additional
Self Contained Relevance
Important Relevance, Frequency
Branching Relevance, Frequency
Survey Tool Frequency
Logical Adherence Lost Relevance
Depends on RQ’s Relevance
Can Randomize independant and few random questions Additonal
Can Randomize within Subgroups Additional
G.2.5 Likert Scales
Table G.8:
Code Inclusion Criteria

even order- forcing respondents, Additional
odd order- respondent settle for
netrual
use check box Additional Information
Depends on Reseacher Additional Information
odd over even Frequency, additional
presurvey Frequency, additional
visualize results Additional
5 point quite established in se Additional
Know potential weakness Additional
Know about the cultural back- Additional
ground
Study population Additional
quantify scale order Additional
Mutually exclusive options Additional, Frequency
Align RQ’s, survey questions and Additional, Frequency
options
G.2.6 Sensitive Questions
Table G.9:

usually wont. frequency, relevance.
Response anonymity frequency, relevance
Lack of trust frequency, relevance
cultural dependency Additional
abstraction level dependency additional
Credibility of source frequency, additional
No personal questions Additional
G.2.7 Time limitation
Table G.10:

mention average time frequency,additional
break and continue frequency,additional

no countdown timers. additonal
never close a survey. additonal
five-right away: 10-pushing it, 15- additional
really well designed,20-never go-
ing to happen.
nothing negative having deadlines frequency, additional
G.2.8 Domain Knowledge
Table G.11:

misinterpret frequency, relevance
no loaded questions relevance
cant eliminate relevance
cross questioning additional
demographic frequency, relevance
pilot frequency, relevance
put "i dont know" or "i dont want additional
to answer"
motivate respondents additional
target specfic respondents frequency, relevance
clarity questions frequency
clueless respondents additional
testing questions additional
"main problem in open web sur- additional
veys less in mail based surveys"
no pre conceptions additional
no expections additional
remove outliers additional
G.2.9 Context
Table G.12:
cannot be neglected frequency, relevance

appropriate method usage relevance
heterogeneous sample selection frequency, additonal information
trust relevance, frequency
stick to common terminology additional information
sending checks while you pilot additional information

with peers
questionnaire clarity frequency, relevance
test undesirability frequency, relevance
do internal research and valida- Additional
tion
pilot with 5 to 10 people frequency, additional information
accessible to respondents additional information
G.2.10 Hypothesis Guessing
Table G.13:

no loaded questions. relevance
no prompting. additional
no single answer questions. frequency, additional
indirect questions relevance
explicit questions. additional
problem in case of convenience sampling. additional
select potential candidates frequency, relevance
encourage truthfulness additonal
problem if there is self staking additional
not a problem in exploratory surveys Additional
survey design Additional
donot influence frequency, relevance
G.2.11 Language Issues
Table G.14:

cant handle frequency, relevance
local translations frequency, relevance
same questionnaire with two in- additonal
terpretations
be as flat as possible frequency, additonal
checking understandability. frequency, relevance
consulting researchers with same frequency
origin
might get radical answers additional
no google translate additional

right medium to roll out survey additional
G.2.12 Cultural Issues
Table G.15:

use proper nomenclature additional
pre survey frequency, additional
stick to a common base additional
if possible do face to face frequency, additonal
sometimes over-hyped additional
used as an excuse additonal
identify the cause frequency, relevance
do meta-analysis additional
avoid sensitive questions frequency
clear and concise frequency
individual differences Additional
G.2.13 Generalizability
Table G.16:

dont compramize quality relevance, frequency
introducing bais relevance, frequency
explicitly define and the target frequency, relevance
population
demographic questions frequency, relevance
overview of sample additional
proper reporting frequency, additional
ideally not achievable frequency, relevance
selecting right analysis method frequency, relevance
dont randomize frequency, relevance
anonymizing results frequency, relevance
G.2.13.1 Bias
Table G.17:
difficult to prevent additional, frequency

applicability and survey value Additional
dimnishes over time
snap shot for a particular period Additional
of time
process repeatation additional, frequency
reduce researcher bias Additional
impossible to do a survey without additional, relevance
bias but identify the type of bias.
depends on network types in on- Additional
line surveys
consistency questions frequency, relevance
rewatch additional
asking same thing differently additional
review your sample techniques additional
communication frequency, additional
balance respondents and reliabil- additonal
ity
predict mistakes additional
dont mask mistakes additional
remove irrelevant responses relevant, frequency
G.2.13.2 Reliability
Table G.18:
respondent commitment check frequency, additional

evidences to remove respondent additonal
consistency question relevance, frequency
rewatch additional
ask redundant questions additional, relevance
find people with no hidden agenda additonal
demographic questions relevance
G.3 Response Problems

G.3.1 Low Response Rate
Table G.19:
Codes inclusion Criteria

benefits to respondents frequency, relvance

concenience snowballing frequency, relvance
post and repost to be on top additional information
post in relevant conferences additional information
post on most visited blogs additional information
anonymity additional
company contacts frequency, relevance
biasing negatively frequency, relevance
motivate repodents frequency, relevance
Personal involvement additonal
dont bait additional
mask the feeling of being assessed additional
proper respondnet selection frequency, relevance
specify survey time frequency, additonal
anonymity additional
consistency in sacale usage additonal
no contrdictory questions frequency, relevance
no overlapping questions frequency, relevance
benefits to respondents frequency, relevance
G.3.2 Inconsistency
Table G.20:

track cookies for each question frequency
honestly report frequency,relevance
not an excuse for exclusion additional
be consistent relevance, frequency
mandatoy questions additional
barriers betwee questions additional
redundant questions frequency,relevance
traingulation between questions additional
optional questions additonal
G.3.3 Response Duplication
Table G.21:

Cross check IP frequency, relevance
not a problem in paper-based sur- frequency, relevance

veys
consistency check frequency check, additional
not feasible to know Additional
do course analysis Additional
limited audience additional
one time links frequency,relevance
follow clustering additional
survey tool frequency,relevance
track random IP generators additional
consistent check additional
upfront about requirement additional
cannot eliminate frequency
pause and resume frequency
track cookies additional, frequency
G.3.4 Rewards
Table G.22:

not a good way frequency, relevance
inform at the end additonal
give personally frequency, additional
motivate respondents relevance
biasing research frequency,relevance
feedback frequency,relevance
noise feeding frequency
G.3.5 Sample Size
Table G.23:

researchers choice frequency, relevance
population size additional
exisitng literature frequency, relevance
unclear sample selection frequency, relevance
reseacrher’s choice frequency, relevance
populaiton size freqeuncy, relevance
very problematic frequency, relevance
no representative sample frequency, relevance

based on sampling frequency, relevance
get a heterogeneous sample additional
triangulation additional
less similar responses, more different responses Additional
depends on number of responses and answers obtained Additional
G.3.6 Analysis Techniques
Table G.24:

part of reserch design frequency, additional
Based on RQ’s frequency, relevance
dont go artificial additonal
based on survey questions frequency, additional
based on number of reponses frequency, relevance
depends on patterns that we are frequency, additional
looking in data
non-representative sample analy- additonal
sis yields wrong patterns
researcher’s choice frequency, relevance
select one which is easy to com- Additional
municate
select statistically correct tech- Additional
niques
existing literature relevance
scale-type frequency, relevance

Full Text 02

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Full Text 02

Uploaded by

Copyright:

Available Formats

Thesis no: MSSE-2016-26

Surveys in Software Engineering

Faculty of Computing Internet : www.bth.se

Context: The need for empirical investigations in the software engineer-

Keywords: Software Engineering, Empirical, Problems

We are delighted in taking this opportunity to acknowledge the remarkable sup-

2 Background and Related Work 9

4 Results and Analysis 37

5 Discussions and Limitations 65

6 Conclusions and Future Work 78

A Selected Primary Studies 93

B Convenient Clustering 101

C Interview Questionnaire 102

D Codes and Themes from Interviews 105

E Quality Criteria 107

F Rigor and Relevance 109

1.1 Research Instrument for our study . . . . . . . . . . . . . . . . . . 6

2.1 Eight Steps of a Survey . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1 Research Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.1 Year Classification Graph . . . . . . . . . . . . . . . . . . . . . . 38

D.1 Example Coding Process 1 . . . . . . . . . . . . . . . . . . . . . . 105

2.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1 Search Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1 Primary Study Classification . . . . . . . . . . . . . . . . . . . . . 37

5.1 Problem-Mitigation Strategy Checklist . . . . . . . . . . . . . . . 68

E.1 Quality criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

F.1 Rigor and Relevance for Primary Studies . . . . . . . . . . . . . . 109

G.1 Convenience Sampling . . . . . . . . . . . . . . . . . . . . . . . . 111

Software Engineering is defined as “applying a systematic, disciplined and quan-

Surveys in software engineering vs Surveys in Social sciences:

1.1 Research Gap

1.2 Research Aim

• To clearly identify the problems faced by the software engineering researchers

• To identify what the software engineering researchers learn by conducting

Figure 1.1: Research Instrument for our study

1.4 Research Questions and Instrument

• RQ3: What are the experiences gained by software engineering researchers

– RQ 3.1: What mitigation strategy do the researchers employ to han-

1.5 Structure of Thesis

Figure 1.2: Thesis structure

2.1.1 Research objectives are defined

• What is the targeted respondent population of survey?

• What is the motivation behind Survey?

• What are the resources required to accomplish the Survey’s goals?

Figure 2.1: Eight Steps of a Survey

• Measuring the rate or frequency of a certain characteristic among the pop-

• Evaluating the gravity of a certain characteristic among the population.

• Descriptive Surveys are conducted with the intention of enabling assertions

• Exploratory Surveys helps the researcher’s to look at a particular topic from

2.1.2 Target Audience and Sampling Frame are identified

Surveys which follow questionnaire methods, generally gather information

2.1.3 Sample Plan is Designed

2.1.3.1 Probabilistic Sampling:

2.1.3.2 Non probabilistic Sampling

• Snowball Sampling [39], [61]

2.1.4 Survey instrument is Designed

2.1.5 Survey Instrument is Evaluated

2.1.6 Survey Data is Analyzed

• For close ended questions, quantitative analysis can be employed. Methods

2.1.7 Conclusions Extracted from Survey Data

2.1.8 Survey Documented and Reported

• Research Objectives and Questions.

• Target population description.