Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

PAID VS.

VOLUNTEER WORK IN OPEN SOURCE


Dirk Riehle
Computer Science Department
Friedrich-Alexander University Erlangen-Nrn!erg
"artensstr# $% &'()* Erlangen% +ermany
dirk,riehle#org
-hilipp Riemer
Computer Science Department
Friedrich-Alexander University Erlangen-Nrn!erg
"artensstr# $% &'()* Erlangen% +ermany
contact,philippriemer#de
Carsten .olassa
So/t0are Engineering
R123 Aachen University
Ahornstr# ))% )4(56 Aachen% +ermany
carsten,kolassa#de
"ichael Schmidt
"athematics Department
Friedrich-Alexander University Erlangen-Nrn!erg
"artensstr# $% &'()* Erlangen% +ermany
michael#schmidt#n!g,gmail#com
ABSTRACT
Many open source projects have long become commercial.
This paper shows just how much of open source software de-
velopment is paid work and how much has remained volun-
teer work. Using a conservative approach, we find that about
5! of all open source software development has been paid
work for many years now and that many small projects are
fully paid for by companies. "owever, we also find that any
non-trivial project balances the amount of paid developer
with volunteer work, and we suggest that the ratio of volun-
teer to paid work can serve as an indicator for the health of
open source projects and aid the management of the respec-
tive communities.
#nde$ Terms%&pen source software, empirical software
engineering, volunteer open source, paid open source.
1. INTRODUCTION
7pen source so/t0are development has long !e-
come an important commercial activity# Cor!et et al#8s
analyses o/ recent 9inux kernel releases sho0 that a
large part o/ this 0ork is !eing carried out !y develop-
ers using their companies8 email addresses 0hen su!-
mitting code :;<% implying that they are paid /or the
0ork !y their employers#
3o0ever% 0hile commercial contri!utions to the
9inux kernel have !een 0idely ackno0ledged% little is
kno0n a!out the overall commercial contri!ution to
open source pro=ects in the /orm o/ paid rather than
volunteer development 0ork#
>n this paper 0e sho0 that open source has !oth
strong and !road commercial support !y companies
paying developers to per/orm open source so/t0are de-
velopment# Also% 0e suggest that understanding the re-
lationship !et0een paid and volunteer 0ork in open
source pro=ects 0ill aid pro=ect leaders in steering their
community#
2his 0ork makes the /ollo0ing contri!utions?
@ >t sho0s empirically that open source has
strong commercial support across a !road
range o/ pro=ects%
@ it sho0s the possi!le range o/ healthy paid-to-
volunteer 0ork ratios to help pro=ect steering%
@ it presents measurements o/
A ho0 much open source 0ork is !eing per-
/ormed during 0orking time%
A ho0 open source 0orking time 0ork has
changed over the years%
A the percentage o/ open source program-
mers that are paid programmers% and%
A the distri!ution o/ volunteer vs# paid
0ork across open source pro=ects%
@ using the 9inux kernel speci/ically and a large
sample o/ active open source pro=ects BC)#(((
pro=ectsD#
2he paper is organiEed as /ollo0s? >n Section 4% 0e
descri!e our research approach and de/ine key terms#
>n Section $ 0e present the main empirical results# >n
Section 6 0e discuss our /indings as 0ell as their limi-
tations# >n Section ) 0e revie0 related 0ork in and
Section ; 0e present /inal conclusions#
To appear in the Proceedings of the 47th Hawaii International
Conference on System Science (HICSS 2014). I Press! 2014.
2. RESEARCH APPROACH
2.1 Definitions
1e use the /ollo0ing de/initions?
@ An author o/ some piece o/ code is the creator
o/ the code% i#e#% the original developer#
@ A commit is the process o/ putting some piece
o/ code into a code repository#
@ 'ode repository is used as a synonym /or
configuration management system#
@ A committer is a so/t0are developer 0ho has
the necessary rights to commit to a code
repository#
@ Maintainer is a synonym /or a committer Bas
used in 9inux kernel developmentD#
@ A patch is a code contri!ution su!mitted to a
committer /or inclusion in the pro=ect#
A code contri!ution !y an author indicates 0hen
the code 0as 0ritten and a commit !y a committer in-
dicates 0hen the committer integrated the code into the
code !ase# Author and committer are roles# 2ypically%
in a t0o-step process% an author su!mits a patch and a
committer integrates the patch into the main code !ase#
An author% 0ho is also a committer% can do !oth o/
these steps as one# 2he common case is that an author
is not a committer% hence 0e separate !oth roles in our
analysis o/ the 9inux kernel#
"oreover% 0e de/ine the /ollo0ing time-related
terms using common governmental regulations in
1estern countries?
@ (orking time is the time /rom &am to )pm lo-
cal time% "ondays to Fridays#
@ )pare time is all the time that is not 0orking
time#
ConseFuently% 0orking time and spare time depend
on the time Eone o/ the developer#
2.2 Data Sou!es
1e use t0o data sources? 2he 9inux kernel and the
7hloh pro=ects# 7ur analysis o/ the 9inux kernel devel-
opment 0ork is !ased on its pu!lic con/iguration man-
agement data /ound at .ernel#org :)<# Since 4(()% it
has !een managed using +it% 0hich in contrast to older
con/iguration management systems lets us distinguish
the authors o/ some code% i#e#% the original developer%
/rom the committer o/ the code% 0ho integrated it into
the kernel code !ase# For this 0ork% 0e do0nloaded
the 0hole con/iguration management history /rom
4(() to 4(''#
7ur analysis o/ open source pro=ects is !ased on a
4((* snapshot o/ the 7hloh open source pro=ect data-
!ase :4(<# Using Da//ara8s de/inition o/ Gactive
pro=ectsG :5<% 0e /ind that our data!ase snapshot con-
tains )%''5 active open source pro=ects# Da//ara esti-
mates that there 0ere a!out '*%((( active open source
pro=ects in the 0orld !y August 4((5% so our sample
represents a!out $(H o/ the total active pro=ect popula-
tion at that time# 1hile not 0holly representative /or
open source at its time% it is close nevertheless#
2." Data #ua$it%
Since 4(()% the 9inux kernel con/iguration man-
agement data Busing +itD has !een providing more pre-
cise in/ormation than traditional systems BCIS% svnD#
1e can distinguish !et0een authors and committers%
and 0e can assess the exact time o/ a commit% 0hether
a code contri!ution or code integration#
2he 4((* 7hloh data!ase snapshot is not Fuite as
detailed# Collecting *%5()%''* commits /rom more than
&%'&4 pro=ects% it does not directly provide all relevant
data# Due to the diversity o/ con/iguration management
systems used in open source% 7hloh cannot distinguish
!et0een an author and a committerJ thus% 0e only have
committer data at hand#
Another conseFuence o/ the variety o/ con/igura-
tion management systems is that 7hloh stores all com-
mit timestamps using U2C% ignoring the original time
Eone o/ a developer# 3o0ever% /or this 0ork% 0e need
the local time o/ a commit and hence the time Eone#
1e address this pro!lem !y using location data that
7hloh provides to determine the timeEone o/ individual
developers# Ky hand% 0e identi/ied )*( committers
Bout o/ 6)%*5( distinct committer idsD% constituting
'#$H o/ the committer population# 2hose identi/ied
committers per/ormed ;6;%5() o/ the * million com-
mits% totaling a!out *H o/ the 0ork !eing per/ormed#
1e call the set o/ identi/ied committers the known
committer set or the known committers% in short#
1hile 0e can argue that the original 7hloh data set
is close to !eing representative o/ open source% the re-
duced num!er o/ kno0n committers may not !e# For
one% 0e identi/ied mostly committers o/ a!ove average
activity B'#$H o/ the population per/orming *H o/ the
0orkD% so there is some !ias# 2hus% 0e need to under-
stand 0hether this !ias is relevant /or the analysis pre-
sented in this paper#
For this% 0e ranked committers !y num!er o/ com-
mits and then !inned the resulting committer seFuence
into 4; di//erent !ins# 2he 4; !ins o/ kno0n commit-
ters all have close-to-eFual total num!ers o/ commits
and 0ere suggested !y R% the statistical analysis tool
and environment 0e are using# 2hus% the amount o/
0ork in each !in is a!out the same% !ut 0as per/ormed
!y very di//erent num!ers o/ people#
4
Assuming paid 0ork to !e 0ork per/ormed "on-
Fri /rom &am-)pm Bsee !elo0 /or de/inition and dis-
cussionD% 0e can calculate the percentage o/ total 0ork
per/ormed that is paid 0ork# Figure ' sho0s this paid-
0ork-percentage Bo/ total 0orkD !y committer !in#
1ith a null hypothesis that no trend B!iasD is appar-
ent Band alternative hypotheses that there is a !iasD% us-
ing a t-test and assuming a normal distri!ution% 0e can-
not re=ect the null hypothesis at the &)H con/idence
level# 1e there/ore have no reason to assume that re-
ducing the overall committer set to the kno0n commit-
ter set introduces a !ias that impacts our analysis B!ut
can also not exclude itD#
>n addition to the kno0n committers set% 0e de/ine
an e$tended committers set or e$tended committers% in
short# 2he extended committers set comprises all com-
mitters in the original 7hloh data!ase% 0here the com-
mitter timeEone is either kno0n or assumed# 2he as-
sumed timeEone is determined using the /ollo0ing
heuristic? 1e /irst condense all commits o/ a commit-
ter in the data!ase into a single 0eek# For all commit-
ters 0here 0e do not kno0 the time Eone% 0e match
their 0eek on an hourly !asis 0ith the 0eeks o/ the
kno0n committers set# Using a least-sFuares approach%
0e identi/y the time Eone that has a minimum di//er-
ence to the esta!lished data# 2his provides us 0ith the
most pro!a!le time Eone /or the not-kno0n committers
so that 0e can determine the local time /or each com-
mit Bignoring 0ork 0hile travelingD#
2he kno0n committers set provides a sharp picture
o/ time-Eone-!ased 0eekly 0ork activities% including
paid and volunteer 0ork% 0hile the extended commit-
ters set provides a richer Bmore dataD% !ut more /uEEy
picture o/ the 0eekly 0ork activities o/ committers#
>n the /ollo0ing% 0e present !oth the kno0n and
extended committer data side-!y-side#
2.& Data Inte'etation
1e 0ould like to understand ho0 much 0ork in
open source is paid 0ork and ho0 it is distri!uted
across a 0ide range o/ open source pro=ects#
As introduced a!ove% 0e assume that 0ork per-
/ormed during regular 0orking hours B0eekdays% &am-
)pmD is 0ork paid /or !y companies or paid /or
through sel/-sponsorship o/ the developer#
7ne possi!le o!=ection to this is that not all cultures
have 0orking 0eeks o/ "on-Fri% &am-)pm# For exam-
ple% some cultures 0ork on Saturdays#
Figure 4 sho0s the distri!ution o/ commits over the
di//erent timeEones o/ this planet#
All countries% mostly >slamic countries% that 0ork
on Saturdays% sho0 very little open source activity Ban
interesting /act in itsel/D# 2hus% 0e /eel sa/e to proceed
0ith our de/inition o/ 0eekend and 0eekdays#
Also% one may argue that many people have 0ork-
ing hours outside o/ &am-)pm and that 0e are too con-
servative in our estimate then# For one% 0e8d rather es-
timate paid 0ork conservatively% !ut 0e also shi/ted
0orking hour de/initions around% to *am-6pm% '(am-
;pm% *am-*pm% etc# 0ith no signi/icant change in the
results# 2hus% 0e decided to stick to the most common
0orkday de/inition#
". PAID WORK IN OPEN SOURCE
".1 Tota$ Wo( )uin* Wo(in* Ti+e
First% 0e investigate ho0 much time is !eing spent
on open source during regular 0orking hours#
Figures $-; sho0 the 0ork0eek on an hourly !ase
/or the years 4(()-4('' /or the 9inux kernel and /or
the years 4(((-4((5 /or the 7hloh data#
$
Figure 1. Paid-work-percentage of total work for 26
equal-commit-numbers committer bins (higher bin
number means more committers in bin)
Figure 2. istribution of percentage of total com-
mits o!er time "ones (light gra# $ known commit-
ters% dark gra# $ e&tended committers)
From perusing Figures $-;% 0e can gain a num!er
o/ insights already# 2he most o!vious insights are that
@ there is a clear di//erence !et0een 0ork days
and 0eekend days? a!out t0ice as much 0ork
is !eing done on a 0ork day as is on a 0eek-
end dayJ and that
@ most 0ork is !eing per/ormed during regular
0aking and 0orking hours% i#e#% /rom &am to
)pm% even on the 0eekends and that
@ developers take lunch and dinner !reaks 0ith
0ork picking up again /or a /e0 hours a/ter
dinner !e/ore Fuieting do0n /or the night#
1ith a 0orking time de/inition o/ "on-Fri% &am-
)pm% 2a!le ' sho0s the percentages o/ all code contri-
!utions respectively all commits made during 0orking
time /or the 9inux kernel and the 7hloh pro=ects?
About 50% of all work contributed to open source
software projects has been provided Monday to
Friday, between 9am and 5pm
".2 Ten)s in Wo( )uin* Wo(in* Ti+e
Next% 0e look at ho0 the 0orking time 0ork spent
on open source has changed over the years#
Figures 5-'( sho0 ho0 the percentage o/ commits
made during 0orking time changed over the years# >n
Figures 5-'(% each data point is the percentage o/
0orking time 0ork /or the given 0eek# 2he moving
average is a 97ESS curve% and the grayed-out space
around it indicates the &)H con/idence interval /or a
data point# 2he 0idening o/ that space at the !ound-
aries o/ the graph is an arti/act o/ not using additional
data !eyond those !oundaries#
6
Figure '. (umber of commits b# authors (when
code is de!eloped) per hour counted o!er all
weeks 2))*-2)11 for the +inu& ,ernel
Figure -. (umber of commits b# committers
(when code is integrated) per hour counted o!er
all weeks 2))*-2)11 for the +inu& ,ernel
Figure *. (umber of commits b# known commit-
ters per hour counted o!er all weeks 2)))-2)). for
the /hloh pro0ects
Figure 6. (umber of commits b# e&tended com-
mitters per hour counted o!er all weeks 2)))-2)).
for the /hloh pro0ects
2he 9inux kernel data in Figures 5-* sho0s signi/i-
cant /luctuations% 0hich re/lect the rapid release cycle
o/ the pro=ect# A release is per/ormed a!out every *(
days :)<# 2he process is highly regulated 0ith de/ined
time periods o/ increasing or decreasing activity# 2he
activity is highest during the t0o 0eek merge 0indo0%
a/ter 0hich sta!iliEation kicks in and activity decreases
rapidly# 2hus% there is no apparent annual schedule#
2he 7hloh data% a more diverse data set% also sho0s
some /luctuations% !ut much less so# 2he main annual
dips /rom 4((4 on on0ards occur during Christmas
0eek% 0here it seems naturally to have a drop in 0ork-
ing time 0ork relative to spare time 0ork Bc/# Section 4
/or the dominance in open source activity !y 1estern
culturesD#
9ooking at the 9inux kernel data in Fig# 5-* again%
0e can see a clear up0ard trend /or !oth authors and
committers# 2hus% /rom around 4((5 through to 4('(
increasing amounts o/ 0orking time 0ork in relation to
spare time 0ork 0as !eing spent on the 9inux kernel#
Starting 4('(% this gro0th largely plateaued#
>n contrast to the 9inux kernel data% the 7hloh data
set sho0s a straight line# Using a likelihood ratio test
BF-2estD% 0ith a straight line as the null hypothesis% 0e
have to re=ect any other hypothesis Bat a con/idence
level o/ &*HD% and conclude that no gro0th occurred#
2he percentage o/ 0orking time 0ork spent on the
7hloh pro=ects has remained /lat#
3o0ever% during these time /rames% the total under-
lying data sets have gro0n su!stantially# 2he 9inux
kernel is gro0ing at a polynomial rate :'$< :4$< 0hile
the com!ined 7hloh pro=ect data% and presuma!ly all
o/ open source% is gro0ing at a near-exponential rate
:*<# 2hus% /or the 7hloh pro=ect data% the year 4(((
data is much more sparse than the year 4((5 data# Still%
as 0e =ust sho0ed% no 0orking time gro0th occurred in
our open source pro=ect data% and the 0orking time per-
centage o/ total 0ork per/ormed stayed /lat#
1ith gro0ing market share :4)< the economic sig-
ni/icance o/ the 9inux kernel has only !een increasing%
so it is not surprising to see gro0th in 0orking time
0ork !eing spent on it# 1hat is surprising is that open
source in total Busing the 7hloh data as a proxyD% 0hich
has !een gro0ing near-exponentially% has maintained a
constant ratio o/ 0orking time to spare time 0ork#
2hus% /or every pro=ect 0ith increasing economic sig-
ni/icance that received more paid development 0ork%
ne0 pro=ects have !een started 0ith less 0orking time
engagement% !ut possi!ly gro0ing into it# >t is too early
to speculate a!out a sta!le state o/ open source in terms
o/ a sta!le ratio o/ paid 0orking time 0ork to volunteer
spare time 0ork contri!utions% !ut open source appears
to have reached at least an intermediate sta!le state#
2hus% even 0ith underlying near-exponential gro0th%
0e expect this ratio to remain sta!le /or no0#
"." De,e$o'e C$assifi!ation
1hile overall 0eekly 0orking times and 0orking
time trends are interesting% 0e also 0ould like to kno0
ho0 many developers are earning their living !y per-
/orming open source so/t0are development# 2hus% 0e
no0 look at individual developers and ho0 much o/
their code is 0ritten during 0orking time% i#e#% to 0hat
extent they are !eing paid /or their 0ork#
Fig# ''-'4 sho0 the distri!ution o/ developers over
the percentage o/ 0ork that is paid 0ork B0orking time
0orkD /or the 9inux kernel and 7hloh pro=ects% respec-
tively# >t has !een counted over all years# -lease note
that the y-axis is log-scale and that 0e are talking
a!out contri!utors no0% not =ust contri!utions# 7nly the
extended committer data is sho0n% !ecause the kno0n
set 0as too small to provide meaning/ul data /or this
particular discussion#
Koth the 9inux kernel and 7hloh pro=ects are domi-
nated !y the extremes? Developers doing all their 0ork
during spare time and developers doing all their 0ork
during 0orking time# 2a!le 4 sho0s the dominance o/
these extremes# 3ere% 0e de/ine paid developers to !e
those 0ho per/ormed &)H or more o/ their commits
during 0orking time% and volunteer developers to !e
those 0ho per/ormed &)H and more o/ their commits
during spare time% outside the 0eekdays &am-)pm time
/rame#
2hus% at least 4$#')H o/ all authors 0orking on the
9inux kernel% totaling '%*(5 developers% are paid /or
their 0ork# ''#4*H o/ all committers 0orking on the
9inux kernel% totaling $5 developers% are paid /or their
0ork as 0ell# +iven the economic signi/icance o/ the
9inux kernel% one 0ould expect more committers
BmaintainersD to !e paid /or their 0ork than authors# A
possi!le explanation /or not con/irming this assump-
tion is the long-term engagement o/ committers that
)
1able 1. Percentage of work performed during
working time (2am-*pm% 3on-Fri) for +inu& (2))*-
2)11) and the /hloh pro0ects (2)))-2)).)
-ercentage o/ total commits
made during 0orking time
9inux
.ernel
author
6)#((H
committer
)'#$;H
7hloh
-ro=ects
kno0n
committer
65#$H
Bmin# 4*%4H% max# )*%*HD
extended
committers
))#6H
Bmin# $;#)H% max# )&#)HD
;
Figure .. ata and trend line for percentage of com-
mits made b# authors to the +inu& ,ernel during
working time for a gi!en week
Figure 4. ata and trend line for percentage of com-
mits made b# committers to the +inu& ,ernel dur-
ing working time for a gi!en week
Figure 2. ata and trend line for percentage of com-
mits made to the /hloh pro0ects during working
time for a gi!en week% known committers
Figure 1). ata and trend line for percentage of
commits made to the /hloh pro0ects during work-
ing time for a gi!en week% e&tended committers
Figure 11. (umber of authors and committers with
a gi!en a!erage percentage of paid work for the
#ears 2))*-2)11 for the +inu& ,ernel
Figure 12. (umber of committers with a gi!en a!er-
age percentage of paid work for the #ears 2)))-
2)). for the /hloh pro0ects% e&tended committers
motivates many to keep 0orking outside traditional
0orking time !oundaries% 0hich makes them /all out-
side our conservative de/inition o/ paid 0ork#
As to the 7hloh pro=ects% '5#&5H o/ all extended
committers% totaling *%466 developers% are !eing paid
/or their 0ork#
Common to these paid developers is that they do
not 0ork on open source pro=ects in their spare time%
i#e#% /all outside the !oundaries o/ the traditional open
source enthusiast and volunteer categories#
".& Po-e!t C$assifi!ation
Finally% not only are 0e interested in 0hat percent-
age o/ developers are !eing paid to 0ork on open
source% 0e also 0ould like to kno0 ho0 they are allo-
cated to pro=ects# >t is /air to assume that some pro=ects
get more commercial attention than others# 2hus% 0e
investigate 0hich pro=ects receive this attention# 2he
9inux kernel pro=ect is a single Bal!eit largeD pro=ect%
so in this Section 0e are looking at the 7hloh pro=ects
only#
1e /ind that there is a large num!er o/ small B'-4
developersD pro=ects 0ith a long-tail distri!ution o/ siEe
that are /ully paid /or !y companies? All developers%
/reFuently =ust one% are making their contri!utions only
during 0orking time# 2he top ) smallest pro=ects in our
sample% /ully paid /or% are called su!tle% 1e!-A%
ShAR-E% gst-openmax% and phpES-#
2hese smallest pro=ects have lo0 commit num!ers
Bin the '((8s only% sometimes lessD# >nspection !y hand
sho0s that many times% code is !eing committed in
large chunks# 2his is uncommon in traditional open
source so/t0are development% 0here the most /reFuent
commit siEe is one line o/ code :';<# 2hus% it is sa/e to
assume that these small !ut still active pro=ects are !e-
ing developed in-house and are !eing provided in a
snapshot-style to the pu!lic at appropriate times#
2he largest pro=ects in our sample maintain a paid-
/or developer percentage in the '(-4(H range# Five ex-
ample pro=ects o/ this siEe are +N7"E% Net!eans >DE%
Eclipse -lat/orm% .DE% and .I"# 2hese are 0ell-
kno0n open source pro=ects that are !eing developed
in an open colla!orative style% and the paid developer
population o/ these pro=ects can serve as an indicator o/
healthy pu!lic open source pro=ects#
&. DISCUSSION O. .INDIN/S
>n this 0ork% 0e are making the assumption that
0ork per/ormed during 0orking time hours B"on-Fri%
&am-)pm% in the resp# local time EoneD is paid 0ork#
2his time /rame has !een de/ined as 0orking time !y
most 1estern countries and thus 0e /eel =usti/ied in
considering it paid 0ork# Even students typically have
to go to class during that time and spending it on open
source development implies economic sel/-sponsorship
as it delays graduation and hence !orro0s against /u-
ture income#
2he time o/ a commit is not the actual time the
0ork 0as per/ormedJ it is the point o/ time 0hen it is
committed Bmade pu!licD# 2hus% the actual 0ork is per-
/ormed right up to that point in time# >n other 0ork 0e
sho0 that the median time !et0een t0o commits o/ the
same open source developer is a!out '((min :')<#
1hen 0e ran the analyses 0ith shi/ted 0orking time
/rames% 0e /ound little di//erence to the num!ers /rom
the &am-)pm time /rame and decided to ignore this im-
precision# 1e !elieve it has no e//ect on the results#
2here is a cultural !ias implied !y these 0orking
hours% as some countries 0ork on other days than "on-
day to Friday# 7ur analysis o/ contri!utions !y time
Eone BSection 4D demonstrates that open source so/t-
0are development is strongly dominated !y 1estern
societies% as 0itnessed !y a sharp drop in activities
around Christian holidays like Easter or Christmas#
7ne might argue that the 7hloh data is getting old#
7ne advantage o/ the 7hloh data is that it dra0s
!roadly on the total population o/ availa!le open
source pro=ects# >t 0as seeded !y the original providers
o/ the 7hloh service 0ith the most popular open source
pro=ects B!y Lahoo search engine rankingD and has
since !een maintained !y hand !y the respective
providers o/ open source pro=ects# Unlike other data
sources% the 7hloh data it is much less !iased to any
5
1able 2. istribution of !olunteer (spare time) to paid (working time) de!elopers% binned% o!er all #ears
Vo$untee 0S'ae Ti+e1 Wo( 2i3e) Pai) 0Wo(in* Ti+e1 Wo(
!orkin" #ime !ork % 0% 00$%%5% 50$%%9&99% 95%%9999% $00%
9inux .ernel
author $$#(;H (#$)H 6$#6)H (#'5H 44#&*H
committer ''#)&H $#()H 56#(&H '#)4H &#5;H
7hloh -ro=ects
kno0n committers 4#6'H '#4'H &)#;&H (#((H (#;&H
extended committers 5#(6H (#6H 56#)*H (#6'H '5#);H
particular su!group o/ open source pro=ects# >/ there is
a !ias% it is a !ias to0ards active 0ell-0orking pro=ects%
0hich happen to !e those 0e are interested in#
Still% it 0ould !e desira!le to have ne0 data# Un/or-
tunately% there is no alternative at present? No pu!lic
access to the ne0 7hloh data is availa!le on the level
o/ detail reFuired% other ne0er data sources are su!-
stantially more !iased to0ards particular su!groups o/
open source% and it is prohi!itively expensive /or a re-
search group to !uild a comprehensive and representa-
tive data set /or all o/ open source B0hich is 0hy no-
!ody has done it yetD# ConseFuently% this is the !est re-
search 0e can do /or no0#
7ur de/inition o/ Gpaid developerG is highly conser-
vative# >t is a person 0ho does &)H or more o/ their
0ork during 0orking time hours only# >t represents a
regular developer 0ith a regular li/e-style and presum-
a!ly no interest in open source so/t0are development
!eyond their 0ork# 2here are important and common
exceptions to this type o/ person?
@ "any paid developers are open source enthu-
siasts and keep 0orking outside regular 0ork-
ing hours#
@ 2he so/t0are industry is !y and large not
unioniEed and tends to ignore regular 0orking
hours#
ConseFuently% our estimate presents a lo0er !ound-
ary /or the num!er o/ paid developers#
4. RELATED WORK
Since 4((*% Cor!et et al# have !een providing sta-
tistics a!out the 9inux kernel development annually
:;<# 2hey investigate topics like evolution o/ the re-
lease /reFuency as 0ell as num!er o/ changes intro-
duced per release# >n addition% they provide a list o/ the
most-active companies supporting the development o/
the 9inux kernel and list the percentage o/ commits
per/ormed !y each o/ them# Similar to the 0ork pre-
sented in this paper% the reports distinguish !et0een au-
thors and committers# Cor!et et al# consider a contri!u-
tion commercial% i/ it is made using a company8s email
address to identi/y the contri!utor# 2hey also maintain
a separate mapping list /or regular contri!utors that al-
lo0s tracking a person even i/ he or she changes the
employer# 2hey /ind that at least 5)H o/ all contri!u-
tions since 4(() can !e assigned to company employ-
ees#
A GReport on the >nternational Status o/ 7pen
Source So/t0are 4('(G /inds that the U#S#A#% Australia%
and the 1est European countries lead the development
and adoption o/ open source so/t0are :'&<# 2his is in
line 0ith our o!servation that 0eekly 0ork as 0ell as
holiday drops line up 0ell 0ith 1estern cultural 0ork
patterns#
+od/rey and 2u studied the 9inux kernel gro0th in
4((( :'$<% and Ro!les et al#% /ollo0ing up on +odrey
and 2u% studied the 9inux kernel gro0th in 4(() :4$<#
Ro!les et al# provide a good summary a!out 0hat
analyses 0ere made in the area o/ evolution research o/
open source so/t0are pro=ects# 2hey study '4$ sta!le
and 6)5 development releases up to April 4(() Bthe
point in time 0here the data /or our analysis startsD
and% !y also counting the num!er o/ uncommented
lines o/ code% con/irm a super-linear gro0th rate% that is
even more signi/icant than already sho0n in the pre-
ceding paper# At the same time% the authors point out
that not all 0ork in a pro=ect is programming% !ut that
also many tasks% such as testing% are done outside o/
the code repository and thus are hard to measure# >nde-
pendently o/ that paper% a study !y Succi et al# a!out
the gro0th in Gli!reG Bopen sourceD so/t0are systems%
con/irms this super-linearity /or the 9inux kernel :46<#
Koth the proceedings o/ >CSE Bthe international
con/erence on so/t0are engineeringD and "SR Ba con-
/erence on mining so/t0are repositoriesD as 0ell as
other con/erences and =ournals !y no0 provide exten-
sive literature on empirical analyses o/ open source and
closed source pro=ects# An example classic open source
studies is :'5< !y "ockus et al# 2opics o/ interest range
/rom !ug prediction :4< :6< :'*< :4;< through engineer-
ing practices :'< :4'< :44<% social structures and com-
munity management :$< :'6<% so/t0are evolution :''<
:'4<% all the 0ay to issues o/ glo!al colla!oration and
distri!uted development :4<# Research methods itsel/%
mostly on data Fuality issues% are also analyEed :&<
:45<# A /e0 papers compare open source 0ith commer-
cial so/t0are development :'<# 3o0ever% to the !est o/
our kno0ledge none o/ this 0ork addresses the issue o/
paid vs# volunteer 0ork as discussed in this paper#
1e did not /ind any research that analyEes the com-
mercialiEation o/ open source so/t0are pro=ects !y in-
vestigating 0hen 0hat 0ork 0as done# A reason might
!e that modern version control systems% such as +it
and "ercurial% have allo0ed us to access commit his-
tory data in detail% including time Eone in/ormation%
only recently# 7lder systems% such as CIS or svn only
store a single U2C time stamp per commit#
5. CONCLUSIONS
2his paper analyEes to 0hat extent open source
so/t0are development has !ecome commercial paid-/or
so/t0are development# A paid contri!ution is de/ined
as having !een contri!uted during regular B1esternD
0orking hours% "on-Fri% &am-)pm# Ky studying the
9inux kernel /rom 4(() to 4('' and the 7hloh
*
pro=ects% a large set o/ more than )%((( active open
source pro=ects% /rom 4((( to 4((5% 0e /ind that a!out
)(H o/ all contri!utions to pro=ects in our sample pop-
ulation have !een paid 0ork# "oreover% no change in
this percentage has occurred /or the 7hloh pro=ects%
suggesting that the ratio o/ paid-to-volunteer 0ork is
sta!le in open source /or no0#
+oing one step /urther% 0e /ind that '(-4(H o/ the
developers engaged in our sample pro=ects per/orm de-
velopment 0ork only during 0orking hours% suggesting
that they are /ully paid /or their 0ork# Unlike tradi-
tional volunteers% they per/orm no 0ork on our sample
pro=ects outside this time-/rame% making our estimate a
conservative one# 1e also /ind that many small
pro=ects are /ully paid /or% and that larger pro=ects have
a healthy mixture o/ paid and volunteer 0ork in the '(-
4(H range as 0ell# >n /uture 0ork% 0e intend to ana-
lyEe the relationship !et0een these categories o/ devel-
opers% company engagement% and pro=ect success#
RE.ERENCES
:'< C# Kird% A# +ourley% -# Devan!u GDetecting -atch
Su!mission and Acceptance in 7SS -ro=ects%G in
-roceedings o/ the Fourth >nternational 1orkshop on
"ining So/t0are Repositories B"SR 8(5D% pp# 4;#
:4< C# Kird% N# Nagappan% -# Devan!u% 3# +all% K#
"urphy% GDoes distri!uted development a//ect so/t0are
FualityM? An empirical case study o/ 1indo0s Iista%G
in Communications o/ the AC"% vol# )4% no# *% pp# *)-
&$#
:$< C# Kird% D# -attison% R# D8SouEa% I# Filkov% -#
Devan!u% G9atent social structure in open source
pro=ects%G in S>+S7F2 8(*NFSE-'; -roceedings% 4((*%
pp# 46-$)#
:6< E# Capra% GAn Empirical Study on the Relationship
Ket0een So/t0are Design Ouality% Development E//ort
and +overnance in 7pen Source -ro=ects%G in >EEE
2ransactions on So/t0are Engineering% vol# $6% no# ;
B4((*D% pp# 5;)-5*4#
:)< P# Cor!et% G3o0 to participate in the 9inux
community%G 4((*% at
http?NN000#linux/oundation#orgNcontentNho0-
participate-linux-community#
:;< P# Cor!et% +# .roah-3artman% and A# "c-herson%
G9inux .ernel Development Q 3o0 /ast it is going%
0ho is doing it% 0hat they are doing% and 0ho is
sponsoring itMG% 4('4% /rom
http?NNgo#linux/oundation#orgN0ho-0rites-linux-4('4#
:5< C# Da//ara% GEstimating the num!er o/ active and sta!le
F97SS pro=ectsG% 4((5% /rom
http?NNro!ertogaloppini#netN4((5N(*N4$Nestimating-the-
num!er-o/-active-and-sta!le-/loss-pro=ects BArchived
at http?NN000#0e!citation#orgN;&t*U"(lRD#
:*< A# Deshpande and D# Riehle% G2he total gro0th o/ open
source%G in -roceedings o/ the /ourth Con/erence on
7pen Source Systems B7SS 4((*D% Springer Ierlag%
4((*% pp#'&5Q4(&#
:&< "# Fischer% "# -inEger% 3# +all% G-opulating a release
history data!ase /rom version control and !ug tracking
systems%G in -roceedings o/ the >nternational
Con/erence on So/t0are "aintenance B>CS" 4(($D%
pp# 4$-$4#
:'(< F97SSmole# Colla!orative collection and analysis o/
/reeNli!reNopen source pro=ect data% 4('4% /rom
http?NN/lossmole#orgN BArchived at
http?NN000#0e!citation#orgN;&t&UhDSRD#
:''< K# Fluri% "# 1ursch% 3# C# +all% GDo Code and
Comments Co-EvolveM 7n the Relation !et0een
Source Code and Comment Changes%G in -roceedings
o/ the '6th 1orking Con/erence on Reverse
Engineering B1CRE 4((5D% pp# 5(-5&#
:'4< 3# C# +all% "# 9anEa% GSo/t0are evolution? analysis
and visualiEation%G in -roceedings o/ the 4*th
>nternational Con/erence on So/t0are Engineering
B>CSE 4((;D% pp# '())-'();#
:'$< "# 1# +od/rey and O# 2u% GEvolution in open source
so/t0are? a case study%G in -roceedings o/ the
>nternational Con/erence on So/t0are "aintenance
B>CS"D% 4(((% pp#'$'Q'64#
:'6< I# .# +ur!ani% A# +arvert% P# D# 3er!sle!% GA case
study o/ open source tools and practices in a
commercial setting%G in -roceedings o/ the Fi/th
1orkshop on 7pen Source So/t0are Engineering% pp#
'-;#
:')< C# .olassa% D# Riehle% "#A# Salim# G2he empirical
commit /reFuency distri!ution o/ open source
pro=ects#G >n -roceedings o/ the 4('$ >nternational
Symposium on 7pen Colla!oration B1ikiSym S
7penSym 4('$D% AC"% 4('$% paper C6#
:';< C# .olassa% D# Riehle% "#A# Salim# TA "odel o/ the
Commit SiEe Distri!ution o/ 7pen Source#U >n
-roceedings o/ the $&th >nternational Con/erence on
Current 2rends in 2heory and -ractice o/ Computer
Science BS7FSE" 4('$D% 9NCS 556'# Springer
Ierlag% 4('$% pp)4-;;#
:'5< A# "ockus% R# 2# Fielding% and P# D# 3er!sle!% GA case
study o/ open source so/t0are development? 2he
Apache server%G in >CSE 4((( -roceedings% 4(((% pp#
4;$Q454#
:'*< N# Nagappan% 2# Kall% GUsing So/t0are Dependencies
and Churn "etrics to -redict Field Failures? An
Empirical Case Study%G in First >nternational
Symposium on Empirical So/t0are Engineering and
"easurement BESE" 4((5D% pp# $;6-$5$#
:'&< National 7pen Source So/t0are 7!servatory% GReport
on the >nternational Status o/ 7pen Source So/t0are%G
4('( BArchived at http?NN000#0e!citation#orgN
;&t&$2c-FD#
:4(< 7hloh% the open source net0ork% 4('4% online at
http?NN000#ohloh#netN BArchived at http?NN000#
0e!citation#orgN;&t&!y9C0D#
:4'< P# 1# -aulson% +# Succi% A# E!erlein% GAn empirical
study o/ open-source and closed-source so/t0are
products%G in 2ransactions on So/t0are Engineering%
vol# $(% no# 6 BApril 4((6D% pp# 46;-4);#
:44< -# C# Rig!y% D# "# +erman% "#-A# Storey% G7pen
source so/t0are peer revie0 practices? a case study o/
the apache server%G in -roceedings o/ the $(th
&
>nternational Con/erence on So/t0are Engineering
B>CSE 4((*D% >EEE% pp# )6'-))(#
:4$< +# Ro!les% P# P# Amor% P# "# +onEaleE-Karahona% and >#
3erraiE% GEvolution and gro0th in 9arge li!re so/t0are
pro=ects%G in -roceedings o/ the eigth international
0orkshop on -rinciples o/ So/t0are Evolution% 4(()%
pp# ';)Q'56#
:46< +# Succi% P# -aulson% and A# E!erlein% G-reliminary
results /rom an empirical study on the gro0th o/ open
source and commercial so/t0are products%G in EDSER-
$ 1orkshop% co-located 0ith >CSE% 4(('#
:4)< S# P# Iaughan-Nichols% 9inux servers keep gro0ing%
1indo0s and Unix keep shrinking# VDnet# BArchived
at http?NN000#0e!citation#orgN;&00iL12&D
:4;< 2# Vimmermann% N# Nagappan% G-redicting de/ects
using net0ork analysis on dependency graphs%G in
-roceedings o/ the $(th >nternational Con/erence on
So/t0are Engineering B>CSE 4((*D% pp# )$'-)6(#
:45< 2# Vimmermann% -# 1eiWger!er% A# Veller% G"ining
version histories to guide so/t0are changes%G in
-roceedings o/ the 4;th >nternational Con/erence on
So/t0are Engineering B>CSE 4((6D% pp# );$-)54#
'(

You might also like