This document analyzes data on paid versus volunteer work in open source software development. The key findings are:
- About 50% of open source software development has been paid work for many years based on a conservative analysis of data from the Linux kernel and a large sample of over 50,000 open source projects.
- While commercial contributions to projects like the Linux kernel are well known, this paper provides broader evidence that open source has strong commercial support across many projects.
- Most open source projects balance paid developer work with volunteer work, and the ratio of volunteer to paid work can serve as an indicator for the health of an open source project and community.
This document analyzes data on paid versus volunteer work in open source software development. The key findings are:
- About 50% of open source software development has been paid work for many years based on a conservative analysis of data from the Linux kernel and a large sample of over 50,000 open source projects.
- While commercial contributions to projects like the Linux kernel are well known, this paper provides broader evidence that open source has strong commercial support across many projects.
- Most open source projects balance paid developer work with volunteer work, and the ratio of volunteer to paid work can serve as an indicator for the health of an open source project and community.
This document analyzes data on paid versus volunteer work in open source software development. The key findings are:
- About 50% of open source software development has been paid work for many years based on a conservative analysis of data from the Linux kernel and a large sample of over 50,000 open source projects.
- While commercial contributions to projects like the Linux kernel are well known, this paper provides broader evidence that open source has strong commercial support across many projects.
- Most open source projects balance paid developer work with volunteer work, and the ratio of volunteer to paid work can serve as an indicator for the health of an open source project and community.
Dirk Riehle Computer Science Department Friedrich-Alexander University Erlangen-Nrn!erg "artensstr# $% &'()* Erlangen% +ermany dirk,riehle#org -hilipp Riemer Computer Science Department Friedrich-Alexander University Erlangen-Nrn!erg "artensstr# $% &'()* Erlangen% +ermany contact,philippriemer#de Carsten .olassa So/t0are Engineering R123 Aachen University Ahornstr# ))% )4(56 Aachen% +ermany carsten,kolassa#de "ichael Schmidt "athematics Department Friedrich-Alexander University Erlangen-Nrn!erg "artensstr# $% &'()* Erlangen% +ermany michael#schmidt#n!g,gmail#com ABSTRACT Many open source projects have long become commercial. This paper shows just how much of open source software de- velopment is paid work and how much has remained volun- teer work. Using a conservative approach, we find that about 5! of all open source software development has been paid work for many years now and that many small projects are fully paid for by companies. "owever, we also find that any non-trivial project balances the amount of paid developer with volunteer work, and we suggest that the ratio of volun- teer to paid work can serve as an indicator for the health of open source projects and aid the management of the respec- tive communities. #nde$ Terms%&pen source software, empirical software engineering, volunteer open source, paid open source. 1. INTRODUCTION 7pen source so/t0are development has long !e- come an important commercial activity# Cor!et et al#8s analyses o/ recent 9inux kernel releases sho0 that a large part o/ this 0ork is !eing carried out !y develop- ers using their companies8 email addresses 0hen su!- mitting code :;<% implying that they are paid /or the 0ork !y their employers# 3o0ever% 0hile commercial contri!utions to the 9inux kernel have !een 0idely ackno0ledged% little is kno0n a!out the overall commercial contri!ution to open source pro=ects in the /orm o/ paid rather than volunteer development 0ork# >n this paper 0e sho0 that open source has !oth strong and !road commercial support !y companies paying developers to per/orm open source so/t0are de- velopment# Also% 0e suggest that understanding the re- lationship !et0een paid and volunteer 0ork in open source pro=ects 0ill aid pro=ect leaders in steering their community# 2his 0ork makes the /ollo0ing contri!utions? @ >t sho0s empirically that open source has strong commercial support across a !road range o/ pro=ects% @ it sho0s the possi!le range o/ healthy paid-to- volunteer 0ork ratios to help pro=ect steering% @ it presents measurements o/ A ho0 much open source 0ork is !eing per- /ormed during 0orking time% A ho0 open source 0orking time 0ork has changed over the years% A the percentage o/ open source program- mers that are paid programmers% and% A the distri!ution o/ volunteer vs# paid 0ork across open source pro=ects% @ using the 9inux kernel speci/ically and a large sample o/ active open source pro=ects BC)#((( pro=ectsD# 2he paper is organiEed as /ollo0s? >n Section 4% 0e descri!e our research approach and de/ine key terms# >n Section $ 0e present the main empirical results# >n Section 6 0e discuss our /indings as 0ell as their limi- tations# >n Section ) 0e revie0 related 0ork in and Section ; 0e present /inal conclusions# To appear in the Proceedings of the 47th Hawaii International Conference on System Science (HICSS 2014). I Press! 2014. 2. RESEARCH APPROACH 2.1 Definitions 1e use the /ollo0ing de/initions? @ An author o/ some piece o/ code is the creator o/ the code% i#e#% the original developer# @ A commit is the process o/ putting some piece o/ code into a code repository# @ 'ode repository is used as a synonym /or configuration management system# @ A committer is a so/t0are developer 0ho has the necessary rights to commit to a code repository# @ Maintainer is a synonym /or a committer Bas used in 9inux kernel developmentD# @ A patch is a code contri!ution su!mitted to a committer /or inclusion in the pro=ect# A code contri!ution !y an author indicates 0hen the code 0as 0ritten and a commit !y a committer in- dicates 0hen the committer integrated the code into the code !ase# Author and committer are roles# 2ypically% in a t0o-step process% an author su!mits a patch and a committer integrates the patch into the main code !ase# An author% 0ho is also a committer% can do !oth o/ these steps as one# 2he common case is that an author is not a committer% hence 0e separate !oth roles in our analysis o/ the 9inux kernel# "oreover% 0e de/ine the /ollo0ing time-related terms using common governmental regulations in 1estern countries? @ (orking time is the time /rom &am to )pm lo- cal time% "ondays to Fridays# @ )pare time is all the time that is not 0orking time# ConseFuently% 0orking time and spare time depend on the time Eone o/ the developer# 2.2 Data Sou!es 1e use t0o data sources? 2he 9inux kernel and the 7hloh pro=ects# 7ur analysis o/ the 9inux kernel devel- opment 0ork is !ased on its pu!lic con/iguration man- agement data /ound at .ernel#org :)<# Since 4(()% it has !een managed using +it% 0hich in contrast to older con/iguration management systems lets us distinguish the authors o/ some code% i#e#% the original developer% /rom the committer o/ the code% 0ho integrated it into the kernel code !ase# For this 0ork% 0e do0nloaded the 0hole con/iguration management history /rom 4(() to 4(''# 7ur analysis o/ open source pro=ects is !ased on a 4((* snapshot o/ the 7hloh open source pro=ect data- !ase :4(<# Using Da//ara8s de/inition o/ Gactive pro=ectsG :5<% 0e /ind that our data!ase snapshot con- tains )%''5 active open source pro=ects# Da//ara esti- mates that there 0ere a!out '*%((( active open source pro=ects in the 0orld !y August 4((5% so our sample represents a!out $(H o/ the total active pro=ect popula- tion at that time# 1hile not 0holly representative /or open source at its time% it is close nevertheless# 2." Data #ua$it% Since 4(()% the 9inux kernel con/iguration man- agement data Busing +itD has !een providing more pre- cise in/ormation than traditional systems BCIS% svnD# 1e can distinguish !et0een authors and committers% and 0e can assess the exact time o/ a commit% 0hether a code contri!ution or code integration# 2he 4((* 7hloh data!ase snapshot is not Fuite as detailed# Collecting *%5()%''* commits /rom more than &%'&4 pro=ects% it does not directly provide all relevant data# Due to the diversity o/ con/iguration management systems used in open source% 7hloh cannot distinguish !et0een an author and a committerJ thus% 0e only have committer data at hand# Another conseFuence o/ the variety o/ con/igura- tion management systems is that 7hloh stores all com- mit timestamps using U2C% ignoring the original time Eone o/ a developer# 3o0ever% /or this 0ork% 0e need the local time o/ a commit and hence the time Eone# 1e address this pro!lem !y using location data that 7hloh provides to determine the timeEone o/ individual developers# Ky hand% 0e identi/ied )*( committers Bout o/ 6)%*5( distinct committer idsD% constituting '#$H o/ the committer population# 2hose identi/ied committers per/ormed ;6;%5() o/ the * million com- mits% totaling a!out *H o/ the 0ork !eing per/ormed# 1e call the set o/ identi/ied committers the known committer set or the known committers% in short# 1hile 0e can argue that the original 7hloh data set is close to !eing representative o/ open source% the re- duced num!er o/ kno0n committers may not !e# For one% 0e identi/ied mostly committers o/ a!ove average activity B'#$H o/ the population per/orming *H o/ the 0orkD% so there is some !ias# 2hus% 0e need to under- stand 0hether this !ias is relevant /or the analysis pre- sented in this paper# For this% 0e ranked committers !y num!er o/ com- mits and then !inned the resulting committer seFuence into 4; di//erent !ins# 2he 4; !ins o/ kno0n commit- ters all have close-to-eFual total num!ers o/ commits and 0ere suggested !y R% the statistical analysis tool and environment 0e are using# 2hus% the amount o/ 0ork in each !in is a!out the same% !ut 0as per/ormed !y very di//erent num!ers o/ people# 4 Assuming paid 0ork to !e 0ork per/ormed "on- Fri /rom &am-)pm Bsee !elo0 /or de/inition and dis- cussionD% 0e can calculate the percentage o/ total 0ork per/ormed that is paid 0ork# Figure ' sho0s this paid- 0ork-percentage Bo/ total 0orkD !y committer !in# 1ith a null hypothesis that no trend B!iasD is appar- ent Band alternative hypotheses that there is a !iasD% us- ing a t-test and assuming a normal distri!ution% 0e can- not re=ect the null hypothesis at the &)H con/idence level# 1e there/ore have no reason to assume that re- ducing the overall committer set to the kno0n commit- ter set introduces a !ias that impacts our analysis B!ut can also not exclude itD# >n addition to the kno0n committers set% 0e de/ine an e$tended committers set or e$tended committers% in short# 2he extended committers set comprises all com- mitters in the original 7hloh data!ase% 0here the com- mitter timeEone is either kno0n or assumed# 2he as- sumed timeEone is determined using the /ollo0ing heuristic? 1e /irst condense all commits o/ a commit- ter in the data!ase into a single 0eek# For all commit- ters 0here 0e do not kno0 the time Eone% 0e match their 0eek on an hourly !asis 0ith the 0eeks o/ the kno0n committers set# Using a least-sFuares approach% 0e identi/y the time Eone that has a minimum di//er- ence to the esta!lished data# 2his provides us 0ith the most pro!a!le time Eone /or the not-kno0n committers so that 0e can determine the local time /or each com- mit Bignoring 0ork 0hile travelingD# 2he kno0n committers set provides a sharp picture o/ time-Eone-!ased 0eekly 0ork activities% including paid and volunteer 0ork% 0hile the extended commit- ters set provides a richer Bmore dataD% !ut more /uEEy picture o/ the 0eekly 0ork activities o/ committers# >n the /ollo0ing% 0e present !oth the kno0n and extended committer data side-!y-side# 2.& Data Inte'etation 1e 0ould like to understand ho0 much 0ork in open source is paid 0ork and ho0 it is distri!uted across a 0ide range o/ open source pro=ects# As introduced a!ove% 0e assume that 0ork per- /ormed during regular 0orking hours B0eekdays% &am- )pmD is 0ork paid /or !y companies or paid /or through sel/-sponsorship o/ the developer# 7ne possi!le o!=ection to this is that not all cultures have 0orking 0eeks o/ "on-Fri% &am-)pm# For exam- ple% some cultures 0ork on Saturdays# Figure 4 sho0s the distri!ution o/ commits over the di//erent timeEones o/ this planet# All countries% mostly >slamic countries% that 0ork on Saturdays% sho0 very little open source activity Ban interesting /act in itsel/D# 2hus% 0e /eel sa/e to proceed 0ith our de/inition o/ 0eekend and 0eekdays# Also% one may argue that many people have 0ork- ing hours outside o/ &am-)pm and that 0e are too con- servative in our estimate then# For one% 0e8d rather es- timate paid 0ork conservatively% !ut 0e also shi/ted 0orking hour de/initions around% to *am-6pm% '(am- ;pm% *am-*pm% etc# 0ith no signi/icant change in the results# 2hus% 0e decided to stick to the most common 0orkday de/inition# ". PAID WORK IN OPEN SOURCE ".1 Tota$ Wo( )uin* Wo(in* Ti+e First% 0e investigate ho0 much time is !eing spent on open source during regular 0orking hours# Figures $-; sho0 the 0ork0eek on an hourly !ase /or the years 4(()-4('' /or the 9inux kernel and /or the years 4(((-4((5 /or the 7hloh data# $ Figure 1. Paid-work-percentage of total work for 26 equal-commit-numbers committer bins (higher bin number means more committers in bin) Figure 2. istribution of percentage of total com- mits o!er time "ones (light gra# $ known commit- ters% dark gra# $ e&tended committers) From perusing Figures $-;% 0e can gain a num!er o/ insights already# 2he most o!vious insights are that @ there is a clear di//erence !et0een 0ork days and 0eekend days? a!out t0ice as much 0ork is !eing done on a 0ork day as is on a 0eek- end dayJ and that @ most 0ork is !eing per/ormed during regular 0aking and 0orking hours% i#e#% /rom &am to )pm% even on the 0eekends and that @ developers take lunch and dinner !reaks 0ith 0ork picking up again /or a /e0 hours a/ter dinner !e/ore Fuieting do0n /or the night# 1ith a 0orking time de/inition o/ "on-Fri% &am- )pm% 2a!le ' sho0s the percentages o/ all code contri- !utions respectively all commits made during 0orking time /or the 9inux kernel and the 7hloh pro=ects? About 50% of all work contributed to open source software projects has been provided Monday to Friday, between 9am and 5pm ".2 Ten)s in Wo( )uin* Wo(in* Ti+e Next% 0e look at ho0 the 0orking time 0ork spent on open source has changed over the years# Figures 5-'( sho0 ho0 the percentage o/ commits made during 0orking time changed over the years# >n Figures 5-'(% each data point is the percentage o/ 0orking time 0ork /or the given 0eek# 2he moving average is a 97ESS curve% and the grayed-out space around it indicates the &)H con/idence interval /or a data point# 2he 0idening o/ that space at the !ound- aries o/ the graph is an arti/act o/ not using additional data !eyond those !oundaries# 6 Figure '. (umber of commits b# authors (when code is de!eloped) per hour counted o!er all weeks 2))*-2)11 for the +inu& ,ernel Figure -. (umber of commits b# committers (when code is integrated) per hour counted o!er all weeks 2))*-2)11 for the +inu& ,ernel Figure *. (umber of commits b# known commit- ters per hour counted o!er all weeks 2)))-2)). for the /hloh pro0ects Figure 6. (umber of commits b# e&tended com- mitters per hour counted o!er all weeks 2)))-2)). for the /hloh pro0ects 2he 9inux kernel data in Figures 5-* sho0s signi/i- cant /luctuations% 0hich re/lect the rapid release cycle o/ the pro=ect# A release is per/ormed a!out every *( days :)<# 2he process is highly regulated 0ith de/ined time periods o/ increasing or decreasing activity# 2he activity is highest during the t0o 0eek merge 0indo0% a/ter 0hich sta!iliEation kicks in and activity decreases rapidly# 2hus% there is no apparent annual schedule# 2he 7hloh data% a more diverse data set% also sho0s some /luctuations% !ut much less so# 2he main annual dips /rom 4((4 on on0ards occur during Christmas 0eek% 0here it seems naturally to have a drop in 0ork- ing time 0ork relative to spare time 0ork Bc/# Section 4 /or the dominance in open source activity !y 1estern culturesD# 9ooking at the 9inux kernel data in Fig# 5-* again% 0e can see a clear up0ard trend /or !oth authors and committers# 2hus% /rom around 4((5 through to 4('( increasing amounts o/ 0orking time 0ork in relation to spare time 0ork 0as !eing spent on the 9inux kernel# Starting 4('(% this gro0th largely plateaued# >n contrast to the 9inux kernel data% the 7hloh data set sho0s a straight line# Using a likelihood ratio test BF-2estD% 0ith a straight line as the null hypothesis% 0e have to re=ect any other hypothesis Bat a con/idence level o/ &*HD% and conclude that no gro0th occurred# 2he percentage o/ 0orking time 0ork spent on the 7hloh pro=ects has remained /lat# 3o0ever% during these time /rames% the total under- lying data sets have gro0n su!stantially# 2he 9inux kernel is gro0ing at a polynomial rate :'$< :4$< 0hile the com!ined 7hloh pro=ect data% and presuma!ly all o/ open source% is gro0ing at a near-exponential rate :*<# 2hus% /or the 7hloh pro=ect data% the year 4((( data is much more sparse than the year 4((5 data# Still% as 0e =ust sho0ed% no 0orking time gro0th occurred in our open source pro=ect data% and the 0orking time per- centage o/ total 0ork per/ormed stayed /lat# 1ith gro0ing market share :4)< the economic sig- ni/icance o/ the 9inux kernel has only !een increasing% so it is not surprising to see gro0th in 0orking time 0ork !eing spent on it# 1hat is surprising is that open source in total Busing the 7hloh data as a proxyD% 0hich has !een gro0ing near-exponentially% has maintained a constant ratio o/ 0orking time to spare time 0ork# 2hus% /or every pro=ect 0ith increasing economic sig- ni/icance that received more paid development 0ork% ne0 pro=ects have !een started 0ith less 0orking time engagement% !ut possi!ly gro0ing into it# >t is too early to speculate a!out a sta!le state o/ open source in terms o/ a sta!le ratio o/ paid 0orking time 0ork to volunteer spare time 0ork contri!utions% !ut open source appears to have reached at least an intermediate sta!le state# 2hus% even 0ith underlying near-exponential gro0th% 0e expect this ratio to remain sta!le /or no0# "." De,e$o'e C$assifi!ation 1hile overall 0eekly 0orking times and 0orking time trends are interesting% 0e also 0ould like to kno0 ho0 many developers are earning their living !y per- /orming open source so/t0are development# 2hus% 0e no0 look at individual developers and ho0 much o/ their code is 0ritten during 0orking time% i#e#% to 0hat extent they are !eing paid /or their 0ork# Fig# ''-'4 sho0 the distri!ution o/ developers over the percentage o/ 0ork that is paid 0ork B0orking time 0orkD /or the 9inux kernel and 7hloh pro=ects% respec- tively# >t has !een counted over all years# -lease note that the y-axis is log-scale and that 0e are talking a!out contri!utors no0% not =ust contri!utions# 7nly the extended committer data is sho0n% !ecause the kno0n set 0as too small to provide meaning/ul data /or this particular discussion# Koth the 9inux kernel and 7hloh pro=ects are domi- nated !y the extremes? Developers doing all their 0ork during spare time and developers doing all their 0ork during 0orking time# 2a!le 4 sho0s the dominance o/ these extremes# 3ere% 0e de/ine paid developers to !e those 0ho per/ormed &)H or more o/ their commits during 0orking time% and volunteer developers to !e those 0ho per/ormed &)H and more o/ their commits during spare time% outside the 0eekdays &am-)pm time /rame# 2hus% at least 4$#')H o/ all authors 0orking on the 9inux kernel% totaling '%*(5 developers% are paid /or their 0ork# ''#4*H o/ all committers 0orking on the 9inux kernel% totaling $5 developers% are paid /or their 0ork as 0ell# +iven the economic signi/icance o/ the 9inux kernel% one 0ould expect more committers BmaintainersD to !e paid /or their 0ork than authors# A possi!le explanation /or not con/irming this assump- tion is the long-term engagement o/ committers that ) 1able 1. Percentage of work performed during working time (2am-*pm% 3on-Fri) for +inu& (2))*- 2)11) and the /hloh pro0ects (2)))-2)).) -ercentage o/ total commits made during 0orking time 9inux .ernel author 6)#((H committer )'#$;H 7hloh -ro=ects kno0n committer 65#$H Bmin# 4*%4H% max# )*%*HD extended committers ))#6H Bmin# $;#)H% max# )&#)HD ; Figure .. ata and trend line for percentage of com- mits made b# authors to the +inu& ,ernel during working time for a gi!en week Figure 4. ata and trend line for percentage of com- mits made b# committers to the +inu& ,ernel dur- ing working time for a gi!en week Figure 2. ata and trend line for percentage of com- mits made to the /hloh pro0ects during working time for a gi!en week% known committers Figure 1). ata and trend line for percentage of commits made to the /hloh pro0ects during work- ing time for a gi!en week% e&tended committers Figure 11. (umber of authors and committers with a gi!en a!erage percentage of paid work for the #ears 2))*-2)11 for the +inu& ,ernel Figure 12. (umber of committers with a gi!en a!er- age percentage of paid work for the #ears 2)))- 2)). for the /hloh pro0ects% e&tended committers motivates many to keep 0orking outside traditional 0orking time !oundaries% 0hich makes them /all out- side our conservative de/inition o/ paid 0ork# As to the 7hloh pro=ects% '5#&5H o/ all extended committers% totaling *%466 developers% are !eing paid /or their 0ork# Common to these paid developers is that they do not 0ork on open source pro=ects in their spare time% i#e#% /all outside the !oundaries o/ the traditional open source enthusiast and volunteer categories# ".& Po-e!t C$assifi!ation Finally% not only are 0e interested in 0hat percent- age o/ developers are !eing paid to 0ork on open source% 0e also 0ould like to kno0 ho0 they are allo- cated to pro=ects# >t is /air to assume that some pro=ects get more commercial attention than others# 2hus% 0e investigate 0hich pro=ects receive this attention# 2he 9inux kernel pro=ect is a single Bal!eit largeD pro=ect% so in this Section 0e are looking at the 7hloh pro=ects only# 1e /ind that there is a large num!er o/ small B'-4 developersD pro=ects 0ith a long-tail distri!ution o/ siEe that are /ully paid /or !y companies? All developers% /reFuently =ust one% are making their contri!utions only during 0orking time# 2he top ) smallest pro=ects in our sample% /ully paid /or% are called su!tle% 1e!-A% ShAR-E% gst-openmax% and phpES-# 2hese smallest pro=ects have lo0 commit num!ers Bin the '((8s only% sometimes lessD# >nspection !y hand sho0s that many times% code is !eing committed in large chunks# 2his is uncommon in traditional open source so/t0are development% 0here the most /reFuent commit siEe is one line o/ code :';<# 2hus% it is sa/e to assume that these small !ut still active pro=ects are !e- ing developed in-house and are !eing provided in a snapshot-style to the pu!lic at appropriate times# 2he largest pro=ects in our sample maintain a paid- /or developer percentage in the '(-4(H range# Five ex- ample pro=ects o/ this siEe are +N7"E% Net!eans >DE% Eclipse -lat/orm% .DE% and .I"# 2hese are 0ell- kno0n open source pro=ects that are !eing developed in an open colla!orative style% and the paid developer population o/ these pro=ects can serve as an indicator o/ healthy pu!lic open source pro=ects# &. DISCUSSION O. .INDIN/S >n this 0ork% 0e are making the assumption that 0ork per/ormed during 0orking time hours B"on-Fri% &am-)pm% in the resp# local time EoneD is paid 0ork# 2his time /rame has !een de/ined as 0orking time !y most 1estern countries and thus 0e /eel =usti/ied in considering it paid 0ork# Even students typically have to go to class during that time and spending it on open source development implies economic sel/-sponsorship as it delays graduation and hence !orro0s against /u- ture income# 2he time o/ a commit is not the actual time the 0ork 0as per/ormedJ it is the point o/ time 0hen it is committed Bmade pu!licD# 2hus% the actual 0ork is per- /ormed right up to that point in time# >n other 0ork 0e sho0 that the median time !et0een t0o commits o/ the same open source developer is a!out '((min :')<# 1hen 0e ran the analyses 0ith shi/ted 0orking time /rames% 0e /ound little di//erence to the num!ers /rom the &am-)pm time /rame and decided to ignore this im- precision# 1e !elieve it has no e//ect on the results# 2here is a cultural !ias implied !y these 0orking hours% as some countries 0ork on other days than "on- day to Friday# 7ur analysis o/ contri!utions !y time Eone BSection 4D demonstrates that open source so/t- 0are development is strongly dominated !y 1estern societies% as 0itnessed !y a sharp drop in activities around Christian holidays like Easter or Christmas# 7ne might argue that the 7hloh data is getting old# 7ne advantage o/ the 7hloh data is that it dra0s !roadly on the total population o/ availa!le open source pro=ects# >t 0as seeded !y the original providers o/ the 7hloh service 0ith the most popular open source pro=ects B!y Lahoo search engine rankingD and has since !een maintained !y hand !y the respective providers o/ open source pro=ects# Unlike other data sources% the 7hloh data it is much less !iased to any 5 1able 2. istribution of !olunteer (spare time) to paid (working time) de!elopers% binned% o!er all #ears Vo$untee 0S'ae Ti+e1 Wo( 2i3e) Pai) 0Wo(in* Ti+e1 Wo( !orkin" #ime !ork % 0% 00$%%5% 50$%%9&99% 95%%9999% $00% 9inux .ernel author $$#(;H (#$)H 6$#6)H (#'5H 44#&*H committer ''#)&H $#()H 56#(&H '#)4H H 7hloh -ro=ects kno0n committers 4#6'H '#4'H &)#;&H (#((H (#;&H extended committers 5#(6H (#6H 56#)*H (#6'H '5#);H particular su!group o/ open source pro=ects# >/ there is a !ias% it is a !ias to0ards active 0ell-0orking pro=ects% 0hich happen to !e those 0e are interested in# Still% it 0ould !e desira!le to have ne0 data# Un/or- tunately% there is no alternative at present? No pu!lic access to the ne0 7hloh data is availa!le on the level o/ detail reFuired% other ne0er data sources are su!- stantially more !iased to0ards particular su!groups o/ open source% and it is prohi!itively expensive /or a re- search group to !uild a comprehensive and representa- tive data set /or all o/ open source B0hich is 0hy no- !ody has done it yetD# ConseFuently% this is the !est re- search 0e can do /or no0# 7ur de/inition o/ Gpaid developerG is highly conser- vative# >t is a person 0ho does &)H or more o/ their 0ork during 0orking time hours only# >t represents a regular developer 0ith a regular li/e-style and presum- a!ly no interest in open source so/t0are development !eyond their 0ork# 2here are important and common exceptions to this type o/ person? @ "any paid developers are open source enthu- siasts and keep 0orking outside regular 0ork- ing hours# @ 2he so/t0are industry is !y and large not unioniEed and tends to ignore regular 0orking hours# ConseFuently% our estimate presents a lo0er !ound- ary /or the num!er o/ paid developers# 4. RELATED WORK Since 4((*% Cor!et et al# have !een providing sta- tistics a!out the 9inux kernel development annually :;<# 2hey investigate topics like evolution o/ the re- lease /reFuency as 0ell as num!er o/ changes intro- duced per release# >n addition% they provide a list o/ the most-active companies supporting the development o/ the 9inux kernel and list the percentage o/ commits per/ormed !y each o/ them# Similar to the 0ork pre- sented in this paper% the reports distinguish !et0een au- thors and committers# Cor!et et al# consider a contri!u- tion commercial% i/ it is made using a company8s email address to identi/y the contri!utor# 2hey also maintain a separate mapping list /or regular contri!utors that al- lo0s tracking a person even i/ he or she changes the employer# 2hey /ind that at least 5)H o/ all contri!u- tions since 4(() can !e assigned to company employ- ees# A GReport on the >nternational Status o/ 7pen Source So/t0are 4('(G /inds that the U#S#A#% Australia% and the 1est European countries lead the development and adoption o/ open source so/t0are :'&<# 2his is in line 0ith our o!servation that 0eekly 0ork as 0ell as holiday drops line up 0ell 0ith 1estern cultural 0ork patterns# +od/rey and 2u studied the 9inux kernel gro0th in 4((( :'$<% and Ro!les et al#% /ollo0ing up on +odrey and 2u% studied the 9inux kernel gro0th in 4(() :4$<# Ro!les et al# provide a good summary a!out 0hat analyses 0ere made in the area o/ evolution research o/ open source so/t0are pro=ects# 2hey study '4$ sta!le and 6)5 development releases up to April 4(() Bthe point in time 0here the data /or our analysis startsD and% !y also counting the num!er o/ uncommented lines o/ code% con/irm a super-linear gro0th rate% that is even more signi/icant than already sho0n in the pre- ceding paper# At the same time% the authors point out that not all 0ork in a pro=ect is programming% !ut that also many tasks% such as testing% are done outside o/ the code repository and thus are hard to measure# >nde- pendently o/ that paper% a study !y Succi et al# a!out the gro0th in Gli!reG Bopen sourceD so/t0are systems% con/irms this super-linearity /or the 9inux kernel :46<# Koth the proceedings o/ >CSE Bthe international con/erence on so/t0are engineeringD and "SR Ba con- /erence on mining so/t0are repositoriesD as 0ell as other con/erences and =ournals !y no0 provide exten- sive literature on empirical analyses o/ open source and closed source pro=ects# An example classic open source studies is :'5< !y "ockus et al# 2opics o/ interest range /rom !ug prediction :4< :6< :'*< :4;< through engineer- ing practices :'< :4'< :44<% social structures and com- munity management :$< :'6<% so/t0are evolution :''< :'4<% all the 0ay to issues o/ glo!al colla!oration and distri!uted development :4<# Research methods itsel/% mostly on data Fuality issues% are also analyEed :&< :45<# A /e0 papers compare open source 0ith commer- cial so/t0are development :'<# 3o0ever% to the !est o/ our kno0ledge none o/ this 0ork addresses the issue o/ paid vs# volunteer 0ork as discussed in this paper# 1e did not /ind any research that analyEes the com- mercialiEation o/ open source so/t0are pro=ects !y in- vestigating 0hen 0hat 0ork 0as done# A reason might !e that modern version control systems% such as +it and "ercurial% have allo0ed us to access commit his- tory data in detail% including time Eone in/ormation% only recently# 7lder systems% such as CIS or svn only store a single U2C time stamp per commit# 5. CONCLUSIONS 2his paper analyEes to 0hat extent open source so/t0are development has !ecome commercial paid-/or so/t0are development# A paid contri!ution is de/ined as having !een contri!uted during regular B1esternD 0orking hours% "on-Fri% &am-)pm# Ky studying the 9inux kernel /rom 4(() to 4('' and the 7hloh * pro=ects% a large set o/ more than )%((( active open source pro=ects% /rom 4((( to 4((5% 0e /ind that a!out )(H o/ all contri!utions to pro=ects in our sample pop- ulation have !een paid 0ork# "oreover% no change in this percentage has occurred /or the 7hloh pro=ects% suggesting that the ratio o/ paid-to-volunteer 0ork is sta!le in open source /or no0# +oing one step /urther% 0e /ind that '(-4(H o/ the developers engaged in our sample pro=ects per/orm de- velopment 0ork only during 0orking hours% suggesting that they are /ully paid /or their 0ork# Unlike tradi- tional volunteers% they per/orm no 0ork on our sample pro=ects outside this time-/rame% making our estimate a conservative one# 1e also /ind that many small pro=ects are /ully paid /or% and that larger pro=ects have a healthy mixture o/ paid and volunteer 0ork in the '(- 4(H range as 0ell# >n /uture 0ork% 0e intend to ana- lyEe the relationship !et0een these categories o/ devel- opers% company engagement% and pro=ect success# RE.ERENCES :'< C# Kird% A# +ourley% -# Devan!u GDetecting -atch Su!mission and Acceptance in 7SS -ro=ects%G in -roceedings o/ the Fourth >nternational 1orkshop on "ining So/t0are Repositories B"SR 8(5D% pp# 4;# :4< C# Kird% N# Nagappan% -# Devan!u% 3# +all% K# "urphy% GDoes distri!uted development a//ect so/t0are FualityM? An empirical case study o/ 1indo0s Iista%G in Communications o/ the AC"% vol# )4% no# *% pp# *)- &$# :$< C# Kird% D# -attison% R# D8SouEa% I# Filkov% -# Devan!u% G9atent social structure in open source pro=ects%G in S>+S7F2 8(*NFSE-'; -roceedings% 4((*% pp# 46-$)# :6< E# Capra% GAn Empirical Study on the Relationship Ket0een So/t0are Design Ouality% Development E//ort and +overnance in 7pen Source -ro=ects%G in >EEE 2ransactions on So/t0are Engineering% vol# $6% no# ; B4((*D% pp# 5;)-5*4# :)< P# Cor!et% G3o0 to participate in the 9inux community%G 4((*% at http?NN000#linux/oundation#orgNcontentNho0- participate-linux-community# :;< P# Cor!et% +# .roah-3artman% and A# "c-herson% G9inux .ernel Development Q 3o0 /ast it is going% 0ho is doing it% 0hat they are doing% and 0ho is sponsoring itMG% 4('4% /rom http?NNgo#linux/oundation#orgN0ho-0rites-linux-4('4# :5< C# Da//ara% GEstimating the num!er o/ active and sta!le F97SS pro=ectsG% 4((5% /rom http?NNro!ertogaloppini#netN4((5N(*N4$Nestimating-the- num!er-o/-active-and-sta!le-/loss-pro=ects BArchived at http?NN000#0e!citation#orgN;&t*U"(lRD# :*< A# Deshpande and D# Riehle% G2he total gro0th o/ open source%G in -roceedings o/ the /ourth Con/erence on 7pen Source Systems B7SS 4((*D% Springer Ierlag% 4((*% pp#'&5Q4(&# :&< "# Fischer% "# -inEger% 3# +all% G-opulating a release history data!ase /rom version control and !ug tracking systems%G in -roceedings o/ the >nternational Con/erence on So/t0are "aintenance B>CS" 4(($D% pp# 4$-$4# :'(< F97SSmole# Colla!orative collection and analysis o/ /reeNli!reNopen source pro=ect data% 4('4% /rom http?NN/lossmole#orgN BArchived at http?NN000#0e!citation#orgN;&t&UhDSRD# :''< K# Fluri% "# 1ursch% 3# C# +all% GDo Code and Comments Co-EvolveM 7n the Relation !et0een Source Code and Comment Changes%G in -roceedings o/ the '6th 1orking Con/erence on Reverse Engineering B1CRE 4((5D% pp# 5(-5&# :'4< 3# C# +all% "# 9anEa% GSo/t0are evolution? analysis and visualiEation%G in -roceedings o/ the 4*th >nternational Con/erence on So/t0are Engineering B>CSE 4((;D% pp# '())-'();# :'$< "# 1# +od/rey and O# 2u% GEvolution in open source so/t0are? a case study%G in -roceedings o/ the >nternational Con/erence on So/t0are "aintenance B>CS"D% 4(((% pp#'$'Q'64# :'6< I# .# +ur!ani% A# +arvert% P# D# 3er!sle!% GA case study o/ open source tools and practices in a commercial setting%G in -roceedings o/ the Fi/th 1orkshop on 7pen Source So/t0are Engineering% pp# '-;# :')< C# .olassa% D# Riehle% "#A# Salim# G2he empirical commit /reFuency distri!ution o/ open source pro=ects#G >n -roceedings o/ the 4('$ >nternational Symposium on 7pen Colla!oration B1ikiSym S 7penSym 4('$D% AC"% 4('$% paper C6# :';< C# .olassa% D# Riehle% "#A# Salim# TA "odel o/ the Commit SiEe Distri!ution o/ 7pen Source#U >n -roceedings o/ the $&th >nternational Con/erence on Current 2rends in 2heory and -ractice o/ Computer Science BS7FSE" 4('$D% 9NCS 556'# Springer Ierlag% 4('$% pp)4-;;# :'5< A# "ockus% R# 2# Fielding% and P# D# 3er!sle!% GA case study o/ open source so/t0are development? 2he Apache server%G in >CSE 4((( -roceedings% 4(((% pp# 4;$Q454# :'*< N# Nagappan% 2# Kall% GUsing So/t0are Dependencies and Churn "etrics to -redict Field Failures? An Empirical Case Study%G in First >nternational Symposium on Empirical So/t0are Engineering and "easurement BESE" 4((5D% pp# $;6-$5$# :'&< National 7pen Source So/t0are 7!servatory% GReport on the >nternational Status o/ 7pen Source So/t0are%G 4('( BArchived at http?NN000#0e!citation#orgN ;&t&$2c-FD# :4(< 7hloh% the open source net0ork% 4('4% online at http?NN000#ohloh#netN BArchived at http?NN000# 0e!citation#orgN;&t&!y9C0D# :4'< P# 1# -aulson% +# Succi% A# E!erlein% GAn empirical study o/ open-source and closed-source so/t0are products%G in 2ransactions on So/t0are Engineering% vol# $(% no# 6 BApril 4((6D% pp# 46;-4);# :44< -# C# Rig!y% D# "# +erman% "#-A# Storey% G7pen source so/t0are peer revie0 practices? a case study o/ the apache server%G in -roceedings o/ the $(th & >nternational Con/erence on So/t0are Engineering B>CSE 4((*D% >EEE% pp# )6'-))(# :4$< +# Ro!les% P# P# Amor% P# "# +onEaleE-Karahona% and ># 3erraiE% GEvolution and gro0th in 9arge li!re so/t0are pro=ects%G in -roceedings o/ the eigth international 0orkshop on -rinciples o/ So/t0are Evolution% 4(()% pp# ';)Q'56# :46< +# Succi% P# -aulson% and A# E!erlein% G-reliminary results /rom an empirical study on the gro0th o/ open source and commercial so/t0are products%G in EDSER- $ 1orkshop% co-located 0ith >CSE% 4(('# :4)< S# P# Iaughan-Nichols% 9inux servers keep gro0ing% 1indo0s and Unix keep shrinking# VDnet# BArchived at http?NN000#0e!citation#orgN;&00iL12&D :4;< 2# Vimmermann% N# Nagappan% G-redicting de/ects using net0ork analysis on dependency graphs%G in -roceedings o/ the $(th >nternational Con/erence on So/t0are Engineering B>CSE 4((*D% pp# )$'-)6(# :45< 2# Vimmermann% -# 1eiWger!er% A# Veller% G"ining version histories to guide so/t0are changes%G in -roceedings o/ the 4;th >nternational Con/erence on So/t0are Engineering B>CSE 4((6D% pp# );$-)54# '(
Chatlog 2-22-14 To 4 - 27 - 14 - Weekend Performance Tuning - Analyzing With DBA Skillsets - Every Sat - Sun 10 - 00 Am To 5 - 00 PM 2014-04-19 13 - 58