Who are the open source contributors?

Mani Jagadeesan
6 min readJan 16, 2021

Open source is everywhere. As developers, we use a very large number of open source packages and tools every single day. It is impossible to imagine a world without open source today, without all the amazing software that we use in our projects.

At an individual level, we know either from personal experience or from others that contributing to open source pays eventually, sometimes indirectly. When we contribute to open source, we share our knowledge with thousands of others who can review our work and give feedback or even improve upon our work. This will help the original contributers improve over time, while also giving them a significant amount of visibility and networking opportunity that can help them in their chosen careers.

Is there a way to find out who these open source contributors are, and what kind of profile they have?

Stackoverflow has been collecting data from thousands of developers and publish survey results that give interesting insights into the overall trends like tools and technologies, salaries of developers, etc. They also collected information on whether a survey respondent contributes to open source software. This data on open source contributions is available for years 2019, 2018 and 2017.

While Stack Overflow’s survey insights (2019) does not provide any observed pattern on those developers who contribute to open source, we can do it ourselves using the raw data. Let’s look at the profile of open source contributors by:

  1. Operating System that they use
  2. Salary
  3. Career satisfaction

Methodology

Our primary objective is to get insights into the typical profile of an open source developer. We have the Stack Overflow survey data for year 2019, which has data about open sourcers in a categorical variable (radio button selection), with the following options:

  • Once a month or more often
  • Less than once a month but more than once per year
  • Less than once per year
  • Never

We would like to see how it correlates with other categorical data like:

  • Operating System that they use (do open sourcers use Linux more than Windows? What about MacOS?)
  • Salary (do open sourcers end up with higher paying jobs?)
  • Career satisfaction (are these open sourcers happy with what they do?), etc.

The way we intend to do it: We calculate the percentage of people among a specific category of open source contributors (for example: developers contributing once a month or more), and see how they do with respect to the other categories of contributors.

Open Source contributors by Operating System

There is a popular perception that open source contributors use Linux much more predominantly than Windows. Is there any truth in this?

We also see in a lot of tech conferences and youtube video tutorials that there ar a lot of developers using MacOS. How does it compare to Linux users? Let’s find out from the Stack Overflow survey responses.

Here are the raw numbers for various operating systems used by open-source contributors:

We can’t see much yet at the moment other than the fact that there are more Linux users among those who contribute more frequently to open source. Converting these numbers to percentage will help us understand this data better:

In the table above, every row totals to 100%.

Finally we can see a clear pattern in these numbers: Developers making open-source contributions are much more likely to use Linux-based (36%) or MacOS (28%) operating systems than Windows (35%). This is significant in the light of market share enjoyed by Windows computers (~75%) based on sales figures and general usage statistics. (Source: https://gs.statcounter.com/os-market-share/desktop/worldwide)

We can also see in the last row: developers who do not contribute to open source prefer to use Windows computers.

So there is some truth in the claim that open source contributors prefer Linux or MacOS for their work.

Care should be taken to note that we can make our inference only in one direction, and nothing at all can be said in the reverse direction. For example, we cannot say that Linux or MacOS users are twice as likely to make open source contributions. We can only make an observation based on this data that open source contributors increasingly prefer Linux or MacOS based operating systems.

Open Sourcers by salary

A good number of blogs and articles in the tech world mention that developers who contribute to open source get the rewards eventually, sometimes indirectly via much faster career progression. There may be some truth in this, but we now have an opportunity to test this with some real-world data.

One proxy for career progression is salary. If open source contributors do see greater opportunities, then they should also be getting higher salaries than average.

For the sake of analysis, we are using the quantile based discretization function in Pandas, which automatically splits the data points into 4 equal sized buckets (each bucket will have equal number of users).

Here is what the data says (number of people in each of the salary categories):

Let’s do the same in percentages within each row:

The first row suggests that developers who contribute to open source have a higher chance (29.34%) of ending up with better jobs.

This is a significant increase over the group that never contributes to open source. In the last group (last row), we can see that among users who never contribute to open source, only 21% of them end up in the highest salary bucket.

Open sourcers by Career Satisfaction

Are developers likely to contribute to open source software when they are satisfied with their jobs or careers? Let’s look at what the data says in terms of absolute numbers:

Here are the percentages of people in each contributor category:

Like before, each of the contributor category (rows) sum up to 100%.

There is a clear pattern emerging here: open source contributors report higher than average career satisfaction. Among those who contribute regularly to open source, nearly 47% of them report the highest possible career satisfaction. This is significantly higher than the category that never contributes to open source projects, who report a 36.7% chance of being very satisfied.

This is again a correlation and not causation, but there could be some element of indirect causation and truth here: those developers contributing to open source are actually in better jobs (possibly they get better over time) and therefore they are also happy with their jobs and careers.

Conclusion

Here are some of the insights we got from the developer survey data:

  1. Developers contributing to open source do prefer Linux or MacOS compared to Windows for their work computer
  2. Open source contributors seem to make slightly higher salary
  3. Open source contributors report higher career satisfaction

We have to be careful here not to make any assumptions between correlations and causations. These are only correlations, so we cannot say that contributing to open source will bring all the other benefits. Correlation does not imply causation.

Next Steps

The most recent developer survey (Year 2020)from Stack Overflow does not collect data on open source contributions. But we do have this data in the surveys conducted in 2017 and 2018. If you are interested, I highly recommend analyzing this data and bring out more insights.

The jupyter notebook used for this analysis can be found in this Github repo: https://github.com/mani04/stackoverflow-developer-survey-2019

--

--

Mani Jagadeesan
0 Followers

Business Intelligence Reports Developer / Frontend developer