Methodology

Share:

This is a public report, and its contents may be used as long as the source is appropriately credited.

The number of respondents

More than 35,000 people responded to the Developer Ecosystem Survey 2023. To ensure we were working with the most representative sample possible, we cleaned the data through the process described below. As a result, the report is based on the input of 26,348 developers from 196 countries and regions, including one response reportedly from Antarctica. The data was weighted according to several criteria, as described in the closing portions of this section.

The data cleaning process

We utilized incomplete responses only when at least the question about the use of programming languages was answered. We also used a set of criteria to identify and exclude suspicious responses, including these:

  • Surveys that were filled out too fast.
  • Surveys from identical IP addresses, as well as surveys with responses that were overwhelmingly similar. If two surveys with the same IP address were more than 75% identical, we kept the one that was more complete.
  • Surveys with conflicting answers, for example, “18–20 years old” and “more than 16 years of professional experience”.
  • Surveys with only a single option chosen for almost all the multiple-choice questions.
  • Surveys submitted from the same email address. In such cases, we kept the survey that was the most complete.

Reducing the response burden

This year's survey consisted of 544 questions.

Our goal was to cover a variety of research areas, so each respondent was exposed to certain sections but not others based on their previous questions. For example, questions about Go were shown only to programmers who use Go. In addition, we randomized questions and sections to further reduce the load on each respondent.

On average, participants invested 30 minutes in completing the survey, and while we have put efforts into streamlining the survey process, we aim to make the engagement even more efficient next year.

Targeting our audience

We invited potential respondents by using Twitter ads, Facebook ads, Instagram, Quora, and JetBrains’ own communication channels. We also posted links to user groups and tech community channels and asked respondents to share the survey with their peers.

Countries and regions

We collected sufficiently large samples from 16 countries: Argentina, Brazil, Canada, China, France, Germany, India, Japan, Mexico, South Korea, Spain, Türkiye, Russia, Ukraine, the United Kingdom, and the United States.

The remaining countries were distributed among six regions:

  • Middle East, Africa, Central Asia
  • Eastern Europe, Balkans, and the Caucasus
  • Benelux and Northern Europe
  • Rest of Europe (including Cyprus and Israel)
  • Other Southeast Asia and Oceania (including Australia and New Zealand)
  • Central and South America (excluding Argentina, Brazil, and Mexico)

For each geographical region, we collected at least 300 responses from external sources, such as ads or respondents’ referrals.

Localization

To maximize inclusivity and accommodate a diverse range of participants, the survey was available in 10 languages: English, Chinese, French, German, Japanese, Korean, Brazilian Portuguese, Russian, Spanish, and Turkish.

Sampling-bias reduction

We weight the data according to where the responses came from. As a base, we took the responses collected from external sources that are less biased toward JetBrains users, such as paid ads on Twitter, Facebook, Instagram, Quora, and respondents’ referrals. We considered each respondent’s source individually to generate results based on the weighting procedures.

We undertook three weighting stages to get a less-biased picture of the worldwide developer population.

Stage one: Adjusting for the populations of professional developers in each region

In the first stage, we assembled the responses collected while targeting different countries, and then we applied our estimations of the populations of professional developers in each country to these data.

First, we took the survey data we received from professional developers and working students that were directed to us via ads posted on various social networks in the 22 regions, along with the data that we received from various peer referrals. Though we did not advertise the survey in Ukraine and Russia, we included data collected from these two countries in the report, using an approximation from 2021's data to weight them accordingly. Then, we weighted the responses according to our estimated populations of professional developers in those 22 regions. This ensured that the distribution of the responses corresponded to the population size of professional developers in each country.

Stage two: The proportions of currently employed and unemployed developers

In the second stage, we forced the proportion of students and unemployed respondents to be 17% in every country. We did this to maintain consistency with the previous year’s methodology, as that is the only estimate of their populations we have available.

By this point, we had a distribution of responses from external sources weighted both by region and employment status.

Stage three: Employment status, programming languages, and JetBrains product usage

The third stage was rather sophisticated, as it included calculations obtained by solving systems of equations. We took those weighted responses, and for the developers from each region, in addition to their employment status, we calculated the shares for each of the 30+ programming languages, as well as the shares for those who answered “I currently use JetBrains products” and “I have never heard of JetBrains or its products”. Those shares became constants in our equations.

The next step was to add two more groups of responses from other sources: JetBrains internal communication channels, such as JetBrains social media accounts and our research panel, and social network ad campaigns targeted at users of certain programming languages.

Solving the system of linear equations and inequalities

We composed a system of 30+ linear equations and inequalities that described:

  • The weighting coefficients for the respondents (as a hypothetical example, Fiona from our sample represents, on average, 180 software developers from France).
  • The specific values of their responses (for example, Pierre uses C++, he is fully employed, and he has never heard of JetBrains).
  • The necessary ratios among the responses (for example, 27% of developers have used C++ in the past 12 months, and so on).

In order to solve this system of equations with the minimum variance of the weighting coefficients (which is important!), we used the dual method of Goldfarb and Idnani (1982, 1983), which helped us collate the optimal individual weighting coefficients for the 26,348 total respondents.

Lingering bias

Despite these measures, some bias is likely present, as JetBrains users might have been more willing, on average, to complete the survey.

As much as we try to control the survey distribution and apply smart weighting, the communities and the developer ecosystem are constantly evolving, and the possibility of some unexpected data fluctuations cannot be completely eliminated.

We will continue to update and improve our methodology in the future. Stay tuned for the Developer Ecosystem Survey 2024!

Methodology:

2023

Find the right tool

Thank you for your time!

We hope you found our report useful. Share this report with your friends and colleagues.

If you have any questions or suggestions, please contact us at surveys@jetbrains.com.