The questions in this section were shown to developers involved in Data Analysis, Data Engineering, Machine Learning, or to those whose job role was Data Analyst / Data Engineer / Data Scientist. This survey was targeted specifically at developers, so the results may not be representative of the wider big data audience.

Big Data

Which statistics package(s) do you use to analyze and visualize data?

Spreadsheet editors are the most used tools for data analysis and visualization (46%).

Which big data analytics platforms do you use?

The majority of big data developers don’t use specific data analytics platforms (68%). The most common data analytics platform used is Google Colab (19%).

Which big data tools do you use?

Jupyter is the most popular big data tool, used by 32% of big data developers. Other popular tools are Apache Spark (20%) and Apache Kafka (17%).

What Spark version do you use?

Where is most of your data hosted?

Data is mostly hosted on internal servers (36%) or locally (26%). AWS is used for data hosting by 21% of the respondents, other types of hosting are less common.

Is IT your company’s core business?

Machine learning specialists more commonly work in core IT companies.

In which of the following sectors is your company primarily active?

In non-IT sectors, data engineers are more commonly employed in financial sectors, while machine learning specialists more often work in the education and science sectors.

In which of the following sectors is your company primarily active?

Python, Scala and Java usage along with Apache Spark

Python is used along with Apache Spark by 66%, Java by 34%, and Scala by 11%.

Top-10 combinations of used big data tools

10% use both Apache Spark and Apache Kafka. 9% use both Apache Spark and Apache Hadoop.

Top-3 languages used along with Apache Kafka

The three most popular languages used along with Apache Kafka are Python, Java, and SQL.

Python/R ratio in US, Europe, Russia and Asia

R is more widely used in Russia (5%), Python is more widely used in Asia (59%).

Primary language by big data hosting usage

Python and Java are more commonly used with Google Cloud, JavaScript and PHP are more commonly used with AWS, and C# is more commonly used with Azure.

Big data tools usage by big data hosting usage

Jupyter and Apache Beam are more commonly used along with Google Cloud. Apache Spark and Apache Kafka are more commonly used among AWS users.

Primary language by involvement in Data Analysis / Data Engineering / Machine Learning

Machine Learning specialists more commonly use Python, C++, and C and less commonly use SQL and PHP in comparison with developers involved in Data Analysis and Data Engineering.

Primary language by sectors

Python and R are more typically used by developers involved in education and science.

Big data tools usage by sectors

Jupyter is more commonly used in education and science. Apache Spark, Apache Kafka, Apache Hadoop , and Apache Hive are more frequently used in banking.

Share of Apache Spark usage by country or region

The largest shares of Apache Spark users are in China, India, South Korea, Spain, and Latin America.

Thank you for your time!

We hope you found our report useful. Share this report with your friends and colleagues.

Join JetBrains Tech Insights Lab

Take part in surveys and UX studies to make JetBrains products easier to use yet even more powerful. For participating in our research, you’ll also get the chance to earn rewards.

If you have any questions or suggestions, please contact us at surveys@jetbrains.com.