The questions in this section were shown to developers involved in Data Analysis, Data Engineering, Machine Learning, or to those whose job role was Data Analyst / Data Engineer / Data Scientist. This survey was targeted specifically at developers, so the results may not be representative of the wider big data audience.

Big Data

Which statistics package(s) do you use to analyze and visualize data?

46%16% 4% 3% 2% 1%12%37%Spreadsheet editorTableauSPSSSASStatisticaStataOtherNone

Spreadsheet editors are the most used tools for data analysis and visualization (46%).

Which big data analytics platforms do you use?

68%19% 7% 4% 3% 3% 2% 4%No specific platformGoogle ColabGoogle AI PlatformDatabricksMicrosoft Azure HDInsightCloudera Data PlatformZeppelinOther

The majority of big data developers don’t use specific data analytics platforms (68%). The most common data analytics platform used is Google Colab (19%).

Which big data tools do you use?

32%20%17%12%10% 4% 3% 2% 1% 1% 2%49%JupyterApache SparkApache KafkaApache Hadoop/MapReduceApache HiveApache FlinkDaskApache BeamApache PigApache TezOtherNone

Jupyter is the most popular big data tool, used by 32% of big data developers. Other popular tools are Apache Spark (20%) and Apache Kafka (17%).

What Spark version do you use?

1%39%34% 9% 5% 1% 3% 0%23%None3.x2.42.32.0–2.21.xCustom distribution of sparkOtherI'm not sure

Where is most of your data hosted?

36%26%21% 8% 5% 4%Internal serversLocallyAWSGoogle CloudAzureOther

Data is mostly hosted on internal servers (36%) or locally (26%). AWS is used for data hosting by 21% of the respondents, other types of hosting are less common.

Is IT your company’s core business?

No
Yes
43%57%42%58%34%66%Data AnalysisData EngineeringMachine Learning

Machine learning specialists more commonly work in core IT companies.

In which of the following sectors is your company primarily active?

37%33%22%21%16%16%15%12%11%10% 9%IT ServicesBig Data / Data AnalysisMobile DevelopmentCloud Computing / PlatformFintechSoftware Development ToolsInternet / Search EnginesIoT / EmbeddedData center ServicesHealthcare ITE-learning
All results

In non-IT sectors, data engineers are more commonly employed in financial sectors, while machine learning specialists more often work in the education and science sectors.

In which of the following sectors is your company primarily active?

14%14%12%10%10% 9% 7% 7% 7% 5% 5%Banking / Real Estate / Mortgage Financing / Accounting / Finance / InsuranceEducation / TrainingSales / Distribution / RetailManufacturingMedicine / HealthScienceGovernment and DefenseLogistics/ TransportationMarketingAdministration / Management / Business DevelopmentEntertainment / Mass Media and Information / Publishing
All results

Python, Scala and Java usage along with Apache Spark

66%Python
34%Java
11%Scala
15%None of these

Python is used along with Apache Spark by 66%, Java by 34%, and Scala by 11%.

Top-10 combinations of used big data tools

10% 9% 9% 7% 7% 6% 6% 6% 5% 4%Apache Kafka and Apache SparkApache Spark and JupyterApache Hadoop/MapReduce and Apache SparkApache Hive and Apache SparkApache Hadoop/MapReduce and Apache KafkaApache Hadoop/MapReduce and Apache HiveApache Kafka and JupyterApache Hive and Apache KafkaApache Hadoop/MapReduce and JupyterApache Hive and Jupyter

10% use both Apache Spark and Apache Kafka. 9% use both Apache Spark and Apache Hadoop.

Top-3 languages used along with Apache Kafka

66%Python
34%Java
27%SQL

The three most popular languages used along with Apache Kafka are Python, Java, and SQL.

Python/R ratio in US, Europe, Russia and Asia

To
From
Value
-
United StatesEuropeRussiaAsia
49%44%54%59%Python
2%2%5%2%R
51%56%45%40%None of these
00.59
United States
Europe
Russia
Asia
To
From
Value
-
PythonRNone of these
00.59

R is more widely used in Russia (5%), Python is more widely used in Asia (59%).

Primary language by big data hosting usage

49%41%30%27%23%19%15% 9% 8% 7% 6%PythonJavaScriptSQLPHPJavaHTML / CSSTypeScriptShell scripting languagesGoC#C++
All results

Python and Java are more commonly used with Google Cloud, JavaScript and PHP are more commonly used with AWS, and C# is more commonly used with Azure.

Big data tools usage by big data hosting usage

30%26%22%14%10% 4% 3% 2% 1% 1% 3%47%JupyterApache SparkApache KafkaApache Hadoop/MapReduceApache HiveDaskApache FlinkApache PigApache BeamApache TezOtherNone

Jupyter and Apache Beam are more commonly used along with Google Cloud. Apache Spark and Apache Kafka are more commonly used among AWS users.

Primary language by involvement in Data Analysis / Data Engineering / Machine Learning

52%31%29%28%18%18%11%10% 8% 8% 6% 5% 4%PythonJavaScriptSQLJavaHTML / CSSPHPC++C#Shell scripting languagesTypeScriptGoCKotlin
All results

Machine Learning specialists more commonly use Python, C++, and C and less commonly use SQL and PHP in comparison with developers involved in Data Analysis and Data Engineering.

Primary language by sectors

To
From
Value
-
Core IT businessBanking / financeEducation and scienceSales / Distribution / RetailManufacturing
46%55%66%43%47%Python
33%27%23%37%32%JavaScript
29%34%19%21%20%Java
29%36%21%40%29%SQL
20%12%15%29%15%PHP
16%15%16%22%18%HTML / CSS
14%9%6%9%10%TypeScript
11%11%10%9%28%C#
10%7%19%6%12%C++
10%7%2%6%6%Go
10%14%13%9%9%Shell scripting languages
5%5%3%6%9%Kotlin
4%4%6%1%4%C
4%5%0%3%4%Scala
3%1%1%1%1%Rust
3%1%3%1%-Swift
2%2%1%3%2%Dart
2%1%1%3%2%Ruby
1%1%6%2%-R
1%1%0%4%4%Visual Basic
0%-4%-1%MATLAB
4%5%3%4%2%Other
00.66
Core IT business
Banking / finance
Education and science
Sales / Distribution / Retail
Manufacturing
To
From
Value
-
PythonJavaScriptJavaSQLPHPHTML / CSSTypeScriptC#C++GoShell scripting languagesKotlinCScalaRustSwiftDartRubyRVisual BasicMATLABOther
00.66

Python and R are more typically used by developers involved in education and science.

Big data tools usage by sectors

To
From
Value
-
Core IT businessBanking / financeEducation and scienceSales / Distribution / RetailManufacturing
29%33%38%27%26%Jupyter
24%36%14%26%15%Apache Spark
23%33%6%21%9%Apache Kafka
15%24%7%12%4%Apache Hadoop/MapReduce
13%21%4%14%6%Apache Hive
7%9%1%3%2%Apache Flink
3%7%6%1%4%Dask
2%5%0%1%1%Apache Beam
2%3%2%1%-Apache Pig
1%2%-3%-Apache Tez
3%2%4%2%1%Other
46%40%53%53%61%None
00.61
Core IT business
Banking / finance
Education and science
Sales / Distribution / Retail
Manufacturing
To
From
Value
-
JupyterApache SparkApache KafkaApache Hadoop/MapReduceApache HiveApache FlinkDaskApache BeamApache PigApache TezOtherNone
00.61

Jupyter is more commonly used in education and science. Apache Spark, Apache Kafka, Apache Hadoop , and Apache Hive are more frequently used in banking.

Share of Apache Spark usage by country or region

29%29%27%20%19%19%18%18%17%16%16%ChinaIndiaSouth KoreaSpainLatin AmericaUnited StatesOther South-East Asia and OceaniaMexicoCanadaUkraineBelarus
All results

The largest shares of Apache Spark users are in China, India, South Korea, Spain, and Latin America.

Thank you for your time!

We hope you found our report useful. Share this report with your friends and colleagues.

Join JetBrains Tech Insights Lab

Take part in surveys and UX studies to make JetBrains products easier to use yet even more powerful. For participating in our research, you’ll also get the chance to earn rewards.

If you have any questions or suggestions, please contact us at surveys@jetbrains.com.