The questions in this section were shown to developers involved in Data Analysis, Data Engineering, Machine Learning, or to those whose job role was Data Analyst / Data Engineer / Data Scientist. This survey was targeted specifically at developers, so the results may not be representative of the wider big data audience.
Big Data
Spreadsheet editors are the most used tools for data analysis and visualization (46%).
The majority of big data developers don’t use specific data analytics platforms (68%). The most common data analytics platform used is Google Colab (19%).
Jupyter is the most popular big data tool, used by 32% of big data developers. Other popular tools are Apache Spark (20%) and Apache Kafka (17%).
Data is mostly hosted on internal servers (36%) or locally (26%). AWS is used for data hosting by 21% of the respondents, other types of hosting are less common.
In non-IT sectors, data engineers are more commonly employed in financial sectors, while machine learning specialists more often work in the education and science sectors.
Python is used along with Apache Spark by 66%, Java by 34%, and Scala by 11%.
10% use both Apache Spark and Apache Kafka. 9% use both Apache Spark and Apache Hadoop.
The three most popular languages used along with Apache Kafka are Python, Java, and SQL.
R is more widely used in Russia (5%), Python is more widely used in Asia (59%).
Python and Java are more commonly used with Google Cloud, JavaScript and PHP are more commonly used with AWS, and C# is more commonly used with Azure.
Jupyter and Apache Beam are more commonly used along with Google Cloud. Apache Spark and Apache Kafka are more commonly used among AWS users.
Machine Learning specialists more commonly use Python, C++, and C and less commonly use SQL and PHP in comparison with developers involved in Data Analysis and Data Engineering.
Python and R are more typically used by developers involved in education and science.
Jupyter is more commonly used in education and science. Apache Spark, Apache Kafka, Apache Hadoop , and Apache Hive are more frequently used in banking.
The largest shares of Apache Spark users are in China, India, South Korea, Spain, and Latin America.
Thank you for your time!
We hope you found our report useful. Share this report with your friends and colleagues.
Join JetBrains Tech Insights Lab
Take part in surveys and UX studies to make JetBrains products easier to use yet even more powerful. For participating in our research, you’ll also get the chance to earn rewards.
If you have any questions or suggestions, please contact us at surveys@jetbrains.com.