Big Data

Share:

The questions in this section were shown to developers involved in Data Analysis, Data Engineering, Machine Learning, or to those whose job role was Data Analyst / Data Engineer / Data Scientist. This survey was targeted specifically at developers, so the results may not be representative of the wider big data audience.

Which of these batch processing tools do you use?

Which of these streaming processing tools do you use?

Professionals who are not involved in data pipeline creation use traditional relational databases for building data lakes. Spark continues to be the most popular tool for batching and streaming processing.

Which of these orchestration tools do you use?

Quite predictably, Apache Airflow is the most popular orchestration tool – especially among data engineers. Interestingly, 10% of the orchestration tools are custom or self-built.

Which of these tools do you use for Spark execution?

Kubernetes, YARN and Amazon EMR are the most popular cloud solutions for Spark execution.

Which of these tools do you use for building data lakes?

Which of these MPP tools do you use?

The vast majority of the respondents do not use MPP tools. BigQuery, Redshift, and Azure SQL Data Warehouse are the most popular instruments.

Do you work with message brokers or message queues (e.g. Kafka, RabbitMQ)?

Which of these tools do you use for messaging and delivery?

Big Data:

2022

Thank you for your time!

We hope you found our report useful. Share this report with your friends and colleagues.

If you have any questions or suggestions, please contact us at surveys@jetbrains.com.