Big Data

Share:

The questions in this section were shown to developers involved in Data Analysis, Data Engineering, Machine Learning, or to those whose job role was Data Analyst / Data Engineer / Data Scientist.

Share:

Which of the following batch-processing tools do you use?

Which of the following streaming processing frameworks / tools do you use?

The Spark ecosystem continues to be the most popular choice for batching and streaming processing.

Which of the following orchestration tools do you use?

Quite predictably, Apache Airflow is the most popular orchestration tool, especially among data engineers. Interestingly, 9% of the orchestration tools being used are custom or self-made.

Which of the following tools do you use for Spark execution?

Kubernetes, YARN, and Amazon EMR are the most popular cloud solutions for Spark execution. Kubernetes has been gaining in popularity year after year, while YARN usage has decreased by 8 percentage points year over year. Companies tend to prefer including data engineering tools in the other parts of the IT landscape instead of using separate systems like YARN.

Which of the following tools do you use for building data lakes?

Which of the following MPP tools do you use?

The majority of respondents do not use MPP tools, but those who do tend to go with BigQuery, Redshift, or Azure SQL Data Warehouse.

Do you usually create new clusters or always work with the same cluster?

Which of the following engines do you use for your data engineering tasks?

A significant majority (64%) reported not using any engines for their data engineering tasks. Among the engine users, BigQuery, Databricks, and AWS Athena are equally popular, each with a 10% share. Amazon EMR, Redshift, AWS Glue, and Azure Analysis Services follow closely.

Do you work with message brokers or message queues (e.g., Kafka, RabbitMQ, etc.)?

Which of the following tools do you use for data-engineering-related messaging and delivery?

Kafka stands out as the most popular choice for data-engineering-related messaging and delivery (58%), while RabbitMQ follows with 46%. Interestingly, only 2% of respondents stated that they do not use any messaging or delivery tools.

Do you run tests in your data engineering codebase?

Which testing frameworks do you use?

Most respondents don’t run tests in their engineering codebase. Among the 31% who do, the largest proportion either don’t use any frameworks or use Great Expectations.

Big Data:

2023

Thank you for your time!

We hope you found our report useful. Share this report with your friends and colleagues.

If you have any questions or suggestions, please contact us at surveys@jetbrains.com.