Big Data
Share:
The questions in this section were shown to developers involved in Data Analysis, Data Engineering, Machine Learning, or to those whose job role was Data Analyst / Data Engineer / Data Scientist.
Share:
Quite predictably, Apache Airflow is the most popular orchestration tool, especially among data engineers. Interestingly, 9% of the orchestration tools being used are custom or self-made.
37%
45%
Kubernetes
30%
22%
YARN
27%
24%
Amazon EMR
11%
11%
Google Dataproc
9%
9%
Azure HDInsight
Kubernetes, YARN, and Amazon EMR are the most popular cloud solutions for Spark execution. Kubernetes has been gaining in popularity year after year, while YARN usage has decreased by 8 percentage points year over year. Companies tend to prefer including data engineering tools in the other parts of the IT landscape instead of using separate systems like YARN.
15%
13%
BigQuery
13%
11%
Redshift
11%
8%
Azure SQL Data Warehouse
9%
10%
Azure Data Explorer
5%
4%
ClickHouse
The majority of respondents do not use MPP tools, but those who do tend to go with BigQuery, Redshift, or Azure SQL Data Warehouse.
I work without a dedicated cluster
I create new clusters for my development tasks
I do all of my work on one cluster that never stops
Other
A significant majority (64%) reported not using any engines for their data engineering tasks. Among the engine users, BigQuery, Databricks, and AWS Athena are equally popular, each with a 10% share. Amazon EMR, Redshift, AWS Glue, and Azure Analysis Services follow closely.
Kafka stands out as the most popular choice for data-engineering-related messaging and delivery (58%), while RabbitMQ follows with 46%. Interestingly, only 2% of respondents stated that they do not use any messaging or delivery tools.
I don’t use any frameworks
Great Expectations
Deequ
Other
Most respondents don’t run tests in their engineering codebase. Among the 31% who do, the largest proportion either don’t use any frameworks or use Great Expectations.
Thank you for your time!
We hope you found our report useful. Share this report with your friends and colleagues.
If you have any questions or suggestions, please contact us at surveys@jetbrains.com.