Big Data
Share:
The questions in this section were shown to developers involved in Data Analysis, Data Engineering, Machine Learning, or to those whose job role was Data Analyst / Data Engineer / Data Scientist.
Share:
Quite predictably, Apache Airflow is the most popular orchestration tool, especially among data engineers. Interestingly, 9% of the orchestration tools being used are custom or self-made.
Kubernetes, YARN, and Amazon EMR are the most popular cloud solutions for Spark execution. Kubernetes has been gaining in popularity year after year, while YARN usage has decreased by 8 percentage points year over year. Companies tend to prefer including data engineering tools in the other parts of the IT landscape instead of using separate systems like YARN.
The majority of respondents do not use MPP tools, but those who do tend to go with BigQuery, Redshift, or Azure SQL Data Warehouse.
A significant majority (64%) reported not using any engines for their data engineering tasks. Among the engine users, BigQuery, Databricks, and AWS Athena are equally popular, each with a 10% share. Amazon EMR, Redshift, AWS Glue, and Azure Analysis Services follow closely.
Kafka stands out as the most popular choice for data-engineering-related messaging and delivery (58%), while RabbitMQ follows with 46%. Interestingly, only 2% of respondents stated that they do not use any messaging or delivery tools.
Most respondents don’t run tests in their engineering codebase. Among the 31% who do, the largest proportion either don’t use any frameworks or use Great Expectations.
Thank you for your time!
We hope you found our report useful. Share this report with your friends and colleagues.
If you have any questions or suggestions, please contact us at surveys@jetbrains.com.