Big Data
Share:
The questions in this section were shown to developers involved in Data Analysis, Data Engineering, Machine Learning, or to those whose job role was Data Analyst / Data Engineer / Data Scientist. This survey was targeted specifically at developers, so the results may not be representative of the wider big data audience.
Professionals who are not involved in data pipeline creation use traditional relational databases for building data lakes. Spark continues to be the most popular tool for batching and streaming processing.
Quite predictably, Apache Airflow is the most popular orchestration tool – especially among data engineers. Interestingly, 10% of the orchestration tools are custom or self-built.
Kubernetes, YARN and Amazon EMR are the most popular cloud solutions for Spark execution.
The vast majority of the respondents do not use MPP tools. BigQuery, Redshift, and Azure SQL Data Warehouse are the most popular instruments.
Thank you for your time!
We hope you found our report useful. Share this report with your friends and colleagues.
If you have any questions or suggestions, please contact us at surveys@jetbrains.com.