Big Data

The questions in this section were shown to developers involved in Data Analysis, Data Engineering, Machine Learning, or to those whose job role was Data Analyst / Data Engineer / Data Scientist. This survey was targeted specifically at developers, so the results may not be representative of the wider big data audience.

Which of these batch processing tools do you use?

Which of these streaming processing tools do you use?

Professionals who are not involved in data pipeline creation use traditional relational databases for building data lakes. Spark continues to be the most popular tool for batching and streaming processing.

Which of these orchestration tools do you use?

Quite predictably, Apache Airflow is the most popular orchestration tool – especially among data engineers. Interestingly, 10% of the orchestration tools are custom or self-built.