Data Science

The questions in this section were shown to developers involved in Business Intelligence, Data Analysis, Data Engineering, Machine Learning, or to those whose job role was Data Analyst / Data Engineer / Data Scientist or Business Analyst.

What kind of activity is data science, data analytics, or machine learning for you?

A considerable number of respondents seem to be juggling data science responsibilities alongside other activities. These findings suggest a democratization of the field is in progress, implying potential opportunities for data science market growth.

Does your team or data department have a dedicated Machine Learning Engineer role?

PyCharm

An all-in-one Python IDE for building data pipelines, analyzing data, prototyping, and deploying ML models with excellent support for Python, scientific libraries, interactive Jupyter notebooks, Anaconda, SQL and NoSQL databases, and more.

Learn more

What types of data do you analyze?

In which of the following activities are you involved?

What type of chart do you use most for data visualizations?

The majority of data science professionals find value in employing tried and true plots for data exploration and presentation. These types of charts are widely used in various data-related tasks such as data gathering, exploratory data analysis, data orchestration, and ML Ops.

Datalore

Datalore by JetBrains is a collaborative data science and analytics platform for teams, accessible right from the browser. Datalore notebooks are compatible with Jupyter and offer smart coding assistance for Python, SQL, R, and Scala notebooks, as well as no-code visualizations and data wrangling. Datalore’s Report builder allows teams to turn a notebook full of code and experiments into a clear, data-driven story. Teams can share notebooks, edit them together in real time, and organize their projects in workspaces.

Learn more

Does your team or data department have a dedicated Data Engineer role?

Close to half of all teams and departments have a dedicated Data Engineer or Machine Learning Engineer.

How did you learn data science, machine learning, or data engineering?

Specialized roles like Data Scientist, Data Engineer, and Machine Learning Engineer are relatively recent additions to the job market. Many respondents transition into these roles from related fields, necessitating the acquisition of new skills through self-study or online courses.

Including you, how many members does your data team have?

Over 50% of those working with data are in teams of five or more people.

Which IDEs or editors do you use for data science or data analytics?

How much of your working time is spent inside notebooks?

What do you use notebooks for?

Do you version your notebooks?

What versioning tools do you use?

While the majority of data science professionals do not version their notebooks, a substantial proportion (41%) opt to do so, and most of them choose Git or GitHub for versioning.

What tools do you use to present the results of your research?

Various implementations of Jupyter notebooks are widely popular in data science, with common use cases including exploratory data analysis, experimenting with data and data querying, and model prototyping. Approximately 40% of data science professionals use Jupyter notebooks to present their work results, but, interestingly, many (almost 50%) spend only 10%–20% of their time using Jupyter notebooks.

What sorts of computational resources do you use for data science tasks?

The majority of respondents rely on local resources for their data science work.

What types of data sources do you work with?

Although the majority uses local files, the share of those using SQL databases grew by 10 percentage points over the past year, highlighting the importance of SQL for data science.

What type of data do you use the most?

Do you use synthetic data in your work?

Most polled data scientists process custom-collected data, with the most prevalent data types being transactional data, time series data, images, and machine-generated data. Interestingly, 30% work with synthetic data – data manufactured artificially rather than generated by real-world events.

Do you train machine learning or deep learning models?

Machine or deep learning models are trained by approximately 40% of all respondents. However, this figure jumps to more than 60% among those who consider data work as their primary activity. This industry trend implies that predictive modeling is becoming the central aspect of working with data work.

How often do you retrain or update your machine learning models?

How much time do you spend each month on model training?

While half of the data science professionals retrain or update their machine learning models at least once a month, most spend less than 20 hours per month on the task.

Do you use GPUs to train your models?

The majority – 81% – of data science professionals use GPUs for model training. Efficient use of graphic processors can accelerate training and thus enhance model performance, making it an increasingly attractive resource for researchers and data specialists. This also emphasizes the importance and relevance of technological innovations in the world of machine learning.

How much VRAM do you usually need for your machine learning tasks?

Higher computing power is a clear trend for machine learning tasks. Nearly 80% of data science professionals now use 16 GB or more VRAM, while the share of those using 8 GB decreased by six percentage points over the past year.

What sorts of methods and algorithms do you use?

Core machine learning algorithms, like regression and tree-based methods, remain prevalent, though a significant number of data science professionals also embrace neural networks. The rising popularity and user-friendliness of transformer nets might explain why 30% of the respondents engage in NLP work. Interestingly, only 24% of participants reported using statistical testing in their work, indicating that machine and deep learning have surpassed classical statistics as fundamental data skills.

Which enterprise machine learning solutions do you use?

Amazon services stand out as the most popular enterprise cloud solutions. Remarkably, there has been a significant increase (of over 10 percentage points) in the adoption of enterprise machine learning solutions compared to the previous year.

What machine learning frameworks do you use?

TensorFlow edges slightly ahead of scikit-learn and PyTorch in popularity, with Keras and XGBoost also showing solid adoption rates. Interestingly, a significant proportion of respondents (19%) reported not using any specific framework.

What tools do you use for tracking model training experiments?

TensorBoard is the most commonly used tool, with a 23% share, followed by MLFlow with 10% and WandB with 7%. However, two-thirds of data science professionals aren’t using any specific tools for tracking their model training experiments.

Which of the following best describes the use of machine learning in your organization?

Machine learning and AI have become crucial components of daily business life, so it should come as no surprise that almost half of our respondents use various AI-based features integrated into the software they use.

Which enterprise cloud solutions do you use?

Which of the following data-driven activities are the most difficult to perform for you or your organization?

On average, what percentage of your team’s time is spent managing, cleaning, or labeling data?

What tools do you use for data cleaning?

Data quality is a typical issue for professionals and organizations that work with data, as nearly 50% dedicate 30% of their time or more to data preparation. An Anaconda study also confirms that data cleaning is emerging as the most time-consuming aspect of data professionals’ workflow. Almost half of our respondents opt for Integrated Development Environments (IDEs) to handle these types of tasks.

PyCharm Professional

The Python IDE for data science and web development

Datalore

A collaborative data science platform

Databases

Demographics

Data Science:

2023

2022

Thank you for your time!

We hope you found our report useful. Share this report with your friends and colleagues.

If you have any questions or suggestions, please contact us at surveys@jetbrains.com.

Data Science

What kind of activity is data science, data analytics, or machine learning for you?

Does your team or data department have a dedicated Machine Learning Engineer role?

PyCharm

What types of data do you analyze?

In which of the following activities are you involved?

What type of chart do you use most for data visualizations?

Datalore

Does your team or data department have a dedicated Data Engineer role?

How did you learn data science, machine learning, or data engineering?

Including you, how many members does your data team have?

Which IDEs or editors do you use for data science or data analytics?

How much of your working time is spent inside notebooks?

What do you use notebooks for?

Do you version your notebooks?

What versioning tools do you use?

What tools do you use to present the results of your research?

What sorts of computational resources do you use for data science tasks?

What types of data sources do you work with?

What type of data do you use the most?

Do you use synthetic data in your work?

Do you train machine learning or deep learning models?

How often do you retrain or update your machine learning models?

How much time do you spend each month on model training?

Do you use GPUs to train your models?

How much VRAM do you usually need for your machine learning tasks?

What sorts of methods and algorithms do you use?

Which enterprise machine learning solutions do you use?

What machine learning frameworks do you use?

What tools do you use for tracking model training experiments?

Which of the following best describes the use of machine learning in your organization?

Which enterprise cloud solutions do you use?

Which of the following data-driven activities are the most difficult to perform for you or your organization?

On average, what percentage of your team’s time is spent managing, cleaning, or labeling data?

What tools do you use for data cleaning?

PyCharm Professional

Datalore

Databases

Demographics

Thank you for your time!

Join JetBrains Tech Insights Lab

Raw data