DataSpell 2022.2 Help

Get started

DataSpell is an Integrated Development Environment (IDE) that is dedicated to specific tasks for exploratory data analysis and prototyping ML (machine learning) models. You can download it from https://www.jetbrains.com/dataspell/.

jupyter notebook

Ensure you are all set

Before you start, check if all required software is available for your environment and installed on your machine.

Supported languages

To start developing in DataSpell you need to download and install Python from python.org and R from https://cran.r-project.org/.

DataSpell supports the following versions:

  • Python 2: version 2.7

  • Python 3: from the version 3.6 up to the version 3.11

  • R: 3.4 and later.

Supported platforms

DataSpell is a cross-platform IDE that works on Windows, macOS, and Linux. Check the system requirements:

Requirement

Minimum

Recommended

RAM

4 GB of free RAM

8 GB of total system RAM

CPU

Any modern CPU

Multi-core CPU. DataSpell supports multithreading for different operations and processes making it faster the more CPU cores it can use.

Disk space

2.5 GB and another 1 GB for caches

SSD drive with at least 5 GB of free space

Monitor resolution

1024×768

1920×1080

Operating system

Officially released 64-bit versions of the following:

  • Microsoft Windows 8 or later

  • macOS 10.14 or later

  • Any Linux distribution that supports Gnome, KDE , or Unity DE. DataSpell is not available for some Linux distributions, such as RHEL6 or CentOS6, that do not include GLIBC 2.14 or later.

Pre-release versions are not supported.

Latest 64-bit version of Windows, macOS, or Linux (for example, Debian, Ubuntu, or RHEL)

If you need assistance installing DataSpell, see the installation instructions.

Install Conda

If you plan to use Conda environment:

  1. Download Anaconda.

  2. Install Anaconda using the installation instructions.

Find your way

Once you run DataSpell, it shows the Welcome screen, the starting point to your work with the IDE, and configuring its settings.

When you run DataSpell for the very first time, it suggests configuring an environment for the default workspace. Conda is the recommended option, as it has Jupyter and data science libraries (like pandas) available out of the box.

If you have any Conda environment installed on your machine, DataSpell will suggest it. If no Conda has been detected, you'll be provided with the Conda download link, so that you can download and install it first.

welcome screen

The workspace is a directory that contains all your notebooks and local datasets. You can attach other directories and projects to the workspace.

An environment is required to execute local notebooks.

  1. The following steps depend on your choice:

    Select any of the existing Conda interpreters. Alternatively, click Select an interpreter and specify a path to the Conda executable in your file system, for example, C:\Users\jetbrains\Anaconda3\python.exe.

    • Specify the location of the new Conda environment in the text field, or click Conda environment location and find location in your file system. Note that the new Conda environment target directory must be empty!

    • Select the Python version from the list.

    • Specify the location of the Conda executable file in the text field, or click Conda executable location and find location in the Conda installation directory. You're basically looking for a path that you've used when installing Conda on your machine, for example, C:\Users\jetbrains\Anaconda3\python.exe.

    Select any of the existing interpreters. Alternatively, click Select an interpreter and specify a path to the Python executable in your file system, for example, C:\Python36\python.exe.

    • Specify the location of the new virtual environment in the text field, or click Virtual environment location and find location in your file system. Note that the directory where the new virtual environment should be located, must be empty!

    • Choose the base interpreter from the list, or click Choose the base interpreter and find a Python executable in the your file system.

      If DataSpell detects no Python on your machine, it provides two options: to download the latest Python versions from python.org or to specify a path to the Python executable (in case of non-standard installation).

    Using a system Python interpreter

    In the Interpreter field, type the fully-qualified path to the required interpreter executable, or click the Browse button and in the Select Python Interpreter dialog that opens, choose the desired Python executable and click OK.

    If DataSpell detects no Python on your machine, it provides two options: to download the latest Python versions from python.org or to specify a path to the Python executable (in case of non-standard installation).

    Downloading Python when installing the system interpreter

    You will need administrator privileges to install, remove, and upgrade packages for the system interpreter. When attempting to install an interpreter package through an intention action, you might receive the following error message:

    System Interpreter warning message

    As prompted, consider using a virtual environment for your project.

  2. Once you configure an environment, click Launch DataSpell. DataSpell creates a workspace so that you can start your work.

You can add local notebooks and datasets to the workspace, attach directories, and clone projects from Version Control Systems.

Get acquainted with the main UI elements:

UI overview

Refer to User interface for the detailed description.

Work with notebooks

In DataSpell, you can easily edit, execute, and examine execution outputs including stream data, images, and other media. Here is a typical workflow:

  1. Open or create a notebook

  2. Add some code cells

  3. Execute the cells and evaluate the results

  4. Debug code cells, if needed

Create a notebook file

  1. Do one of the following:

    • Right-click the target directory in the Workspace tool window, and select New from the context menu.

    • Press Alt+Insert

  2. Select Jupyter Notebook.

  3. In the dialog that opens, type a filename, example.

    Creating a new notebook file

    A notebook document has the *.ipynb extension and is marked with the corresponding icon: Jupyter Notebook file icon.

    A newly created notebook opens in the editor. It contains one code cell. You can change its type with the cell type selector in the notebook toolbar:

    Select a cell type

    Each cell has a toolbar for quick access to the basic actions, such as code execution or navigation. By default, cell toolbars are disabled. To enable them, open project settings/preferences (Ctrl+Alt+S), go to Jupyter, and select the Show cell toolbar checkbox.

Edit a notebook

  1. To edit a cell, just click it.

  2. Put some pandas code in the first code cell:

    import pandas as pd kernel_stats = pd.read_csv('libraries_by_python_version.csv') kernel_stats
  3. You do not need to install the pandas package in advance. Just click a highlighted line, press Alt+Enter, and select a suggested fix for the missing import statement.

  4. This example uses the libraries_by_python_version.csv dataset. Download it from libraries_by_python_version.csv and save in the project directory.

  5. Add more code or Markdown cells to your notebook. You can add a code cell after the very last cell, add a code cell or Markdown cell right after the selected cell, and insert a new cell after executing the selected cell. You can find these actions in the Cell main menu item.

    Adding a new cell

    Let's put some matplotlib code to visualize the data frame of the first code cell.

    import matplotlib.pyplot as plt plt.pie(kernel_stats['total_count'], labels=kernel_stats['library']) plt.show()
    Two code cells are added to the notebook

    Again, there is no need to preinstall matplotlib and numpy. Use Alt+Enter to fix imports.

You can edit code cells with the help of code insights, such as syntax highlighting, code completion, and so on.

Using code completion for matplotlib

Execute notebooks

You can execute the code of the notebook cells in many ways using the icons on the Jupyter notebook toolbar and cell toolbars, commands of the code cell context menu (right-click the code cell to open it), and the Run commands of the main menu.

  1. Once you’ve executed the cell, its output is shown below the code. You can click Open in new tab to preview tabular data in a separate tab of the editor.

    Cell execution

    Note that when you work with local notebooks, you don’t need to launch any Jupyter server in advance: just execute any cell and the server will be launched.

    The Jupyter tool window shows the execution status. the current values of the variables in the Variables tab. You can preview the variables declared in your code in the Jupyter Variables tool windows.

  2. Now execute the second cell. Its code depends on a variable from the first cell, so the order of cell execution is important.

    Run the second cell

    You can copy the built plot or save it as an image. To execute all cells, click run all on the notebook toolbar.

Debug

  1. Click the gutter (the leftmost space in the editor) to set the breakpoints in the selected cell.

  2. Press Shift+Alt+Enter for Windows/Linux or ⌥⇧↩ for macOS (or select the Debug Cell command from the extended set of actions in the cell toolbar). To debug the entire notebook, select Run | Debug from the main menu.

    Debug a cell
  3. Use the stepping toolbar buttons to choose on which line you want to stop next and switch to the Debugger tool window to preview the variable values.

    Debug tool window

Manage connections

You can run notebooks on different servers and kernels. You work with two types of Jupyter servers: configured and managed:

  • Managed servers are automatically launched by DataSpell for the current project. They are terminated when you close DataSpell.

  • Configured servers. You connect to these servers by specifying its URL and token. You can connect to a local or remote Jupyter server.

Configure servers

When you launch any Jupyter server, by default it uses the current project interpreter and the automatically selected port. However, you can select any other interpreter available in your DataSpell instance and specify an alternative port. You can also connect to any configured server if you know its URL and token.

  1. To open the server settings, select Configure Jupyter Server in the list of the Jupyter servers on the Jupyter notebook toolbar.

    Configure a Jupyter server
  2. To connect to any running Jupyter server, select Configured Server and specify the server's path including a URL and a token.

  3. In the Jupyter toolbar, from the list of the servers, select Switch to the current Jupyter Server to explicitly switch to the configured server.

See Manage Jupyter notebook servers for more details.

Tune your working environment

A virtual environment based on a Python interpreter is required to execute Python code in your notebooks. So, you need at least one environment be configured on your machine.

When you open an existing project in DataSpell or connect to a Jupyter server, the IDE creates a virtual environment for you. In most cases, it is a Conda environment based on your Anaconda installation. You can select any other Conda environment on your machine or create a new one.

Change your environment

  • Change the environment with the Python interpreter selector located in the lower-right corner of the DataSpell UI. Click it and select the target environment from the list.

    Select an interpreter
  • Change Conda with Anaconda CLI

    1. In the Terminal window, run the ls command in the <Conda Home>/envs directory (for example, /Users/jetbrains/.conda/envs) and select the target environment.

    2. Navigate to the bin directory of your anaconda installation (for example, anaconda3/bin).

    3. Execute the conda activate <env name> command (for example, conda activate my-conda-env).

Create a new environment

  1. Select Add interpreter in the Python interpreter selector.

    Add an interpreter
  2. In the Add Python Interpreter dialog, enter the name of the new environment, and specify the Anaconda base in the Conda executable field.

    Adding a new Conda environment

The reason for creating various Conda environments based on the same Anaconda installation is obvious - you can install specific packages for each environment and use them for specific tasks and projects. You can also select other types of environments, venv or pipenv.

Install packages

  1. In the Python interpreter selector, choose the target environment and select Interpreter Settings.

    Managing packages
  2. Click new environment to add a new package. Click the Conda package manager button (Conda package manager) to manage packages from the Conda repository. Otherwise, DataSpell will be using pip.

  3. Type a package name in the Search field and locate the target package. If needed, specify a package version.

  4. Click Install. Close the window on the task completion.

Go beyond

R language

With the R plugin installed in DataSpell, you can perform various statistical computing using R language and use coding assistance, visual debugging, smart running and preview tools, and other popular IDE features.

R plugin

Databases

As you might have noticed already, creating projects of the various types requires a data source. It is also quite possible that you inject SQL statements into your source code.

DataSpell Professional does not enable you to create databases, but provides facilities to manage and query them. Once you are granted access to a certain database, you can configure one or more data sources within DataSpell that reflect the structure of the database and store the database access credentials. Based on this information, DataSpell establishes a connection to the database and provides the ability to retrieve or change information contained therein.

Access to the databases is provided by the Database window (View | Tool Windows | Database). This tool window allows you to work with the databases. It lets you view and modify data structures in your databases, and perform other associated tasks.

DB tool window

See Database tools and SQL for details.

Last modified: 15 September 2022