PyCharm 2024.2 Help

Databricks

The Databricks plugin allows you to connect to a remote Databricks workspace right from the IDE.

With the Databricks plugin you can:

Prerequisites

    Connect to Databricks workspace

    To create a new Databricks connection:

    1. Go to View | Tool windows | Databricks to open the Databricks tool window.

    2. Click New connection New connection. The Big Data Tools dialog opens.

    You can connect to your Databricks workspace using one of the following options:

    Connect via Profile

    1. In the Name field, enter the name of the connection to distinguish it between other connections.

    2. If you have a .databrickscfg file in your user root directory, it will be used automatically for the authentication via profile. You can select the profile from the drop-down menu if you have several profiles.

    3. If you want to edit the .databrickscfg file, click Open .databrickscfg File Open .databrickscfg File to open the file in the editor.

    4. Click Reload .databrickscfg File Reload .databrickscfg File to reload the changed file.

    5. Click Test connection to ensure that all configuration parameters are correct.

    6. Click OK to save changes.

    Connect to Databricks via Profile

    Connect via Databricks CLI

    1. In the Name field, enter the name of the connection to distinguish it between other connections.

    2. In the URL field, enter the URL of your Databricks workspace.

    3. If you do not have Databricks CLI installed, PyCharm will install it on the first try to establish a connection.

    4. Click Test connection to ensure that all configuration parameters are correct.

    5. Click OK to save changes.

    Connect to Databricks via Databricks CLI

    Connect via Azure CLI

    1. In the Name field, enter the name of the connection to distinguish it between other connections.

    2. In the URL field, enter the URL of your Databricks workspace.

    3. If you do not have Azure CLI installed, click the Install CLI link and follow the installation instructions on the website.

    4. Click Test connection to ensure that all configuration parameters are correct.

    5. Click OK to save changes.

    Connect to Databricks via Azure CLI

    Additionally, you can configure the following settings:

    • Enable connection: deselect if you want to disable this connection. By default, the newly created connections are enabled.

    • Per project: select to enable these connection settings only for the current project. Deselect it if you want this connection to be visible in other projects.

    Run and synchronize files

    Run as Workflow

    When you run a workflow on a Databricks cluster, your series of tasks or operations are executed in a specific sequence across multiple machines in the cluster. Each task in your workflow might be dependent on the output of previous tasks.

    1. Open a .py or .ipynb file in the editor.

    2. Do one of the following:

      • Click Run as Workflow in the Databricks tool window.

      • Right-click in the editor and select Run as Workflow from the context menu.

    Run on Cluster

    When you run a job or a notebook on a Databricks cluster, your code is sent to the cluster and is executed on multiple machines within the cluster. This method of execution contributes to faster processing and analysis, especially beneficial when dealing with large amounts of data.

    1. Open a .py file in the editor.

    2. Do one of the following:

      • Click Run on Cluster in the Databricks tool window.

      • Right-click in the editor and select Run on Cluster from the context menu.

    Synchronize project files

    You can synchronize your project files with a Databricks cluster:

    1. Specify a path to the folder on the Databricks cluster that you want to synchronize your files with.

    2. Click Start Sync.

    Synchronize project files with the Databricks cluster
    Last modified: 06 August 2024