DataGrip 2023.1 Help

Spark monitoring

With the Big Data Tools plugin, you can monitor your Spark jobs.

Typical workflow:

  1. Establish connection to a Spark server

  2. Adjust the preview layout

  3. Filter out jobs parameters

Connect to a Spark server

  1. In the Big Data Tools window, click Add a connection and select Spark.

  2. In the Big Data Tools dialog that opens, specify the connection parameters:

    Configure Spark connection
    • Name: the name of the connection to distinguish it between the other connections.

    • URL: the URL of the Spark server.

    Optionally, you can set up:

    • Per project: select to enable these connection settings only for the current project. Deselect it if you want this connection to be visible in other projects.

    • Enable connection: deselect if you want to restrict using this connection. By default, the newly created connections are enabled.

    • Enable tunneling: creates an SSH tunnel to the remote host. It can be useful if the target server is in a private network but an SSH connection to the host in the network is available.

      Select the checkbox and specify a configuration of an SSH connection (click ... to create a new SSH configuration).

    • Enable HTTP basic authentication: connection with the HTTP authentication using the specified username and password.

    • Proxy: select if you want to use IDE proxy settings or if you want to specify custom proxy settings.

  3. Once you fill in the settings, click Test connection to ensure that all configuration parameters are correct. Then click OK.

At any time, you can open the connection settings in one of the following ways:

  • Go to the Tools | Big Data Tools Settings page of the IDE settings Control+Alt+S.

  • Click settings on the Spark monitoring tool window toolbar.

Once you have established a connection to the Spark server, the Spark monitoring tool window appears.

Spark monitoring: jobs

The window consists of the several areas to monitor data for:

  • Application: a user application is being executed on Spark .

  • Job: a parallel computation consisting of multiple tasks.

  • Stage: a set of tasks within a job.

  • Environment: runtime information and Spark server properties.

  • Executor: a process launched for an application that runs tasks and keeps data in memory or disk storage across them.

  • Storage: server storage utilization.

  • SQL: specific details about SQL queries execution.

You can also preview info on Tasks, units of work that sent to one executor.

Refer to Spark documentation for more information about types of data.

Adjust layout

  • In the list of the application jobs, select a job to preview.

  • To focus on a particular stage, switch to the Stages tab.

    Job stages
  • To manage visibility of the monitoring areas, use the following buttons:

    Item

    Description

    Preview details

    Shows details for the selected stage.

    Show tasks

    Shows the list of the tasks executed during the selected stage.

    Showing stage details

  • Click Preview on web to preview any monitoring data in a browser.

Once you have set up the layout of the monitoring window, opened or closed some preview areas, you can filter the monitoring data to preview particular job parameters.

Filter out the monitoring data

  • Use the following buttons in the Applications, Jobs, and Stages tabs to show details for the jobs and stages with specific status.

    Item

    Description

    Running jobs

    Show running applications, jobs, or stages

    Succeeded jobs

    Show succeeded applications, jobs, or stages

    Failed status

    Show failed jobs or stages

    Unknown status

    Show jobs or stages with unknown status

    Test skipped

    Show skipped stages

  • Filter the list of applications by a start time and end time. Besides, you can specify the limit of the items in the filtered list.

    Filtering applications in Spark monitoring
  • Manage content within a table:

    • Click a column header to change the order of data in the column.

    • Click Show/Hide columns on the toolbar to select the columns to be shown in the table:

      Select columns to show in the table

At any time, you can click Refresh on the Spark monitoring tool window to manually refresh the monitoring data. Alternatively, you can configure the automatic update within a certain time interval in the list located next to the Refresh button. You can select 5, 10, or 30 seconds.

Last modified: 21 June 2023