Big Data Tools

The Big Data Tools plugin is available for PyCharm 2020.1 and later. It provides specific capabilities to monitor and process data with Zeppelin, AWS S3, Apache Spark, Apache Kafka, Apache Hive, Apache Flink, Google Cloud Storage, Minio, Linode, Digital Open Spaces, Microsoft Azure, and Hadoop Distributed File System (HDFS).

You can create new or edit existing local or remote Zeppelin notebooks, execute code paragraphs, preview the resulting tables and graphs, and export the results to various formats.

User interface of the IDE with the Big Data Tools plugin enabled

Getting started with Big Data Tools in PyCharm

The basic workflow for big data processing in PyCharm includes the following steps:

Configure your environment

Install the Big Data Tools plugin.
Create a new project in PyCharm.
Configure a connection to the target server.
Work with your notebooks and data files.

Work with notebooks

Create and edit a notebook.
Execute the notebook.
Analyze your data

Get familiar with the user interface

When you install the Big Data Tools plugin for PyCharm, the following user interface elements appear:

Big Data Tools window

The Big Data Tools window appears in the rightmost group of the tool windows. The window displays the list of the configured servers and files structured by folders. Even when no connections are configured, you can see the available types of servers to connect to.

Basic operations on notebooks are available from the context menu.

You can navigate through the directories and preview columnar structures of .csv, .parquet, .avro, and .orc files.

Basic operations on data files are available from the context menu. You can also move files by dragging them to the target directory on the target server.

For the basic operations with the servers, use the window toolbar:


	Adds a new connection to a server.
	Deletes the selected connection.
	Only for Zeppelin servers. Opens a window to search across all the available Zeppelin connections.
	Refreshes connections to all configured servers.
	Opens the connection settings for the selected server.
	Only for file storages. Display the server content in a separate file viewer.

If you have any questions regarding the Big Data Tools plugin, click the Support link and select one of the available options. You can join the support Slack channel, submit a ticket in the YouTrack system, or copy the support email to send your question.

Notebook editor

In the notebook editor, you can add and execute Python code paragraphs. When editing your code paragraph, you can use all the coding assistance features available for a particular language. Code warnings and errors will be highlighted in the corresponding code constructs in the scrollbar. The results of paragraph execution are shown in the preview area below each paragraph.

Use the notebook editor toolbar for the basic operations with notebooks:


	Executes all paragraphs in the notebook.
	Stops execution of the notebook paragraphs.
	Clears output previews for all paragraphs.
	Select Export Note Code to HTML to save the note as an HTML file. Select Toggle Code Visibility to hide code paragraphs (by default, all types of paragraphs are shown).
	Opens the Interpreter Bindings dialog to configure interpreters for the selected notebook.
	Click this button to open the notebook in the browser or copy a link to it.
	Allows you to jump to a particular paragraph of a notebook.
	Shows the minimap for quick navigation through the notebook.

A toolbar of a local note contains a list of available Zeppelin servers, so that you can select one to execute the note.

The notebook editor toolbar also shows the status of the last paragraph execution.

Monitoring tool windows

These windows appear when you have connected to a Spark or Hadoop server.

Last modified: 25 August 2022