Configure Big Data Tools environmentUltimate
Before you start working with Big Data Tools, you need to install the required plugins and configure connections to servers.
Install the required plugins
Whatever you do in IntelliJ IDEA, you do it in a project. So, open an existing project (File | Open) or create a new project (File | New | Project ).
Press Ctrl+Alt+S to open IDE settings and select Plugins | Marketplace.
Install the following plugins:
Big Data Tools
Scala
Also, check that the Python plugin is enabled. It should be installed by default for IntelliJ IDEA Ultimate.
Install the Big Data Tools plugin.
Restart the IDE. After the restart, the Big Data Tools tab appears in the rightmost group of the tool windows. Click it to open the Big Data Tools window.
Once the Big Data Tools support is enabled in the IDE, you can configure a connection to a Zeppelin, Spark, Google Storage, and S3 server. You can connect to HDFS, WebHDFS, AWS S3, and a local drive using config files and URI.
Configure a server connection
In the Big Data Tools window, click
and select the server type. The Big Data Tools Connection dialog opens.
In the Big Data Tools Connection dialog, specify the following parameters depending on the server type:
File Systems: FS | Local, FS | HDFS
Storages: AWS S3, Minio, Linode, Digital Open Spaces, GS, Azure
Monitoring: Spark, Hadoop
Notebooks: Zeppelin
LocalHDFSAWS S3MinioLinodeDigital Open SpacesGSAzureSparkHadoopKafkaZeppelinMandatory parameters:
Root path: a path to the root directory.
Name: the name of the connection to distinguish it between the other connections.
Optionally, you can set up:
Per project: select to enable these connection settings only for the current project. Deselect it if you want this connection to be visible in other projects.
Enable connection: deselect if, for some reasons, you want to restrict using this connection. By default, the newly created connections are enabled.
Once you fill in the settings, click Test connection to ensure that all configuration parameters are correct. Then click OK.
temporarly
You can disable any connection if you temporarily do not need it. Right-click the corresponding item in the BigDataTools window and select Disable Connection from the context menu. The server changes its visual appearance and behavior: you cannot preview its content. To restore the connection, right-click it and select Enable Connection from the context menu.
For your convenience, you can rename the server root and copy a path to it. To quickly access all the required actions, right-click the target server in the BigDataTools window and select the corresponding command from the context menu.
Now that you have established a connection to the server, you can start working with your notebooks. However, it might be a good practice to ensure that all the libraries and packages required for execution on a particular server are installed and available.
Configure notebook dependencies
From the main menu, select File | Project Structure.
In the Project Structure dialog, select Modules in the list of the Project Settings. Then select any of the configured connections in the list of the modules and double-click System Dependencies.
Inspect the list of the added libraries. Click the list and start typing to search for a particular library.
If needed, modify the list of the libraries
Click
to add a new library.
Click
and specify the URL of the external documentation.
Click
to select the items that you want IntelliJ IDEA to ignore (folders, archives and folders within the archives), and click OK.
Click
to remove the selected ordinary library from the library or restore the selected excluded items. The items themselves will stay in the library.
Manage Zeppelin interpreters
You can configure interpreters on a Zeppelin server. Once an interpreter is added, it is available for all notes on this server.
Configure Zeppelin interpreters
Open interpreter settings using one of the following ways:
Click the
on the notebook toolbar.
Right-click a Zeppelin server in the BigDataTools tool window and select Open Interpreter Settings from the context menu.
Preview the list of the available interpreters in the Interpreter Settings window.
Note that the list of the interpreters is identical to the list that opens in the Interpreter Bindings dialog for Zeppelin 0.8 and earlier. For Zeppelin 0.9, Interpreter Bindings shows only interpreters in use. To filter out the list of the interpreters, type the target name in the Search field.
You can use the following actions of the interpreter toolbar:
Item Description Updates the list of the interpreters.
Opens a dialog to add a new interpreter. You can include a new interpreter to an existing group of interpreters and configure its settings.
Deletes the selected interpreter.
Restarts the selected interpreter.
Opens a dialog to add, remove, and modify interpreter repositories.
Preview the settings of the target interpreter.
When an interpreter has resolved all dependencies and it is ready for use, its status is shown as Ready.
If the selected interpreter is a root of the interpreter group, you should see the interpreters that are included in this group. For example, the
spark
group consists of%spark
,%spark.sql
,%spark.pyspark
,%spark.ipyspark
,%spark.r
,%spark.ir
,%spark.shiny
,%spark.kotlin
Select SHARED, SCOPED, or ISOLATED interpreter binding modes. In shared mode, every note using this interpreter shares a single interpreter instance. Scoped and isolated mode can be used under per user or per note dimensions. In scoped per note mode, each note will create a new interpreter instance in the same interpreter process. In isolated per note mode, each note will create a new interpreter process.
Select the Set permission checkbox and specify the owner names, if you want to restrict access to the selected interpreter.
Select the Connect to existing process checkbox to provide a Host and Port on the target server.
You can add interpreter Properties or modify the predefined set of properties and their values. Properties are exported as environment variables on the system if the property name consists of upper-case characters, numbers, or underscores ([A-Z_0-9]). Otherwise, the property is set as a common interpreter property. See more details in the Apache Zeppelin documentation.
For example, you can add the zeppelin.SparkInterpreter.precode property and put some code into the Value field to execute on interpreter init.
This code is resolved in a note after initialization of the interpreter:
In the Dependencies area add any library you want to use with the selected interpreter. If needed, specify the files that should be excluded.
Click to update the list of the interpreters. To restart the selected interpreter, click
.
Manage repositories
To open Repository Settings, click
on the interpreter toolbar.
You can refresh the list of the repositories (
), add a new repository (
), and remove the selected repository (
).
To add a new repository, click
and fill in the repository settings:
Mandatory parameters:
Id: a unique name of the repository
Url: address of the repository
Optionally, you can set up:
Name: a username to access the repository
Password: a password to access the repository
Host: an HTTP or HTTPS server where the repository resides
Port: a port of the repository server
Name and Password: user credentials to access the repository server