PyCharm
 
Get PyCharm

Custom Spark cluster

Last modified: 11 February 2024

In the Spark Submit run configuration, you can use AWS EMR or Dataproc as remote servers to run your applications. Besides these two options, you can also configure your own custom Spark cluster: Set up an SSH configuration to connect to a remote server and, optionally, configure connections to a Spark History server and an SFTP connection.

If you've set up both Spark History and SFTP connections, they will be available under Custom Spark Cluster in the Big Data Tools tool window.

Select Spark Submit

You can now select this cluster as a remote target in the Spark Submit run configuration. When you launch this run configuration, you'll be able to open the Spark job in the Services tool window by clicking the link in the application output.