PyCharm 2022.3 Help

HDFS

Connect to an HDFS server

  1. In the Big Data Tools window, click Add a connection and select HDFS.

  2. In the Big Data Tools dialog that opens, specify the connection parameters:

    HDFS connection
    • Name: the name of the connection to distinguish it between the other connections.

    • Root path: a path on the target server to be the root for HDFS connection.

      When the connection is successfully established, the Driver home path field shows the target IP address of connection including a port number. Example: hdfs://127.0.0.1:65224/.

    • Configuration source: select one of:

      • Configuration files directory: a path to the directory with the HDFS configuration files. See the samples of configuration files.

      • File system URI: URI of an HDFS server.

    Optionally, you can set up:

    • Per project: select to enable these connection settings only for the current project. Deselect it if you want this connection to be visible in other projects.

    • Enable connection: deselect if you want to restrict using this connection. By default, the newly created connections are enabled.

    • Username: enter a username to log in to the server. If not specified, the HADOOP_USER_NAME environment variable is used. If this variable is not defined, the user.name property is used. If Kerberos is enabled, it overrides any of these three values.

    • Enable tunneling (Only NameNode operation). Creates an SSH tunnel to the remote host. It can be useful if the target server is in a private network but an SSH connection to the host in the network is available. SSH tunneling currently works only for operators with the following name nodes: list files, get meta info

      Select the checkbox and specify a configuration of an SSH connection (click ... to create a new SSH configuration).

  3. Once you fill in the settings, click Test connection to ensure that all configuration parameters are correct. Then click OK.

Samples of Hadoop File System configuration files

Type

Sample configuration

HDFS

<?xml version="1.0"?> <configuration> <property> <name>fs.hdfs.impl</name> <value>org.apache.hadoop.hdfs.DistributedFileSystem</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://example.com:9000/</value> </property> </configuration>

S3

<?xml version="1.0"?> <configuration> <property> <name>fs.s3a.impl</name> <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value> </property> <property> <name>fs.s3a.access.key</name> <value>sample_access_key</value> </property> <property> <name>fs.s3a.secret.key</name> <value>sample_secret_key</value> </property> <property> <name>fs.defaultFS</name> <value>s3a://example.com/</value> </property> </configuration>

WebHDFS

<?xml version="1.0"?> <configuration> <property> <name>fs.webhdfs.impl</name> <value>org.apache.hadoop.hdfs.web.WebHdfsFileSystem</value> </property> <property> <name>fs.defaultFS</name> <value>webhdfs://master.example.com:50070/</value> </property> </configuration>

WebHDFS and Kerberos

<?xml version="1.0"?> <configuration> <property> <name>fs.webhdfs.impl</name> <value>org.apache.hadoop.hdfs.web.WebHdfsFileSystem</value> </property> <property> <name>fs.defaultFS</name> <value>webhdfs://master.example.com:50070</value> </property> <property> ​ <name>hadoop.security.authentication</name> <value>Kerberos</value> </property> <property> <name>dfs.web.authentication.kerberos.principal</name> <value>testuser@EXAMPLE.COM</value> </property> <property>​ <name>hadoop.security.authorization</name>​ <value>true</value>​ </property> </configuration>
Last modified: 20 January 2023