Spark DataFrame coding assistance
The Spark plugin provides coding assistance for Apache Spark DataFrames in your Python code.
Completion for available columns
If you create a DataFrame or read it from a file, PyCharm will assist you in accessing the DataFrame columns, for example, while selecting or filtering DataFrames.
Detecting unresolved columns
If you refer to a column that doesn't exist in the DataFrame, PyCharm highlights it and suggests replacing it with one of the available column names.
You can enable and disable this inspection in the IDE settings (Ctrl+Alt+S), under
.Getting a schema
Completion of column names and the corresponding inspection are available if PyCharm can access the DataFrame schema. The schema can be specified in multiple ways:
Columns and their types are specified directly in the
read
method:df = (spark.read .schema("name STRING, value BIGINT, planet STRING") .parquet("aliens.parquet")) .parquet("aliens.parquet"))The schema is specified as a separate variable and then used in the
read
method:schema = StructType([ StructField("name", StringType(), False), StructField("value", LongType(), False), StructField("planet", StringType(), False), ]) df = spark.read.schema(schema).parquet("aliens.parquet")
If you have not specified schema in either of these ways, you can use the dedicated inlay hint to infer the schema from a Parquet file. The file can be located locally or on a remote storage.
Infer schema from a file
Use the
read.parquet()
method in your Spark code, for example:df = spark.read.parquet("/myfilepath")Click the
Choose schema
inlay hint.In the window that opens, select a file from which the schema can be inferred.
The schema inferred from the selected file will be displayed as an inlay hint next to the method. You can hover over it to preview the available columns and their types. And you can click it to insert the schema using the
schema
method or to select another one.
You can enable and disable this inlay hint in the IDE settings (Ctrl+Alt+S), under
.