Work with Data Wrangler

Open Data Wrangler

Open a Jupyter notebook.

Run code cell to create a pandas dataframe. For example, run cell with the following code:

import pandas as pd

# Data
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda', 'Dina', 'Kate', 'Tom', 'Emily'],
'Age': [22, 78, 22, 30, 45, 30, 35, 40],
'Gender': ['Male', 'Female', 'Male', 'Female', 'Female', 'Female', 'Male', 'Female'],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia', 'San Antonio', 'San Diego'],
'Occupation': ['Engineer', 'Doctor', 'Teacher', 'Nurse', 'Architect', 'Lawyer', 'Accountant', 'Scientist']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Display the DataFrame
df

In the upper-right corner of the output cell, click Open Data Wrangler.
Data Wrangler will open in a new tab:

Use Data Wrangler transformations

Transformation	Description
Sort and filter
Filter	Filters rows in a selected column based on a specified condition and value note Currently is not supported for string values
Clean and remove
Drop column	Removes a selected column from a table
Remove duplicates	Removes all rows that have duplicate values from a selected column
Drop missing values	Removes all rows with missing values from a selected column
Remove rows with NaN values	Removes rows that contain empty values from a table
Drop rows	Removes selected rows from a table
Find and replace
Find and replace	Replaces cells with a specified matching pattern from a selected column
Create additional column
Transform column with string	Transforms strings in a selected column. You can select one of the following transformations: Capitalize first character Convert text to lowercase Convert text to uppercase
One-hot encoding categorical variables	Splits categorical data from a selected column into a new column for each category
Other
Fill missing	Replaces cells with missing values with a new value in a selected column
Round numerical	Rounds numbers in a selected column to the specified number of decimal places: Round: rounds a number to the nearest integer. If the fraction of the number is 0.5 or higher, it rounds up. If it's less than 0.5, it rounds down. Floor: rounds a number down to the nearest integer Ceil: rounds a number up to the nearest integer
Split column	Splits a selected column into several columns based on a user defined delimiter
Change a type of column	Changes the data type of selected column
Normalize and scale
Min-Max scaling	Rescales a selected numerical column between a minimum and maximum value
Z-Score normalization	Transforms the data from a selected column into a distribution with a mean of 0 and a standard deviation of 1
Handling outliers or skewed distributions
Outlier detection with IQR	Detects outliers in a selected column using Interquartile Range
Reduce skewness	Reduces skewness by applying logarithmic or square root transformation to the data in a selected colum
Outlier detection with MAD	Detects outliers in a selected column using Median Absolute Deviation
Outlier detection with Euclidean distance	Detects outliers in a selected column using Euclidean Distance