Â
Search

# How do I scale my features? - Part 1 (Linear Transformation)

Updated: Jun 28

How do I scale my features?

In this blog, we will learn about different feature scaling techniques. We will also explore how one feature scaling technique fits better in the scenario.

Before diving into How to scale your features? Check out this blog to understand the importance of feature scaling and when to use it, and when not to use it.

In this part, we are only going to focus on Linear Transformation.

Check out the Key Takeaway section for highlights of this blog.

## MinMax Scaler

`x_scaled = (x - x_min)/ (x_max - x_min)`

• We will use this if we want to scale values in a specific range.

• Default range in between 0 and 1. The custom range can be between any two positive integers.

### Using Default Range ( 0 to 1)

Features values will get scaled between 0 and 1.

First 5 rows of scaled features

### Using Custom Range ( 2 to 10)

Features values will be scaled between the custom ranges. In our case, it is 2 to 10.

First 5 rows of scaled features

Although Minmax scalers scale the value within the range of 0 and 1, all the inliers will be shrunk to a very narrow range and outliers will still have a wide range.

## Standard Scaler

`x_scaled = (x - mean) / standard_deviation`

Standardization is a two-step process:

1. Mean Removal: Subtracting each term by mean will remove the mean.

2. Variance Scaling: Dividing by standard deviation will provide variance scaling.

Standard scaler is used where a dataset that contains higher magnitude variance may take precedence over other features in some algorithms and features with lower magnitude variance will contribute very little.

Standard Scaler cannot guarantee a balanced feature scale in case the feature has outliers. Out of all three central tendencies (Mean, Median and Mode), mean is most affected by outliers. Calculating mean of features with outliers will shift the mean towards outliers. Since mean is used in the calculation, so outliers will have affect on scaled value.

First 5 rows of scaled features

## MaxAbs Scaler

`x_scaled = x / abs(x_max)`
• In this scaler, each feature is scaled between -1 and 1 if the feature has negative values.

• In this scaler, each feature is scaled between 0 and 1 if the feature has positive values.

• The value of each feature is divided by the maximum absolute value of that feature.

• Like Minmax Scaler, it is also used to scale value between default ranges.

First 5 rows of scaled features
MaxAbs Scaler is also affected by the presence of Outliers

## Robust Scaler

`x_scaled = ( x - median) / quartile_range`

It is robust to all outliers.

It is a two-step process:

1. Median Removal: Subtracting each value by feature median

2. Quartile Range Scaling: Dividing each median-removed value by quartile range.

First 5 rows of scaled features

Unlike Minmax, Standard, and Maxabs Scaler, Robust Scaler is not affected by outliers because it uses the median to scale the value. And out of all central tendency, Median is least affected by an outlier.

## Key Takeaway

• Standard Scaler, Minmax Scaler, Maxabs Scaler are most affected by the presence of outliers in the features.

• Robust Scaler is least affected by the presence of outliers.

• Minmax Scaler and MaxAbs Scaler can be used to scale the value in a specific range.

• Standard Scaler is used to standardize the feature by removing mean and applying variance scaling.

• Robust Scaler is used to Standardize the feature, and the feature contains outliers.

I hope this blog helps you understand using the correct feature scaling technique.

Comments and feedback are most welcomed.