Simple explanation of Feature Scaling: Scale to Succeed!

Prajwal Srinivas
2 min readSep 6, 2022

--

Feature Scaling, Normalization, Standardization — what, when and how to use.

Feature Scaling

Suppose one dataset contains a set of features, and one of the features is in kilograms while the other is in grams, another one is liters, and so on. Now, how to use these features for model building? It is an obstacle as a few machine learning algorithms (like Regression, KNN, Neural Networks) are highly sensitive to these features and algorithms that compute the distance between the features are biased towards numerically larger values (unlike tree algorithms, Naive Bayes, LDA) if the data is not scaled. From here the concept of feature scaling comes in. The process of normalizing or in other words leveling the range of features in a dataset is known as feature scaling. If feature scaling is not performed while using these algorithms will impact the accuracy.

In reality 8.2 feet > 140 cm. But distance-based machine learning algorithm will interpret it as 140 > 8.2, because it is checking only the values.

Types of Feature Scaling — Normalization and Standarization

Normalization or MinMax Scaling— Refers to the scaling of data between a particular range ie. a minimum to a maximum. To find the new scaled value, the minimum value in the dataset is subtracted from the value and then divided by the range of the dataset(maximum-minimum). This scales the range to [0, 1] or sometimes [-1, 1]. Normalization is useful when there are no outliers as it cannot cope up with them.

Standardization or Z Score Normalization— Refers to centering the data around the mean such that the standard deviation is 1. To find the new scaled value, the value is subtracted from the mean and dividing by the standard deviation. It is not bounded to a certain range. Standardization can be helpful in cases where the data follows a Gaussian distribution. Standardization does not get affected by outliers because there is no predefined range of transformed features.

To conclude, normalization makes the data homogenous over all records and fields. Whereas data standardization is the process of placing dissimilar features on the same scale.

Hope this article provide clarity on Feature Scaling, its types and utility. Happy Learning!

--

--

Prajwal Srinivas
Prajwal Srinivas

Written by Prajwal Srinivas

Master’s of Data Analytics Engineering Student @ Northeastern University | Ex- HSBC | Ex-TCS

No responses yet