Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Outlier

Outlier

Outliers are the extreme values that exhibit significant deviation from the other observations in our data set. By looking at the outlier, it initially seems that this data probably does not belong with the rest of the data set as they look different from the rest.

An outlier may occur due to the variability in the data, or due to experimental error/human error. They may indicate an experimental error or heavy skewness in the data(heavy-tailed distribution).

In the cases when you have a small sample size, outliers can significantly mess up all your results. For statistical analysis of data, outliers can impact the normality test results of our data, invalidate the basic assumptions like constant variances for regression testing etc.

Outliers tend to affect mean more than mdeian or mode

Figure 2:Outliers tend to affect mean more than mdeian or mode

Detecting Outliers

When starting an outlier detection quest you need to answer 2 important questions about your dataset:

Here are some of the techniques for detecting outliers:

Any data point whose Z-score falls out of 3rd standard deviation is usually considered an outlier

Figure 4:Any data point whose Z-score falls out of 3rd standard deviation is usually considered an outlier

It is an unsupervised model and needs to be re-calibrated each time a new batch of data is analyzed.

Then for prediction, it compares an observation against that splitting value in a “node”, that node will have two node children on which another random comparisons will be made. The number of “splittings” made by the algorithm for an instance is named: “path length”. As expected, outliers will have shorter path lengths than the rest of the observations.

If not correctly optimized, training time can be very long and computationally expensive.

Dealing with Outliers

Below are a few common practices to deal with Outliers: