Hey Kowndinya,

Great question. I don't know if it can handle blank values. You can try it. If you get an error, you can just replace all blanks in your dataframe with an arbitrary string, like "blank" or "null." Then, it will be treated as another level of a string-based categorical variable. If your variable has a numeric level encoding, you can choose an arbitrary numeric value, like "0" or "99" or something else that doesn't already exist as a level for that categorical variable.

Yes, isolation forest is meant to detect anomalous records. One use case is fraud detection.

In doing the above, make sure there is no "data leakage." In other words, if 100% of fraud records are associated with 100% of "blank" or "null," then you have data leakage.

--

a data scientist https://www.linkedin.com/in/andrewyoung16/

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store