Andrew Young
Feb 12, 2021

--

Hey Kowndinya,

Great question. I don't know if it can handle blank values. You can try it. If you get an error, you can just replace all blanks in your dataframe with an arbitrary string, like "blank" or "null." Then, it will be treated as another level of a string-based categorical variable. If your variable has a numeric level encoding, you can choose an arbitrary numeric value, like "0" or "99" or something else that doesn't already exist as a level for that categorical variable.

Yes, isolation forest is meant to detect anomalous records. One use case is fraud detection.

In doing the above, make sure there is no "data leakage." In other words, if 100% of fraud records are associated with 100% of "blank" or "null," then you have data leakage.

--

--