Classification problem

It is not about an exercise. I only have a general question about supervised machine learning:
Supose I have a database of transactions within bank accounts. I have a large dataset (bilions) of transactions. In some cases I know that a transaction is fraudulent, but for the majority os other transactions, I don’t know if are fraudulent or not.
Which strategy can I use to train the machine to learn in this situation?

Charles Wilis

Hi @wilisbr

You can use Semi-supervised learning or Self-supervised learning based on the number of unlabeled data. You can also use anomaly detection algorithms (Isolation Forest, Autoencoders, One-Class SVM, etc.)


I agree, anomaly detection seems appropriate.