Clustering-based Optimization in Fraud Detection Classifier Training

175 views
1 comments
0 likes
4 favorites

ConstanzeArnold
68 media
uploaded October 24, 2022

Fraud detection is an essential problem in the bank industry. It can create the loss of money and can do massive harm to the reputation of financial institutions. Therefore, in real-world examples, fraud comes as a prevalent and influential research area. The goal is to train the transactions classifier of two classes: fraudulent and regular transactions. Fraudulent transactions are a rare event that leads to very imbalanced data. Therefore, the imbalanced data set faces unsolved issues when used for classifier training. Let us have a data set of transactions. We suggest splitting the classification process into several ones. The training data set is clustered, and different sub-classifiers are trained on the clustered data. We chose XGBoost as the classifier of transactions. When testing the classification, the decision is made by a sub-classifier whose training set center is the
closest to the particular point from the training set. In our case, the proper criterion of classification is the F1 score because it is a harmonic mean of precision and recall. For the experimental evaluation of the suggested strategy, We use the credit card transaction database (https://data.world/ealtman/synthetic-credit-card-transactions) representing actual transactions of the credit card users living in the United States. The experiments show that we succeed in the significant increase of F1 score as compared with the case without clustering.