Artificial Intelligence (AI)
Discuss current events in AI and technological innovations with Intel® employees
541 Discussions

Enhanced Fraud Detection Using Graph Neural Networks with Intel Optimizations

0 0 3,981

Posted on behalf of Authors: 

Aasavari Kakne
Chaojun Zhang
Tianyi Liu
Yixiu Chen
Rita Brugarolas Brufau
Fanli Lin
Minmin Hou

Fraud is a serious problem in the financial services sector, causing billions of dollars in losses every year. Traditionally, fraud detection solutions are based on classical machine learning approaches such as logistic regression and gradient boosting. However, traditional machine learning algorithms have limited prediction capabilities and tend to fall short while encountering challenges such as complex and evolving fraudster behavior. In this blog, we detail the major challenges in fraud detection and how Intel’s tools and optimizations can help implement enhanced solutions.

Fraud Detection Challenges

  1. Severe class imbalance: genuine card transactions significantly outnumber fraudulent ones, leading to highly imbalanced datasets (typically less than 1% fraud examples). This often causes machine learning models to become biased towards the majority class (non-fraudulent transactions), thus struggling to detect fraud accurately.
  2. Evolving fraudulent transactions: Fraudulent transactions are constantly evolving and require complex modeling, which is difficult to achieve using traditional classical ML techniques. E.g., a fraudster may make many small transactions (that appear legitimate) to evade detection.
  3. Scale of data and speed of fraud detection: Credit card transaction datasets have billions of transactions that require distributed preprocessing and training to reduce the time for model deployment. Moreover, to detect fraudulent activity in time, we need an efficient inference pipeline.
  4. Limitations of the human-in-loop mechanism: Fraud detection mechanisms automatically flag potential fraudulent transactions and typically require the intervention of a human-in-loop who makes the final call. But a not-so-accurate model would either predict too many false positives (where human-in-loop gets overwhelmed by the volume of flagged transactions) or too many false negatives (where money is lost due to not catching the fraudulent transactions). Thus, a fraud detection model needs to focus on both high precision and recall.

Using Graph Neural Networks for Fraud Detection

In this fraud detection reference use case, we employ self-supervised graph neural networks (GNN) to learn users’ behavioral patterns and achieve 0.94 AUCPR (Area Under Precision Recall Curve) on IBM’s synthetically curated TabFormer dataset with 24 million transactions. Furthermore, we provide a low-code, fully customizable, distributed pipeline to boost developer productivity. Check out our fraud detection Jupyter notebook here!


Figure 1. Highlights of Intel’s GNN-enhanced fraud detection reference use case

High-level Architecture


Figure 2. Fraud detection reference use case architecture

Our workflow comprises three stages: feature engineering, GNN training, and XGBoost training. Each stage is config-driven, containerized, and supports a distributed workload.

Stage 1: Feature Engineering (Edge Featurization)

Our one-line run instruction allows users to effortlessly run preprocessing for both single node and distributed pipelines in only a few minutes on Intel® Xeon® CPUs. Furthermore, users can also bring their own raw datasets or their own preprocessing logic. To do so, please follow the instructions in the README for summary and next steps section.

Stage 2: GNN Training (Node Featurization)

Credit card transaction datasets can be modeled as bipartite graphs where cards and merchants are nodes and transactions are edges. Thus, powerful graph neural networks can be trained using them. Using a graphical representation of the Tabformer dataset, we train a GNN model with a self-supervised Link prediction task, i.e., learning to predict if a given edge is real or fake. Our GNN model has a learnable embedding layer, 2-layer GraphSAGE, and 2-layer MLP (Multi-Layer Perceptron) to learn the latent representations for cards and merchants solely from the graph topology, thus capturing complex behavioral patterns of credit card users. Lastly, learned representations are augmented with feature-engineered data to be passed on to stage 3, i.e., fraud classification using XGBoost.

Intel’s low-code config-driven GNN workflow allows you to construct graph dataset, train GNN model and augment GNN representations with feature engineered data with ease and allows customization of each step to your own use case. For further details, please review Figure 4 and visit our graph neural networks and analytics GitHub repository.


Figure 3. Intel’s GNN workflow

Stage 3: XGBoost Training (Fraud Classification)

Using GNN-augmented features from stage 2, we train an XGBoost model with a supervised fraud classification task. XGBoost is a popular model for fraud detection because of its ability to handle large data ranges and severe class imbalance observed in credit card datasets such as TabFormer.

Intel’s classical ML workflow integrates automated hyper-parameter optimization using Optuna for a single node and distributed pipeline. We publish our best results and corresponding hyper-parameters for the users to replicate our results. Through GNN-enhanced features, we observe a boost in AUCPR for all three data splits in both single-node pipeline and distributed mode. For more detailed information about the performance, please refer to the performance evaluation section below.

Performance Evaluation

Test Methodology

Our baseline is an XGBoost trained on feature-engineered data, whereas the final model is an XGBoost trained on GNN-enhanced data in single node vs. distributed setting. We perform automated hyper-parameter optimization for each model, i.e., baseline and final model for both single node and distributed setting.

We assessed performance using two key metrics:

  • AUCPR: We used the Area Under the Precision-Recall Curve (AUCPR) to evaluate fraud detection accuracy because:
    1. It considers both precision and recall, making it robust for highly imbalanced datasets.
    2. Its aptness for fraud detection. Reducing AUCPR means reducing the volume of transactions flagged for human evaluation. The average number of incoming credit card transactions per minute is in millions; thus, flagging too many transactions simply overwhelms the human annotators and makes it hard to catch fraud in time.)
  • Training speed: We measured the time to train models on single versus two nodes to compare scalability.

Experimental Results

In this article, the term “preprocessing” refers to the feature engineering stage only, and the term "Baseline XGB" refers to the XGBoost model trained with feature engineering and XGBoost training only. The term "GNN Boosted XGB" refers to the XGBoost model trained using all stages, including feature engineering, GNN training, and XGBoost training. We compare the GNN-boosted XGB model with the baseline XGB model to evaluate the merit of adding the GNN training stage. As shown in Table 1 below, our single-node experiments proved a significant 6% performance improvement on the test set, improving accuracy from 0.88 to 0.94 [1]. We further scaled out the experiments to 2 nodes, improving accuracy from 0.90 to 0.94, and achieved another 4% performance gain [1].


Table 1. AUCPR for classical ML (Baseline XGB vs. GNN Boosted XGB)

As shown in Figure 4 below, the distributed GNN delivered a 1.41x speedup in end-to-end execution time, the baseline XGBoost training showed a 3.07x speedup, and the total distributed processing time delivered a 1.27x speedup [1]. Please note that the distributed data processing time of GNN embedding is significantly larger than single mode, which resulted in extra time overhead and caused the regression of GNN boosted model, e.g., data loading, data splitting.



Figure 4. Execution time for Single node vs. Distributed mode (2 nodes)

As shown in Figure 5 below, the distributed GNN delivered a 1.56x speedup for training time, and the distributed GNN boosted XGB delivered a 1.58x speedup for training time, and the baseline XGB delivered a 4.75x speed up [1].



Figure 5. Training time for Single node vs. Distributed mode (2 nodes)


In this blog, we demonstrated the benefits of augmenting traditional ML techniques with graph neural networks to generate more expressive features for downstream fraud classification. To help data scientists overcome the knowledge barriers towards GNN and multi-node distributed training, we provide a no-code, config-driven user interface that enables easy customization of every step in our end-to-end fraud detection pipeline. Lastly, users can benefit from automated hyper-parameter optimization for XGBoost models.

We welcome your feedback, questions, and comments on our Fraud Detection reference kit. To receive support from our technical team, please submit a GitHub issue here.

We encourage you to check out Intel’s other AI Tools and Framework optimizations and learn about the unified, open, standards-based oneAPI programming model that forms the foundation of Intel’s AI Software Portfolio.


  1. TabFormer from IBM

  2. Enhanced Fraud Detection using Graph Neural Networks from Intel[GitHub Repository]

  3. Intel Optimization for XGBoost

  4. Distributed Classical ML Workflow from Intel [GitHub Repository]

  5. Graph Neural Networks and Analytics from Intel [GitHub Repository]

  6. Optuna open source hyperparameter optimization framework

Product and Performance Information

System Configurations

[1] The tests were conducted on a two-node cluster; each was equipped with Intel® Xeon® Platinum 8352Y CPU @ 2.20GHz, and 512GB memory; the detailed configuration is listed in the tables below. Test by Intel in July 2023.


Table 1. Hardware Configuration for Experiment


Table 2. Software Configuration for Experiment

2 Performance varies by use, configuration, and other factors. Learn more at



Tags (1)
About the Author
Susan is a Product Marketing Manager for AIML at Intel. She has her Ph.D. in Human Factors and Ergonomics, having used analytics to quantify and compare mental models of how humans learn complex operations. Throughout her well-rounded career, she has held roles in user-centered design, product management, customer insights, consulting, and operational risk.