Skip to main content
Trading, FinTech & Analytics — Case Study

AI-Powered Fraud Detection System

The client was a regional bank processing roughly two million card transactions per day. Their existing fraud detection was a rule-based system maintained by a small fraud-ops team, which had grown to several hundred rules over five years. Rule maintenance had become a full-time job, false positives produced legitimate-customer friction that drove call-center volume, and an emerging pattern of fraud — synthetic identity plus low-value testing transactions — was slipping under the existing rules because each transaction looked normal in isolation.

96%detection accuracy
<1sdecision latency
$2Mannual fraud savings
-35%false positive rate
AI-Powered Fraud Detection System
Category

Trading, FinTech & Analytics

Industry

Banking/Payments

Timeline

14 weeks from kickoff to live cutover with continuous post-launch tuning

Team size

5 specialists

Project Overview

The full story

The practical problem was that fraud had become contextual rather than rule-detectable. The synthetic identity pattern relied on small transactions across many merchants in sequence, none of which would trip a velocity rule. Behavioral signals — typing rhythm at the merchant terminal, geographic context, account history shape — were not represented in the rule engine, and the bank had no way to evaluate transaction context across the customer’s recent activity in under the strict latency budget that real-time payment authorization required.

We built a real-time fraud detection system that ran a gradient-boosted ensemble plus a sequence model over the customer’s recent transaction history, all behind a single decision endpoint with a strict latency budget. The model produced a fraud score and the top contributing factors per decision, which the fraud-ops team used both for investigation and for ongoing model refinement. The rule engine remained in place for known-pattern fraud, with the ML layer running in parallel and the orchestrator combining outputs.

What shipped was a fraud decision endpoint that returned an authorize, decline, or step-up decision in under one second, with the contributing factors logged for downstream investigation. The bank’s overall fraud loss dropped substantially in the first year, false-positive rates fell because the ML layer caught patterns the rules missed and let benign transactions through that the rules would have flagged, and the fraud-ops team shifted from rule maintenance to investigation and signal development.

The Problem

Rule maintenance had become a full-time job, and emerging contextual fraud was slipping under rules that worked one transaction at a time.

01Friction point

Several hundred rules accumulated over five years, with maintenance costs growing and unintended-interaction false positives rising.

02Friction point

Synthetic identity plus low-value testing transactions slipped under velocity rules because each transaction looked normal in isolation.

03Friction point

False positives drove call-center volume from legitimate customers, creating both cost and customer-experience drag.

04Friction point

Behavioral and geographic context were absent from the rule engine, so the bank could not detect contextual fraud patterns.

05Friction point

Real-time authorization had a strict latency budget that limited what additional logic could be added without breaking payment flow.

Our Approach

How we structured the engagement

Added ML in parallel with the rule engine, not as a replacement, so the bank kept its known-pattern coverage while gaining context.

  1. Phase 01Weeks 1-3

    Discovery

    Reviewed six months of confirmed fraud and false-positive cases, segmented by pattern, and worked with fraud-ops on which patterns the rules covered well versus poorly. Output: a feature set for the ML layer covering account history, behavioral, and contextual signals, plus a latency budget per decision step.

  2. Phase 02Weeks 4-5

    Architecture

    Designed a decision endpoint that ran the rule engine and the ML layer in parallel, combined outputs with a configurable policy, and returned an authorize, decline, or step-up decision. Used Kafka to ingest transaction context and AWS Lambda for the rule path, with the ML model on a low-latency inference endpoint.

  3. Phase 03Weeks 6-11

    Build

    Built the gradient-boosted ensemble for transaction-level scoring and a separate sequence model for sequence-level context. Implemented the contributing-factor extraction so every decision logged which features drove the score. Built a feedback ingestion path so confirmed fraud and false positives flowed back into weekly retraining.

  4. Phase 04Weeks 12-14

    Launch

    Ran in shadow mode for four weeks alongside the production rule engine, compared outputs, and tuned the combination policy until the ML layer added value without regressing the rules. Cut over with the combination policy active, monitored hourly during the first two weeks, and tuned thresholds against fraud-ops feedback.

System Architecture

What we built, component by component

  1. 01

    Transaction stream

    Kafka topic that captures every payment authorization request with full context for downstream scoring and analytics.

  2. 02

    Rule engine

    Existing rule-based detector retained for known-pattern fraud, running in parallel with the ML layer at decision time.

  3. 03

    Transaction scorer

    Gradient-boosted ensemble that scores each transaction based on account history, behavioral, and contextual features.

  4. 04

    Sequence model

    Captures fraud patterns that span multiple transactions in sequence, including the synthetic-identity testing pattern.

  5. 05

    Decision combiner

    Combines rule output, transaction score, and sequence score under a configurable policy, returns the final decision.

  6. 06

    Feedback ingester

    Pulls confirmed fraud and confirmed false positives into the weekly retraining pipeline with structured labels.

Data Flow

A transaction arrives on the Kafka stream, the rule engine and the ML layer score it in parallel within the latency budget, and the decision combiner returns authorize, decline, or step-up. Contributing factors are logged with each decision, fraud-ops investigates flagged cases, and confirmed outcomes flow back through the feedback ingester into the weekly retraining job that updates both the transaction scorer and the sequence model.

Transaction stream
Rule engine
Transaction scorer
Sequence model
Decision combiner
Key Decisions

The trade-offs we made and why

Decision 01Lead trade-off

Ran ML in parallel with the rule engine rather than replacing it

Replacing the rules would have introduced risk on known fraud patterns the bank had spent years tuning. Running in parallel preserved the rule coverage and let the ML layer add context-aware detection on top, with the combination policy as the place to tune the trade-off.

Decision 02

Split transaction scoring from sequence modeling

Transaction-level features and sequence-level features had different shapes and different model architectures. Splitting them produced cleaner training data and let each model do what it was good at, which combined to catch more patterns than a single model would have.

Decision 03

Logged contributing factors per decision

Black-box scores would have failed regulatory and fraud-ops needs alike. Contributing factors per decision gave fraud investigators a starting point and gave compliance a defensible record. SHAP made this practical on gradient-boosted trees without sacrificing model quality.

Decision 04

Shadow-mode for four weeks before any production traffic

Real-money authorization is a high-stakes deployment surface. Four weeks of shadow comparison against rule output produced the confidence and the tuning data to cut over safely, and surfaced one model edge case that would have caused a notable false-positive spike on go-live.

Outcomes

What changed for the client

detection accuracy

Precision-recall harmonic mean on a held-out three-month sample of confirmed fraud and confirmed legitimate transactions.

decision latency

P95 latency from authorization request received to decision returned, including both rule engine and ML layer paths.

annual fraud savings

Reduction in confirmed fraud losses across the rollout year versus a counterfactual baseline from the prior rule-only period.

false positive rate

Reduction in legitimate-customer declines, which materially reduced call-center volume from blocked transactions.

Tech Stack

The tools behind the system

Built with a deliberate stack chosen for production reliability and operational velocity.

4 componentsProduction-grade
PythonScikit-learnKafkaAWS Lambda
What we’d carry forward

Lessons learned from the build

01Lesson

Adding ML alongside the rules was safer and ultimately better than replacing them. The combination policy gave the fraud-ops team a tunable surface that responded to new patterns faster than either system alone would have, and we would default to this architecture on any rules-first replacement project.

02Lesson

Sequence modeling caught patterns the transaction model alone never would have. Splitting the work was the right call even though it doubled the deployment surface, because each model could be tuned to its own pattern shape and retrained independently as those patterns evolved.

03Lesson

Shadow mode is the deployment posture for real-money systems. Four weeks felt long in planning and felt short in retrospect. The edge case we caught in week three would have produced enough false positives on day one to undermine fraud-ops confidence in the entire system, and we would not skip shadow mode on any high-stakes deployment.

Related Services

Similar delivery work usually starts in these service areas

If you are exploring a similar product, workflow, or implementation challenge, these are the service tracks that usually fit best.

Industry Context

Where this project sits in the bigger market picture

How we approach AI delivery for payments, banking, underwriting, and financial workflows.

Similar Project?

Build a result-driven AI product with a team that has shipped before

If you are exploring a similar product, workflow, or AI use case, we can help scope the right architecture, delivery model, and first milestone.

Start with clarity

Have an AI idea, messy workflow, or product vision? Let's make it buildable.

Bring the problem. We'll help shape the product, define the architecture, and show the fastest path to a serious first version.

  • A practical first roadmap in the discovery call

  • Architecture, timeline, and delivery options in plain English

  • Security, scalability, and reliability discussed upfront

Model registry

softus-rag-v4.2

live

187ms

Latency

128k

Context

$0.004

Cost / req

Evaluation suite

Faithfulness94%
Answer relevance97%
Citation accuracy99%

Deploy pipeline

prod / canary 25% — healthy