AI-Powered Fraud Detection System
The client was a regional bank processing roughly two million card transactions per day. Their existing fraud detection was a rule-based system maintained by a small fraud-ops team, which had grown to several hundred rules over five years. Rule maintenance had become a full-time job, false positives produced legitimate-customer friction that drove call-center volume, and an emerging pattern of fraud — synthetic identity plus low-value testing transactions — was slipping under the existing rules because each transaction looked normal in isolation.
Trading, FinTech & Analytics
Banking/Payments
14 weeks from kickoff to live cutover with continuous post-launch tuning
5 specialists
The full story
The practical problem was that fraud had become contextual rather than rule-detectable. The synthetic identity pattern relied on small transactions across many merchants in sequence, none of which would trip a velocity rule. Behavioral signals — typing rhythm at the merchant terminal, geographic context, account history shape — were not represented in the rule engine, and the bank had no way to evaluate transaction context across the customer’s recent activity in under the strict latency budget that real-time payment authorization required.
We built a real-time fraud detection system that ran a gradient-boosted ensemble plus a sequence model over the customer’s recent transaction history, all behind a single decision endpoint with a strict latency budget. The model produced a fraud score and the top contributing factors per decision, which the fraud-ops team used both for investigation and for ongoing model refinement. The rule engine remained in place for known-pattern fraud, with the ML layer running in parallel and the orchestrator combining outputs.
What shipped was a fraud decision endpoint that returned an authorize, decline, or step-up decision in under one second, with the contributing factors logged for downstream investigation. The bank’s overall fraud loss dropped substantially in the first year, false-positive rates fell because the ML layer caught patterns the rules missed and let benign transactions through that the rules would have flagged, and the fraud-ops team shifted from rule maintenance to investigation and signal development.
Rule maintenance had become a full-time job, and emerging contextual fraud was slipping under rules that worked one transaction at a time.
Several hundred rules accumulated over five years, with maintenance costs growing and unintended-interaction false positives rising.
Synthetic identity plus low-value testing transactions slipped under velocity rules because each transaction looked normal in isolation.
False positives drove call-center volume from legitimate customers, creating both cost and customer-experience drag.
Behavioral and geographic context were absent from the rule engine, so the bank could not detect contextual fraud patterns.
Real-time authorization had a strict latency budget that limited what additional logic could be added without breaking payment flow.
How we structured the engagement
Added ML in parallel with the rule engine, not as a replacement, so the bank kept its known-pattern coverage while gaining context.
- 01Phase 01Weeks 1-3
Discovery
Reviewed six months of confirmed fraud and false-positive cases, segmented by pattern, and worked with fraud-ops on which patterns the rules covered well versus poorly. Output: a feature set for the ML layer covering account history, behavioral, and contextual signals, plus a latency budget per decision step.
- 02Phase 02Weeks 4-5
Architecture
Designed a decision endpoint that ran the rule engine and the ML layer in parallel, combined outputs with a configurable policy, and returned an authorize, decline, or step-up decision. Used Kafka to ingest transaction context and AWS Lambda for the rule path, with the ML model on a low-latency inference endpoint.
- 03Phase 03Weeks 6-11
Build
Built the gradient-boosted ensemble for transaction-level scoring and a separate sequence model for sequence-level context. Implemented the contributing-factor extraction so every decision logged which features drove the score. Built a feedback ingestion path so confirmed fraud and false positives flowed back into weekly retraining.
- 04Phase 04Weeks 12-14
Launch
Ran in shadow mode for four weeks alongside the production rule engine, compared outputs, and tuned the combination policy until the ML layer added value without regressing the rules. Cut over with the combination policy active, monitored hourly during the first two weeks, and tuned thresholds against fraud-ops feedback.
What we built, component by component
- 01
Transaction stream
Kafka topic that captures every payment authorization request with full context for downstream scoring and analytics.
- 02
Rule engine
Existing rule-based detector retained for known-pattern fraud, running in parallel with the ML layer at decision time.
- 03
Transaction scorer
Gradient-boosted ensemble that scores each transaction based on account history, behavioral, and contextual features.
- 04
Sequence model
Captures fraud patterns that span multiple transactions in sequence, including the synthetic-identity testing pattern.
- 05
Decision combiner
Combines rule output, transaction score, and sequence score under a configurable policy, returns the final decision.
- 06
Feedback ingester
Pulls confirmed fraud and confirmed false positives into the weekly retraining pipeline with structured labels.
A transaction arrives on the Kafka stream, the rule engine and the ML layer score it in parallel within the latency budget, and the decision combiner returns authorize, decline, or step-up. Contributing factors are logged with each decision, fraud-ops investigates flagged cases, and confirmed outcomes flow back through the feedback ingester into the weekly retraining job that updates both the transaction scorer and the sequence model.
The trade-offs we made and why
Ran ML in parallel with the rule engine rather than replacing it
Replacing the rules would have introduced risk on known fraud patterns the bank had spent years tuning. Running in parallel preserved the rule coverage and let the ML layer add context-aware detection on top, with the combination policy as the place to tune the trade-off.
Split transaction scoring from sequence modeling
Transaction-level features and sequence-level features had different shapes and different model architectures. Splitting them produced cleaner training data and let each model do what it was good at, which combined to catch more patterns than a single model would have.
Logged contributing factors per decision
Black-box scores would have failed regulatory and fraud-ops needs alike. Contributing factors per decision gave fraud investigators a starting point and gave compliance a defensible record. SHAP made this practical on gradient-boosted trees without sacrificing model quality.
Shadow-mode for four weeks before any production traffic
Real-money authorization is a high-stakes deployment surface. Four weeks of shadow comparison against rule output produced the confidence and the tuning data to cut over safely, and surfaced one model edge case that would have caused a notable false-positive spike on go-live.
What changed for the client
detection accuracy
Precision-recall harmonic mean on a held-out three-month sample of confirmed fraud and confirmed legitimate transactions.
decision latency
P95 latency from authorization request received to decision returned, including both rule engine and ML layer paths.
annual fraud savings
Reduction in confirmed fraud losses across the rollout year versus a counterfactual baseline from the prior rule-only period.
false positive rate
Reduction in legitimate-customer declines, which materially reduced call-center volume from blocked transactions.
The tools behind the system
Built with a deliberate stack chosen for production reliability and operational velocity.
Lessons learned from the build
Adding ML alongside the rules was safer and ultimately better than replacing them. The combination policy gave the fraud-ops team a tunable surface that responded to new patterns faster than either system alone would have, and we would default to this architecture on any rules-first replacement project.
Sequence modeling caught patterns the transaction model alone never would have. Splitting the work was the right call even though it doubled the deployment surface, because each model could be tuned to its own pattern shape and retrained independently as those patterns evolved.
Shadow mode is the deployment posture for real-money systems. Four weeks felt long in planning and felt short in retrospect. The edge case we caught in week three would have produced enough false positives on day one to undermine fraud-ops confidence in the entire system, and we would not skip shadow mode on any high-stakes deployment.
Similar delivery work usually starts in these service areas
If you are exploring a similar product, workflow, or implementation challenge, these are the service tracks that usually fit best.
Where this project sits in the bigger market picture
How we approach AI delivery for payments, banking, underwriting, and financial workflows.
Build a result-driven AI product with a team that has shipped before
If you are exploring a similar product, workflow, or AI use case, we can help scope the right architecture, delivery model, and first milestone.
Related case studies worth reviewing next
Have an AI idea, messy workflow, or product vision? Let's make it buildable.
Bring the problem. We'll help shape the product, define the architecture, and show the fastest path to a serious first version.
A practical first roadmap in the discovery call
Architecture, timeline, and delivery options in plain English
Security, scalability, and reliability discussed upfront
Model registry
softus-rag-v4.2
187ms
Latency
128k
Context
$0.004
Cost / req
Evaluation suite
Deploy pipeline
prod / canary 25% — healthy
