Skip to main content
Trading, FinTech & Analytics — Case Study

Real-Time Risk Scoring for Payments

The client was a payment processor serving roughly four thousand mid-market merchants, processing several million transactions per day. Chargebacks had become the second-largest cost line after interchange, driven by a mix of friendly fraud and merchant-specific patterns that the processor’s existing rule engine could not capture. The processor’s merchant customers were increasingly demanding pre-authorization risk signals to make their own decline decisions, which the existing infrastructure could not deliver in the milliseconds available during card-not-present authorization.

98%high-risk accuracy
-35%chargeback losses
<100msscoring latency
+1merchant decision lever
Real-Time Risk Scoring for Payments
Category

Trading, FinTech & Analytics

Industry

Banking/Payments

Timeline

14 weeks from kickoff to live cutover with continuous post-launch retraining

Team size

5 specialists

Project Overview

The full story

The practical problem was that risk scoring had to happen inside the authorization latency budget while still incorporating signals from the merchant’s recent transaction history, the cardholder’s prior activity, and network-level patterns. Existing tooling either ran fast and used only transaction-level features, or used richer features and missed the latency budget. The processor also needed to expose scores to merchants without exposing the underlying signals, which existing risk products did not handle well.

We built a real-time risk scoring service that pre-computed cardholder and merchant feature vectors continuously, kept them in a low-latency feature store, and ran a gradient-boosted scorer inside a strict millisecond budget at authorization time. The service produced both a numeric risk score and a categorical reason that merchants could use in their decline logic without seeing the underlying features. Visa and Mastercard network-level signals were ingested via their data APIs and integrated into the cardholder feature vector.

What shipped was a risk scoring endpoint that processed each authorization in well under the latency budget, returned a score plus reason code to the processor, and exposed a filtered version to merchants for their own decision logic. Chargeback losses dropped substantially in the first six months, merchants gained a decision lever they previously did not have, and the processor turned the scoring service into a paid premium feature for higher-tier merchant accounts.

The Problem

Chargebacks were the second-largest cost line, and the existing rule engine could not capture merchant-specific patterns at authorization speed.

01Friction point

Existing rule-based scoring missed merchant-specific patterns that drove a large share of chargebacks across the merchant book.

02Friction point

Latency budget at authorization time made richer feature use impossible without rebuilding the scoring infrastructure around a feature store.

03Friction point

Merchants demanded pre-authorization risk signals to make their own decline decisions but had no way to consume them at point of decision.

04Friction point

Network-level signals from Visa and Mastercard data APIs were not integrated, leaving the processor blind to cross-merchant cardholder patterns.

05Friction point

Black-box scores would not have been usable by merchants without reason codes that they could safely build into their own logic.

Our Approach

How we structured the engagement

Moved feature computation off the authorization path with a continuous feature store, kept only fast scoring at decision time.

  1. Phase 01Weeks 1-3

    Discovery

    Reviewed twelve months of chargeback data segmented by merchant pattern, audited the existing rule engine for coverage gaps, and worked with five merchant customers on what risk signals would actually drive their decline decisions. Output: a feature set, a latency budget of well under one hundred milliseconds, and a reason-code taxonomy for merchant exposure.

  2. Phase 02Weeks 4-5

    Architecture

    Designed a feature store that maintained cardholder and merchant vectors continuously via Kafka stream processing, with reads at single-digit millisecond latency. Picked XGBoost for the scorer because it hit the latency budget and produced reason codes via SHAP. Integrated Visa and Mastercard data APIs into the cardholder vector refresh path.

  3. Phase 03Weeks 6-12

    Build

    Shipped the feature store and the streaming computation first because the scorer depended on it. Built the scorer next with stress testing against the latency budget. Implemented the merchant exposure path with reason codes only, no raw features. Wired the integration into the processor’s existing authorization flow with a fallback path on scorer failure.

  4. Phase 04Weeks 13-14

    Launch

    Ran in shadow mode for four weeks across the full transaction volume, compared scores to actual chargeback outcomes, and tuned thresholds against merchant feedback. Cut over with the processor using the score and merchants opting in for the exposure path. Monitored latency and accuracy hourly during the first two weeks of live operation.

System Architecture

What we built, component by component

  1. 01

    Feature store

    Continuously-maintained cardholder and merchant vectors with single-digit millisecond reads at authorization time.

  2. 02

    Streaming feature compute

    Kafka stream processors that update feature vectors on every transaction and on network-level signals from Visa and Mastercard.

  3. 03

    Risk scorer

    XGBoost model that produces a numeric risk score plus SHAP-derived reason codes inside the latency budget.

  4. 04

    Authorization integration

    Inline call from the processor’s authorization flow to the scorer, with a fast-fail fallback path on scorer unavailability.

  5. 05

    Merchant exposure path

    Filtered score plus reason code exposed to opted-in merchants for their own decline logic, without raw feature exposure.

  6. 06

    Outcome ingester

    Pulls chargeback and refund outcomes back into the weekly retraining pipeline with structured labels and merchant context.

Data Flow

Transactions and network-level signals flow through Kafka into the feature store. At authorization time the processor calls the scorer, which reads feature vectors and returns a risk score plus reason code inside the latency budget. The score gates the processor’s own decision and is exposed in filtered form to opted-in merchants, while outcomes feed back into weekly retraining via the outcome ingester.

Feature store
Streaming feature compute
Risk scorer
Authorization integration
Merchant exposure path
Key Decisions

The trade-offs we made and why

Decision 01Lead trade-off

Moved feature computation off the authorization path

Computing features at authorization time made the latency budget impossible. Pre-computing continuously into a feature store turned the authorization-time work into a fast lookup plus a fast inference, which is what made richer features compatible with the latency requirement.

Decision 02

Used XGBoost over a neural model

Inference latency at the percentile we needed was a hard requirement. Gradient-boosted trees inferenced fast enough and produced reason codes via SHAP, which a neural model at the same accuracy would not have matched on either dimension at this transaction volume.

Decision 03

Exposed only reason codes, never raw features, to merchants

Raw feature exposure would have leaked the scoring logic and let merchants game the system. Reason codes carried the actionable information without exposing the underlying signal, which is what made the merchant-facing path commercially safe and operationally useful.

Decision 04

Shadow-mode against full volume before cutover

Payments authorization is unforgiving on errors, and a bad score on day one would have cost real money. Four weeks of shadow comparison validated both latency and accuracy at full production volume and gave the processor and merchants the confidence to commit to the score in production.

Outcomes

What changed for the client

high-risk accuracy

Precision-recall harmonic mean for high-risk transactions on a held-out chargeback sample over the first three months post-launch.

chargeback losses

Reduction in chargeback dollars across the processor book over the first six months versus the prior-year baseline.

scoring latency

P95 latency from authorization request to score returned to the processor, including feature store read and inference.

merchant decision lever

New decision input available to opted-in merchants at authorization time, previously unavailable in any pre-authorization tool.

In their words
The reason codes are what made this usable for our merchants. They could build the score into their own decline logic without us giving away the model.
Chief Risk OfficerMid-market payment processor
Tech Stack

The tools behind the system

Built with a deliberate stack chosen for production reliability and operational velocity.

4 componentsProduction-grade
PythonXGBoostKafkaAWS Lambda
What we’d carry forward

Lessons learned from the build

01Lesson

Moving feature computation off the authorization path was the architectural decision that made everything else possible. Without a feature store, we would have been forced to use a thinner feature set and would have left meaningful accuracy on the table. Feature stores belong in any sub-second decisioning system.

02Lesson

Reason codes were a product feature, not just a model feature. Once merchants started building reason codes into their own logic, the score became sticky in a way a raw number would not have been. We would design the reason-code taxonomy alongside the model from day one next time.

03Lesson

Shadow mode against full volume caught two latency hotspots and one accuracy edge case that synthetic load testing would have missed. Production traffic has shape that synthetic testing cannot reproduce, and we would commit to full-volume shadow on any future high-stakes decisioning deployment.

Related Services

Similar delivery work usually starts in these service areas

If you are exploring a similar product, workflow, or implementation challenge, these are the service tracks that usually fit best.

Industry Context

Where this project sits in the bigger market picture

How we approach AI delivery for payments, banking, underwriting, and financial workflows.

Similar Project?

Build a result-driven AI product with a team that has shipped before

If you are exploring a similar product, workflow, or AI use case, we can help scope the right architecture, delivery model, and first milestone.

Start with clarity

Have an AI idea, messy workflow, or product vision? Let's make it buildable.

Bring the problem. We'll help shape the product, define the architecture, and show the fastest path to a serious first version.

  • A practical first roadmap in the discovery call

  • Architecture, timeline, and delivery options in plain English

  • Security, scalability, and reliability discussed upfront

Model registry

softus-rag-v4.2

live

187ms

Latency

128k

Context

$0.004

Cost / req

Evaluation suite

Faithfulness94%
Answer relevance97%
Citation accuracy99%

Deploy pipeline

prod / canary 25% — healthy