Real-Time Risk Scoring for Payments
The client was a payment processor serving roughly four thousand mid-market merchants, processing several million transactions per day. Chargebacks had become the second-largest cost line after interchange, driven by a mix of friendly fraud and merchant-specific patterns that the processor’s existing rule engine could not capture. The processor’s merchant customers were increasingly demanding pre-authorization risk signals to make their own decline decisions, which the existing infrastructure could not deliver in the milliseconds available during card-not-present authorization.
Trading, FinTech & Analytics
Banking/Payments
14 weeks from kickoff to live cutover with continuous post-launch retraining
5 specialists
The full story
The practical problem was that risk scoring had to happen inside the authorization latency budget while still incorporating signals from the merchant’s recent transaction history, the cardholder’s prior activity, and network-level patterns. Existing tooling either ran fast and used only transaction-level features, or used richer features and missed the latency budget. The processor also needed to expose scores to merchants without exposing the underlying signals, which existing risk products did not handle well.
We built a real-time risk scoring service that pre-computed cardholder and merchant feature vectors continuously, kept them in a low-latency feature store, and ran a gradient-boosted scorer inside a strict millisecond budget at authorization time. The service produced both a numeric risk score and a categorical reason that merchants could use in their decline logic without seeing the underlying features. Visa and Mastercard network-level signals were ingested via their data APIs and integrated into the cardholder feature vector.
What shipped was a risk scoring endpoint that processed each authorization in well under the latency budget, returned a score plus reason code to the processor, and exposed a filtered version to merchants for their own decision logic. Chargeback losses dropped substantially in the first six months, merchants gained a decision lever they previously did not have, and the processor turned the scoring service into a paid premium feature for higher-tier merchant accounts.
Chargebacks were the second-largest cost line, and the existing rule engine could not capture merchant-specific patterns at authorization speed.
Existing rule-based scoring missed merchant-specific patterns that drove a large share of chargebacks across the merchant book.
Latency budget at authorization time made richer feature use impossible without rebuilding the scoring infrastructure around a feature store.
Merchants demanded pre-authorization risk signals to make their own decline decisions but had no way to consume them at point of decision.
Network-level signals from Visa and Mastercard data APIs were not integrated, leaving the processor blind to cross-merchant cardholder patterns.
Black-box scores would not have been usable by merchants without reason codes that they could safely build into their own logic.
How we structured the engagement
Moved feature computation off the authorization path with a continuous feature store, kept only fast scoring at decision time.
- 01Phase 01Weeks 1-3
Discovery
Reviewed twelve months of chargeback data segmented by merchant pattern, audited the existing rule engine for coverage gaps, and worked with five merchant customers on what risk signals would actually drive their decline decisions. Output: a feature set, a latency budget of well under one hundred milliseconds, and a reason-code taxonomy for merchant exposure.
- 02Phase 02Weeks 4-5
Architecture
Designed a feature store that maintained cardholder and merchant vectors continuously via Kafka stream processing, with reads at single-digit millisecond latency. Picked XGBoost for the scorer because it hit the latency budget and produced reason codes via SHAP. Integrated Visa and Mastercard data APIs into the cardholder vector refresh path.
- 03Phase 03Weeks 6-12
Build
Shipped the feature store and the streaming computation first because the scorer depended on it. Built the scorer next with stress testing against the latency budget. Implemented the merchant exposure path with reason codes only, no raw features. Wired the integration into the processor’s existing authorization flow with a fallback path on scorer failure.
- 04Phase 04Weeks 13-14
Launch
Ran in shadow mode for four weeks across the full transaction volume, compared scores to actual chargeback outcomes, and tuned thresholds against merchant feedback. Cut over with the processor using the score and merchants opting in for the exposure path. Monitored latency and accuracy hourly during the first two weeks of live operation.
What we built, component by component
- 01
Feature store
Continuously-maintained cardholder and merchant vectors with single-digit millisecond reads at authorization time.
- 02
Streaming feature compute
Kafka stream processors that update feature vectors on every transaction and on network-level signals from Visa and Mastercard.
- 03
Risk scorer
XGBoost model that produces a numeric risk score plus SHAP-derived reason codes inside the latency budget.
- 04
Authorization integration
Inline call from the processor’s authorization flow to the scorer, with a fast-fail fallback path on scorer unavailability.
- 05
Merchant exposure path
Filtered score plus reason code exposed to opted-in merchants for their own decline logic, without raw feature exposure.
- 06
Outcome ingester
Pulls chargeback and refund outcomes back into the weekly retraining pipeline with structured labels and merchant context.
Transactions and network-level signals flow through Kafka into the feature store. At authorization time the processor calls the scorer, which reads feature vectors and returns a risk score plus reason code inside the latency budget. The score gates the processor’s own decision and is exposed in filtered form to opted-in merchants, while outcomes feed back into weekly retraining via the outcome ingester.
The trade-offs we made and why
Moved feature computation off the authorization path
Computing features at authorization time made the latency budget impossible. Pre-computing continuously into a feature store turned the authorization-time work into a fast lookup plus a fast inference, which is what made richer features compatible with the latency requirement.
Used XGBoost over a neural model
Inference latency at the percentile we needed was a hard requirement. Gradient-boosted trees inferenced fast enough and produced reason codes via SHAP, which a neural model at the same accuracy would not have matched on either dimension at this transaction volume.
Exposed only reason codes, never raw features, to merchants
Raw feature exposure would have leaked the scoring logic and let merchants game the system. Reason codes carried the actionable information without exposing the underlying signal, which is what made the merchant-facing path commercially safe and operationally useful.
Shadow-mode against full volume before cutover
Payments authorization is unforgiving on errors, and a bad score on day one would have cost real money. Four weeks of shadow comparison validated both latency and accuracy at full production volume and gave the processor and merchants the confidence to commit to the score in production.
What changed for the client
high-risk accuracy
Precision-recall harmonic mean for high-risk transactions on a held-out chargeback sample over the first three months post-launch.
chargeback losses
Reduction in chargeback dollars across the processor book over the first six months versus the prior-year baseline.
scoring latency
P95 latency from authorization request to score returned to the processor, including feature store read and inference.
merchant decision lever
New decision input available to opted-in merchants at authorization time, previously unavailable in any pre-authorization tool.
“The reason codes are what made this usable for our merchants. They could build the score into their own decline logic without us giving away the model.”
The tools behind the system
Built with a deliberate stack chosen for production reliability and operational velocity.
Lessons learned from the build
Moving feature computation off the authorization path was the architectural decision that made everything else possible. Without a feature store, we would have been forced to use a thinner feature set and would have left meaningful accuracy on the table. Feature stores belong in any sub-second decisioning system.
Reason codes were a product feature, not just a model feature. Once merchants started building reason codes into their own logic, the score became sticky in a way a raw number would not have been. We would design the reason-code taxonomy alongside the model from day one next time.
Shadow mode against full volume caught two latency hotspots and one accuracy edge case that synthetic load testing would have missed. Production traffic has shape that synthetic testing cannot reproduce, and we would commit to full-volume shadow on any future high-stakes decisioning deployment.
Similar delivery work usually starts in these service areas
If you are exploring a similar product, workflow, or implementation challenge, these are the service tracks that usually fit best.
Where this project sits in the bigger market picture
How we approach AI delivery for payments, banking, underwriting, and financial workflows.
Build a result-driven AI product with a team that has shipped before
If you are exploring a similar product, workflow, or AI use case, we can help scope the right architecture, delivery model, and first milestone.
Related case studies worth reviewing next
Have an AI idea, messy workflow, or product vision? Let's make it buildable.
Bring the problem. We'll help shape the product, define the architecture, and show the fastest path to a serious first version.
A practical first roadmap in the discovery call
Architecture, timeline, and delivery options in plain English
Security, scalability, and reliability discussed upfront
Model registry
softus-rag-v4.2
187ms
Latency
128k
Context
$0.004
Cost / req
Evaluation suite
Deploy pipeline
prod / canary 25% — healthy
