Skip to main content
Data & Automation Platforms — Case Study

Automated Loan Processing & Underwriting Platform

The client was a regional consumer lender serving roughly two hundred thousand active borrowers across personal and small-business loan products. Approval timelines had grown to fourteen days under volume pressure, which was costing them deals to faster competitors and creating compliance risk where time-bound disclosures were missed. The underwriting team was capable but spent the majority of their time on document extraction and rule-based checks rather than judgment calls.

48hrtime to decision
92%risk accuracy
-40%processing cost
<12%override rate
Automated Loan Processing & Underwriting Platform
Category

Data & Automation Platforms

Industry

Banking/FinTech

Timeline

16 weeks from kickoff to full cutover

Team size

5 specialists

Project Overview

The full story

The operational problem was that applicant documents arrived in fifteen different formats — pay stubs, tax returns, bank statements, employer letters — and the existing OCR product produced output that required human reconciliation against the actual application form. Credit scoring relied on a model that was retrained quarterly on a static feature set, which meant new patterns of fraud and new income types like gig work were under-served in the score.

We rebuilt the pipeline around a document-understanding model that produced structured output mapped directly to the underwriting schema, then layered an updated credit-risk model that incorporated alternative income signals and recent bureau data with explicit reason codes. Every underwriter decision was logged with the model’s recommendation and the human override if any, so the model retraining loop was data-driven rather than calendar-driven.

What shipped was an underwriter workstation that opens a loan file with the data already extracted and the recommended decision pre-populated, with reason codes and the supporting documents linked inline. The underwriter approves, declines, or overrides with a structured reason that flows back into model training. Time-bound disclosures fire automatically based on file state. The model retrains weekly on the override stream, which closed the gap on new income patterns within a quarter.

The Problem

Approvals had grown to fourteen days because underwriters were doing document extraction work instead of judgment work.

01Friction point

Applicant documents arrived in fifteen formats and the existing OCR output required line-by-line human reconciliation against the application.

02Friction point

Credit scoring retrained quarterly on a static feature set, missing recent fraud patterns and gig-economy income signals.

03Friction point

Compliance disclosures with time-bound deadlines were occasionally missed under volume pressure, creating regulatory exposure.

04Friction point

Underwriter overrides of model recommendations were captured as free text, so retraining could not learn from the most informative cases.

05Friction point

Customer-facing status updates were manual, which drove avoidable inbound support volume during the approval wait.

Our Approach

How we structured the engagement

Made document understanding the foundation and built every downstream decision on structured, reviewable extraction output.

  1. Phase 01Weeks 1-3

    Discovery

    Audited two thousand recent loan files to taxonomize document types, extraction failure modes, and override reasons. Worked with compliance to map every regulatory deadline to a file state. Output: an extraction schema, an updated feature set for credit scoring, and a structured override taxonomy.

  2. Phase 02Weeks 4-6

    Architecture

    Designed a document-understanding service using a layout-aware transformer fine-tuned on the existing document corpus, with confidence per extracted field. Built a credit-risk model using XGBoost with reason codes generated per prediction via SHAP for compliance defensibility.

  3. Phase 03Weeks 7-13

    Build

    Shipped extraction first because every downstream component depended on it. Built the underwriter workstation around the extraction output with confidence highlighting per field. Implemented the structured override flow with reason taxonomy and wired weekly retraining triggered by override volume.

  4. Phase 04Weeks 14-16

    Launch

    Ran a six-week parallel pilot where every loan was processed by both the new platform and the legacy workflow. Compared decisions, measured override rate, and refined the model until override-rate-on-recommendations dropped below twelve percent. Cut over to platform-only on week fifteen.

System Architecture

What we built, component by component

  1. 01

    Document understanding

    Layout-aware transformer fine-tuned on the loan-document corpus, produces structured fields with per-field confidence.

  2. 02

    Identity and fraud checks

    Runs identity verification, sanctions screening, and pattern-based fraud rules on every file before scoring.

  3. 03

    Credit-risk model

    XGBoost classifier with SHAP-derived reason codes, retrained weekly on the structured override stream.

  4. 04

    Underwriter workstation

    File view with extracted data, model recommendation, confidence per field, and structured override capture.

  5. 05

    Compliance engine

    Watches file state transitions and fires time-bound disclosures and notices automatically with audit logging.

  6. 06

    Retraining pipeline

    Weekly job that ingests structured overrides as training signal, re-fits the model, and gates promotion on holdout metrics.

Data Flow

An application arrives with documents, the document-understanding service extracts structured fields with confidence, and the identity and fraud checks run in parallel. The credit-risk model produces a recommendation with reason codes, the underwriter workstation surfaces the file for human decision, and structured overrides flow into the retraining pipeline which closes the loop on a weekly cadence.

Document understanding
Identity and fraud checks
Credit-risk model
Underwriter workstation
Compliance engine
Key Decisions

The trade-offs we made and why

Decision 01Lead trade-off

Fine-tuned a layout-aware transformer over generic OCR

Generic OCR returned strings and left structure-recovery to a brittle rule layer. A layout-aware model fine-tuned on the actual document corpus produced fields directly mapped to the underwriting schema, which removed the reconciliation step entirely.

Decision 02

Used XGBoost with SHAP over a neural credit model

Regulatory defensibility required per-decision reason codes. SHAP on gradient-boosted trees produced reason codes regulators recognized and that the compliance team had existing playbooks for, which made deployment a question of weeks rather than months.

Decision 03

Made overrides structured rather than free text

Free-text overrides were a black hole for retraining. A structured taxonomy of override reasons turned every override into a labeled training example, which closed the model improvement loop without requiring manual labeling work.

Decision 04

Built the compliance engine as a state-machine listener

Embedding disclosure logic in the underwriter workstation would have coupled compliance to the UI. A state-machine listener on file transitions kept compliance decoupled and made the audit trail cleaner for regulators reviewing the system.

Outcomes

What changed for the client

time to decision

Median end-to-end approval time dropped from fourteen days to forty-eight hours across the post-launch volume.

risk accuracy

Model accuracy on a held-out twelve-month back-test against actual repayment performance on a stratified sample.

processing cost

Per-loan operating cost including underwriter time and platform expense, measured over the first ninety days.

override rate

Share of model recommendations overridden by underwriters after model tuning completed, used as the cutover gate.

In their words
We freed our underwriters from data entry and gave compliance a cleaner audit trail than they had before. The override stream is what keeps the model honest.
Chief Risk OfficerRegional consumer lender
Tech Stack

The tools behind the system

Built with a deliberate stack chosen for production reliability and operational velocity.

5 componentsProduction-grade
PythonOCR APIsXGBoostFastAPIAWS
What we’d carry forward

Lessons learned from the build

01Lesson

Document understanding was the leverage point. Every hour invested in extraction quality paid back four times downstream because the credit model, the compliance engine, and the workstation all consumed the same structured output. We would invest even more here next time.

02Lesson

Reason codes are not optional in regulated work. We baked SHAP into the model from day one and it removed friction with compliance entirely. A model without reason codes would have stalled at risk review regardless of accuracy.

03Lesson

Structured overrides created a virtuous training loop we did not predict the shape of. Within two quarters the model was catching patterns the original underwriters had been quietly compensating for manually. Capturing the override taxonomy was the highest-leverage data decision we made.

Related Services

Similar delivery work usually starts in these service areas

If you are exploring a similar product, workflow, or implementation challenge, these are the service tracks that usually fit best.

Industry Context

Where this project sits in the bigger market picture

How we approach AI delivery for payments, banking, underwriting, and financial workflows.

Similar Project?

Build a result-driven AI product with a team that has shipped before

If you are exploring a similar product, workflow, or AI use case, we can help scope the right architecture, delivery model, and first milestone.

Start with clarity

Have an AI idea, messy workflow, or product vision? Let's make it buildable.

Bring the problem. We'll help shape the product, define the architecture, and show the fastest path to a serious first version.

  • A practical first roadmap in the discovery call

  • Architecture, timeline, and delivery options in plain English

  • Security, scalability, and reliability discussed upfront

Model registry

softus-rag-v4.2

live

187ms

Latency

128k

Context

$0.004

Cost / req

Evaluation suite

Faithfulness94%
Answer relevance97%
Citation accuracy99%

Deploy pipeline

prod / canary 25% — healthy