Automated Loan Processing & Underwriting Platform
The client was a regional consumer lender serving roughly two hundred thousand active borrowers across personal and small-business loan products. Approval timelines had grown to fourteen days under volume pressure, which was costing them deals to faster competitors and creating compliance risk where time-bound disclosures were missed. The underwriting team was capable but spent the majority of their time on document extraction and rule-based checks rather than judgment calls.
Data & Automation Platforms
Banking/FinTech
16 weeks from kickoff to full cutover
5 specialists
The full story
The operational problem was that applicant documents arrived in fifteen different formats — pay stubs, tax returns, bank statements, employer letters — and the existing OCR product produced output that required human reconciliation against the actual application form. Credit scoring relied on a model that was retrained quarterly on a static feature set, which meant new patterns of fraud and new income types like gig work were under-served in the score.
We rebuilt the pipeline around a document-understanding model that produced structured output mapped directly to the underwriting schema, then layered an updated credit-risk model that incorporated alternative income signals and recent bureau data with explicit reason codes. Every underwriter decision was logged with the model’s recommendation and the human override if any, so the model retraining loop was data-driven rather than calendar-driven.
What shipped was an underwriter workstation that opens a loan file with the data already extracted and the recommended decision pre-populated, with reason codes and the supporting documents linked inline. The underwriter approves, declines, or overrides with a structured reason that flows back into model training. Time-bound disclosures fire automatically based on file state. The model retrains weekly on the override stream, which closed the gap on new income patterns within a quarter.
Approvals had grown to fourteen days because underwriters were doing document extraction work instead of judgment work.
Applicant documents arrived in fifteen formats and the existing OCR output required line-by-line human reconciliation against the application.
Credit scoring retrained quarterly on a static feature set, missing recent fraud patterns and gig-economy income signals.
Compliance disclosures with time-bound deadlines were occasionally missed under volume pressure, creating regulatory exposure.
Underwriter overrides of model recommendations were captured as free text, so retraining could not learn from the most informative cases.
Customer-facing status updates were manual, which drove avoidable inbound support volume during the approval wait.
How we structured the engagement
Made document understanding the foundation and built every downstream decision on structured, reviewable extraction output.
- 01Phase 01Weeks 1-3
Discovery
Audited two thousand recent loan files to taxonomize document types, extraction failure modes, and override reasons. Worked with compliance to map every regulatory deadline to a file state. Output: an extraction schema, an updated feature set for credit scoring, and a structured override taxonomy.
- 02Phase 02Weeks 4-6
Architecture
Designed a document-understanding service using a layout-aware transformer fine-tuned on the existing document corpus, with confidence per extracted field. Built a credit-risk model using XGBoost with reason codes generated per prediction via SHAP for compliance defensibility.
- 03Phase 03Weeks 7-13
Build
Shipped extraction first because every downstream component depended on it. Built the underwriter workstation around the extraction output with confidence highlighting per field. Implemented the structured override flow with reason taxonomy and wired weekly retraining triggered by override volume.
- 04Phase 04Weeks 14-16
Launch
Ran a six-week parallel pilot where every loan was processed by both the new platform and the legacy workflow. Compared decisions, measured override rate, and refined the model until override-rate-on-recommendations dropped below twelve percent. Cut over to platform-only on week fifteen.
What we built, component by component
- 01
Document understanding
Layout-aware transformer fine-tuned on the loan-document corpus, produces structured fields with per-field confidence.
- 02
Identity and fraud checks
Runs identity verification, sanctions screening, and pattern-based fraud rules on every file before scoring.
- 03
Credit-risk model
XGBoost classifier with SHAP-derived reason codes, retrained weekly on the structured override stream.
- 04
Underwriter workstation
File view with extracted data, model recommendation, confidence per field, and structured override capture.
- 05
Compliance engine
Watches file state transitions and fires time-bound disclosures and notices automatically with audit logging.
- 06
Retraining pipeline
Weekly job that ingests structured overrides as training signal, re-fits the model, and gates promotion on holdout metrics.
An application arrives with documents, the document-understanding service extracts structured fields with confidence, and the identity and fraud checks run in parallel. The credit-risk model produces a recommendation with reason codes, the underwriter workstation surfaces the file for human decision, and structured overrides flow into the retraining pipeline which closes the loop on a weekly cadence.
The trade-offs we made and why
Fine-tuned a layout-aware transformer over generic OCR
Generic OCR returned strings and left structure-recovery to a brittle rule layer. A layout-aware model fine-tuned on the actual document corpus produced fields directly mapped to the underwriting schema, which removed the reconciliation step entirely.
Used XGBoost with SHAP over a neural credit model
Regulatory defensibility required per-decision reason codes. SHAP on gradient-boosted trees produced reason codes regulators recognized and that the compliance team had existing playbooks for, which made deployment a question of weeks rather than months.
Made overrides structured rather than free text
Free-text overrides were a black hole for retraining. A structured taxonomy of override reasons turned every override into a labeled training example, which closed the model improvement loop without requiring manual labeling work.
Built the compliance engine as a state-machine listener
Embedding disclosure logic in the underwriter workstation would have coupled compliance to the UI. A state-machine listener on file transitions kept compliance decoupled and made the audit trail cleaner for regulators reviewing the system.
What changed for the client
time to decision
Median end-to-end approval time dropped from fourteen days to forty-eight hours across the post-launch volume.
risk accuracy
Model accuracy on a held-out twelve-month back-test against actual repayment performance on a stratified sample.
processing cost
Per-loan operating cost including underwriter time and platform expense, measured over the first ninety days.
override rate
Share of model recommendations overridden by underwriters after model tuning completed, used as the cutover gate.
“We freed our underwriters from data entry and gave compliance a cleaner audit trail than they had before. The override stream is what keeps the model honest.”
The tools behind the system
Built with a deliberate stack chosen for production reliability and operational velocity.
Lessons learned from the build
Document understanding was the leverage point. Every hour invested in extraction quality paid back four times downstream because the credit model, the compliance engine, and the workstation all consumed the same structured output. We would invest even more here next time.
Reason codes are not optional in regulated work. We baked SHAP into the model from day one and it removed friction with compliance entirely. A model without reason codes would have stalled at risk review regardless of accuracy.
Structured overrides created a virtuous training loop we did not predict the shape of. Within two quarters the model was catching patterns the original underwriters had been quietly compensating for manually. Capturing the override taxonomy was the highest-leverage data decision we made.
Similar delivery work usually starts in these service areas
If you are exploring a similar product, workflow, or implementation challenge, these are the service tracks that usually fit best.
Where this project sits in the bigger market picture
How we approach AI delivery for payments, banking, underwriting, and financial workflows.
Build a result-driven AI product with a team that has shipped before
If you are exploring a similar product, workflow, or AI use case, we can help scope the right architecture, delivery model, and first milestone.
Related case studies worth reviewing next
Have an AI idea, messy workflow, or product vision? Let's make it buildable.
Bring the problem. We'll help shape the product, define the architecture, and show the fastest path to a serious first version.
A practical first roadmap in the discovery call
Architecture, timeline, and delivery options in plain English
Security, scalability, and reliability discussed upfront
Model registry
softus-rag-v4.2
187ms
Latency
128k
Context
$0.004
Cost / req
Evaluation suite
Deploy pipeline
prod / canary 25% — healthy
