Generative AI Solutions — Case Study

AI Lead Generation Platform

The client was a fast-growing B2B sales platform serving mid-market revenue teams across North America. Their existing prospecting workflow leaned on a patchwork of standalone data vendors, each priced per-seat and exported as a CSV that an SDR cleaned manually before importing into HubSpot. As the user base grew past four hundred seats, the per-seat data spend was outpacing platform ARPU and customer support tickets about stale or duplicated contacts were becoming the top driver of churn.

+40%lead accuracy

-70%manual prospecting time

10+sources unified

60stime to CRM-ready set

The full story

The specific user pain was operational. SDRs were spending the first ninety minutes of every shift reconciling exports — copying titles between Apollo and Lusha, hand-checking email validity through a third tool, then chasing duplicates that crept in from LinkedIn scrape jobs. The data was three to six weeks old by the time a sequence went out. Sales leaders had no way to attribute conversion lift to a specific data source, so they could not negotiate pricing with their vendors.

We designed a single ingestion plane that pulled from ten verified sources behind one normalized contact schema, with a streaming deduplication engine keyed on company domain plus a fuzzy match on name and title. An enrichment step ran a confidence score per field and only wrote back to the CRM when the score cleared a tenant-configurable threshold. We layered an attribution model on top so the platform could show which source produced the highest reply rate per segment.

What shipped was a unified prospecting workspace where an SDR pastes a target account list and gets a CRM-ready contact set in under sixty seconds, with provenance and confidence visible per row. HubSpot and Salesforce sync runs continuously rather than nightly. The vendor selection tool recommends which sources to retain based on actual outbound performance, giving sales ops a defensible position in vendor renewal conversations.

The Problem

Sales teams were paying for ten data tools and still importing stale, duplicated contacts into the CRM by hand each morning.

01Friction point

Ten separate data vendors with overlapping coverage, paid per seat, and no shared schema across the exports SDRs received daily.

02Friction point

Manual deduplication and email validation consumed roughly ninety minutes per rep per day before any outbound work began.

03Friction point

Contact records aged three to six weeks before a sequence reached the prospect, which suppressed reply rates and burned domains.

04Friction point

No attribution back to source meant sales ops could not negotiate vendor pricing or justify cutting the lowest-performing feeds.

05Friction point

CRM ingestion ran as a nightly cron, so updated titles and role changes did not surface until the following day at the earliest.

Our Approach

How we structured the engagement

We treated this as a data engineering problem first and a model problem second — clean inputs before clever inference.

Phase 01Weeks 1-2

Discovery

Sat with four SDR teams for a full week, instrumented the existing CSV workflow, and measured time-to-clean per record. Audited the ten vendor APIs for rate limits, field coverage, and licensing terms. Output: a normalized contact schema and a ranked source-priority table per field.

Phase 02Weeks 3-4

Architecture

Designed a streaming ingestion plane with one connector per vendor, a Kafka topic per source, and a single deduplication consumer keyed on domain plus fuzzy name match. Picked PostgreSQL with the citext extension as the system of record and Redis for the in-flight match cache.

Phase 03Weeks 5-9

Build

Shipped the ten connectors in parallel pairs, then the deduplication engine, then the confidence scorer. Used a fine-tuned distilBERT for title normalization and a heuristic ensemble for email validity. Wrote a tenant-scoped sync layer for HubSpot and Salesforce with backoff and dead-letter handling.

Phase 04Weeks 10-11

Launch

Rolled out to three design-partner accounts behind a feature flag, ran a four-week soak with daily accuracy audits, then promoted to general availability. Built the attribution dashboard during soak based on what design partners actually asked about during weekly review calls.

System Architecture

What we built, component by component

01
Source connectors
Ten vendor-specific clients with per-source rate limiting, retry, and schema translation into the normalized contact format.
02
Ingestion stream
Kafka topic per source plus a fan-in topic that the deduplication consumer reads, providing replay and audit history.
03
Deduplication engine
Keyed on domain and a fuzzy name-plus-title match, holds an in-flight cache in Redis with a five-minute window.
04
Confidence scorer
Per-field score combining source priority, recency, and cross-source agreement, written alongside every contact record.
05
Contact store
PostgreSQL with citext columns, partitioned by tenant, with row-level security and per-tenant retention policies.
06
CRM sync layer
Tenant-scoped HubSpot and Salesforce workers with exponential backoff, idempotency keys, and a dead-letter queue.
07
Attribution service
Joins outbound activity from the CRM back to the source that supplied each contact, exposed via a sales-ops dashboard.

Data Flow

A user request triggers parallel pulls across the ten source connectors, each writing to its own Kafka topic. The deduplication consumer merges into a single contact, the confidence scorer annotates each field, and the contact store accepts the row only when the per-field threshold is met. The CRM sync layer then pushes the record outbound and the attribution service waits for downstream reply or meeting events to close the loop.

Source connectors

Ingestion stream

Deduplication engine

Confidence scorer

Contact store

Key Decisions

The trade-offs we made and why

Decision 01Lead trade-off

Chose Kafka over a database trigger pipeline

Triggers would have coupled vendor latency to the write path and made replay during a vendor outage impossible. Kafka let us decouple ingestion from deduplication, replay a bad day cleanly, and add the eleventh vendor without touching existing consumers.

Decision 02

Used PostgreSQL with citext over Elasticsearch

Most queries were exact-match on domain or email, not full-text. Postgres gave us row-level security per tenant, transactional updates from the sync workers, and lower operational cost than a search cluster. We pushed fuzzy matching into a Redis-backed in-flight cache instead.

Decision 03

Fine-tuned distilBERT for title normalization rather than calling GPT

Title strings are short, repetitive, and the latency budget per record was under fifty milliseconds. A six-megabyte fine-tune ran on CPU at the ingest node and removed a per-call cost that would have made the per-record economics break at scale.

Decision 04

Wrote a custom CRM sync layer instead of using a third-party iPaaS

iPaaS pricing per record would have eroded gross margin past five hundred customers, and we needed tenant-scoped retry semantics for compliance. The custom layer is roughly twelve hundred lines of Python and is the most boring, most stable part of the system.

Outcomes

What changed for the client

lead accuracy

Measured as the percentage of contacts that resulted in a verified email open or reply within the first sequence step.

manual prospecting time

Average daily minutes spent on CSV reconciliation and validation across a sample of sixty SDRs before and after rollout.

sources unified

Apollo, Lusha, ZoomInfo, Cognism, LinkedIn Sales Navigator, Clearbit, Hunter, RocketReach, UpLead, and Snov, with a connector contract for the eleventh.

time to CRM-ready set

Median wall-clock time from pasting a target account list to a fully synced, scored contact set in the destination CRM.

In their words

“We expected an integration project and got a sales-ops weapon. The attribution view paid for the build inside one renewal cycle.”

Head of Revenue OperationsSeries B SalesTech platform

Tech Stack

The tools behind the system

Built with a deliberate stack chosen for production reliability and operational velocity.

8 componentsProduction-grade

PythonNode.jsReact.jsFastAPIAI/MLDockerPostgreSQLAWS

What we’d carry forward

Lessons learned from the build

01Lesson

Investing two full weeks in observed SDR workflow before writing any code paid back five times during build. The schema we shipped was visibly closer to how reps actually think about a contact than the one we would have written from vendor docs alone.

02Lesson

We underestimated how much value the attribution dashboard would carry. It was a stretch goal in the original scope and became the single feature that closed the renewal. Next time we would build it first and let the pipeline serve it, rather than the other way around.

03Lesson

Per-field confidence scoring was the right design but the wrong default. Tenants kept asking for visibility into why a record was rejected. We would expose the score and rejection reason in the UI from day one rather than treating it as an internal signal.

Related Services

Similar delivery work usually starts in these service areas

If you are exploring a similar product, workflow, or implementation challenge, these are the service tracks that usually fit best.

Generative AI Solutions Product & PoC Development

Industry Context

Where this project sits in the bigger market picture

Patterns for AI features, internal tooling, and product delivery in SaaS businesses.

Explore AI for SaaS

Similar Project?

Build a result-driven AI product with a team that has shipped before

If you are exploring a similar product, workflow, or AI use case, we can help scope the right architecture, delivery model, and first milestone.

Discuss Your Project Explore Services

More Relevant Work

Related case studies worth reviewing next

View all case studies

Generative AI Solutions

Chatbot with Voice – Real-Time AI Conversations with Video & Voice Cloning

Increased customer interaction time by 3.5x.

Read case study →

Generative AI Solutions

AI-Powered Music Generation Platform

Reduced music production time by 90%.

Read case study →

Generative AI Solutions

VoiceAI Application for Advanced Voice Synthesis and Cloning

Cut production voice-over costs by 85%.

Read case study →

Start with clarity

Have an AI idea, messy workflow, or product vision? Let's make it buildable.

Bring the problem. We'll help shape the product, define the architecture, and show the fastest path to a serious first version.

A practical first roadmap in the discovery call
Architecture, timeline, and delivery options in plain English
Security, scalability, and reliability discussed upfront

Discuss your project View capabilities

Model registry

softus-rag-v4.2

live

187ms

Latency

128k

Context

$0.004

Cost / req

Evaluation suite

Faithfulness94%

Answer relevance97%

Citation accuracy99%

Deploy pipeline

prod / canary 25% — healthy

AI Lead Generation Platform

The full story

Sales teams were paying for ten data tools and still importing stale, duplicated contacts into the CRM by hand each morning.

How we structured the engagement

Discovery

Architecture

Build

Launch

What we built, component by component

Source connectors

Ingestion stream

Deduplication engine

Confidence scorer

Contact store

CRM sync layer

Attribution service

The trade-offs we made and why

Chose Kafka over a database trigger pipeline

Used PostgreSQL with citext over Elasticsearch

Fine-tuned distilBERT for title normalization rather than calling GPT

Wrote a custom CRM sync layer instead of using a third-party iPaaS

What changed for the client

lead accuracy

manual prospecting time

sources unified

time to CRM-ready set

The tools behind the system

Lessons learned from the build

Similar delivery work usually starts in these service areas

Where this project sits in the bigger market picture

Build a result-driven AI product with a team that has shipped before

Related case studies worth reviewing next

Chatbot with Voice – Real-Time AI Conversations with Video & Voice Cloning

AI-Powered Music Generation Platform

VoiceAI Application for Advanced Voice Synthesis and Cloning

Have an AI idea, messy workflow, or product vision? Let's make it buildable.