Case Study

Auto Parts Data Platform for a European B2B Marketplace

EU Auto Parts Group

Automotive aftermarket

Germany

View product demo

Auto Parts Data Platform for a European B2B Marketplace

−45%

Wrong-fit returns

From 7.4% to 4.1% of B2B revenue; ~€1.1M in annual cost avoided

−85%

Search latency (p95)

4.2s → 620ms; abandon rate before first click: 11% → 3.8%

6–9 wks → 4–7 days

Supplier onboarding

Adapter pattern + steward console replaced manual schema mapping

Duration

9 months (Phase 1–5)

Team Size

7 people

Services

3 services

Client Context

The CTO opened our first call with a spreadsheet. Three tabs: return reasons, SKU counts by supplier, and a column labelled 'time to correct'. He'd been tracking wrong-fit returns for 14 months. The number wasn't moving. Their catalog had grown organically inside a 12-year-old SQL Server monolith — 40+ suppliers, each delivering data in their own Excel, CSV, or proprietary XML format, no canonical schema, no public API. 3,800 garages and BMW/Mercedes/VW dealer networks were buying parts and returning about 7 in every 100 because the fitment data was wrong. 'We've tried cleaning this manually,' he said. 'We can't keep up.' Then he showed us the codebase. The normalization logic was stored procedures — 18,000 lines, no tests, last touched in 2019. When we asked who understood it, he said: 'One person. He retired last year.'

The Challenge

Business Challenge

Wrong-fit returns had reached 7.4% of B2B revenue — nearly 3× the DACH aftermarket benchmark of 2.5–3.5%. Each return cost €38 in logistics, restock and credit-note handling. At 8.6M annual order lines, that was €2.4M in avoidable cost. Sales were losing ground to two pan-European marketplaces because catalog search was slow, fitment unreliable, and EAN/OEM cross-references were missing for 31% of SKUs. The nightly FTP/import cycle meant partner systems were always 18–24 hours stale.

Technical Challenge

10.2M raw SKUs across 40+ supplier feeds, zero canonical product or fitment schema. 'Part number' meant something different in each supplier's file. Vehicle compatibility was 100% free-text ('Fits BMW E46, most variants'). Search ran on a full-text index over a denormalized SQL Server table — p95 latency of 4.2 seconds. No API surface existed; partners integrated via nightly FTP exports. The legacy stored-procedure normalization layer had no test coverage, no replay capability, and was owned by a single engineer who had left. Rebuilding it meant deciphering 18,000 lines of undocumented T-SQL.

Signals Before We Started

7.4% wrong-fit return rate (DACH benchmark: 2.5–3.5%); €2.4M annual avoidable cost
31% of SKUs missing OEM cross-references or EAN; 12% were duplicates across suppliers
4.2s p95 search latency; 11% of sessions abandoned before the first result click
Supplier onboarding: 6–9 weeks per brand due to manual schema mapping
Vehicle compatibility: 100% free-text — no structured fitment data at all
18,000-line stored-procedure normalization layer, zero test coverage, sole owner retired

Our Solution

Overview

We delivered a modern aftermarket catalog platform with three technical pillars: (1) a supplier ingestion pipeline that normalizes heterogeneous feeds into a canonical schema using a per-supplier adapter pattern with replayable raw payload storage; (2) a fitment enrichment engine that joins OEM/EAN cross-references with KBA/TecDoc vehicle reference data and confidence-scores ambiguous mappings rather than silently dropping them; (3) an Elasticsearch-backed B2B search surface with custom OEM-number analyzers, multi-signal relevance tuning, and REST/webhook APIs for partners and the existing SAP ERP. The architecture decision that shaped everything was event sourcing on the ingestion path: every supplier row arrives as an immutable event on Kafka before any normalization happens. That single decision let us replay 14 months of ingestion when we added a new cross-reference field — without it, the schema change would have been a 9-month backfill project.

Architecture

.NET 8 microservices on Azure Kubernetes Service (AKS). Why microservices here but not on the construction ERP? Scale and team boundary: ingestion, enrichment, and search have very different scaling profiles and are owned by different squads. Ingestion workers scale horizontally on Kafka consumer group lag; search is read-heavy and caches aggressively behind Redis; enrichment is CPU-heavy during initial fitment resolution but mostly idle after. PostgreSQL 15 for canonical catalog state (Part, Brand, Category, Fitment, CrossReference aggregates); Elasticsearch 8 for search with custom analyzers for OEM reference numbers (which contain hyphens and slashes that standard tokenizers destroy); Apache Kafka for the ingestion event log and domain event bus; Azure Blob Storage for immutable raw supplier payloads; Redis for read-through cache in front of Elasticsearch; Angular 17 steward console; Azure API Management as the external gateway with per-partner OAuth2 client credentials and rate limiting. Observability: OpenTelemetry traces → Grafana Tempo; metrics → Grafana; logs → Loki. The hardest architectural call was fitment confidence scoring: rather than a binary accept/reject on ambiguous vehicle matches, we publish a score (0.0–1.0) alongside every fitment edge. Results with score < 0.7 are surfaced to human stewards but still appear in search, flagged as 'verify fitment'. This keeps inventory visible while protecting garages from confirmed wrong fits.

Approach

1
Discovery + canonical schema design with all 8 hero suppliers in the room
2
Immutable event-sourced ingestion: raw payload on Blob, then Kafka, then normalize
3
Per-supplier adapter: CSV / XLSX / XML / JSON, each with its own validation contract
4
Fitment enrichment engine with KBA/TecDoc join and confidence scoring (0.0–1.0)
5
Elasticsearch custom OEM-number analyzer + multi-signal relevance tuning
6
Partner REST + webhook APIs via APIM; SAP iDoc bridge for ERP integration
7
Phased rollout: 8 hero suppliers (62% of revenue) before touching the long tail

Platform Modules

The system was delivered as the following modules — each with its own owner, integration contract and rollout plan.

Supplier Gateway

Per-supplier adapters (CSV/XLSX/XML/JSON) with schema validation, dead-letter queue, and replayable raw payload archive on Azure Blob. Every supplier row is an immutable event before normalization begins — enabling full ingestion replay when the canonical schema evolves.

Canonical Catalog Service

Source-of-truth Part, Brand, Category, Fitment and CrossReference aggregates in PostgreSQL 15. Conflict-resolution rules handle the case where two suppliers publish contradictory data for the same OEM number — the most recent authoritative supplier wins, with audit history preserved.

Fitment Enrichment Engine

Joins canonical parts with KBA/TecDoc vehicle reference data. The hard problem: engine code 'M20B25' applies to 14 different BMW E30/E36 production variants with subtly different part applicability. The engine resolves this by scoring confidence per fitment edge (0.0–1.0) and routing sub-threshold matches to stewards rather than silently discarding them.

Search & Discovery

Elasticsearch 8 with a custom OEM-number analyzer that preserves hyphens and slashes as term boundaries — critical because '12-34-5-678901' and '12345678901' are the same part in different supplier formats. Relevance combines brand tier, fitment confidence, OEM-exact match, and real-time stock weight.

Partner API & Webhooks

REST + webhook surface behind Azure API Management. OAuth2 client credentials per partner, per-partner rate limits, signed webhooks (HMAC-SHA256) with replay protection. Contract tests against partner OpenAPI specs run on every CI build.

Steward Console

Angular admin app for the data team: triage ingestion exceptions, resolve low-confidence fitment matches, audit cross-reference changes. UX iteration in week 14 reduced clicks-per-exception from 5 to 2, nearly doubling daily throughput without adding headcount.

Data Flow

Supplier files arrive via SFTP or S3 bucket trigger and land in Azure Blob as immutable raw payloads (never overwritten, always versioned). The matching Supplier Gateway adapter reads the raw payload, validates against the supplier's declared schema, and emits a `catalog.supplier.row.received` event onto a dedicated Kafka topic per supplier. The Canonical Catalog Service consumes these events, applies deduplication (last-write-wins per OEM number per authoritative supplier rank), and writes the canonical Part aggregate to PostgreSQL while emitting `catalog.part.changed`. The Fitment Enrichment Engine reacts to `catalog.part.changed`, joins against the KBA vehicle reference dataset, resolves engine codes to applicable part variants, scores confidence per edge, and writes fitment edges back to PostgreSQL. A downstream Indexer service projects enriched read models into Elasticsearch, applying the custom OEM-number analyzer at index time. Partner REST queries hit a Redis read-through cache in front of Elasticsearch (TTL 60s for stock, 6h for fitment); price and fitment changes flow to subscribed partners via signed webhooks within 90 seconds of the Kafka event. The Steward Console subscribes to a `catalog.exception` topic and surfaces unresolved conflicts and low-confidence fitment edges to human reviewers.

Integrations

40+ supplier feeds (Bosch, Mahle, Hella, Continental, and long tail) — CSV, XLSX, XML, JSON via FTP/S3/SFTP
KBA/TecDoc-style vehicle reference dataset (licensed; updated quarterly)
SAP S/4HANA ERP — orders, stock and pricing via webhook + SAP iDoc bridge (BizTalk adapter)
DATEV-compatible accounting export for the finance team
Azure AD B2C for partner identity and access management

Delivery Timeline

Phased delivery — each phase had explicit goals, measurable outcomes and a checkpoint before progression.

Phase 1 — Discovery & canonical schema
Week 1–4
Goals
- ·Map every supplier feed format; find the 12 ways 'part number' is expressed
- ·Define canonical Part, Brand, Fitment, CrossReference entities with all stakeholders
- ·Baseline data quality: missing OEM, missing EAN, duplicates, fitment coverage
- ·Align on KPI tree: returns %, search p95, onboarding days
Outcomes
- ✓Canonical schema v1 signed off; 8 hero suppliers prioritized (62% of revenue)
- ✓Baseline: 31% missing OEM, 18% missing EAN, 12% duplicate SKUs across suppliers, 0% structured fitment
- ✓Decision: event-source the ingestion path to enable replay — this was the pivotal architectural call
Phase 2 — Ingestion pipelines & canonical catalog
Week 5–12
Goals
- ·Build per-supplier adapters (CSV, XLSX, XML, JSON over FTP/S3/SFTP)
- ·Validation, deduplication, and conflict-resolution rules in the canonical service
- ·Steward console MVP: review ingestion exceptions
Outcomes
- ✓8 hero suppliers ingesting daily; ingestion success ≥ 99.2%; dead-letter queue for exceptions
- ✓Deduplication collapsed 10.2M raw rows → 8.6M canonical SKUs
- ✓Steward console handling ~400 exceptions/day; first UX iteration doubled throughput in week 14
Phase 3 — Fitment enrichment engine
Week 9–18
Goals
- ·Integrate KBA vehicle reference dataset; build engine-code → applicable-parts resolver
- ·Confidence scoring: 0.0–1.0 per fitment edge; score < 0.7 → steward queue
- ·Vehicle compatibility queryable at p95 < 80ms
Outcomes
- ✓Fitment coverage on high-velocity SKUs: 47% → 91%
- ✓Confidence-scored matches: 6.2% of edges scored 0.4–0.7, surfaced to stewards rather than dropped
- ✓Key insight: 'score don't reject' kept 540K SKUs in search that a binary threshold would have hidden
Phase 4 — Search & partner APIs
Week 14–24
Goals
- ·Reindex 8.6M canonical SKUs into Elasticsearch with custom OEM-number analyzer
- ·Tune relevance: brand tier boost, OEM-exact priority, fitment confidence weight, stock availability
- ·Expose REST + webhook APIs via APIM; SAP iDoc bridge via BizTalk adapter
Outcomes
- ✓Search p95: 4.2s → 620ms; abandon rate before first click: 11% → 3.8%
- ✓OEM-exact queries: precision improved from 61% to 94% after custom tokenizer (hyphens in part numbers were the root cause of the previous 39% mismatch rate)
- ✓First webhook integration with the SAP ERP live week 22; partner sandbox onboarded 12 garages in 3 weeks
Phase 5 — Rollout & decommission
Week 22–36
Goals
- ·Onboard remaining 32 suppliers in four waves
- ·Blue-green migration off the legacy SQL Server catalog with 4-week parallel period
- ·Decommission 18,000-line stored-procedure normalization layer
Outcomes
- ✓Full 40+ supplier base live; average onboarding: 6–9 weeks → 4–7 days
- ✓Legacy catalog decommissioned week 34 after zero regressions in parallel period
- ✓Returns: 7.4% → 4.1% three months post-cutover; fitment coverage sustained at 91%+

Technology Stack

.NET 8

PostgreSQL 15

Elasticsearch 8

Apache Kafka

Azure (AKS, APIM, AD B2C, Blob)

Angular 17

Redis

OpenTelemetry / Grafana

The Results

Measurable impact delivered within 9 months (Phase 1–5).

−45%

Wrong-fit returns

From 7.4% to 4.1% of B2B revenue; ~€1.1M in annual cost avoided

−85%

Search latency (p95)

4.2s → 620ms; abandon rate before first click: 11% → 3.8%

6–9 wks → 4–7 days

Supplier onboarding

Adapter pattern + steward console replaced manual schema mapping

0% → 91%

Fitment coverage

From 100% free-text to structured fitment on high-velocity SKUs

Security & Compliance

✓GDPR: partner PII isolated in EU-West-1 only; no cross-region replication of personal data
✓OAuth2 client credentials + per-partner rate limits + IP allowlisting at APIM
✓Audit trail of every steward action (who changed which mapping, when, why, before/after value)
✓Signed webhooks (HMAC-SHA256) with timestamp replay protection (5-minute window)
✓Least-privilege service principals; no shared production credentials; secrets in Azure Key Vault
✓Supplier raw payload archive encrypted at rest (AES-256); retained for 7 years for dispute resolution

Delivery & Operations

✓GitHub Actions CI: adapter tests, OEM-number analyzer unit tests, contract tests against partner OpenAPI specs
✓Blue-green deploys on AKS via Argo Rollouts; 5% canary → 100% on green SLO for 10 minutes
✓OpenTelemetry traces to Grafana Tempo; SLO alerts on search p95 > 1s and fitment coverage < 89%
✓Kafka consumer-lag dashboards per supplier; alert if lag > 2h of events during business hours
✓Quarterly fitment-quality drill: replay 1,000 golden vehicle/part pairs and compare against expected results
✓On-call rotation: German PO handles business escalations; Vireon engineering lead handles platform incidents

What we'd do again

Key Learnings

Event-source the ingestion path before any normalization. We replayed 14 months of ingestion when we added a new cross-reference field — without immutable raw payloads on object storage, that would have been a 9-month backfill project instead of a 3-day re-run.
Score, don't reject. A 0.6-confidence fitment edge resolved by a human steward is better than a hidden SKU. Our 'score < 0.7 → steward queue' policy kept 540K SKUs visible in search that a binary threshold would have dropped — 6.4% of catalog revenue exposure.
OEM-number search is a specialized NLP problem, not a full-text search problem. Standard tokenizers destroy the structural information encoded in part numbers. Build the custom analyzer in week 1, not week 16.
Search relevance is a product, not a config file. We allocated a dedicated engineer to relevance for the last 6 weeks of phase 4. That work alone closed ~40% of 'cannot find a part' support tickets — more user value than any single feature in the same period.

Honest retrospective

What We'd Do Differently

Every project teaches us something we hadn't anticipated. Here's what we'd change if we were starting again.

We locked the canonical schema in week 3 and didn't hold a formal review until week 20. Two suppliers had attributes we hadn't modelled — we built adapters around the gap. That created cross-reference inconsistency that took 3 weeks to reconcile. Schema review should be a standing bi-weekly agenda item for the first 12 weeks of any data platform project.
The steward console UX was an afterthought. The first version was technically correct but required 5 screen transitions per exception row — at 400 exceptions/day, that was 2,000 unnecessary clicks daily. A UX-focused sprint in week 14 reduced it to 2 transitions. We should have run a structured task-analysis session with the data team before building any UI, not after.
We underestimated the OEM-number tokenization problem. Standard Elasticsearch analyzers treat hyphens as whitespace; auto-part OEM numbers use hyphens as structural separators ('12-34-5-678901'). We caught this during relevance testing in week 16, not week 5. A dedicated 'search quality baseline' sprint at the start of phase 4 would have caught it 6 weeks earlier.

Similar Projects

Other high-stakes systems we've shipped.

Construction (general contracting)

Bau Nord GmbH

Construction ERP for a Mid-Sized German Bau Contractor

5.2d → 1.4d Approval lead time

Read case study →

Retail — grocery

Regional supermarket chain (APAC)

Inventory & Promotion Platform for a 120-Store Supermarket Chain

3 days → ~2h Promotion go-live time