AI & Machine Learning

Supermarket Demand Forecasting MLOps Blueprint

Jorn Lee

•

March 24, 2026

•

12 min read

•

Supermarket Demand Forecasting MLOps Blueprint

Jorn Lee

Founder & Product Engineer

A production blueprint for supermarket demand forecasting with MLOps controls, drift management, and planner-friendly override workflows.

Most supermarket forecasting projects we audit have a model that scores well in a notebook and is ignored by the planners who are supposed to use it. The model can't see the in-store promo that finance approved on Friday afternoon, it doesn't know that store 037 is hosting a wedding fair next weekend, and when its prediction is wrong nobody can explain why — so the planner reverts to her spreadsheet and the project quietly dies.

This blueprint is what we shipped for a 120-store APAC grocery chain where promo go-live dropped from 3 days to 2 hours and click-and-collect cancellations dropped from 9% to 3.2%. The headline lesson: MLOps for retail is 30% modeling and 70% planner workflow.

Architecture

Diagram (Mermaid)

The override loop — planner can override the forecast with a reason code, and the override is recorded as training signal — is what turns the system from a model into a product.

Feature pipeline

We compute features on a daily batch (T-1 sales, weather, promo flags) and a real-time stream (footfall, on-shelf availability, current promo lift). Features land in a feature store keyed by (store, SKU, day) with strict versioning. Training and inference read from the same store — no skew, no "works in the notebook".

Model strategy

We use a two-tier model: a gradient-boosted tree (LightGBM) for baseline demand, and a smaller calibration model for promo lift. Single deep-learning models perform marginally better in offline tests but are impossible to debug when a planner asks "why did you predict 40 units of this yogurt today?". Tree models with SHAP explanations win in production because trust beats one extra percentage point of MAPE.

Planner console — the actual product

The console shows, for each (store, SKU, day):

The forecast and a 90% confidence band.
The top three SHAP factors in plain language ("+18 units: weekend", "+12 units: promo active", "−4 units: rain forecast").
An override input with a mandatory reason code from a fixed taxonomy (local event, supplier shortage, competitor promo, etc.).
The history of the planner's last overrides and how they performed.

That last item changed behavior more than anything else. Planners who saw that their gut overrides had a 14% higher error rate than the model started overriding less, and only on signals the model genuinely couldn't see.

Drift and recalibration

We monitor drift per (category, region) on a weekly cadence:

MAPE drift > 15% vs. trailing 8-week baseline triggers a recalibration ticket.
Feature distribution drift (PSI > 0.2 on key features) triggers an investigation.
Post-promo, we always recalibrate within 7 days because promo dynamics change faster than baseline demand.

Recalibration is automated end-to-end but gated by a human approval on the model registry before promotion. We learned the hard way that fully-automatic promotion deploys silent regressions.

What we'd do differently

We initially built one global model per category. It was easier to operate but consistently failed on stores with unusual profiles (tourist areas, urban convenience). Splitting into store-cluster models (5 clusters by behavior) cut MAPE by 22% without exploding operational cost.

Closing

A forecasting model that planners trust and override transparently is worth 10× a higher-accuracy model that gets ignored. Build the planner workflow first; the modeling is the easy part.

Let's Build Something Great Together

Schedule a free consultation to discuss your project and explore how we can help.

Book a consultation View our services