Works About Uses Reading 1:1

Real-Time Feature Store Architecture

Project

10 Feb, 2025

at DataCorp

Built the team’s first feature store to solve the train-serve skew problem — ensuring ML models see the same features in training and production.

Architecture

Offline store — Parquet files on S3, computed via Airflow batch jobs
Online store — Redis cluster with sub-10ms reads for real-time serving
Feature registry — centralized catalog with lineage, ownership, and freshness SLAs
SDK — Python client for consistent feature retrieval in notebooks, training, and serving

Key Design Decisions

Chose Redis over DynamoDB for online store — 3x lower p99 latency at our scale
Parquet over Delta Lake for offline — simpler, team already familiar, good enough for batch
Built custom registry instead of adopting Feast — our schema requirements didn’t fit

Impact

Eliminated train-serve skew for all production models. Feature reuse across teams went from 0% to 60%, reducing duplicate computation by ~$2K/month.