Create Feature Store

1. Objective

The objective of this workflow is to solve the "Training-Serving Skew". Without a feature store, Data Scientists write SQL to get training data, and Backend Engineers write Java/Go to calculate the same features for inference. They never match perfectly. A Feature Store (like Feast) provides a single definition of a feature (e.g., "User Clicks last 30d") that is served consistently in batch (Offline) and real-time (Online).

2. Context & Scope

In Scope

This workflow covers Feature Definition (YAML/Python), configuring the Offline Store (BigQuery/Snowflake/Parquet), configuring the Online Store (Redis/DynamoDB), and Materialization.

Assumption: You have a Data Warehouse with raw data.

Out of Scope

Data Transformation: The Feature Store serves features. It generally assumes the heavy transformations (aggregations) are managed by dbt or a stream processor, though some stores support on-the-fly transforms.
Model Training: This workflow outputs the Dataset, not the Model.

3. When to Use / When Not to Use

✅ Use This Workflow When

You have 5+ models sharing the same features (e.g., "Customer Churn" and "Upsell Propensity" both need "Last Login").
You need low-latency (< 10ms) feature retrieval for real-time inference.
You struggle with point-in-time correctness (Time travel queries).

❌ Do NOT Use This Workflow When

You have 1 model and it runs in batch mode once a week. (Just use a SQL view).
You are a solo data scientist prototyping. (Overkill).

4. Inputs (Required/Optional)

Required Inputs

Input	Description	Format	Example
REPO_NAME	Project name.	String	`credit_scoring_features`
OFFLINE_STORE	History source.	Config	`BigQuery`/`File`
ONLINE_STORE	Cache.	Config	`Redis`/`Sqlite`

Optional Inputs

Input	Description	Default	Condition
TTL	Feature freshness.	`24h`	How far back to look.

5. Outputs (Artifacts)

Artifact	Format	Destination	Quality Criteria
Feature Registry	`registry.pb`	S3/GCS	Source of truth.
Historical Dataset	DataFrame	Notebook	Point-in-time correct.

6. Operating Modes

⚡ Fast Mode

Timebox: 1 hour Scope: Local Feast. Details: Using Feast with a local parquet file as Offline store and sqlite as Online store. Good for understanding concepts.

🎯 Standard Mode (Default)

Timebox: 3 days Scope: Cloud Deployment. Details: Configuring Feast with Snowflake (Offline) and Redis (Online). Setting up a CI/CD pipeline to feast apply changes to the registry automatically.

🔬 Deep Mode

Timebox: 2 weeks Scope: Streaming Aggregation. Details: Using Tecton or Feast with Kafka push sources. Features are updated in sub-seconds. Complex sliding window aggregations managed by stream processors.

7. Constraints & Guardrails

Technical Constraints

Latency: Online store lookups must be fast using the Entity Key (e.g., user_id). Avoid complex creation logic at query time.
Cost: Redis is expensive. Do not store massive embedding vectors in Redis unless necessary. Use tiered storage or separate Vector DBs.

Security & Privacy

CAUTION

Access Control Features often contain PII. The Feature Store is a centralized honeypot. Ensure IAM roles restrict who can read features from the Online Store.

Compliance

Lineage: You must be able to trace which Raw Data version produced the "Customer Age" feature used in Model V3.

8. Procedure

Phase 1: Feature Definition

Objective: Define the logic.

Create feature_definitions.py. Define Entity: user = Entity(name="user", join_keys=["user_id"]). Define Feature View:

python

user_stats_view = FeatureView(
    name="user_stats",
    entities=[user],
    ttl=timedelta(days=1),
    schema=[Field(name="clicks", dtype=Int64)],
    source=bigquery_source
)

Run feast plan to verify.

Verify: Features are valid and source is reachable.

Phase 2: Materialization

Objective: Sync.

Deploy registry: feast apply. Materialize (load data from Offline to Online): feast materialize-incremental $(date -u) This runs a job to copy recent data from BigQuery to Redis so it's ready for Low Latency access. Note: In Prod, schedule this via Airflow.

Verify: Redis contains keys for the users.

Phase 3: Retrieval

Objective: Serve.

Offline (Training):store.get_historical_features(entity_df=events, features=["user_stats:clicks"]). Crucial: This performs a "Point-in-time Join" (ASOF join). It fetches the click count as it was at the time of the event, avoiding leakage.

Online (Inference):store.get_online_features(features=["user_stats:clicks"], entity_rows=[{"user_id": 123}]). Returns the latest value from Redis.

Verify: Offline and Online values match for the most recent timestamp.

9. Technical Considerations

Time Travel: The hardest part of Data Engineering for ML. get_historical_features handles the complexity of "The user had 5 clicks yesterday, but 6 today. The training label is from yesterday, so I need 5."

Feature Engineering vs Feature Store: The Store assumes features are already computed or simple projections. Complex logic (e.g., "TF-IDF of last 100 tweets") usually belongs in the pipeline before the store.

Serialization: Ensure types (Int32 vs Int64) match between the Source (Parquet) and Destination (Redis) to avoid silent overflows.

10. Quality Gates (Definition of Done)

Checklist

[ ] feast apply runs.
[ ] Materialization job scheduled.
[ ] Point-in-time retrieval verified.
[ ] Latency < 20ms for P99.

Validation

Criterion	Method	Threshold
Consistency	Batch vs Online	< 1% discrepancy
Freshness	Data Lag	< Materialization Interval

11. Failure Modes & Recovery

Failure Mode	Symptoms	Recovery Action
Stale Features	Model accuracy drops.	Materialization job failed. Check Airflow logs. Re-run backfill.
Redis OOM	Eviction errors.	You stored too much history or large blobs. Reduce TTL or filter input rows.
Schema Mismatch	`Field type does not match`.	Source Table changed (e.g. float to string). Update FeatureView schema and re-apply.

12. Copy-Paste Prompt

TIP

One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.

text

Role: Act as a Senior ML Ops Engineer.
Task: Execute the Create Feature Store workflow.

## Objective & Scope
- **Goal**: Implement a centralized Feature Store (Feast) to eliminate Training-Serving skew.
- **Scope**: Feature Definitions, Offline/Online Store config, Materialization, and Point-in-time retrieval.

## Inputs
- [ ] REPO_NAME: Project name.
- [ ] OFFLINE_STORE: Historical source (e.g., BigQuery config).
- [ ] ONLINE_STORE: Low-latency cache (e.g., Redis config).

## Output Artifacts
- [ ] Feature Registry (registry.pb)
- [ ] Historical Dataset (DataFrame)

## Execution Steps
1. **Define**
   - Initialize Feast. Define `Entity` (join keys) and `FeatureView` (schema, TTL, source) in Python/YAML.
2. **Deploy**
   - Run `feast apply`. Trigger `feast materialize-incremental` to sync online store.
3. **Test**
   - Verify Offline retrieval (historical join) and Online retrieval (latency check).

## Quality Gates
- [ ] Registry successfully applied.
- [ ] Materialization job completed.
- [ ] Point-in-time correctness verified.
- [ ] Online lookup latency < 20ms.

## Failure Handling
- If blocked, output a "Clarification Brief" detailing missing info or blockers.

## Constraints
- **Security**: Restrict access to PII features.
- **Technical**: Match data types (Int32/64) between stores.

## Command
Now execute this workflow step-by-step.

Appendix: Change Log

Version	Date	Author	Changes
1.0.0	2026-01-14	AI Engineering Team	Initial release

Create Feature Store ​

1. Objective ​

2. Context & Scope ​

In Scope ​

Out of Scope ​

3. When to Use / When Not to Use ​

✅ Use This Workflow When ​

❌ Do NOT Use This Workflow When ​

4. Inputs (Required/Optional) ​

Required Inputs ​

Optional Inputs ​

5. Outputs (Artifacts) ​

6. Operating Modes ​

⚡ Fast Mode ​

🎯 Standard Mode (Default) ​

🔬 Deep Mode ​

7. Constraints & Guardrails ​

Technical Constraints ​

Security & Privacy ​

Compliance ​

8. Procedure ​

Phase 1: Feature Definition ​

Phase 2: Materialization ​

Phase 3: Retrieval ​

9. Technical Considerations ​

10. Quality Gates (Definition of Done) ​

Checklist ​

Validation ​

11. Failure Modes & Recovery ​

12. Copy-Paste Prompt ​

Appendix: Change Log ​