Skip to content

Create Feature Store


1. Objective

The objective of this workflow is to solve the "Training-Serving Skew". Without a feature store, Data Scientists write SQL to get training data, and Backend Engineers write Java/Go to calculate the same features for inference. They never match perfectly. A Feature Store (like Feast) provides a single definition of a feature (e.g., "User Clicks last 30d") that is served consistently in batch (Offline) and real-time (Online).


2. Context & Scope

In Scope

This workflow covers Feature Definition (YAML/Python), configuring the Offline Store (BigQuery/Snowflake/Parquet), configuring the Online Store (Redis/DynamoDB), and Materialization.

Assumption: You have a Data Warehouse with raw data.

Out of Scope

  • Data Transformation: The Feature Store serves features. It generally assumes the heavy transformations (aggregations) are managed by dbt or a stream processor, though some stores support on-the-fly transforms.
  • Model Training: This workflow outputs the Dataset, not the Model.

3. When to Use / When Not to Use

Use This Workflow When

  • You have 5+ models sharing the same features (e.g., "Customer Churn" and "Upsell Propensity" both need "Last Login").
  • You need low-latency (< 10ms) feature retrieval for real-time inference.
  • You struggle with point-in-time correctness (Time travel queries).

Do NOT Use This Workflow When

  • You have 1 model and it runs in batch mode once a week. (Just use a SQL view).
  • You are a solo data scientist prototyping. (Overkill).

4. Inputs (Required/Optional)

Required Inputs

InputDescriptionFormatExample
REPO_NAMEProject name.Stringcredit_scoring_features
OFFLINE_STOREHistory source.ConfigBigQuery/File
ONLINE_STORECache.ConfigRedis/Sqlite

Optional Inputs

InputDescriptionDefaultCondition
TTLFeature freshness.24hHow far back to look.

5. Outputs (Artifacts)

ArtifactFormatDestinationQuality Criteria
Feature Registryregistry.pbS3/GCSSource of truth.
Historical DatasetDataFrameNotebookPoint-in-time correct.

6. Operating Modes

Fast Mode

Timebox: 1 hour Scope: Local Feast. Details: Using Feast with a local parquet file as Offline store and sqlite as Online store. Good for understanding concepts.

🎯 Standard Mode (Default)

Timebox: 3 days Scope: Cloud Deployment. Details: Configuring Feast with Snowflake (Offline) and Redis (Online). Setting up a CI/CD pipeline to feast apply changes to the registry automatically.

🔬 Deep Mode

Timebox: 2 weeks Scope: Streaming Aggregation. Details: Using Tecton or Feast with Kafka push sources. Features are updated in sub-seconds. Complex sliding window aggregations managed by stream processors.


7. Constraints & Guardrails

Technical Constraints

  • Latency: Online store lookups must be fast using the Entity Key (e.g., user_id). Avoid complex creation logic at query time.
  • Cost: Redis is expensive. Do not store massive embedding vectors in Redis unless necessary. Use tiered storage or separate Vector DBs.

Security & Privacy

CAUTION

Access Control Features often contain PII. The Feature Store is a centralized honeypot. Ensure IAM roles restrict who can read features from the Online Store.

Compliance

  • Lineage: You must be able to trace which Raw Data version produced the "Customer Age" feature used in Model V3.

8. Procedure

Phase 1: Feature Definition

Objective: Define the logic.

Create feature_definitions.py. Define Entity: user = Entity(name="user", join_keys=["user_id"]). Define Feature View:

python
user_stats_view = FeatureView(
    name="user_stats",
    entities=[user],
    ttl=timedelta(days=1),
    schema=[Field(name="clicks", dtype=Int64)],
    source=bigquery_source
)

Run feast plan to verify.

Verify: Features are valid and source is reachable.

Phase 2: Materialization

Objective: Sync.

Deploy registry: feast apply. Materialize (load data from Offline to Online): feast materialize-incremental $(date -u) This runs a job to copy recent data from BigQuery to Redis so it's ready for Low Latency access. Note: In Prod, schedule this via Airflow.

Verify: Redis contains keys for the users.

Phase 3: Retrieval

Objective: Serve.

Offline (Training):store.get_historical_features(entity_df=events, features=["user_stats:clicks"]). Crucial: This performs a "Point-in-time Join" (ASOF join). It fetches the click count as it was at the time of the event, avoiding leakage.

Online (Inference):store.get_online_features(features=["user_stats:clicks"], entity_rows=[{"user_id": 123}]). Returns the latest value from Redis.

Verify: Offline and Online values match for the most recent timestamp.


9. Technical Considerations

Time Travel: The hardest part of Data Engineering for ML. get_historical_features handles the complexity of "The user had 5 clicks yesterday, but 6 today. The training label is from yesterday, so I need 5."

Feature Engineering vs Feature Store: The Store assumes features are already computed or simple projections. Complex logic (e.g., "TF-IDF of last 100 tweets") usually belongs in the pipeline before the store.

Serialization: Ensure types (Int32 vs Int64) match between the Source (Parquet) and Destination (Redis) to avoid silent overflows.


10. Quality Gates (Definition of Done)

Checklist

  • [ ] feast apply runs.
  • [ ] Materialization job scheduled.
  • [ ] Point-in-time retrieval verified.
  • [ ] Latency < 20ms for P99.

Validation

CriterionMethodThreshold
ConsistencyBatch vs Online< 1% discrepancy
FreshnessData Lag< Materialization Interval

11. Failure Modes & Recovery

Failure ModeSymptomsRecovery Action
Stale FeaturesModel accuracy drops.Materialization job failed. Check Airflow logs. Re-run backfill.
Redis OOMEviction errors.You stored too much history or large blobs. Reduce TTL or filter input rows.
Schema MismatchField type does not match.Source Table changed (e.g. float to string). Update FeatureView schema and re-apply.

12. Copy-Paste Prompt

TIP

One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.

text
Role: Act as a Senior ML Ops Engineer.
Task: Execute the Create Feature Store workflow.

## Objective & Scope
- **Goal**: Implement a centralized Feature Store (Feast) to eliminate Training-Serving skew.
- **Scope**: Feature Definitions, Offline/Online Store config, Materialization, and Point-in-time retrieval.

## Inputs
- [ ] REPO_NAME: Project name.
- [ ] OFFLINE_STORE: Historical source (e.g., BigQuery config).
- [ ] ONLINE_STORE: Low-latency cache (e.g., Redis config).

## Output Artifacts
- [ ] Feature Registry (registry.pb)
- [ ] Historical Dataset (DataFrame)

## Execution Steps
1. **Define**
   - Initialize Feast. Define `Entity` (join keys) and `FeatureView` (schema, TTL, source) in Python/YAML.
2. **Deploy**
   - Run `feast apply`. Trigger `feast materialize-incremental` to sync online store.
3. **Test**
   - Verify Offline retrieval (historical join) and Online retrieval (latency check).

## Quality Gates
- [ ] Registry successfully applied.
- [ ] Materialization job completed.
- [ ] Point-in-time correctness verified.
- [ ] Online lookup latency < 20ms.

## Failure Handling
- If blocked, output a "Clarification Brief" detailing missing info or blockers.

## Constraints
- **Security**: Restrict access to PII features.
- **Technical**: Match data types (Int32/64) between stores.

## Command
Now execute this workflow step-by-step.

Appendix: Change Log

VersionDateAuthorChanges
1.0.02026-01-14AI Engineering TeamInitial release

Cập nhật lần cuối: