Giao diện
Create Feature Store
1. Objective
The objective of this workflow is to solve the "Training-Serving Skew". Without a feature store, Data Scientists write SQL to get training data, and Backend Engineers write Java/Go to calculate the same features for inference. They never match perfectly. A Feature Store (like Feast) provides a single definition of a feature (e.g., "User Clicks last 30d") that is served consistently in batch (Offline) and real-time (Online).
2. Context & Scope
In Scope
This workflow covers Feature Definition (YAML/Python), configuring the Offline Store (BigQuery/Snowflake/Parquet), configuring the Online Store (Redis/DynamoDB), and Materialization.
Assumption: You have a Data Warehouse with raw data.
Out of Scope
- Data Transformation: The Feature Store serves features. It generally assumes the heavy transformations (aggregations) are managed by dbt or a stream processor, though some stores support on-the-fly transforms.
- Model Training: This workflow outputs the Dataset, not the Model.
3. When to Use / When Not to Use
✅ Use This Workflow When
- You have 5+ models sharing the same features (e.g., "Customer Churn" and "Upsell Propensity" both need "Last Login").
- You need low-latency (< 10ms) feature retrieval for real-time inference.
- You struggle with point-in-time correctness (Time travel queries).
❌ Do NOT Use This Workflow When
- You have 1 model and it runs in batch mode once a week. (Just use a SQL view).
- You are a solo data scientist prototyping. (Overkill).
4. Inputs (Required/Optional)
Required Inputs
| Input | Description | Format | Example |
|---|---|---|---|
| REPO_NAME | Project name. | String | credit_scoring_features |
| OFFLINE_STORE | History source. | Config | BigQuery/File |
| ONLINE_STORE | Cache. | Config | Redis/Sqlite |
Optional Inputs
| Input | Description | Default | Condition |
|---|---|---|---|
| TTL | Feature freshness. | 24h | How far back to look. |
5. Outputs (Artifacts)
| Artifact | Format | Destination | Quality Criteria |
|---|---|---|---|
| Feature Registry | registry.pb | S3/GCS | Source of truth. |
| Historical Dataset | DataFrame | Notebook | Point-in-time correct. |
6. Operating Modes
⚡ Fast Mode
Timebox: 1 hour Scope: Local Feast. Details: Using Feast with a local parquet file as Offline store and sqlite as Online store. Good for understanding concepts.
🎯 Standard Mode (Default)
Timebox: 3 days Scope: Cloud Deployment. Details: Configuring Feast with Snowflake (Offline) and Redis (Online). Setting up a CI/CD pipeline to feast apply changes to the registry automatically.
🔬 Deep Mode
Timebox: 2 weeks Scope: Streaming Aggregation. Details: Using Tecton or Feast with Kafka push sources. Features are updated in sub-seconds. Complex sliding window aggregations managed by stream processors.
7. Constraints & Guardrails
Technical Constraints
- Latency: Online store lookups must be fast using the Entity Key (e.g.,
user_id). Avoid complex creation logic at query time. - Cost: Redis is expensive. Do not store massive embedding vectors in Redis unless necessary. Use tiered storage or separate Vector DBs.
Security & Privacy
CAUTION
Access Control Features often contain PII. The Feature Store is a centralized honeypot. Ensure IAM roles restrict who can read features from the Online Store.
Compliance
- Lineage: You must be able to trace which Raw Data version produced the "Customer Age" feature used in Model V3.
8. Procedure
Phase 1: Feature Definition
Objective: Define the logic.
Create feature_definitions.py. Define Entity: user = Entity(name="user", join_keys=["user_id"]). Define Feature View:
python
user_stats_view = FeatureView(
name="user_stats",
entities=[user],
ttl=timedelta(days=1),
schema=[Field(name="clicks", dtype=Int64)],
source=bigquery_source
)Run feast plan to verify.
Verify: Features are valid and source is reachable.
Phase 2: Materialization
Objective: Sync.
Deploy registry: feast apply. Materialize (load data from Offline to Online): feast materialize-incremental $(date -u) This runs a job to copy recent data from BigQuery to Redis so it's ready for Low Latency access. Note: In Prod, schedule this via Airflow.
Verify: Redis contains keys for the users.
Phase 3: Retrieval
Objective: Serve.
Offline (Training):store.get_historical_features(entity_df=events, features=["user_stats:clicks"]). Crucial: This performs a "Point-in-time Join" (ASOF join). It fetches the click count as it was at the time of the event, avoiding leakage.
Online (Inference):store.get_online_features(features=["user_stats:clicks"], entity_rows=[{"user_id": 123}]). Returns the latest value from Redis.
Verify: Offline and Online values match for the most recent timestamp.
9. Technical Considerations
Time Travel: The hardest part of Data Engineering for ML. get_historical_features handles the complexity of "The user had 5 clicks yesterday, but 6 today. The training label is from yesterday, so I need 5."
Feature Engineering vs Feature Store: The Store assumes features are already computed or simple projections. Complex logic (e.g., "TF-IDF of last 100 tweets") usually belongs in the pipeline before the store.
Serialization: Ensure types (Int32 vs Int64) match between the Source (Parquet) and Destination (Redis) to avoid silent overflows.
10. Quality Gates (Definition of Done)
Checklist
- [ ]
feast applyruns. - [ ] Materialization job scheduled.
- [ ] Point-in-time retrieval verified.
- [ ] Latency < 20ms for P99.
Validation
| Criterion | Method | Threshold |
|---|---|---|
| Consistency | Batch vs Online | < 1% discrepancy |
| Freshness | Data Lag | < Materialization Interval |
11. Failure Modes & Recovery
| Failure Mode | Symptoms | Recovery Action |
|---|---|---|
| Stale Features | Model accuracy drops. | Materialization job failed. Check Airflow logs. Re-run backfill. |
| Redis OOM | Eviction errors. | You stored too much history or large blobs. Reduce TTL or filter input rows. |
| Schema Mismatch | Field type does not match. | Source Table changed (e.g. float to string). Update FeatureView schema and re-apply. |
12. Copy-Paste Prompt
TIP
One-Click Agent Invocation Copy the prompt below, replace placeholders, and paste into your agent.
text
Role: Act as a Senior ML Ops Engineer.
Task: Execute the Create Feature Store workflow.
## Objective & Scope
- **Goal**: Implement a centralized Feature Store (Feast) to eliminate Training-Serving skew.
- **Scope**: Feature Definitions, Offline/Online Store config, Materialization, and Point-in-time retrieval.
## Inputs
- [ ] REPO_NAME: Project name.
- [ ] OFFLINE_STORE: Historical source (e.g., BigQuery config).
- [ ] ONLINE_STORE: Low-latency cache (e.g., Redis config).
## Output Artifacts
- [ ] Feature Registry (registry.pb)
- [ ] Historical Dataset (DataFrame)
## Execution Steps
1. **Define**
- Initialize Feast. Define `Entity` (join keys) and `FeatureView` (schema, TTL, source) in Python/YAML.
2. **Deploy**
- Run `feast apply`. Trigger `feast materialize-incremental` to sync online store.
3. **Test**
- Verify Offline retrieval (historical join) and Online retrieval (latency check).
## Quality Gates
- [ ] Registry successfully applied.
- [ ] Materialization job completed.
- [ ] Point-in-time correctness verified.
- [ ] Online lookup latency < 20ms.
## Failure Handling
- If blocked, output a "Clarification Brief" detailing missing info or blockers.
## Constraints
- **Security**: Restrict access to PII features.
- **Technical**: Match data types (Int32/64) between stores.
## Command
Now execute this workflow step-by-step.Appendix: Change Log
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0.0 | 2026-01-14 | AI Engineering Team | Initial release |