Skip to content

📐 Modeling Fundamentals

Level: Foundation Solves: Hiểu rõ các tầng abstraction trong data modeling và cách chuyển đổi giữa chúng

Ba Tầng Data Modeling

💡 Giáo sư Tom

Data modeling không phải là vẽ boxes và lines. Đó là quá trình chuyển đổi business requirements thành database schema có thể implement được. Mỗi tầng có mục đích riêng - skip tầng nào là bạn đang tạo technical debt.

Tổng quan Three-Schema Architecture

┌─────────────────────────────────────────────────────────────────┐
│                 THREE-SCHEMA ARCHITECTURE                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │              CONCEPTUAL MODEL                            │    │
│  │  • Business entities và relationships                    │    │
│  │  • Technology-agnostic                                   │    │
│  │  • Stakeholder communication                             │    │
│  │  • "What data do we need?"                               │    │
│  └─────────────────────────────────────────────────────────┘    │
│                           │                                     │
│                           ▼                                     │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │               LOGICAL MODEL                              │    │
│  │  • Attributes và data types (abstract)                   │    │
│  │  • Keys và relationships                                 │    │
│  │  • Normalization applied                                 │    │
│  │  • "How is data structured?"                             │    │
│  └─────────────────────────────────────────────────────────┘    │
│                           │                                     │
│                           ▼                                     │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │              PHYSICAL MODEL                              │    │
│  │  • Database-specific types                               │    │
│  │  • Indexes, partitions, constraints                      │    │
│  │  • Storage và performance considerations                 │    │
│  │  • "How is data implemented?"                            │    │
│  └─────────────────────────────────────────────────────────┘    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Conceptual Data Model

Mục đích

  • Communication tool giữa business và technical teams
  • Capture business entitiesrelationships
  • Không quan tâm đến implementation details

Components

ComponentMô tảVí dụ
EntityBusiness object cần lưu trữCustomer, Order, Product
RelationshipLiên kết giữa entitiesCustomer places Order
CardinalitySố lượng instances trong relationshipOne-to-Many, Many-to-Many

Ví dụ: E-Commerce Conceptual Model

┌─────────────────────────────────────────────────────────────────┐
│                 E-COMMERCE CONCEPTUAL MODEL                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   ┌──────────┐         places          ┌──────────┐             │
│   │ Customer │─────────────────────────│  Order   │             │
│   └──────────┘         1:N             └──────────┘             │
│        │                                    │                   │
│        │ has                                │ contains          │
│        │ 1:N                                │ 1:N               │
│        ▼                                    ▼                   │
│   ┌──────────┐                        ┌──────────┐              │
│   │ Address  │                        │OrderItem │              │
│   └──────────┘                        └──────────┘              │
│                                             │                   │
│                                             │ references        │
│                                             │ N:1               │
│                                             ▼                   │
│                                       ┌──────────┐              │
│                                       │ Product  │              │
│                                       └──────────┘              │
│                                             │                   │
│                                             │ belongs to        │
│                                             │ N:1               │
│                                             ▼                   │
│                                       ┌──────────┐              │
│                                       │ Category │              │
│                                       └──────────┘              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Logical Data Model

Mục đích

  • Define attributes cho mỗi entity
  • Specify data types (abstract, không database-specific)
  • Apply normalization rules
  • Define primary keysforeign keys

Components

ComponentMô tảVí dụ
AttributeProperty của entitycustomer_name, email
Primary Key (PK)Unique identifiercustomer_id
Foreign Key (FK)Reference đến entity khácorder.customer_id
Data TypeKiểu dữ liệu abstractString, Integer, Date

Ví dụ: E-Commerce Logical Model

┌─────────────────────────────────────────────────────────────────┐
│                  E-COMMERCE LOGICAL MODEL                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────────┐      ┌─────────────────────┐           │
│  │      Customer       │      │       Order         │           │
│  ├─────────────────────┤      ├─────────────────────┤           │
│  │ PK customer_id      │──┐   │ PK order_id         │           │
│  │    email            │  │   │ FK customer_id      │◄──────────┤
│  │    first_name       │  │   │    order_date       │           │
│  │    last_name        │  │   │    status           │           │
│  │    created_at       │  │   │    total_amount     │           │
│  └─────────────────────┘  │   └─────────────────────┘           │
│                           │              │                      │
│                           └──────────────┘                      │
│                                          │                      │
│  ┌─────────────────────┐      ┌─────────────────────┐           │
│  │      Product        │      │     OrderItem       │           │
│  ├─────────────────────┤      ├─────────────────────┤           │
│  │ PK product_id       │◄─────│ PK order_item_id    │           │
│  │ FK category_id      │      │ FK order_id         │◄──────────┤
│  │    name             │      │ FK product_id       │           │
│  │    description      │      │    quantity         │           │
│  │    price            │      │    unit_price       │           │
│  │    stock_quantity   │      └─────────────────────┘           │
│  └─────────────────────┘                                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Physical Data Model

Mục đích

  • Database-specific implementation
  • Define exact data types (VARCHAR(255), BIGINT, etc.)
  • Specify indexes, constraints, partitions
  • Consider storageperformance

Components

ComponentMô tảVí dụ
TablePhysical storage structurecustomers, orders
ColumnDatabase column với specific typeVARCHAR(100), BIGINT
IndexPerformance optimizationidx_customer_email
ConstraintData integrity rulesNOT NULL, UNIQUE, CHECK
PartitionData distribution strategyPARTITION BY RANGE

Ví dụ: PostgreSQL Physical Model

sql
-- Physical Model: PostgreSQL Implementation
CREATE TABLE customers (
    customer_id     BIGSERIAL PRIMARY KEY,
    email           VARCHAR(255) NOT NULL UNIQUE,
    first_name      VARCHAR(100) NOT NULL,
    last_name       VARCHAR(100) NOT NULL,
    created_at      TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at      TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE INDEX idx_customers_email ON customers(email);
CREATE INDEX idx_customers_created_at ON customers(created_at);

CREATE TABLE orders (
    order_id        BIGSERIAL PRIMARY KEY,
    customer_id     BIGINT NOT NULL REFERENCES customers(customer_id),
    order_date      TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    status          VARCHAR(20) NOT NULL DEFAULT 'pending'
                    CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled')),
    total_amount    DECIMAL(12, 2) NOT NULL CHECK (total_amount >= 0),
    created_at      TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_order_date ON orders(order_date);

ERD Conventions

Notation Styles

StyleCardinalitySử dụng
ChenDiamonds cho relationshipsAcademic, conceptual
Crow's FootCrow's foot symbolsIndustry standard
UMLMultiplicity numbersSoftware engineering
IDEF1XDots và linesGovernment, defense
┌─────────────────────────────────────────────────────────────────┐
│                  CROW'S FOOT NOTATION                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Symbol          Meaning              Example                   │
│  ──────          ───────              ───────                   │
│                                                                 │
│    ──┼──         One (mandatory)      Customer must exist       │
│                                                                 │
│    ──○──         Zero or One          Address is optional       │
│                                                                 │
│    ──┼<          One or Many          Order has items           │
│                                                                 │
│    ──○<          Zero or Many         Customer may have orders  │
│                                                                 │
│                                                                 │
│  RELATIONSHIP EXAMPLES:                                         │
│                                                                 │
│  ┌──────────┐              ┌──────────┐                         │
│  │ Customer │──────┼──○<───│  Order   │                         │
│  └──────────┘              └──────────┘                         │
│  "One customer has zero or many orders"                         │
│                                                                 │
│  ┌──────────┐              ┌──────────┐                         │
│  │  Order   │──────┼──┼<───│OrderItem │                         │
│  └──────────┘              └──────────┘                         │
│  "One order has one or many order items"                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Best Practices

Naming Conventions

⚠️ Consistency is King

Chọn một convention và stick với nó. Mixing conventions là nguồn gốc của bugs và confusion.

ElementConventionVí dụ
Tablessnake_case, pluralcustomers, order_items
Columnssnake_case, singularcustomer_id, created_at
Primary Keys{table_singular}_idcustomer_id, order_id
Foreign Keys{referenced_table}_idcustomer_id in orders
Indexesidx_{table}_idx_orders_customer_id
Constraints{type}{table}chk_orders_status

Common Anti-Patterns

Anti-PatternVấn đềGiải pháp
God TableMột table chứa mọi thứNormalize, tách entities
Polymorphic FKFK trỏ đến nhiều tablesSeparate tables hoặc inheritance
EAV ModelEntity-Attribute-ValueProper schema design
Soft Deletes Everywhereis_deleted flagsAudit tables, temporal tables
No ConstraintsMissing FK, CHECKAlways define constraints

🔍 Common Review Comments

⚠️ Design Review Feedback

Những comments dưới đây thường xuất hiện trong design review sessions. Hiểu và anticipate chúng sẽ giúp bạn tránh iteration loops tốn thời gian.

Review CommentÝ nghĩaCách khắc phục
"Grain statement ở đâu?"Reviewer không thấy declaration rõ ràng về granularityLuôn khai báo grain ở đầu mỗi entity/table design
"FK này tạo tight coupling cross bounded contexts"Foreign key connects hai domains khác nhau → breaking microservice boundariesSử dụng eventual consistency hoặc domain events thay vì hard FK
"Natural key vs surrogate key decision chưa documented"Trade-off giữa business meaning và technical flexibility chưa rõDocument decision trong ADR, default to surrogate cho flexibility
"Cardinality này sẽ explode khi scale"1:N relationship có thể trở thành 1:1M → performance issuesModel với partition/sharding strategy, hoặc bucket pattern
"Missing audit columns"Không có created_at, updated_at, created_byThêm audit columns cho mọi entity table, không chỉ fact tables
"Naming convention không consistent"Mix snake_case và camelCase, singular và pluralEstablish và enforce naming standard trong team
"Đây là God Entity — nên decompose"Entity có quá nhiều attributes, violates SRPTách thành multiple focused entities với clear boundaries

⚠️ Anti-patterns

❌ Architectural Debt

Những anti-patterns này không chỉ là bad practice — chúng tạo ra technical debt tích lũy theo thời gian, dẫn đến incidents, data corruption, và refactoring projects tốn millions.

Anti-pattern 1: God Table

┌─────────────────────────────────────────────────────────────────┐
│                    GOD TABLE ANTI-PATTERN                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ❌ PROBLEM:                                                    │
│  CREATE TABLE everything (                                      │
│      id, customer_name, order_date, product_name,               │
│      shipping_address, payment_method, inventory_count,         │
│      supplier_contact, employee_who_processed, ...              │
│      -- 100+ columns from different domains                     │
│  );                                                             │
│                                                                 │
│  CONSEQUENCES:                                                  │
│  • Lock contention: Mọi transaction compete for same table      │
│  • Schema coupling: Change ở domain A affects domain B          │
│  • Query performance: Full scans với 100+ columns               │
│  • Team conflicts: Multiple teams editing same table            │
│  • Migration nightmare: Mọi change là high-risk                 │
│                                                                 │
│  REAL INCIDENT: E-commerce company với 150-column orders table  │
│  → 30-minute checkout outages during peak sales                 │
│  → 6-month refactoring project to decompose                     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Anti-pattern 2: Polymorphic Foreign Key

┌─────────────────────────────────────────────────────────────────┐
│               POLYMORPHIC FK ANTI-PATTERN                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ❌ PROBLEM:                                                    │
│  CREATE TABLE comments (                                        │
│      id BIGINT PRIMARY KEY,                                     │
│      commentable_type VARCHAR(50),  -- 'Post', 'Product', etc.  │
│      commentable_id BIGINT,         -- Could be post or product │
│      content TEXT                                               │
│  );                                                             │
│                                                                 │
│  CONSEQUENCES:                                                  │
│  • No referential integrity: Database cannot enforce FK         │
│  • Query complexity: CASE statements everywhere                 │
│  • Index inefficiency: Composite index on (type, id) required   │
│  • Type safety gone: Application bugs cause orphan records      │
│  • Schema discovery: Impossible to understand relationships     │
│                                                                 │
│  BETTER APPROACH:                                               │
│  • Separate tables: post_comments, product_comments             │
│  • Or: Proper inheritance pattern (CTI, STI)                    │
│  • Or: Reference table với union view                           │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Anti-pattern 3: Entity-Attribute-Value (EAV)

┌─────────────────────────────────────────────────────────────────┐
│                    EAV ANTI-PATTERN                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ❌ PROBLEM:                                                    │
│  CREATE TABLE entity_attributes (                               │
│      entity_id BIGINT,                                          │
│      attribute_name VARCHAR(100),                               │
│      attribute_value TEXT                                       │
│  );                                                             │
│                                                                 │
│  -- Query: Get all attributes for entity 123                    │
│  -- Result: 50 rows that need to be pivoted                     │
│                                                                 │
│  CONSEQUENCES:                                                  │
│  • Query hell: Simple SELECT becomes pivot nightmare            │
│  • No type safety: Everything is TEXT                           │
│  • No constraints: Cannot enforce required fields               │
│  • Index useless: Every query is full scan                      │
│  • Reporting impossible: BI tools cannot understand schema      │
│  • Storage bloat: Attribute names repeated per row              │
│                                                                 │
│  WHEN ACCEPTABLE:                                               │
│  • Truly dynamic user-defined fields (rare)                     │
│  • Feature flags / configuration (limited scope)                │
│  • NEVER for core business entities                             │
│                                                                 │
│  BETTER: JSONB column cho flexible fields + proper schema       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Anti-pattern 4: Soft Deletes Everywhere

┌─────────────────────────────────────────────────────────────────┐
│               SOFT DELETE ANTI-PATTERN                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ❌ PROBLEM:                                                    │
│  ALTER TABLE customers ADD COLUMN is_deleted BOOLEAN;           │
│  ALTER TABLE orders ADD COLUMN is_deleted BOOLEAN;              │
│  ALTER TABLE products ADD COLUMN is_deleted BOOLEAN;            │
│  -- ... every table gets is_deleted                             │
│                                                                 │
│  CONSEQUENCES:                                                  │
│  • Query pollution: WHERE is_deleted = FALSE everywhere         │
│  • Forgotten filters: Bugs showing deleted data                 │
│  • Unique constraints broken: Deleted email blocks new signup   │
│  • Storage bloat: Deleted data stays forever                    │
│  • Compliance risk: GDPR requires actual deletion               │
│  • Index bloat: Deleted rows still in indexes                   │
│                                                                 │
│  BETTER APPROACHES:                                             │
│  • Audit/History tables: Move deleted to separate table         │
│  • Temporal tables: Built-in versioning (SQL:2011)              │
│  • Event sourcing: Immutable event log                          │
│  • Hard delete + audit log: Delete real data, log event         │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Anti-pattern 5: No Constraints

┌─────────────────────────────────────────────────────────────────┐
│               MISSING CONSTRAINTS ANTI-PATTERN                  │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ❌ PROBLEM:                                                    │
│  CREATE TABLE orders (                                          │
│      id BIGINT,            -- No PRIMARY KEY!                   │
│      customer_id BIGINT,   -- No FOREIGN KEY!                   │
│      status VARCHAR(50),   -- No CHECK constraint!              │
│      total DECIMAL         -- Can be negative!                  │
│  );                                                             │
│  -- "We validate in the application"                            │
│                                                                 │
│  CONSEQUENCES:                                                  │
│  • Orphan records: Orders without customers                     │
│  • Invalid states: status = 'shiped' (typo)                     │
│  • Data corruption: Negative totals, duplicate IDs              │
│  • Trust erosion: No one trusts the data for reporting          │
│  • Debugging nightmare: "Where did this bad data come from?"    │
│                                                                 │
│  REALITY: Applications have bugs. Databases should be the       │
│  last line of defense for data integrity.                       │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

🔄 Migration Notes

Transitioning Between Model Layers

┌─────────────────────────────────────────────────────────────────┐
│               MODEL LAYER TRANSITIONS                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  CONCEPTUAL → LOGICAL:                                          │
│  ─────────────────────                                          │
│  1. Identify primary keys cho mỗi entity                        │
│  2. Resolve M:N relationships thành junction tables             │
│  3. Add foreign keys và cardinality constraints                 │
│  4. Apply normalization (target 3NF minimum)                    │
│  5. Document business rules as constraints                      │
│                                                                 │
│  COMMON PITFALLS:                                               │
│  • Skipping M:N resolution → schema không implementable         │
│  • Ignoring cardinality → wrong FK directions                   │
│  • Over-normalizing → join hell                                 │
│                                                                 │
│  LOGICAL → PHYSICAL:                                            │
│  ─────────────────────                                          │
│  1. Choose database-specific types (VARCHAR(255), BIGINT, etc.) │
│  2. Design indexes based on query patterns                      │
│  3. Plan partitioning strategy cho large tables                 │
│  4. Add audit columns (created_at, updated_at)                  │
│  5. Consider denormalization cho hot paths                      │
│                                                                 │
│  COMMON PITFALLS:                                               │
│  • Over-indexing → write performance degradation                │
│  • Wrong data types → storage bloat, comparison issues          │
│  • Ignoring partitioning → table becomes unmanageable           │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Schema Evolution Considerations

PhaseBackward Compatible?Key Consideration
Conceptual changesN/ABusiness alignment, stakeholder buy-in
Logical changesDocument impactData contract implications
Physical changesCriticalZero-downtime migration required

💡 Migration Best Practice

Khi migrate từ legacy schema, luôn sử dụng expand-contract pattern:

  1. Expand: Add new columns/tables alongside old
  2. Migrate: Backfill data, update applications
  3. Contract: Remove deprecated elements after grace period

📎 Cross-References