Skip to content

Primitive Data Types Deep Dive

Mọi Rust type đều có Size và Alignment xác định — Không có exceptions

Tổng quan Primitive Types

Rust chia primitives thành 2 nhóm:

┌────────────────────────────────────────────────────────────────────┐
│                     PRIMITIVE TYPES                                 │
├──────────────────────────────┬─────────────────────────────────────┤
│         SCALAR               │           COMPOUND                  │
│   (Single value)             │   (Multiple values)                 │
├──────────────────────────────┼─────────────────────────────────────┤
│ • Integers (i8..i128,        │ • Tuples: (T1, T2, ...)            │
│            u8..u128)         │ • Arrays: [T; N]                   │
│ • Floats (f32, f64)          │                                     │
│ • Boolean (bool)             │   → Fixed size                     │
│ • Character (char)           │   → Stack allocated                │
└──────────────────────────────┴─────────────────────────────────────┘

Scalar Types: Integers

Integer Types Table

TypeSizeRangeUse Case
i81 byte-128 → 127Tiny signed values, FFI
u81 byte0 → 255Bytes, ASCII chars
i162 bytes-32,768 → 32,767Audio samples
u162 bytes0 → 65,535Unicode BMP, ports
i324 bytes-2.1B → 2.1BDefault integer
u324 bytes0 → 4.2BColors (RGBA), timestamps
i648 bytes±9.2 quintillionDatabase IDs
u648 bytes0 → 18.4 quintillionFile sizes, hashes
i12816 bytes±170 undecillionCrypto, UUID
u12816 bytes0 → 340 undecillionCrypto, UUID
isizePlatformDependsPointer arithmetic
usizePlatformDependsIndexing, lengths

Memory Layout

rust
let a: i8 = -1;      // 0xFF (two's complement)
let b: u8 = 255;     // 0xFF
let c: i32 = 42;     // 0x0000002A (little-endian: 2A 00 00 00)
Memory Layout (Little-Endian, như x86_64):

a: i8 = -1
┌────┐
│ FF │  ← 1 byte, two's complement
└────┘

b: u8 = 255
┌────┐
│ FF │  ← 1 byte, same bit pattern, different interpretation
└────┘

c: i32 = 42
┌────┬────┬────┬────┐
│ 2A │ 00 │ 00 │ 00 │  ← 4 bytes, least significant byte first
└────┴────┴────┴────┘
 Low              High
 Address          Address

Integer Overflow Behavior

rust
fn main() {
    let x: u8 = 255;
    
    // DEBUG mode: Panic!
    // let y = x + 1;  // thread 'main' panicked at 'attempt to add with overflow'
    
    // RELEASE mode: Wraps (two's complement)
    // let y = x + 1;  // y = 0
    
    // Explicit behavior:
    let wrapped = x.wrapping_add(1);   // 0 - always wraps
    let saturated = x.saturating_add(1); // 255 - clamps at max
    let checked = x.checked_add(1);    // None - returns Option
    let overflowed = x.overflowing_add(1); // (0, true) - returns tuple
    
    println!("wrapped: {}", wrapped);     // 0
    println!("saturated: {}", saturated); // 255
    println!("checked: {:?}", checked);   // None
    println!("overflowed: {:?}", overflowed); // (0, true)
}

⚠️ C/C++ KHÁC BIỆT

Trong C/C++, signed integer overflow là Undefined Behavior. Rust defines behavior:

  • Debug: Panic
  • Release: Wrapping (có thể config)

Điều này giúp tránh security vulnerabilities từ integer overflow.


Scalar Types: Floats (IEEE 754)

Float Types

TypeSizePrecisionRange
f324 bytes~7 decimal digits±3.4 × 10³⁸
f648 bytes~15 decimal digits±1.8 × 10³⁰⁸

Default float là f64 — đủ precision cho hầu hết use cases.

IEEE 754 Binary Representation

rust
let x: f32 = 3.14159;
f32 (32 bits) Layout:
┌─────┬──────────┬─────────────────────────┐
│ S   │ Exponent │       Mantissa          │
│ 1   │ 8 bits   │       23 bits           │
├─────┼──────────┼─────────────────────────┤
│  0  │ 10000000 │ 10010010000111111011011 │
└─────┴──────────┴─────────────────────────┘
  ↓       ↓              ↓
 Sign   128-127=1    1.mantissa
 (+)    (2^1)        (1.57079...)

Value = (-1)^S × 2^(E-127) × 1.M = 1 × 2 × 1.5707... ≈ 3.14159

Float Precision Traps

rust
fn main() {
    // CẢNH BÁO: Float equality có thể không như mong đợi!
    let a: f64 = 0.1 + 0.2;
    let b: f64 = 0.3;
    
    println!("a = {:.17}", a);  // 0.30000000000000004
    println!("b = {:.17}", b);  // 0.29999999999999999
    println!("a == b: {}", a == b);  // false!
    
    // ✅ ĐÚNG: So sánh với epsilon
    let epsilon = 1e-10;
    println!("Close enough: {}", (a - b).abs() < epsilon);  // true
}

Special Float Values

rust
fn main() {
    let inf: f64 = f64::INFINITY;      // 1.0 / 0.0
    let neg_inf: f64 = f64::NEG_INFINITY;
    let nan: f64 = f64::NAN;           // 0.0 / 0.0
    
    // NaN is NOT equal to anything, including itself!
    println!("NaN == NaN: {}", nan == nan);  // false
    println!("NaN.is_nan(): {}", nan.is_nan());  // true
}

Scalar Types: Boolean

rust
let t: bool = true;
let f: bool = false;
Bool Memory Layout:
┌────┐
│ 01 │  true  = 0x01
└────┘
┌────┐
│ 00 │  false = 0x00
└────┘
     1 byte (không phải 1 bit!)

Tại sao 1 byte thay vì 1 bit?

  • CPU addressable unit nhỏ nhất là 1 byte
  • Atomic operations yêu cầu byte-aligned data
  • Performance: Bit manipulation có overhead

💡 OPTIMIZATION

Khi cần nhiều booleans, dùng bitflags crate hoặc manual bit manipulation:

rust
let flags: u8 = 0b0000_0101;  // 8 booleans trong 1 byte

Scalar Types: Character (Unicode)

rust
let c: char = 'A';
let emoji: char = '🦀';
let chinese: char = '';

Tại sao char là 4 bytes?

Rust char đại diện cho Unicode Scalar Value — bất kỳ Unicode code point nào từ U+0000 đến U+10FFFF (trừ surrogates).

char Memory Layout (4 bytes = 32 bits):

'A' (U+0041):
┌────┬────┬────┬────┐
│ 41 │ 00 │ 00 │ 00 │  Little-endian
└────┴────┴────┴────┘

'🦀' (U+1F980 - Crab emoji):
┌────┬────┬────┬────┐
│ 80 │ F9 │ 01 │ 00 │  0x0001F980 in little-endian
└────┴────┴────┴────┘

'中' (U+4E2D - Chinese character):
┌────┬────┬────┬────┐
│ 2D │ 4E │ 00 │ 00 │  0x00004E2D
└────┴────┴────┴────┘

char vs String bytes

rust
fn main() {
    let c = '🦀';
    println!("char size: {} bytes", std::mem::size_of::<char>()); // 4
    
    let s = "🦀";
    println!("&str size: {} bytes", s.len()); // 4 (UTF-8 encoded)
    
    // UTF-8 encoding của 🦀
    for b in s.bytes() {
        print!("{:02X} ", b);  // F0 9F A6 80
    }
}

Quan trọng: char (4 bytes fixed) ≠ UTF-8 byte (1-4 bytes variable).


Compound Types: Tuples

Tuple gom nhiều values với types khác nhau:

rust
let tup: (i32, f64, u8) = (500, 6.4, 1);

Tuple Memory Layout

Tuple (i32, f64, u8) Memory Layout:

┌─────────────────────────────────────────────────────────┐
│                    (500, 6.4, 1)                        │
├────────────┬───────────────────────────┬───────┬────────┤
│   i32      │         f64               │  u8   │ padding│
│  500       │        6.4                │   1   │  (7B)  │
├────────────┼───────────────────────────┼───────┼────────┤
│ 4 bytes    │       8 bytes             │1 byte │ 7 bytes│
└────────────┴───────────────────────────┴───────┴────────┘
 Offset: 0          8                      16      17-23

Total: 24 bytes (với alignment padding)

Tại sao có padding?

  • f64 yêu cầu 8-byte alignment
  • i32 (4 bytes) + padding để f64 bắt đầu ở offset 8
  • Sau u8, padding để tổng size là bội số của largest alignment (8)

Tuple Access

rust
let tup = (500, 6.4, 1);

// Destructuring
let (x, y, z) = tup;

// Dot notation (0-indexed)
println!("{}, {}, {}", tup.0, tup.1, tup.2);

Compound Types: Arrays

Array là collection fixed-size, stack-allocated:

rust
let arr: [i32; 5] = [1, 2, 3, 4, 5];
let zeros: [u8; 100] = [0; 100];  // 100 zeros

Array Memory Layout

[i32; 5] = [1, 2, 3, 4, 5]

┌─────────┬─────────┬─────────┬─────────┬─────────┐
│  1      │  2      │  3      │  4      │  5      │
├─────────┼─────────┼─────────┼─────────┼─────────┤
│ 4 bytes │ 4 bytes │ 4 bytes │ 4 bytes │ 4 bytes │
└─────────┴─────────┴─────────┴─────────┴─────────┘
 Offset:0     4         8        12        16

Total: 20 bytes, CONTIGUOUS memory

Array vs Slice vs Vec

TypeSize known atAllocated onResizable
[T; N]Compile-timeStack
&[T]RuntimePoints to stack/heap
Vec<T>RuntimeHeap
rust
fn main() {
    let arr: [i32; 5] = [1, 2, 3, 4, 5];  // Stack: 20 bytes
    let slice: &[i32] = &arr[1..4];        // Fat pointer: 16 bytes (ptr + len)
    let vec: Vec<i32> = vec![1, 2, 3];     // Stack: 24 bytes (ptr + len + cap)
                                            // Heap: 12 bytes (data)
}

Bounds Checking

rust
fn main() {
    let arr = [1, 2, 3, 4, 5];
    
    let idx: usize = 10;
    // let val = arr[idx];  // Panic at runtime: index out of bounds
    
    // ✅ Safe access
    match arr.get(idx) {
        Some(val) => println!("Value: {}", val),
        None => println!("Index out of bounds"),
    }
}

💡 C/C++ KHÁC BIỆT

C/C++ không có bounds checking → buffer overflow vulnerabilities. Rust always checks bounds (có thể elided bởi optimizer khi provable).


Deep Dive: isizeusize

Architecture-Dependent Sizes

Architectureusizeisize
32-bit (x86, ARM32)4 bytes4 bytes
64-bit (x86_64, ARM64)8 bytes8 bytes

Tại sao tồn tại?

  • Indexing into memory cần pointer-sized integers
  • Array lengths, slice lengths → phải fit trong address space
  • Pointer arithmetic cần signed (isize) cho negative offsets

Use Cases

rust
fn main() {
    let arr = [1, 2, 3, 4, 5];
    
    // ✅ Array indexing PHẢI dùng usize
    let idx: usize = 2;
    println!("{}", arr[idx]);
    
    // ❌ KHÔNG COMPILE
    // let idx: u32 = 2;
    // println!("{}", arr[idx]);  // expected `usize`, found `u32`
    
    // ✅ Lengths trả về usize
    let len: usize = arr.len();
    
    // ✅ Pointer arithmetic
    let ptr = arr.as_ptr();
    unsafe {
        let offset: isize = 2;  // Có thể âm!
        let elem = *ptr.offset(offset);
    }
}

Portability Implications

rust
// ⚠️ CẢNH BÁO: Code này không portable!
fn bad_serialize(len: usize) -> [u8; 4] {
    (len as u32).to_le_bytes()  // Truncates on 64-bit!
}

// ✅ ĐÚNG: Serialize với size đủ lớn
fn good_serialize(len: usize) -> [u8; 8] {
    (len as u64).to_le_bytes()  // Always fits
}
32-bit system:
usize::MAX = 4,294,967,295 (4GB address space)
┌────────────────┐
│   4 bytes      │
└────────────────┘

64-bit system:
usize::MAX = 18,446,744,073,709,551,615 (16 exabytes)
┌────────────────────────────────────┐
│             8 bytes                │
└────────────────────────────────────┘

Memory Size Summary

rust
use std::mem::size_of;

fn main() {
    // Scalars
    println!("bool:  {} byte", size_of::<bool>());    // 1
    println!("i8:    {} byte", size_of::<i8>());      // 1
    println!("i32:   {} bytes", size_of::<i32>());    // 4
    println!("i64:   {} bytes", size_of::<i64>());    // 8
    println!("i128:  {} bytes", size_of::<i128>());   // 16
    println!("f32:   {} bytes", size_of::<f32>());    // 4
    println!("f64:   {} bytes", size_of::<f64>());    // 8
    println!("char:  {} bytes", size_of::<char>());   // 4
    println!("usize: {} bytes", size_of::<usize>()); // 8 (on 64-bit)
    
    // Compounds
    println!("[i32; 5]: {} bytes", size_of::<[i32; 5]>());  // 20
    println!("(i32, f64, u8): {} bytes", size_of::<(i32, f64, u8)>());  // 24
    
    // Unit type
    println!("(): {} bytes", size_of::<()>());  // 0 (ZST)
}

Bảng Tóm tắt

TypeSizeStack/HeapKey Insight
Integers1-16 bytesStackTwo's complement, overflow defined
Floats4-8 bytesStackIEEE 754, NaN special handling
bool1 byteStackNot 1 bit for alignment
char4 bytesStackUnicode scalar, not UTF-8 byte
TupleSum + paddingStackHeterogeneous, fixed size
ArrayN × sizeof(T)StackHomogeneous, fixed size
usize/isize4 or 8 bytesStackArchitecture-dependent