Giao diện
Primitive Data Types Deep Dive
Mọi Rust type đều có Size và Alignment xác định — Không có exceptions
Tổng quan Primitive Types
Rust chia primitives thành 2 nhóm:
┌────────────────────────────────────────────────────────────────────┐
│ PRIMITIVE TYPES │
├──────────────────────────────┬─────────────────────────────────────┤
│ SCALAR │ COMPOUND │
│ (Single value) │ (Multiple values) │
├──────────────────────────────┼─────────────────────────────────────┤
│ • Integers (i8..i128, │ • Tuples: (T1, T2, ...) │
│ u8..u128) │ • Arrays: [T; N] │
│ • Floats (f32, f64) │ │
│ • Boolean (bool) │ → Fixed size │
│ • Character (char) │ → Stack allocated │
└──────────────────────────────┴─────────────────────────────────────┘Scalar Types: Integers
Integer Types Table
| Type | Size | Range | Use Case |
|---|---|---|---|
i8 | 1 byte | -128 → 127 | Tiny signed values, FFI |
u8 | 1 byte | 0 → 255 | Bytes, ASCII chars |
i16 | 2 bytes | -32,768 → 32,767 | Audio samples |
u16 | 2 bytes | 0 → 65,535 | Unicode BMP, ports |
i32 | 4 bytes | -2.1B → 2.1B | Default integer |
u32 | 4 bytes | 0 → 4.2B | Colors (RGBA), timestamps |
i64 | 8 bytes | ±9.2 quintillion | Database IDs |
u64 | 8 bytes | 0 → 18.4 quintillion | File sizes, hashes |
i128 | 16 bytes | ±170 undecillion | Crypto, UUID |
u128 | 16 bytes | 0 → 340 undecillion | Crypto, UUID |
isize | Platform | Depends | Pointer arithmetic |
usize | Platform | Depends | Indexing, lengths |
Memory Layout
rust
let a: i8 = -1; // 0xFF (two's complement)
let b: u8 = 255; // 0xFF
let c: i32 = 42; // 0x0000002A (little-endian: 2A 00 00 00)Memory Layout (Little-Endian, như x86_64):
a: i8 = -1
┌────┐
│ FF │ ← 1 byte, two's complement
└────┘
b: u8 = 255
┌────┐
│ FF │ ← 1 byte, same bit pattern, different interpretation
└────┘
c: i32 = 42
┌────┬────┬────┬────┐
│ 2A │ 00 │ 00 │ 00 │ ← 4 bytes, least significant byte first
└────┴────┴────┴────┘
Low High
Address AddressInteger Overflow Behavior
rust
fn main() {
let x: u8 = 255;
// DEBUG mode: Panic!
// let y = x + 1; // thread 'main' panicked at 'attempt to add with overflow'
// RELEASE mode: Wraps (two's complement)
// let y = x + 1; // y = 0
// Explicit behavior:
let wrapped = x.wrapping_add(1); // 0 - always wraps
let saturated = x.saturating_add(1); // 255 - clamps at max
let checked = x.checked_add(1); // None - returns Option
let overflowed = x.overflowing_add(1); // (0, true) - returns tuple
println!("wrapped: {}", wrapped); // 0
println!("saturated: {}", saturated); // 255
println!("checked: {:?}", checked); // None
println!("overflowed: {:?}", overflowed); // (0, true)
}⚠️ C/C++ KHÁC BIỆT
Trong C/C++, signed integer overflow là Undefined Behavior. Rust defines behavior:
- Debug: Panic
- Release: Wrapping (có thể config)
Điều này giúp tránh security vulnerabilities từ integer overflow.
Scalar Types: Floats (IEEE 754)
Float Types
| Type | Size | Precision | Range |
|---|---|---|---|
f32 | 4 bytes | ~7 decimal digits | ±3.4 × 10³⁸ |
f64 | 8 bytes | ~15 decimal digits | ±1.8 × 10³⁰⁸ |
Default float là f64 — đủ precision cho hầu hết use cases.
IEEE 754 Binary Representation
rust
let x: f32 = 3.14159;f32 (32 bits) Layout:
┌─────┬──────────┬─────────────────────────┐
│ S │ Exponent │ Mantissa │
│ 1 │ 8 bits │ 23 bits │
├─────┼──────────┼─────────────────────────┤
│ 0 │ 10000000 │ 10010010000111111011011 │
└─────┴──────────┴─────────────────────────┘
↓ ↓ ↓
Sign 128-127=1 1.mantissa
(+) (2^1) (1.57079...)
Value = (-1)^S × 2^(E-127) × 1.M = 1 × 2 × 1.5707... ≈ 3.14159Float Precision Traps
rust
fn main() {
// CẢNH BÁO: Float equality có thể không như mong đợi!
let a: f64 = 0.1 + 0.2;
let b: f64 = 0.3;
println!("a = {:.17}", a); // 0.30000000000000004
println!("b = {:.17}", b); // 0.29999999999999999
println!("a == b: {}", a == b); // false!
// ✅ ĐÚNG: So sánh với epsilon
let epsilon = 1e-10;
println!("Close enough: {}", (a - b).abs() < epsilon); // true
}Special Float Values
rust
fn main() {
let inf: f64 = f64::INFINITY; // 1.0 / 0.0
let neg_inf: f64 = f64::NEG_INFINITY;
let nan: f64 = f64::NAN; // 0.0 / 0.0
// NaN is NOT equal to anything, including itself!
println!("NaN == NaN: {}", nan == nan); // false
println!("NaN.is_nan(): {}", nan.is_nan()); // true
}Scalar Types: Boolean
rust
let t: bool = true;
let f: bool = false;Bool Memory Layout:
┌────┐
│ 01 │ true = 0x01
└────┘
┌────┐
│ 00 │ false = 0x00
└────┘
1 byte (không phải 1 bit!)Tại sao 1 byte thay vì 1 bit?
- CPU addressable unit nhỏ nhất là 1 byte
- Atomic operations yêu cầu byte-aligned data
- Performance: Bit manipulation có overhead
💡 OPTIMIZATION
Khi cần nhiều booleans, dùng bitflags crate hoặc manual bit manipulation:
rust
let flags: u8 = 0b0000_0101; // 8 booleans trong 1 byteScalar Types: Character (Unicode)
rust
let c: char = 'A';
let emoji: char = '🦀';
let chinese: char = '中';Tại sao char là 4 bytes?
Rust char đại diện cho Unicode Scalar Value — bất kỳ Unicode code point nào từ U+0000 đến U+10FFFF (trừ surrogates).
char Memory Layout (4 bytes = 32 bits):
'A' (U+0041):
┌────┬────┬────┬────┐
│ 41 │ 00 │ 00 │ 00 │ Little-endian
└────┴────┴────┴────┘
'🦀' (U+1F980 - Crab emoji):
┌────┬────┬────┬────┐
│ 80 │ F9 │ 01 │ 00 │ 0x0001F980 in little-endian
└────┴────┴────┴────┘
'中' (U+4E2D - Chinese character):
┌────┬────┬────┬────┐
│ 2D │ 4E │ 00 │ 00 │ 0x00004E2D
└────┴────┴────┴────┘char vs String bytes
rust
fn main() {
let c = '🦀';
println!("char size: {} bytes", std::mem::size_of::<char>()); // 4
let s = "🦀";
println!("&str size: {} bytes", s.len()); // 4 (UTF-8 encoded)
// UTF-8 encoding của 🦀
for b in s.bytes() {
print!("{:02X} ", b); // F0 9F A6 80
}
}Quan trọng: char (4 bytes fixed) ≠ UTF-8 byte (1-4 bytes variable).
Compound Types: Tuples
Tuple gom nhiều values với types khác nhau:
rust
let tup: (i32, f64, u8) = (500, 6.4, 1);Tuple Memory Layout
Tuple (i32, f64, u8) Memory Layout:
┌─────────────────────────────────────────────────────────┐
│ (500, 6.4, 1) │
├────────────┬───────────────────────────┬───────┬────────┤
│ i32 │ f64 │ u8 │ padding│
│ 500 │ 6.4 │ 1 │ (7B) │
├────────────┼───────────────────────────┼───────┼────────┤
│ 4 bytes │ 8 bytes │1 byte │ 7 bytes│
└────────────┴───────────────────────────┴───────┴────────┘
Offset: 0 8 16 17-23
Total: 24 bytes (với alignment padding)Tại sao có padding?
f64yêu cầu 8-byte alignmenti32(4 bytes) + padding đểf64bắt đầu ở offset 8- Sau
u8, padding để tổng size là bội số của largest alignment (8)
Tuple Access
rust
let tup = (500, 6.4, 1);
// Destructuring
let (x, y, z) = tup;
// Dot notation (0-indexed)
println!("{}, {}, {}", tup.0, tup.1, tup.2);Compound Types: Arrays
Array là collection fixed-size, stack-allocated:
rust
let arr: [i32; 5] = [1, 2, 3, 4, 5];
let zeros: [u8; 100] = [0; 100]; // 100 zerosArray Memory Layout
[i32; 5] = [1, 2, 3, 4, 5]
┌─────────┬─────────┬─────────┬─────────┬─────────┐
│ 1 │ 2 │ 3 │ 4 │ 5 │
├─────────┼─────────┼─────────┼─────────┼─────────┤
│ 4 bytes │ 4 bytes │ 4 bytes │ 4 bytes │ 4 bytes │
└─────────┴─────────┴─────────┴─────────┴─────────┘
Offset:0 4 8 12 16
Total: 20 bytes, CONTIGUOUS memoryArray vs Slice vs Vec
| Type | Size known at | Allocated on | Resizable |
|---|---|---|---|
[T; N] | Compile-time | Stack | ❌ |
&[T] | Runtime | Points to stack/heap | ❌ |
Vec<T> | Runtime | Heap | ✅ |
rust
fn main() {
let arr: [i32; 5] = [1, 2, 3, 4, 5]; // Stack: 20 bytes
let slice: &[i32] = &arr[1..4]; // Fat pointer: 16 bytes (ptr + len)
let vec: Vec<i32> = vec![1, 2, 3]; // Stack: 24 bytes (ptr + len + cap)
// Heap: 12 bytes (data)
}Bounds Checking
rust
fn main() {
let arr = [1, 2, 3, 4, 5];
let idx: usize = 10;
// let val = arr[idx]; // Panic at runtime: index out of bounds
// ✅ Safe access
match arr.get(idx) {
Some(val) => println!("Value: {}", val),
None => println!("Index out of bounds"),
}
}💡 C/C++ KHÁC BIỆT
C/C++ không có bounds checking → buffer overflow vulnerabilities. Rust always checks bounds (có thể elided bởi optimizer khi provable).
Deep Dive: isize và usize
Architecture-Dependent Sizes
| Architecture | usize | isize |
|---|---|---|
| 32-bit (x86, ARM32) | 4 bytes | 4 bytes |
| 64-bit (x86_64, ARM64) | 8 bytes | 8 bytes |
Tại sao tồn tại?
- Indexing into memory cần pointer-sized integers
- Array lengths, slice lengths → phải fit trong address space
- Pointer arithmetic cần signed (
isize) cho negative offsets
Use Cases
rust
fn main() {
let arr = [1, 2, 3, 4, 5];
// ✅ Array indexing PHẢI dùng usize
let idx: usize = 2;
println!("{}", arr[idx]);
// ❌ KHÔNG COMPILE
// let idx: u32 = 2;
// println!("{}", arr[idx]); // expected `usize`, found `u32`
// ✅ Lengths trả về usize
let len: usize = arr.len();
// ✅ Pointer arithmetic
let ptr = arr.as_ptr();
unsafe {
let offset: isize = 2; // Có thể âm!
let elem = *ptr.offset(offset);
}
}Portability Implications
rust
// ⚠️ CẢNH BÁO: Code này không portable!
fn bad_serialize(len: usize) -> [u8; 4] {
(len as u32).to_le_bytes() // Truncates on 64-bit!
}
// ✅ ĐÚNG: Serialize với size đủ lớn
fn good_serialize(len: usize) -> [u8; 8] {
(len as u64).to_le_bytes() // Always fits
}32-bit system:
usize::MAX = 4,294,967,295 (4GB address space)
┌────────────────┐
│ 4 bytes │
└────────────────┘
64-bit system:
usize::MAX = 18,446,744,073,709,551,615 (16 exabytes)
┌────────────────────────────────────┐
│ 8 bytes │
└────────────────────────────────────┘Memory Size Summary
rust
use std::mem::size_of;
fn main() {
// Scalars
println!("bool: {} byte", size_of::<bool>()); // 1
println!("i8: {} byte", size_of::<i8>()); // 1
println!("i32: {} bytes", size_of::<i32>()); // 4
println!("i64: {} bytes", size_of::<i64>()); // 8
println!("i128: {} bytes", size_of::<i128>()); // 16
println!("f32: {} bytes", size_of::<f32>()); // 4
println!("f64: {} bytes", size_of::<f64>()); // 8
println!("char: {} bytes", size_of::<char>()); // 4
println!("usize: {} bytes", size_of::<usize>()); // 8 (on 64-bit)
// Compounds
println!("[i32; 5]: {} bytes", size_of::<[i32; 5]>()); // 20
println!("(i32, f64, u8): {} bytes", size_of::<(i32, f64, u8)>()); // 24
// Unit type
println!("(): {} bytes", size_of::<()>()); // 0 (ZST)
}Bảng Tóm tắt
| Type | Size | Stack/Heap | Key Insight |
|---|---|---|---|
| Integers | 1-16 bytes | Stack | Two's complement, overflow defined |
| Floats | 4-8 bytes | Stack | IEEE 754, NaN special handling |
| bool | 1 byte | Stack | Not 1 bit for alignment |
| char | 4 bytes | Stack | Unicode scalar, not UTF-8 byte |
| Tuple | Sum + padding | Stack | Heterogeneous, fixed size |
| Array | N × sizeof(T) | Stack | Homogeneous, fixed size |
| usize/isize | 4 or 8 bytes | Stack | Architecture-dependent |