How does MinIO AIStor handle the data I/O path internally?

Understanding MinIO AIStor’s internal data I/O path is essential for architects and operators who need to understand how the system achieves durability, consistency, and performance in distributed object storage.

Answer

MinIO AIStor implements a distributed, erasure-coded object storage system with strong consistency through quorum-based operations. The architecture ensures data durability through Reed-Solomon erasure coding while maintaining strict consistency guarantees via quorum validation on both read and write paths.

Object Write Flow

The write path ensures atomic, durable commits with erasure coding protection.

Write Sequence

Client Request
     │
     ▼
┌─────────────────────┐
│  Put Object Handler │
└─────────────────────┘
     │
     ▼
┌─────────────────────┐
│  Namespace Lock     │  ← Lock on bucket/object
└─────────────────────┘
     │
     ▼
┌─────────────────────┐
│  Erasure Encoding   │  ← Reed-Solomon: data + parity shards
└─────────────────────┘
     │
     ▼
┌─────────────────────┐
│  Parallel Write     │  ← Workers write to all online disks
└─────────────────────┘
     │
     ▼
┌─────────────────────┐
│  Quorum Validation  │  ← Verify write quorum met
└─────────────────────┘
     │
     ▼
┌─────────────────────┐
│  Atomic Commit      │  ← Rename temp → final, write xl.meta
└─────────────────────┘

Write Steps

Step	Operation	Description
1	Entry	Request arrives at Put Object handler
2	Lock Acquisition	Namespace lock obtained on bucket/object path
3	Erasure Encoding	Data split into `dataBlocks` shards with `parityBlocks` parity via Reed-Solomon
4	Parallel Write	Writer workers write encoded shards to all online disks concurrently
5	Quorum Validation	Verify writes succeeded on ≥ Write Quorum disks
6	Atomic Commit	Rename temp data to final location, write `xl.meta` metadata
7	Post-Write	Failed partial writes queued to MRF (Most Recent Failures) for healing

Write Quorum Calculation^[1]

Write Quorum = dataBlocks
             (or dataBlocks + 1 if dataBlocks == parityBlocks)

Example with EC:4 (4 data + 4 parity on 8 disks):

Write Quorum = 4 + 1 = 5 disks must succeed

Object Read Flow

The read path prioritizes consistency and enables on-read healing for corrupted data.

Read Sequence

Client Request
     │
     ▼
┌─────────────────────┐
│  Get Object Handler │
└─────────────────────┘
     │
     ▼
┌─────────────────────┐
│  Parallel Meta Read │  ← Read xl.meta from all disks
└─────────────────────┘
     │
     ▼
┌─────────────────────┐
│  Quorum Selection   │  ← Determine latest valid metadata
└─────────────────────┘
     │
     ▼
┌─────────────────────┐
│  Erasure Decoding   │  ← Read from dataBlocks disks
└─────────────────────┘
     │
     ▼
┌─────────────────────┐
│  On-Read Healing    │  ← Queue corrupted shards for repair
└─────────────────────┘
     │
     ▼
┌─────────────────────┐
│  Return Data        │
└─────────────────────┘

Read Steps

Step	Operation	Description
1	Entry	Request arrives at Get Object handler
2	Metadata Read	Read workers fetch `xl.meta` from all disks in parallel
3	Quorum Selection	MinIO algorithm determines latest valid metadata version
4	Erasure Decoding	Parallelized read from `dataBlocks` disks to reconstruct object
5	On-Read Healing	Corrupted shards detected via bitrot checksums queued for repair

Read Quorum Calculation^[2]

Read Quorum = totalDisks - parityBlocks

Example with EC:4 (8 total disks, 4 parity):

Read Quorum = 8 - 4 = 4 disks must be available

Consistency Guarantees

MinIO provides strong consistency through quorum-based operations.

Guarantee	Behavior
Write Consistency	Succeeds only when data + metadata committed to ≥ Write Quorum disks
Read Consistency	Requires ≥ Read Quorum disks available with valid data
Atomicity	Partial writes never visible to readers; failures trigger full rollback
Durability	Data survives up to `parityBlocks` disk failures

Consistency Model

Strong Consistency:
- Read-after-write: Guaranteed (same or different client)
- List-after-write: Guaranteed
- No stale reads: Quorum ensures latest committed version

Failure Handling:
- Write failure → Full rollback, no partial data visible
- Read with degraded disks → Reconstruct from available shards
- Bitrot detection → On-read healing queues repairs

Key Components

xl.meta

The metadata file stored alongside each object containing:

Object version information
Erasure coding parameters
Checksum data for bitrot detection
Part information for multipart uploads

MRF (Most Recent Failures)

A queue system that tracks:

Partial operations that achieved quorum but didn’t write to all disks
Detected corruptions for background healing
Ensures eventual consistency for degraded operations

Namespace Locking

Distributed locking mechanism that:

Prevents concurrent writes to same object
Ensures serializable operations
Coordinates across all nodes in the erasure set

Performance Characteristics

Operation	Parallelism	Limiting Factor
Write	All disks written concurrently	Slowest disk in quorum
Read	`dataBlocks` disks read concurrently	Reconstruction overhead if degraded
Metadata	All disks queried in parallel	Quorum response time

Optimization Tips

Balanced erasure sets: Ensure similar disk performance within each set
Network bandwidth: Size network for parallel shard transfers
Disk health: Monitor for slow disks that impact quorum operations

Source Code References

cmd/erasure.go:71-77 - defaultWQuorum(): Write quorum = dataCount (or dataCount + 1 if dataCount == parityCount)
cmd/erasure.go:80-82 - defaultRQuorum(): Read quorum = setDriveCount - defaultParityCount