How does MinIO AIStor handle the data I/O path internally?

Asked by muratkars Answered by muratkars January 4, 2026
0 views

Understanding MinIO AIStor’s internal data I/O path is essential for architects and operators who need to understand how the system achieves durability, consistency, and performance in distributed object storage.

Answer

MinIO AIStor implements a distributed, erasure-coded object storage system with strong consistency through quorum-based operations. The architecture ensures data durability through Reed-Solomon erasure coding while maintaining strict consistency guarantees via quorum validation on both read and write paths.


Object Write Flow

The write path ensures atomic, durable commits with erasure coding protection.

Write Sequence

Client Request
┌─────────────────────┐
│ Put Object Handler │
└─────────────────────┘
┌─────────────────────┐
│ Namespace Lock │ ← Lock on bucket/object
└─────────────────────┘
┌─────────────────────┐
│ Erasure Encoding │ ← Reed-Solomon: data + parity shards
└─────────────────────┘
┌─────────────────────┐
│ Parallel Write │ ← Workers write to all online disks
└─────────────────────┘
┌─────────────────────┐
│ Quorum Validation │ ← Verify write quorum met
└─────────────────────┘
┌─────────────────────┐
│ Atomic Commit │ ← Rename temp → final, write xl.meta
└─────────────────────┘

Write Steps

StepOperationDescription
1EntryRequest arrives at Put Object handler
2Lock AcquisitionNamespace lock obtained on bucket/object path
3Erasure EncodingData split into dataBlocks shards with parityBlocks parity via Reed-Solomon
4Parallel WriteWriter workers write encoded shards to all online disks concurrently
5Quorum ValidationVerify writes succeeded on ≥ Write Quorum disks
6Atomic CommitRename temp data to final location, write xl.meta metadata
7Post-WriteFailed partial writes queued to MRF (Most Recent Failures) for healing

Write Quorum Calculation[1]

Write Quorum = dataBlocks
(or dataBlocks + 1 if dataBlocks == parityBlocks)

Example with EC:4 (4 data + 4 parity on 8 disks):

  • Write Quorum = 4 + 1 = 5 disks must succeed

Object Read Flow

The read path prioritizes consistency and enables on-read healing for corrupted data.

Read Sequence

Client Request
┌─────────────────────┐
│ Get Object Handler │
└─────────────────────┘
┌─────────────────────┐
│ Parallel Meta Read │ ← Read xl.meta from all disks
└─────────────────────┘
┌─────────────────────┐
│ Quorum Selection │ ← Determine latest valid metadata
└─────────────────────┘
┌─────────────────────┐
│ Erasure Decoding │ ← Read from dataBlocks disks
└─────────────────────┘
┌─────────────────────┐
│ On-Read Healing │ ← Queue corrupted shards for repair
└─────────────────────┘
┌─────────────────────┐
│ Return Data │
└─────────────────────┘

Read Steps

StepOperationDescription
1EntryRequest arrives at Get Object handler
2Metadata ReadRead workers fetch xl.meta from all disks in parallel
3Quorum SelectionMinIO algorithm determines latest valid metadata version
4Erasure DecodingParallelized read from dataBlocks disks to reconstruct object
5On-Read HealingCorrupted shards detected via bitrot checksums queued for repair

Read Quorum Calculation[2]

Read Quorum = totalDisks - parityBlocks

Example with EC:4 (8 total disks, 4 parity):

  • Read Quorum = 8 - 4 = 4 disks must be available

Consistency Guarantees

MinIO provides strong consistency through quorum-based operations.

GuaranteeBehavior
Write ConsistencySucceeds only when data + metadata committed to ≥ Write Quorum disks
Read ConsistencyRequires ≥ Read Quorum disks available with valid data
AtomicityPartial writes never visible to readers; failures trigger full rollback
DurabilityData survives up to parityBlocks disk failures

Consistency Model

Strong Consistency:
- Read-after-write: Guaranteed (same or different client)
- List-after-write: Guaranteed
- No stale reads: Quorum ensures latest committed version
Failure Handling:
- Write failure → Full rollback, no partial data visible
- Read with degraded disks → Reconstruct from available shards
- Bitrot detection → On-read healing queues repairs

Key Components

xl.meta

The metadata file stored alongside each object containing:

  • Object version information
  • Erasure coding parameters
  • Checksum data for bitrot detection
  • Part information for multipart uploads

MRF (Most Recent Failures)

A queue system that tracks:

  • Partial operations that achieved quorum but didn’t write to all disks
  • Detected corruptions for background healing
  • Ensures eventual consistency for degraded operations

Namespace Locking

Distributed locking mechanism that:

  • Prevents concurrent writes to same object
  • Ensures serializable operations
  • Coordinates across all nodes in the erasure set

Performance Characteristics

OperationParallelismLimiting Factor
WriteAll disks written concurrentlySlowest disk in quorum
ReaddataBlocks disks read concurrentlyReconstruction overhead if degraded
MetadataAll disks queried in parallelQuorum response time

Optimization Tips

  • Balanced erasure sets: Ensure similar disk performance within each set
  • Network bandwidth: Size network for parallel shard transfers
  • Disk health: Monitor for slow disks that impact quorum operations

Source Code References
  1. cmd/erasure.go:71-77 - defaultWQuorum(): Write quorum = dataCount (or dataCount + 1 if dataCount == parityCount)
  2. cmd/erasure.go:80-82 - defaultRQuorum(): Read quorum = setDriveCount - defaultParityCount
0