How does MinIO AIStor erasure coding work internally?

Asked by muratkars Answered by muratkars January 4, 2026
0 views

Understanding MinIO AIStor’s erasure coding implementation helps architects design resilient storage systems and operators troubleshoot data protection issues. This answer is based on the implementation by Klaus Post, the engineer who wrote MinIO’s Reed-Solomon erasure coding.

Answer

MinIO uses Klaus Post’s Reed-Solomon implementation with configurable data/parity ratios and deterministic shard placement. The system encodes data into multiple shards distributed across disks, enabling reconstruction even when multiple disks fail. Quorum-based operations ensure consistency while maximizing availability.


Encoding Logic

How Data Is Encoded

Original Object (e.g., 2 MB)
┌─────────────────────────────────┐
│ Split into dataBlocks shards │
└─────────────────────────────────┘
┌─────────────────────────────────┐
│ Reed-Solomon over GF(2^8) │ ← Galois Field 256
│ Compute parityBlocks shards │
└─────────────────────────────────┘
┌─────────────────────────────────┐
│ Write all shards to disks │
└─────────────────────────────────┘

Encoding Parameters

ParameterDescriptionTypical Value
dataBlocksNumber of data shardsVaries by EC config
parityBlocksNumber of parity shardsVaries by EC config
blockSize[1]Size of data block before splitting1 MiB default
shardSizeSize of each shardceil(blockSize / dataBlocks)

Shard Size Calculation

shardSize = ceil(blockSize / dataBlocks)
Example: 1 MB block with 8 data blocks
shardSize = ceil(1 MB / 8) = 128 KB per shard

Typical shard sizes range around 256 KB for smaller objects.

Library

MinIO uses the high-performance Reed-Solomon library:

github.com/klauspost/reedsolomon

Key features:

  • SIMD-optimized (AVX2, AVX512, NEON)
  • GF(2^8) - Galois Field with 256 elements
  • Concurrent encoding/decoding

Parity Block Placement

MinIO uses deterministic algorithms to distribute shards across disks.

Placement Algorithms

AlgorithmDescriptionUse Case
CRCMOD[2]Legacy CRC32-based rotationOlder deployments
SIPMOD[2]SipHash-2-4 with deployment IDCurrent default
SIPMOD+PARITY[2]Parity-aware distributionOptimized placement

Deterministic Ordering

Object Key + Deployment ID
┌─────────────────────────────────┐
│ Hash Function (SipHash-2-4) │
└─────────────────────────────────┘
┌─────────────────────────────────┐
│ CRC32 Rotation for Ordering │
└─────────────────────────────────┘
┌─────────────────────────────────┐
│ Map to Erasure Set │
└─────────────────────────────────┘
Disk 1 Disk 2 Disk 3 ... Disk N
[D1] [D2] [P1] ... [PN]

Why Deterministic Placement?

  • Consistent reads: Any node can locate shards without coordination
  • Balanced distribution: Objects spread evenly across disks
  • Predictable healing: System knows where shards should be

Quorum Mathematics

Definitions

N = Total Disks in Erasure Set
P = Parity Blocks
D = Data Blocks = N - P

Quorum Formulas[3]

Quorum TypeFormulaPurpose
Write QuorumD (or D + 1 if D == P)Minimum disks for successful write
Read QuorumD = N - PMinimum disks for reconstruction

Why +1 When D == P?

When data and parity blocks are equal (e.g., EC:4 on 8 disks), requiring exactly D disks creates ambiguity about which version is authoritative. Adding +1 ensures a clear majority.

Example Calculations

12 disks with 4 parity (EC:4):

N = 12, P = 4
D = 12 - 4 = 8
Write Quorum = 8 (D ≠ P, so just D)
Read Quorum = 8
Fault Tolerance: Can lose 4 disks and still operate
Storage Efficiency: 8/12 = 66.7% usable

8 disks with 4 parity (EC:4):

N = 8, P = 4
D = 8 - 4 = 4
Write Quorum = 5 (D == P, so D + 1)
Read Quorum = 4
Fault Tolerance: Can lose 4 disks for reads, 3 for writes
Storage Efficiency: 4/8 = 50% usable

16 disks with 4 parity (EC:4):

N = 16, P = 4
D = 16 - 4 = 12
Write Quorum = 12
Read Quorum = 12
Fault Tolerance: Can lose 4 disks
Storage Efficiency: 12/16 = 75% usable

Failure Handling

Error Categories

CategoryDescriptionResolution
BitrotSilent data corruption detected by checksumOn-read healing queues repair
Disk Not FoundDisk offline or failedReconstruct from available shards
Missing PartsPartial object dataHeal from parity shards
Checksum MismatchData integrity failureReconstruct and replace shard

Reconstruction Process

Available Shards < Total Shards?
├── No → Direct read (no reconstruction needed)
└── Yes → Available ≥ Read Quorum?
├── Yes → Reed-Solomon reconstruction
│ └── Rebuild missing shards
└── No → Read fails (insufficient data)

Reconstruction Requirements

  • Minimum shards needed: dataBlocks (Read Quorum)
  • Can reconstruct from: Any combination of data + parity shards
  • Performance impact: Reconstruction requires more CPU and I/O

Self-Test Verification[4]

MinIO runs comprehensive erasure coding verification at startup.

What Gets Tested

Startup Self-Test:
├── Test all (data, parity) configurations
│ ├── Data blocks: 2 to 14
│ └── Parity blocks: 1 to 8
├── Encode test data
├── Corrupt random shards
├── Verify reconstruction
└── Fatal error on any mismatch

Why Self-Test?

PurposeBenefit
Hardware validationDetects CPU/memory issues affecting encoding
Library verificationEnsures Reed-Solomon implementation is correct
Silent corruption preventionFatal error prevents serving corrupt data

Test Coverage

Configurations tested at startup:
- EC:2 through EC:8 parity levels
- Data blocks from 2 to 14
- All combinations within valid ranges
If ANY test fails → Server refuses to start

This aggressive testing ensures MinIO never operates with a faulty erasure coding implementation that could silently corrupt data.


Storage Efficiency Reference

Erasure Set SizeParityData BlocksEfficiencyFault Tolerance
8 disks2675%2 disk failures
8 disks4450%4 disk failures
12 disks4866.7%4 disk failures
16 disks41275%4 disk failures
16 disks8850%8 disk failures

Source Code References
  1. cmd/object-api-common.go:31 - blockSizeV2 = 1 * humanize.MiByte (default block size)
  2. cmd/format-erasure.go:47-53 - Distribution algorithms: CRCMOD, SIPMOD, SIPMOD+PARITY
  3. cmd/erasure.go:71-82 - defaultWQuorum() and defaultRQuorum() implementations
  4. cmd/erasure-coding.go:137-194 - erasureSelfTest() verifies all data/parity configurations at startup
0