How does MinIO AIStor self-healing work internally?

Understanding MinIO AIStor’s self-healing process is essential for operators who need to ensure data integrity and manage recovery operations in distributed deployments.

Answer

MinIO implements background detection and repair of corrupted or missing data with persistent progress tracking. The self-healing system continuously monitors data integrity, detects various error conditions, and automatically reconstructs damaged or missing shards using erasure coding.

Detection

MinIO uses multiple detection mechanisms to identify data integrity issues.

Detection Functions

Function	Purpose	Scope
Heal Objects on Disk	Evaluates individual disk health	Per-disk scanning
Check Objects with All Parts	Validates metadata + parts integrity	Full object validation
Objects That Are Dangling	Identifies unrecoverable objects	Orphan cleanup
List Online Disks	Compares modTime across disks	Version consistency

Detection Flow

┌─────────────────────────────────────────────────────────┐
│                   Detection Layer                        │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  Scanner Process                                         │
│        │                                                 │
│        ▼                                                 │
│  ┌─────────────────────────────────────────────────┐    │
│  │  For each object in erasure set:                │    │
│  │                                                  │    │
│  │  1. Read xl.meta from all disks                 │    │
│  │  2. Compare modTime across disks                │    │
│  │  3. Validate metadata integrity                 │    │
│  │  4. Check all parts exist                       │    │
│  │  5. Verify checksums (deep mode)                │    │
│  └─────────────────────────────────────────────────┘    │
│        │                                                 │
│        ▼                                                 │
│  Error Detected? ──► Queue for Healing                  │
│                                                          │
└─────────────────────────────────────────────────────────┘

Error Types

The healing system detects and handles various error conditions.

Error Categories

Error Type	Description	Detection Method
File Not Found^[1]	Missing xl.meta metadata	Disk read failure
File Version Not Found^[1]	Missing specific version	Version lookup failure
File Corrupt^[1]	Corrupted xl.meta	Metadata parsing failure
Part Missing	Missing data part	Part enumeration
Part Corrupt	Corrupted data part	Checksum mismatch

Error Detection Matrix

┌─────────────────────────────────────────────────────────┐
│                    Error Detection                       │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  xl.meta Check                                           │
│       │                                                  │
│       ├── Not Found ────────► File Not Found            │
│       ├── Parse Error ──────► File Corrupt              │
│       └── Version Missing ──► File Version Not Found    │
│                                                          │
│  Parts Check                                             │
│       │                                                  │
│       ├── Part Missing ─────► Part Missing              │
│       └── Checksum Fail ────► Part Corrupt              │
│                                                          │
└─────────────────────────────────────────────────────────┘

Scan Modes

MinIO supports multiple scanning modes for different use cases.

Mode	Description	Use Case	Performance Impact
Normal Mode^[2]	Regular metadata scanning	Continuous background	Low
Deep Mode^[2]	Full checksum validation	Periodic integrity audit	High
Uncommitted Scan^[2]	Fast dangling data detection	Quick cleanup	Medium

Mode Comparison

Normal Mode (Default)
├── Reads xl.meta only
├── Compares versions across disks
├── Fast, low I/O impact
└── Detects: Missing files, version mismatches

Deep Mode (Thorough)
├── Reads xl.meta AND all parts
├── Validates every checksum
├── High I/O, CPU usage
└── Detects: Bitrot, silent corruption

Uncommitted Scan (Cleanup)
├── Scans for orphaned temp files
├── Identifies incomplete uploads
├── Medium I/O impact
└── Detects: Dangling data, failed writes

Healing Process

When issues are detected, MinIO executes a structured repair process.

Healing Steps

Step 1: Read xl.meta from all disks
              │
              ▼
Step 2: Determine read quorum
              │
              ▼
Step 3: Select latest metadata
              │
              ▼
Step 4: Check parts integrity
              │
              ▼
Step 5: Reconstruct missing shards
              │
              ▼
Step 6: Write to outdated disks
              │
              ▼
Step 7: Atomic commit to disks

Detailed Healing Flow

┌─────────────────────────────────────────────────────────┐
│                   Healing Process                        │
├─────────────────────────────────────────────────────────┤
│                                                          │
│  1. READ METADATA                                        │
│     └── Fetch xl.meta from all disks in erasure set    │
│                                                          │
│  2. QUORUM CHECK                                         │
│     └── Verify read quorum available                    │
│     └── If < quorum → Mark unrecoverable               │
│                                                          │
│  3. SELECT LATEST                                        │
│     └── Compare modTime across valid copies             │
│     └── Choose most recent as authoritative             │
│                                                          │
│  4. INTEGRITY CHECK                                      │
│     └── Validate all parts against metadata             │
│     └── Identify missing/corrupt parts                  │
│                                                          │
│  5. RECONSTRUCT                                          │
│     └── Read available shards (data + parity)           │
│     └── Reed-Solomon decode missing shards              │
│                                                          │
│  6. WRITE REPAIRS                                        │
│     └── Write reconstructed shards to affected disks    │
│     └── Update xl.meta on repaired disks                │
│                                                          │
│  7. ATOMIC COMMIT                                        │
│     └── Rename temp files to final location             │
│     └── Ensure all-or-nothing semantics                 │
│                                                          │
└─────────────────────────────────────────────────────────┘

Reconstruction Requirements

Scenario	Available Shards	Outcome
Full health	All N shards	No healing needed
Degraded	≥ D shards (read quorum)	Reconstruction possible
Critical	< D shards	Unrecoverable, logged as failed

Fresh Drive Healing

When a new or replacement drive is added, MinIO initiates a full drive healing process.

Progress Tracking Files^[3]

File	Purpose	Location
.healing.bin	State file tracking progress	Drive root
.healing.failed-list.json.zst	Compressed list of failed objects	Drive root

Drive Healing Flow

New/Replacement Drive Detected
              │
              ▼
┌─────────────────────────────────────────────────────────┐
│  Initialize Healing                                      │
│  └── Create .healing.bin state file                     │
└─────────────────────────────────────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────────────────────┐
│  Scan Erasure Set                                        │
│  └── Enumerate all objects that should exist on drive   │
└─────────────────────────────────────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────────────────────┐
│  For Each Object:                                        │
│  ├── Check if shard exists on new drive                 │
│  ├── If missing → Reconstruct from peers                │
│  ├── If corrupt → Replace with reconstructed            │
│  └── Update progress in .healing.bin                    │
└─────────────────────────────────────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────────────────────┐
│  Retry Failed Objects                                    │
│  └── Up to 5 attempts per object                        │
│  └── Failures logged to .healing.failed-list.json.zst  │
└─────────────────────────────────────────────────────────┘
              │
              ▼
┌─────────────────────────────────────────────────────────┐
│  Complete                                                │
│  └── Remove .healing.bin (success)                      │
│  └── Keep failed list for investigation                 │
└─────────────────────────────────────────────────────────┘

Retry Configuration

Parameter	Value	Description
Max Retries	5 attempts	Per-object retry limit
Failed List Format	JSON (zstd compressed)	Space-efficient storage
Progress Persistence	Continuous	Survives restarts

Background Healing Triggers

Healing can be triggered by multiple events:

Trigger	Description	Priority
On-Read Detection	Corruption found during read	High (immediate)
Scanner Cycle	Periodic background scan	Normal
Drive Replacement	New drive added to pool	High
Admin Command	Manual healing request	Configurable
MRF Queue	Failed write retry	Normal

Monitoring Healing

Key Metrics

Metric	Description	Alert Threshold
Healing Rate	Objects healed per second	Below expected rate
Failed Objects	Objects that couldn’t be healed	> 0
Queue Depth	Objects pending healing	Growing continuously
Drive Healing Progress	Percentage complete	Stalled progress

Health Check Commands

# Check healing status
mc admin heal ALIAS --dry-run

# Start manual healing
mc admin heal ALIAS/bucket --recursive

# View healing progress
mc admin heal ALIAS --verbose

Best Practices

Monitor failed lists: Investigate .healing.failed-list.json.zst for unrecoverable objects
Schedule deep scans: Run periodic deep mode scans during low-traffic periods
Replace failed drives promptly: Minimize time in degraded state
Size for healing bandwidth: Ensure network can handle reconstruction I/O
Track healing metrics: Alert on stalled or slow healing progress

Source Code References

cmd/storage-errors.go:70,75,108 - Error definitions: errFileNotFound, errFileVersionNotFound, errFileCorrupt
cmd/data-scanner.go:888-890 - Scan modes: HealNormalScan, HealDeepScan based on bitrot detection
cmd/background-newdisks-heal-ops.go:42-43 - healingTrackerFilename = ".healing.bin", healingTrackerFailedList = ".healing.failed-list.json.zst"