What APIs are available for maintenance automation and operational tooling?

Asked by muratkars Answered by muratkars July 17, 2025
0 views

Understanding MinIO’s operational APIs is essential for building robust automation, ensuring safe maintenance procedures, and integrating with enterprise monitoring and orchestration systems.

This question addresses:

  • Available APIs for operational automation
  • Safe node maintenance procedures
  • Durability and availability queries
  • Programmatic repair operations

Answer

MinIO follows an API-first philosophy where everything is accessible programmatically, enabling complete automation of operational tasks.

Available APIs and Tools

1. mc admin CLI

  • Command-line interface for all admin operations
  • REST API underneath for direct integration
  • Scriptable and automation-friendly
  • Idempotent operations for safety

2. madmin-go SDK

  • Native Go SDK for administrative operations
  • Full programmatic access to cluster management
  • Used by mc admin internally
  • Direct integration for Go applications

3. REST API

  • Direct HTTP/HTTPS endpoints
  • JSON responses for easy parsing
  • Authentication via access/secret keys
  • Language-agnostic integration

4. Prometheus Metrics Endpoint

  • Comprehensive metrics exposure
  • Real-time monitoring data
  • Grafana-ready dashboards
  • AlertManager integration support

Node Maintenance Operations

Query Before Maintenance:

Terminal window
# Check if node can be taken down safely
mc admin health myminio
# Check specific node status
mc admin info myminio --json | jq '.servers[] | select(.endpoint=="node1:9000")'
# Verify erasure set quorum
mc admin heal myminio --dry-run
# Check current operations
mc admin trace myminio

API Calls for Safety Checks:

// Using madmin-go SDK
import "github.com/minio/madmin-go"
// Initialize admin client
mdmClient, _ := madmin.New("minio:9000", "access", "secret", false)
// Check server health
health, _ := mdmClient.ServerHealthInfo(context.Background())
// Verify quorum status
for _, server := range health {
if server.State != "online" {
fmt.Printf("Warning: %s is %s\n", server.Endpoint, server.State)
}
}
// Check if maintenance is safe
canMaintain := health.MaintenanceSafe("node1:9000")

Readiness and Health Endpoints

Health Check Endpoints:

Terminal window
# Liveness probe - is the server running?
curl http://minio:9000/minio/health/live
# Readiness probe - can the server accept requests?
curl http://minio:9000/minio/health/ready
# Cluster readiness - is the cluster operational?
curl http://minio:9000/minio/health/cluster

Response Indicates:

  • HTTP 200: Safe to proceed with maintenance
  • HTTP 503: Would impact quorum/availability
  • Detailed JSON body with specifics

Repair Operations API

Trigger Targeted Healing:

Terminal window
# Heal specific bucket
mc admin heal myminio/bucket
# Heal specific object
mc admin heal myminio/bucket/object
# Heal entire node after maintenance
mc admin heal myminio --force-start
# Monitor healing progress
mc admin heal myminio --status

Programmatic Healing:

// Start healing operation
healStart, _ := mdmClient.Heal(context.Background(), "bucket", "prefix",
madmin.HealOpts{
Recursive: true,
DryRun: false,
Remove: false,
})
// Monitor progress
for {
healStatus, _ := mdmClient.HealStatus(context.Background(), healStart.Token)
if healStatus.Summary.State == "finished" {
break
}
time.Sleep(5 * time.Second)
}

Automation Best Practices

1. Pre-Maintenance Checks:

#!/bin/bash
# Automated maintenance script
# Check cluster health
if ! mc admin health myminio; then
echo "Cluster unhealthy, aborting"
exit 1
fi
# Verify quorum
QUORUM=$(mc admin info myminio --json | jq '.quorum_status')
if [ "$QUORUM" != "ok" ]; then
echo "Quorum at risk, aborting"
exit 1
fi
# Safe to proceed
echo "Starting maintenance on node1"

2. Idempotent Operations:

  • All admin commands are idempotent
  • Safe to retry on failure
  • No side effects from multiple executions

3. Monitoring Integration:

# Prometheus alert rule
- alert: MinIONodeDown
expr: minio_cluster_nodes_online < minio_cluster_nodes_total
for: 5m
annotations:
summary: "MinIO node offline"
description: "Node {{ $labels.server }} is offline"

Why MinIO Rarely Needs Maintenance

Important Note: MinIO typically does not require node maintenance, these operations should be limited to:

  • OS-level upgrades
  • Hardware replacements
  • Network infrastructure changes
  • Security patching

MinIO handles most operational tasks automatically:

  • Self-healing for bit rot
  • Automatic rebalancing
  • Online upgrades
  • Transparent failover

Comprehensive API Coverage

OperationCLISDKRESTPrometheus
Health Check
Heal Operations-
Node Status
Quorum Check
Performance
Configuration-

Real-World Automation Example

Automated Rolling Upgrade:

import requests
import time
import json
def safe_node_maintenance(node):
# Check if node can be removed
health = requests.get(f"http://{node}/minio/health/ready")
if health.status_code != 200:
return False
# Trigger pre-maintenance heal
requests.post(f"http://admin/heal/{node}")
# Wait for heal completion
while True:
status = requests.get(f"http://admin/heal/status")
if status.json()["state"] == "complete":
break
time.sleep(10)
# Safe to maintain
return True
# Process each node
for node in cluster_nodes:
if safe_node_maintenance(node):
perform_maintenance(node)
else:
log_error(f"Cannot maintain {node}")

Key Advantages

The comprehensive API coverage enables:

  • Full automation - No manual intervention needed
  • Safe operations - Pre-flight checks prevent issues
  • Integration flexibility - Works with any tooling
  • Observability - Complete visibility into operations
  • Script-friendly - Idempotent and predictable

This API-first approach makes MinIO ideal for modern DevOps practices, enabling infrastructure-as-code, GitOps workflows, and complete operational automation.

0