What operational metrics (v3) are surfaced by the storage system?

Asked by muratkars Answered by muratkars July 17, 2025
0 views

Understanding the operational metrics available through MinIO’s v3 metrics endpoint is essential for monitoring, alerting, and optimizing storage system performance and health.

This question covers:

  • v3 metrics endpoint capabilities
  • Available metric categories and types
  • Prometheus integration via v3 API
  • Operational monitoring best practices

Answer

MinIO’s v3 metrics endpoint provides comprehensive operational metrics via Prometheus-compatible format, offering detailed visibility into storage system performance, health, and resource utilization.

V3 Metrics Endpoint Overview

Endpoint Access:

Terminal window
# v3 metrics endpoint
curl http://minio:9000/minio/v3/metrics/cluster
# Generate Prometheus configuration for v3
mc admin prometheus metrics myminio

Core System Metrics

Cluster Health:

  • minio_cluster_nodes_online - Number of online nodes
  • minio_cluster_nodes_total - Total configured nodes
  • minio_cluster_drive_errors_total - Drive error count
  • minio_cluster_drives_online - Online drives count
  • minio_cluster_drives_total - Total configured drives

Storage Capacity:

  • minio_cluster_capacity_usable_total - Total usable capacity
  • minio_cluster_capacity_usable_free - Available free capacity
  • minio_cluster_usage_object_total - Total object count
  • minio_cluster_usage_buckets_total - Total bucket count
  • minio_disk_storage_available - Per-disk available storage
  • minio_disk_storage_total - Per-disk total storage

Performance Metrics

Request Statistics:

  • minio_http_requests_total - Total HTTP requests by method
  • minio_http_request_duration_seconds - Request duration histogram
  • minio_http_requests_in_flight - Current concurrent requests
  • minio_http_traffic_sent_bytes - Outbound traffic
  • minio_http_traffic_received_bytes - Inbound traffic

Operation Performance:

  • minio_s3_requests_total - S3 API requests by operation
  • minio_s3_errors_total - S3 API errors by type
  • minio_s3_request_duration_seconds - S3 operation latency
  • minio_s3_time_to_first_byte_seconds - TTFB measurements

I/O Performance:

  • minio_disk_api_latency_microseconds - Disk operation latency
  • minio_disk_storage_used - Per-disk storage utilization
  • minio_inter_node_traffic_sent_bytes - Cluster replication traffic
  • minio_inter_node_traffic_received_bytes - Cluster healing traffic

Advanced Operational Metrics

Healing and Recovery:

  • minio_heal_objects_total - Objects requiring healing
  • minio_heal_objects_heal_total - Successfully healed objects
  • minio_heal_objects_error_total - Healing operation errors
  • minio_heal_time_last_activity_nano_seconds - Last healing activity

Cache Performance (AIStor):

  • minio_cache_hits_total - Cache hit count
  • minio_cache_misses_total - Cache miss count
  • minio_cache_data_served - Data served from cache
  • minio_cache_usage_percent - Cache utilization percentage

Security and Access:

  • minio_audit_failed_messages - Failed audit log messages
  • minio_audit_target_queue_length - Audit queue depth
  • minio_iam_since_last_sync_millis - IAM sync freshness
  • minio_bucket_replication_failed_bytes - Replication failures

Resource Utilization

Memory and CPU:

  • minio_go_routine_total - Active goroutines
  • minio_process_resident_memory_bytes - Memory usage
  • minio_process_cpu_total_seconds - CPU utilization
  • minio_process_io_read_bytes - Process I/O read
  • minio_process_io_write_bytes - Process I/O write

Network Statistics:

  • minio_network_sent_bytes_total - Network bytes sent
  • minio_network_received_bytes_total - Network bytes received
  • minio_network_rpc_errors_total - RPC communication errors

Prometheus Configuration

V3 Metrics Endpoint:

Terminal window
# Access v3 metrics endpoint
curl http://minio:9000/minio/v3/metrics/cluster
# Generate v3 Prometheus configuration
mc admin prometheus metrics myminio
# Enable specific v3 metric types
mc admin prometheus metrics myminio --type cluster,node,bucket

Sample Prometheus Config for V3:

scrape_configs:
- job_name: 'minio-v3'
metrics_path: /minio/v3/metrics/cluster
scheme: http
static_configs:
- targets: ['minio1:9000', 'minio2:9000', 'minio3:9000']
scrape_interval: 30s
scrape_timeout: 10s

Key Alert Rules

Critical System Alerts:

groups:
- name: minio-alerts
rules:
# Node availability
- alert: MinIONodeDown
expr: minio_cluster_nodes_online < minio_cluster_nodes_total
for: 5m
annotations:
summary: "MinIO cluster has offline nodes"
# Storage capacity
- alert: MinIOStorageUsageHigh
expr: (1 - minio_cluster_capacity_usable_free / minio_cluster_capacity_usable_total) > 0.85
for: 10m
annotations:
summary: "MinIO storage usage above 85%"
# Performance degradation
- alert: MinIOHighLatency
expr: histogram_quantile(0.95, minio_s3_request_duration_seconds) > 1
for: 5m
annotations:
summary: "MinIO request latency high"

Grafana Dashboard Metrics

Essential Dashboard Panels:

  1. Cluster Overview:

    • minio_cluster_nodes_online/minio_cluster_nodes_total * 100 (Node health %)
    • minio_cluster_capacity_usable_free (Available capacity)
    • rate(minio_s3_requests_total[5m]) (Request rate)
  2. Performance Monitoring:

    • histogram_quantile(0.95, minio_s3_request_duration_seconds) (95th percentile latency)
    • rate(minio_http_traffic_sent_bytes[5m]) (Throughput)
    • minio_http_requests_in_flight (Concurrent connections)
  3. Resource Utilization:

    • minio_process_resident_memory_bytes (Memory usage)
    • rate(minio_process_cpu_total_seconds[5m]) (CPU usage)
    • minio_disk_storage_used/minio_disk_storage_total * 100 (Disk utilization)

Advanced Metrics Analysis

Capacity Planning:

# Predict capacity exhaustion
predict_linear(minio_cluster_capacity_usable_free[7d], 30*24*3600) < 0
# Growth rate analysis
rate(minio_cluster_usage_object_total[1h]) * 24 * 30

Performance Analysis:

# Request distribution by operation
sum(rate(minio_s3_requests_total[5m])) by (api)
# Error rate by status code
sum(rate(minio_s3_errors_total[5m])) by (error_code)
# Bandwidth utilization per node
sum(rate(minio_http_traffic_sent_bytes[5m])) by (server)

Custom Metrics and Labels

Metric Labels:

  • server - MinIO server endpoint
  • api - S3 API operation (GetObject, PutObject, etc.)
  • bucket - Bucket name for bucket-specific metrics
  • error_code - HTTP/S3 error codes
  • drive - Individual drive identifier

Business Metrics:

Terminal window
# Configure custom labels
mc admin prometheus generate myminio \
--include-labels "bucket,api,server"
# Add business context
mc admin config set myminio prometheus \
job_id="production-cluster" \
site="datacenter-1"

Metrics Collection Best Practices

Scraping Configuration:

  • Scrape interval: 30-60 seconds for most metrics
  • High-frequency metrics: 15 seconds for performance critical
  • Long-term storage: Use recording rules for aggregations
  • Retention: Configure appropriate retention policies

Performance Impact:

  • Metrics collection: <1% CPU overhead
  • Network usage: ~10KB per scrape
  • Recommended: Dedicated monitoring network

Integration Examples

Alertmanager Integration:

# Critical storage alert
- alert: MinIOCriticalStorage
expr: minio_cluster_capacity_usable_free < 100*1024*1024*1024 # 100GB
for: 2m
labels:
severity: critical
annotations:
summary: "MinIO cluster critically low on storage"
runbook_url: "https://docs.company.com/runbooks/minio-storage"

External Monitoring:

# Python monitoring script
import requests
import json
def check_minio_health():
metrics = requests.get('http://minio:9000/minio/v2/metrics/cluster')
# Parse and analyze metrics
return health_status

Documentation References

For comprehensive details on v3 metrics implementation:

Key Advantages

MinIO’s v3 metrics endpoint provides:

  • Enhanced observability - Improved metric granularity and categorization
  • Prometheus native - Industry-standard format with v3 enhancements
  • Real-time visibility - Sub-minute metric updates
  • Actionable insights - Metrics tied to operational decisions
  • Scalable monitoring - Efficient collection at any scale
  • V3 improvements - Enhanced metric organization and performance

The v3 metrics endpoint represents the latest evolution in MinIO’s monitoring capabilities, providing enhanced observability for enterprise deployments with improved performance and metric organization.

0