Understanding the operational metrics available through MinIO’s v3 metrics endpoint is essential for monitoring, alerting, and optimizing storage system performance and health.
This question covers:
- v3 metrics endpoint capabilities
- Available metric categories and types
- Prometheus integration via v3 API
- Operational monitoring best practices
Answer
MinIO’s v3 metrics endpoint provides comprehensive operational metrics via Prometheus-compatible format, offering detailed visibility into storage system performance, health, and resource utilization.
V3 Metrics Endpoint Overview
Endpoint Access:
# v3 metrics endpointcurl http://minio:9000/minio/v3/metrics/cluster
# Generate Prometheus configuration for v3mc admin prometheus metrics myminioCore System Metrics
Cluster Health:
minio_cluster_nodes_online- Number of online nodesminio_cluster_nodes_total- Total configured nodesminio_cluster_drive_errors_total- Drive error countminio_cluster_drives_online- Online drives countminio_cluster_drives_total- Total configured drives
Storage Capacity:
minio_cluster_capacity_usable_total- Total usable capacityminio_cluster_capacity_usable_free- Available free capacityminio_cluster_usage_object_total- Total object countminio_cluster_usage_buckets_total- Total bucket countminio_disk_storage_available- Per-disk available storageminio_disk_storage_total- Per-disk total storage
Performance Metrics
Request Statistics:
minio_http_requests_total- Total HTTP requests by methodminio_http_request_duration_seconds- Request duration histogramminio_http_requests_in_flight- Current concurrent requestsminio_http_traffic_sent_bytes- Outbound trafficminio_http_traffic_received_bytes- Inbound traffic
Operation Performance:
minio_s3_requests_total- S3 API requests by operationminio_s3_errors_total- S3 API errors by typeminio_s3_request_duration_seconds- S3 operation latencyminio_s3_time_to_first_byte_seconds- TTFB measurements
I/O Performance:
minio_disk_api_latency_microseconds- Disk operation latencyminio_disk_storage_used- Per-disk storage utilizationminio_inter_node_traffic_sent_bytes- Cluster replication trafficminio_inter_node_traffic_received_bytes- Cluster healing traffic
Advanced Operational Metrics
Healing and Recovery:
minio_heal_objects_total- Objects requiring healingminio_heal_objects_heal_total- Successfully healed objectsminio_heal_objects_error_total- Healing operation errorsminio_heal_time_last_activity_nano_seconds- Last healing activity
Cache Performance (AIStor):
minio_cache_hits_total- Cache hit countminio_cache_misses_total- Cache miss countminio_cache_data_served- Data served from cacheminio_cache_usage_percent- Cache utilization percentage
Security and Access:
minio_audit_failed_messages- Failed audit log messagesminio_audit_target_queue_length- Audit queue depthminio_iam_since_last_sync_millis- IAM sync freshnessminio_bucket_replication_failed_bytes- Replication failures
Resource Utilization
Memory and CPU:
minio_go_routine_total- Active goroutinesminio_process_resident_memory_bytes- Memory usageminio_process_cpu_total_seconds- CPU utilizationminio_process_io_read_bytes- Process I/O readminio_process_io_write_bytes- Process I/O write
Network Statistics:
minio_network_sent_bytes_total- Network bytes sentminio_network_received_bytes_total- Network bytes receivedminio_network_rpc_errors_total- RPC communication errors
Prometheus Configuration
V3 Metrics Endpoint:
# Access v3 metrics endpointcurl http://minio:9000/minio/v3/metrics/cluster
# Generate v3 Prometheus configurationmc admin prometheus metrics myminio
# Enable specific v3 metric typesmc admin prometheus metrics myminio --type cluster,node,bucketSample Prometheus Config for V3:
scrape_configs: - job_name: 'minio-v3' metrics_path: /minio/v3/metrics/cluster scheme: http static_configs: - targets: ['minio1:9000', 'minio2:9000', 'minio3:9000'] scrape_interval: 30s scrape_timeout: 10sKey Alert Rules
Critical System Alerts:
groups:- name: minio-alerts rules: # Node availability - alert: MinIONodeDown expr: minio_cluster_nodes_online < minio_cluster_nodes_total for: 5m annotations: summary: "MinIO cluster has offline nodes"
# Storage capacity - alert: MinIOStorageUsageHigh expr: (1 - minio_cluster_capacity_usable_free / minio_cluster_capacity_usable_total) > 0.85 for: 10m annotations: summary: "MinIO storage usage above 85%"
# Performance degradation - alert: MinIOHighLatency expr: histogram_quantile(0.95, minio_s3_request_duration_seconds) > 1 for: 5m annotations: summary: "MinIO request latency high"Grafana Dashboard Metrics
Essential Dashboard Panels:
-
Cluster Overview:
minio_cluster_nodes_online/minio_cluster_nodes_total * 100(Node health %)minio_cluster_capacity_usable_free(Available capacity)rate(minio_s3_requests_total[5m])(Request rate)
-
Performance Monitoring:
histogram_quantile(0.95, minio_s3_request_duration_seconds)(95th percentile latency)rate(minio_http_traffic_sent_bytes[5m])(Throughput)minio_http_requests_in_flight(Concurrent connections)
-
Resource Utilization:
minio_process_resident_memory_bytes(Memory usage)rate(minio_process_cpu_total_seconds[5m])(CPU usage)minio_disk_storage_used/minio_disk_storage_total * 100(Disk utilization)
Advanced Metrics Analysis
Capacity Planning:
# Predict capacity exhaustionpredict_linear(minio_cluster_capacity_usable_free[7d], 30*24*3600) < 0
# Growth rate analysisrate(minio_cluster_usage_object_total[1h]) * 24 * 30Performance Analysis:
# Request distribution by operationsum(rate(minio_s3_requests_total[5m])) by (api)
# Error rate by status codesum(rate(minio_s3_errors_total[5m])) by (error_code)
# Bandwidth utilization per nodesum(rate(minio_http_traffic_sent_bytes[5m])) by (server)Custom Metrics and Labels
Metric Labels:
server- MinIO server endpointapi- S3 API operation (GetObject, PutObject, etc.)bucket- Bucket name for bucket-specific metricserror_code- HTTP/S3 error codesdrive- Individual drive identifier
Business Metrics:
# Configure custom labelsmc admin prometheus generate myminio \ --include-labels "bucket,api,server"
# Add business contextmc admin config set myminio prometheus \ job_id="production-cluster" \ site="datacenter-1"Metrics Collection Best Practices
Scraping Configuration:
- Scrape interval: 30-60 seconds for most metrics
- High-frequency metrics: 15 seconds for performance critical
- Long-term storage: Use recording rules for aggregations
- Retention: Configure appropriate retention policies
Performance Impact:
- Metrics collection: <1% CPU overhead
- Network usage: ~10KB per scrape
- Recommended: Dedicated monitoring network
Integration Examples
Alertmanager Integration:
# Critical storage alert- alert: MinIOCriticalStorage expr: minio_cluster_capacity_usable_free < 100*1024*1024*1024 # 100GB for: 2m labels: severity: critical annotations: summary: "MinIO cluster critically low on storage" runbook_url: "https://docs.company.com/runbooks/minio-storage"External Monitoring:
# Python monitoring scriptimport requestsimport json
def check_minio_health(): metrics = requests.get('http://minio:9000/minio/v2/metrics/cluster') # Parse and analyze metrics return health_statusDocumentation References
For comprehensive details on v3 metrics implementation:
- Metrics and Alerts Guide - Complete monitoring setup and configuration
- mc admin prometheus metrics Reference - Command reference and metric types
Key Advantages
MinIO’s v3 metrics endpoint provides:
- Enhanced observability - Improved metric granularity and categorization
- Prometheus native - Industry-standard format with v3 enhancements
- Real-time visibility - Sub-minute metric updates
- Actionable insights - Metrics tied to operational decisions
- Scalable monitoring - Efficient collection at any scale
- V3 improvements - Enhanced metric organization and performance
The v3 metrics endpoint represents the latest evolution in MinIO’s monitoring capabilities, providing enhanced observability for enterprise deployments with improved performance and metric organization.