How to optimize MinIO for large file uploads and downloads?

Question

How do I optimize MinIO for handling large file uploads and downloads (multi-GB files)? What are the best practices for maximizing throughput and minimizing transfer times for large objects?

Answer

Optimizing MinIO for large file transfers requires configuration tuning at multiple levels: server, client, network, and application. Here’s a comprehensive guide to achieve maximum performance for large file operations.

1. Server-Side Optimization

MinIO Server Configuration

# /etc/minio/minio.conf - Optimized for large files

# Core settings
MINIO_ROOT_USER=minio-admin
MINIO_ROOT_PASSWORD=SecurePassword123!

# Performance settings
MINIO_API_WORKERS=16                    # Increase API workers
MINIO_API_REQUESTS_MAX=20000           # Increase max concurrent requests
MINIO_API_REQUESTS_DEADLINE=10m        # Longer deadline for large uploads

# Memory optimization
MINIO_API_MEMORY_LIMIT=32GB            # Increase memory limit
MINIO_SCANNER_SPEED=fastest            # Fastest scanning for metadata

# Compression (optional - may reduce throughput)
MINIO_COMPRESS=off                     # Disable for max throughput
# MINIO_COMPRESS_EXTENSIONS=".txt,.log,.csv"  # Only for specific types

# Caching (for repeated downloads)
MINIO_CACHE_DRIVES="/tmp/cache1,/tmp/cache2"
MINIO_CACHE_QUOTA=80                   # Use 80% of cache drive
MINIO_CACHE_AFTER=0                    # Cache immediately
MINIO_CACHE_WATERMARK_LOW=70           # Low watermark
MINIO_CACHE_WATERMARK_HIGH=90          # High watermark

# Healing optimization
MINIO_HEAL_SCAN_MODE=deep              # Comprehensive healing
MINIO_HEAL_MAX_SLEEP=1s                # Minimal heal delay

Systemd Service Optimization

[Unit]
Description=MinIO Object Storage Server
Documentation=https://min.io/docs/minio/linux/index.html
Wants=network-online.target
After=network-online.target
AssertFileIsExecutable=/usr/local/bin/minio

[Service]
WorkingDirectory=/usr/local/
User=minio-user
Group=minio-user

# Resource limits for large files
LimitNOFILE=1048576                    # Maximum file descriptors
LimitNPROC=1048576                     # Maximum processes
LimitCORE=infinity                     # Core dump size
LimitMEMLOCK=infinity                  # Memory lock limit

# Memory settings
MemoryAccounting=true
MemoryHigh=60G                         # High memory threshold
MemoryMax=64G                          # Maximum memory usage

# CPU settings
CPUAccounting=true
CPUQuota=800%                          # 8 cores maximum
CPUSchedulingPolicy=0                  # Normal scheduling
Nice=-10                               # Higher priority

# I/O settings
IOAccounting=true
IOSchedulingClass=1                    # Real-time I/O class
IOSchedulingPriority=4                 # High I/O priority
BlockIOAccounting=true

# Network settings
IPAccounting=true

EnvironmentFile=-/etc/minio/minio.conf
ExecStartPre=/bin/bash -c "if [ -z \"${MINIO_VOLUMES}\" ]; then echo \"Variable MINIO_VOLUMES not set\"; exit 1; fi"
ExecStart=/usr/local/bin/minio server $MINIO_OPTS $MINIO_VOLUMES

Restart=always
TimeoutStopSec=infinity
SendSIGKILL=no

[Install]
WantedBy=multi-user.target

2. Storage Infrastructure Optimization

Storage Configuration

# XFS filesystem optimization for large files
mkfs.xfs -f -i size=512 -d agcount=16 /dev/nvme0n1

# Mount options for performance
mount -o noatime,largeio,inode64,swalloc /dev/nvme0n1 /opt/minio/data1

# Add to /etc/fstab
echo "/dev/nvme0n1 /opt/minio/data1 xfs defaults,noatime,largeio,inode64,swalloc 0 2" >> /etc/fstab

# RAID optimization for multiple drives
mdadm --create --verbose /dev/md0 --level=0 --raid-devices=4 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1

I/O Scheduler Optimization

# Set I/O scheduler for NVMe drives
echo none > /sys/block/nvme0n1/queue/scheduler

# Optimize queue depth for high throughput
echo 1024 > /sys/block/nvme0n1/queue/nr_requests

# Increase readahead for large sequential reads
echo 4096 > /sys/block/nvme0n1/queue/read_ahead_kb

# Optimize for large I/O operations
echo 1 > /sys/block/nvme0n1/queue/nomerges

3. Network Optimization

System Network Tuning

# /etc/sysctl.conf - Network optimization for large transfers

# TCP buffer sizes
net.core.rmem_max = 268435456
net.core.wmem_max = 268435456
net.core.rmem_default = 67108864
net.core.wmem_default = 67108864
net.ipv4.tcp_rmem = 4096 87380 268435456
net.ipv4.tcp_wmem = 4096 65536 268435456

# TCP window scaling
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1

# TCP congestion control (BBR for high bandwidth)
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

# Network interface buffers
net.core.netdev_max_backlog = 30000
net.core.netdev_budget = 600

# Connection limits
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535

# TCP optimizations
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_moderate_rcvbuf = 1

# Apply changes
sysctl -p

Network Interface Optimization

# Increase network interface ring buffers
ethtool -G eth0 rx 4096 tx 4096

# Enable hardware offloading
ethtool -K eth0 gso on gro on tso on

# Set interrupt coalescing for throughput
ethtool -C eth0 adaptive-rx on adaptive-tx on

# Multi-queue networking
echo 16 > /sys/class/net/eth0/queues/rx-0/rps_cpus

4. Client-Side Optimization

MinIO Client (mc) Configuration

# Configure mc for large file transfers
mc config host add myminio http://minio.example.com:9000 ACCESS_KEY SECRET_KEY

# Set parallel uploads for large files
mc config set myminio api-signature-version v4
mc config set myminio multipart-threshold 128MB
mc config set myminio multipart-copy-threshold 128MB
mc config set myminio max-parts 10000

# Upload with optimized settings
mc cp --recursive --parallel 16 large-dataset/ myminio/bucket/

AWS CLI Optimization

[default]
region = us-east-1
output = json
max_concurrent_requests = 20
max_bandwidth = 1GB/s
multipart_threshold = 128MB
multipart_chunksize = 64MB
max_queue_size = 10000

# Use AWS CLI for large transfers
aws s3 cp large-file.bin s3://bucket/ \
  --endpoint-url http://minio.example.com:9000 \
  --cli-read-timeout 0 \
  --cli-write-timeout 0

5. Application-Level Optimization

Optimized Upload Implementation (Go)

package main

import (
    "context"
    "fmt"
    "io"
    "log"
    "os"
    "runtime"
    "sync"
    "time"

    "github.com/minio/minio-go/v7"
    "github.com/minio/minio-go/v7/pkg/credentials"
)

type OptimizedUploader struct {
    client       *minio.Client
    bucketName   string
    workers      int
    partSize     int64
    uploadQueue  chan UploadTask
    wg           sync.WaitGroup
}

type UploadTask struct {
    filePath   string
    objectName string
    fileSize   int64
}

func NewOptimizedUploader(endpoint, accessKey, secretKey, bucket string) *OptimizedUploader {
    client, err := minio.New(endpoint, &minio.Options{
        Creds:  credentials.NewStaticV4(accessKey, secretKey, ""),
        Secure: false,
    })
    if err != nil {
        log.Fatal(err)
    }

    return &OptimizedUploader{
        client:      client,
        bucketName:  bucket,
        workers:     runtime.NumCPU() * 4, // 4x CPU cores
        partSize:    128 * 1024 * 1024,    // 128MB parts
        uploadQueue: make(chan UploadTask, 1000),
    }
}

func (u *OptimizedUploader) UploadLargeFile(filePath, objectName string) error {
    file, err := os.Open(filePath)
    if err != nil {
        return err
    }
    defer file.Close()

    stat, err := file.Stat()
    if err != nil {
        return err
    }

    fileSize := stat.Size()

    options := minio.PutObjectOptions{
        PartSize:           uint64(u.partSize),
        ContentType:        "application/octet-stream",
        SendContentMd5:     true,
        DisableContentSha256: true, // Disable for performance
        ConcurrentStreamParts: true,
        NumThreads:         uint(u.workers),
    }

    // Custom reader with buffer optimization
    bufferedReader := &BufferedReader{
        reader:     file,
        bufferSize: int(u.partSize),
    }

    start := time.Now()
    _, err = u.client.PutObject(
        context.Background(),
        u.bucketName,
        objectName,
        bufferedReader,
        fileSize,
        options,
    )

    if err != nil {
        return err
    }

    duration := time.Since(start)
    throughput := float64(fileSize) / duration.Seconds() / (1024 * 1024) // MB/s

    fmt.Printf("Upload completed: %s (%.2f MB/s)\n", objectName, throughput)
    return nil
}

type BufferedReader struct {
    reader     io.Reader
    buffer     []byte
    bufferSize int
}

func (br *BufferedReader) Read(p []byte) (n int, err error) {
    if br.buffer == nil {
        br.buffer = make([]byte, br.bufferSize)
    }

    return br.reader.Read(p)
}

// Concurrent upload manager
func (u *OptimizedUploader) StartWorkers() {
    for i := 0; i < u.workers; i++ {
        go u.worker()
    }
}

func (u *OptimizedUploader) worker() {
    for task := range u.uploadQueue {
        err := u.UploadLargeFile(task.filePath, task.objectName)
        if err != nil {
            log.Printf("Upload failed for %s: %v", task.filePath, err)
        }
        u.wg.Done()
    }
}

func (u *OptimizedUploader) QueueUpload(filePath, objectName string) {
    stat, err := os.Stat(filePath)
    if err != nil {
        log.Printf("Failed to stat file %s: %v", filePath, err)
        return
    }

    task := UploadTask{
        filePath:   filePath,
        objectName: objectName,
        fileSize:   stat.Size(),
    }

    u.wg.Add(1)
    u.uploadQueue <- task
}

func (u *OptimizedUploader) WaitForCompletion() {
    u.wg.Wait()
    close(u.uploadQueue)
}

// Usage example
func main() {
    uploader := NewOptimizedUploader(
        "minio.example.com:9000",
        "access-key",
        "secret-key",
        "large-files",
    )

    uploader.StartWorkers()

    // Queue multiple large files
    uploader.QueueUpload("/path/to/large-file-1.bin", "file-1.bin")
    uploader.QueueUpload("/path/to/large-file-2.bin", "file-2.bin")
    uploader.QueueUpload("/path/to/large-file-3.bin", "file-3.bin")

    uploader.WaitForCompletion()
    fmt.Println("All uploads completed")
}

Python Implementation with Optimization

import os
import threading
import queue
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from minio import Minio
from minio.error import S3Error

class OptimizedMinIOUploader:
    def __init__(self, endpoint, access_key, secret_key, bucket_name):
        self.client = Minio(
            endpoint,
            access_key=access_key,
            secret_key=secret_key,
            secure=False
        )
        self.bucket_name = bucket_name
        self.part_size = 128 * 1024 * 1024  # 128MB
        self.max_workers = os.cpu_count() * 4

    def upload_large_file(self, file_path, object_name=None):
        """Upload a large file with optimized settings"""
        if object_name is None:
            object_name = os.path.basename(file_path)

        file_size = os.path.getsize(file_path)

        start_time = time.time()

        try:
            with open(file_path, 'rb') as file_data:
                self.client.put_object(
                    self.bucket_name,
                    object_name,
                    file_data,
                    file_size,
                    part_size=self.part_size,
                    num_parallel_uploads=self.max_workers // 2,
                    progress=self._progress_callback
                )

            end_time = time.time()
            duration = end_time - start_time
            throughput = (file_size / (1024 * 1024)) / duration  # MB/s

            print(f"Upload completed: {object_name} ({throughput:.2f} MB/s)")
            return True

        except S3Error as e:
            print(f"Upload failed for {object_name}: {e}")
            return False

    def _progress_callback(self, bytes_uploaded):
        """Progress callback for monitoring uploads"""
        pass  # Implement progress tracking if needed

    def upload_multiple_files(self, file_list, max_concurrent=None):
        """Upload multiple files concurrently"""
        if max_concurrent is None:
            max_concurrent = min(self.max_workers, len(file_list))

        results = []

        with ThreadPoolExecutor(max_workers=max_concurrent) as executor:
            future_to_file = {
                executor.submit(self.upload_large_file, file_path): file_path
                for file_path in file_list
            }

            for future in as_completed(future_to_file):
                file_path = future_to_file[future]
                try:
                    result = future.result()
                    results.append((file_path, result))
                except Exception as e:
                    print(f"Error uploading {file_path}: {e}")
                    results.append((file_path, False))

        return results

    def resume_upload(self, file_path, object_name, upload_id):
        """Resume an interrupted multipart upload"""
        try:
            # List existing parts
            parts = self.client.list_parts(self.bucket_name, object_name, upload_id)
            uploaded_parts = [(part.part_number, part.etag) for part in parts]

            # Continue upload from last part
            with open(file_path, 'rb') as file_data:
                result = self.client.put_object(
                    self.bucket_name,
                    object_name,
                    file_data,
                    os.path.getsize(file_path),
                    part_size=self.part_size,
                    sse=None,
                    progress=self._progress_callback,
                    metadata=None,
                    tags=None,
                    retention=None,
                    legal_hold=False,
                    part_number_start=len(uploaded_parts) + 1
                )

            return result

        except S3Error as e:
            print(f"Resume upload failed: {e}")
            return None

# Usage example
if __name__ == "__main__":
    uploader = OptimizedMinIOUploader(
        "minio.example.com:9000",
        "access-key",
        "secret-key",
        "large-files"
    )

    # Upload single large file
    uploader.upload_large_file("/path/to/large-file.bin")

    # Upload multiple files concurrently
    files = [
        "/path/to/file1.bin",
        "/path/to/file2.bin",
        "/path/to/file3.bin"
    ]
    results = uploader.upload_multiple_files(files, max_concurrent=8)

6. Load Balancer Optimization

HAProxy Configuration for Large Files

global
    daemon
    maxconn 40000

    # Buffer optimization for large files
    tune.bufsize 65536
    tune.maxrewrite 8192
    tune.http.maxhdr 200

defaults
    mode http
    option httplog
    option dontlognull
    retries 3

    # Timeout optimization for large transfers
    timeout connect 10s
    timeout client 3600s      # 1 hour for large uploads
    timeout server 3600s      # 1 hour for large uploads
    timeout http-request 300s  # 5 minutes for request headers
    timeout http-keep-alive 10s

    # Connection optimization
    option http-server-close
    option tcp-smart-accept
    option tcp-smart-connect

frontend minio_frontend
    bind *:9000

    # Connection limits
    maxconn 10000

    # Request size limits (disable for large files)
    # option http-buffer-request - Commented out for streaming

    default_backend minio_backend

backend minio_backend
    balance roundrobin

    # Health checks
    option httpchk GET /minio/health/live
    http-check expect status 200

    # Server configuration
    server minio1 10.0.1.11:9000 check maxconn 2500 weight 100
    server minio2 10.0.1.12:9000 check maxconn 2500 weight 100
    server minio3 10.0.1.13:9000 check maxconn 2500 weight 100
    server minio4 10.0.1.14:9000 check maxconn 2500 weight 100

    # Connection pooling
    http-reuse aggressive

7. Monitoring Large File Transfers

Performance Monitoring Script

#!/bin/bash
MINIO_ALIAS="myminio"
BUCKET="large-files"
TEST_FILE="/tmp/test-large-file.bin"
TEST_SIZE="1G"

echo "=== MinIO Large File Performance Test ==="
echo "Timestamp: $(date)"
echo

# Create test file
echo "Creating test file (${TEST_SIZE})..."
dd if=/dev/zero of=$TEST_FILE bs=1M count=1024 status=progress 2>/dev/null

# Upload test
echo "Testing upload performance..."
UPLOAD_START=$(date +%s.%N)
mc cp $TEST_FILE $MINIO_ALIAS/$BUCKET/test-upload.bin
UPLOAD_END=$(date +%s.%N)
UPLOAD_TIME=$(echo "$UPLOAD_END - $UPLOAD_START" | bc)
UPLOAD_SPEED=$(echo "scale=2; 1024 / $UPLOAD_TIME" | bc)

echo "Upload completed: ${UPLOAD_TIME}s (${UPLOAD_SPEED} MB/s)"

# Download test
echo "Testing download performance..."
DOWNLOAD_START=$(date +%s.%N)
mc cp $MINIO_ALIAS/$BUCKET/test-upload.bin /tmp/test-download.bin
DOWNLOAD_END=$(date +%s.%N)
DOWNLOAD_TIME=$(echo "$DOWNLOAD_END - $DOWNLOAD_START" | bc)
DOWNLOAD_SPEED=$(echo "scale=2; 1024 / $DOWNLOAD_TIME" | bc)

echo "Download completed: ${DOWNLOAD_TIME}s (${DOWNLOAD_SPEED} MB/s)"

# Cleanup
rm -f $TEST_FILE /tmp/test-download.bin
mc rm $MINIO_ALIAS/$BUCKET/test-upload.bin

# System metrics during test
echo ""
echo "=== System Metrics ==="
echo "CPU Usage: $(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | awk -F'%' '{print $1}')%"
echo "Memory Usage: $(free | awk 'NR==2{printf "%.1f%%", $3*100/$2 }')"
echo "Network Connections: $(netstat -an | grep :9000 | wc -l)"
echo "Disk I/O: $(iostat -x 1 1 | tail -n +4 | awk '{sum+=$10} END {printf "%.1f%%", sum/NR}')"

8. Performance Tuning Checklist

Server Configuration

✅ Increase API workers and request limits
✅ Optimize memory allocation
✅ Configure appropriate part sizes
✅ Disable compression for maximum throughput
✅ Use fast storage (NVMe SSDs)

Network Configuration

✅ Increase TCP buffer sizes
✅ Enable BBR congestion control
✅ Optimize network interface settings
✅ Use 10Gbps+ network connections

Client Configuration

✅ Use appropriate multipart thresholds
✅ Enable concurrent transfers
✅ Optimize part sizes for your network
✅ Use connection pooling

Storage Configuration

✅ Use XFS filesystem with optimized mount options
✅ Set I/O scheduler to ‘none’ for NVMe
✅ Increase read-ahead buffers
✅ Use RAID 0 for maximum throughput

9. Expected Performance Targets

File Size	Network	Expected Throughput	Optimization Focus
100MB-1GB	1Gbps	100-120 MB/s	Part size, concurrency
1-10GB	10Gbps	800-1200 MB/s	Network buffers, I/O
10GB+	10Gbps+	1-5 GB/s	Storage, parallelism

10. Troubleshooting Performance Issues

Common Bottlenecks

Small part sizes: Increase to 64-128MB for large files
Network congestion: Monitor bandwidth utilization
Storage latency: Check disk I/O metrics
CPU limitations: Monitor CPU usage during transfers
Memory pressure: Ensure adequate RAM for buffers

Performance Debugging Commands

# Monitor network throughput
iftop -i eth0

# Monitor disk I/O
iotop -ao

# Monitor MinIO metrics
mc admin prometheus metrics myminio

# Check multipart upload status
mc admin trace myminio --verbose --all

# Monitor system performance
htop
vmstat 1
iostat -x 1

This comprehensive optimization guide will help you achieve maximum performance for large file transfers with MinIO, ensuring efficient utilization of your storage and network infrastructure.

Question

Answer

1. Server-Side Optimization

MinIO Server Configuration

Systemd Service Optimization

2. Storage Infrastructure Optimization

Storage Configuration

I/O Scheduler Optimization

3. Network Optimization

System Network Tuning

Network Interface Optimization

4. Client-Side Optimization

MinIO Client (mc) Configuration

AWS CLI Optimization

5. Application-Level Optimization

Optimized Upload Implementation (Go)

Python Implementation with Optimization

6. Load Balancer Optimization

HAProxy Configuration for Large Files

7. Monitoring Large File Transfers

Performance Monitoring Script

8. Performance Tuning Checklist

Server Configuration

Network Configuration

Client Configuration

Storage Configuration

9. Expected Performance Targets

10. Troubleshooting Performance Issues

Common Bottlenecks

Performance Debugging Commands

Related Resources