The AIStor Inventory API is a bucket-level object metadata reporting system that generates comprehensive inventory reports listing objects in a bucket. It follows the AWS S3 Inventory model while extending it with AIStor-specific capabilities such as Parquet and Iceberg output, advanced filtering, and distributed job orchestration across cluster nodes.
Answer
The Inventory API enables you to generate on-demand or scheduled reports about every object in a bucket — including metadata like size, ETag, tags, encryption status, replication state, and more. Reports are written in CSV, JSON, Parquet, or Iceberg format to a destination bucket of your choice.
This is an enterprise feature available in AIStor. All operations are managed through mc inventory commands or the S3-compatible REST API.
Use Cases
Compliance and Auditing
Generate periodic inventory snapshots to prove regulatory compliance (GDPR, HIPAA, SOX). Track which objects exist, their encryption status, object lock settings, and retention dates.
schedule: dailyincludeFields: - EncryptionStatus - ObjectLockMode - ObjectLockRetainUntilDate - ObjectLockLegalHoldStatusStorage Cost Optimization
Identify large, old, or rarely accessed objects for tiering or deletion. Filter by size and age to find optimization targets.
filters: size: greaterThan: 100MiB lastModified: olderThan: 90dincludeFields: - StorageClass - Tier - TieringStatus - AccessTimeData Lifecycle Management
Discover objects with excessive versions, stale delete markers, or multipart uploads that need cleanup.
versions: allfilters: versionsCount: greaterThan: 10includeFields: - IsDeleteMarker - IsLatest - IsMultipartAnalytics and Data Catalog Integration
Export Parquet or Iceberg reports directly into your data lakehouse for ad-hoc analytics with tools like Spark, Trino, or DuckDB.
destination: format: parquet compression: onincludeFields: - Tags - UserMetadata - ReplicationStatusReplication Verification
Audit replication status across buckets to ensure data redundancy policies are being met.
includeFields: - ReplicationStatusfilters: prefix: - "critical-data/"Quick Start
Step 1: Generate a Configuration Template
mc inventory generate myminio/my-source-bucket my-job > config.ymlStep 2: Edit the Configuration
A minimal configuration:
apiVersion: v1id: my-jobdestination: bucket: my-dest-bucket prefix: inventory-reports/ format: csv compression: onschedule: onceversions: currentStep 3: Submit the Configuration
mc inventory put myminio/my-source-bucket config.ymlThe job starts processing immediately after submission.
Step 4: Monitor Progress
# One-shot status checkmc inventory status myminio/my-source-bucket my-job
# Live watch mode with auto-refreshmc inventory status --watch myminio/my-source-bucket my-jobStatus output includes: job state, objects scanned/matched, records written, execution time, manifest path, and error details.
Step 5: Access the Report
mc ls -r myminio/my-dest-bucket/inventory-reports/Output folder structure:
inventory-reports/ my-source-bucket/ my-job/ 2026-02-23T10-30Z/ files/ file-001.csv.zst file-002.csv.zst manifest.jsonDownload and decompress:
mc cp myminio/my-dest-bucket/inventory-reports/.../file-001.csv.zst ./zstd -d file-001.csv.zstAPI Reference
S3-Compatible Endpoints
All endpoints use the ?minio-inventory query parameter on the bucket path.
| Operation | Method | Query Parameters | IAM Permission |
|---|---|---|---|
| Generate template | GET | ?minio-inventory&id={id}&generate | s3:PutInventoryConfiguration |
| Put config | PUT | ?minio-inventory&id={id} | s3:PutInventoryConfiguration |
| Get config | GET | ?minio-inventory&id={id} | s3:GetInventoryConfiguration |
| Delete config | DELETE | ?minio-inventory&id={id} | s3:PutInventoryConfiguration |
| List configs | GET | ?minio-inventory&continuation-token={token} | s3:GetInventoryConfiguration |
| Get job status | GET | ?minio-inventory&id={id}&status | s3:GetInventoryConfiguration |
Admin Control Endpoints
These require the admin:InventoryControl permission.
| Operation | Method | Path |
|---|---|---|
| Cancel job | POST | /minio/admin/v3/inventory/{bucket}/{id}/cancel |
| Suspend job | POST | /minio/admin/v3/inventory/{bucket}/{id}/suspend |
| Resume job | POST | /minio/admin/v3/inventory/{bucket}/{id}/resume |
mc Commands
| Command | Purpose |
|---|---|
mc inventory generate ALIAS/BUCKET ID | Generate a YAML configuration template |
mc inventory put ALIAS/BUCKET FILE | Create or replace an inventory configuration |
mc inventory get ALIAS/BUCKET ID | Retrieve an existing configuration |
mc inventory list ALIAS/BUCKET | List all configurations for a bucket |
mc inventory list --all ALIAS | List all configurations across all buckets |
mc inventory status ALIAS/BUCKET ID | Check job status |
mc inventory status --watch ALIAS/BUCKET ID | Live-watch job progress |
mc inventory status --all ALIAS | Show status of all jobs across all buckets |
mc inventory delete ALIAS/BUCKET ID | Delete a configuration |
mc inventory cancel ALIAS/BUCKET ID | Cancel the current execution |
mc inventory suspend ALIAS/BUCKET ID | Suspend job and pause schedule |
mc inventory resume ALIAS/BUCKET ID | Resume a suspended job |
mc inventory migrate-from-batch FILE ID | Convert batch catalog YAML to inventory format |
Configuration Reference
YAML Structure
apiVersion: v1 # Required, must be "v1"id: my-inventory-job # Required, 1-64 chars [a-zA-Z0-9._-]
destination: bucket: dest-bucket # Required prefix: reports/ # Optional format: csv # csv | json | parquet | iceberg (default: csv) compression: on # on | off (default: on)
schedule: once # once | hourly | daily | weekly | monthly | yearlymode: fast # fast | strict (default: fast)versions: all # all | current (default: all)
includeFields: # Optional additional fields - ETag - Tags - UserMetadata - AccessTime
filters: # Optional object filters prefix: ["videos/"] lastModified: olderThan: 30d size: greaterThan: 1MiB name: match: "*.mp4" tags: and: - key: project valueString: match: "ares-*"Output Formats
| Format | Compression | Best For |
|---|---|---|
| CSV | ZSTD | Human-readable reports, spreadsheet import |
| JSON (NDJSON) | ZSTD | Programmatic processing, streaming |
| Parquet | Snappy | Analytics engines (Spark, Trino, DuckDB) |
| Iceberg | — | Data lakehouse integration via catalog |
Default Fields (Always Included)
Bucket, Key, SequenceNumber, Size, LastModifiedDate
When versions: all, also: VersionID, IsDeleteMarker, IsLatest
Optional Fields
| Field | Description |
|---|---|
ETag | Object’s ETag |
StorageClass | Storage class |
IsMultipart | Multipart upload indicator |
EncryptionStatus | Server-side encryption status |
IsBucketKeyEnabled | Bucket key encryption enabled |
KmsKeyArn | KMS key ARN |
ChecksumAlgorithm | Checksum algorithm used |
Tags | Object tags (query string format) |
UserMetadata | User-defined metadata (query string format) |
AccessTime | Last access time |
ReplicationStatus | Replication status |
ObjectLockRetainUntilDate | Object lock retention date |
ObjectLockMode | Lock mode (GOVERNANCE/COMPLIANCE) |
ObjectLockLegalHoldStatus | Legal hold status (on/off) |
Tier | Storage tier |
TieringStatus | Tiering status |
Schedule Options
| Schedule | Behavior |
|---|---|
once | Runs one time immediately (default) |
hourly | Runs at the beginning of every hour after previous completion |
daily | Runs at midnight UTC the day after previous completion |
weekly | Runs on Sunday at midnight UTC following previous completion |
monthly | Runs on the first Sunday of the month following previous completion |
yearly | Runs on the first Sunday of January following previous completion |
Periodic schedules are based on completion time, not fixed calendar intervals. The scheduler detects missed windows and queues jobs accordingly.
Filtering Capabilities
Prefix filtering — include only objects under specific prefixes:
filters: prefix: - "videos/" - "images/"Age filtering — relative durations or absolute timestamps:
filters: lastModified: olderThan: 30d # Relative duration after: "2025-01-01T00:00:00Z" # Absolute timestampSize filtering — human-readable units:
filters: size: greaterThan: 10MiB lessThan: 1GiBName filtering — glob, substring, or regex:
filters: name: match: "archive-*.zip" # Glob contains: "backup" # Substring regex: "report-\\d{4}" # RegexTag/metadata filtering — combine with AND/OR logic:
filters: tags: and: - key: project valueString: match: "ares-*" - key: status valueString: contains: "complete"Architecture
Scheduler-Executor Model
The Inventory system uses a distributed Scheduler-Executor architecture designed for reliable, scalable operation across multi-node clusters.
┌─────────────────────────────────────────────────────┐│ Cluster ││ ││ ┌─────────────┐ ││ │ Scheduler │ Singleton (leader-elected) ││ │ Runs: 15min │ Scans configs, creates Schedule ││ └──────┬──────┘ ││ │ writes Schedule to .minio.sys ││ ▼ ││ ┌─────────────────────────────────────────────┐ ││ │ Schedule Object │ ││ │ • PendingJobs │ ││ │ • LockExpiredJobs │ ││ │ • ReadyToRetryJobs │ ││ └──────┬──────────┬──────────┬───────────┘ ││ │ │ │ ││ ▼ ▼ ▼ ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ Executor │ │ Executor │ │ Executor │ ││ │ Node 1 │ │ Node 2 │ │ Node N │ ││ │ Polls:2m │ │ Polls:2m │ │ Polls:2m │ ││ │ Max: 5 │ │ Max: 5 │ │ Max: 5 │ ││ └──────────┘ └──────────┘ └──────────┘ ││ │└─────────────────────────────────────────────────────┘Scheduler — a cluster-wide singleton (leader-elected via distributed lock) that runs every 15 minutes[1]. It scans all inventory configurations, detects jobs that are due, and produces a centralized Schedule stored in .minio.sys/inventory/__schedule__.bin.
Executor — runs on every node in the cluster, polling the Schedule every 2 minutes[2] (with ±20% jitter to prevent thundering herd). Each executor can run up to 5 concurrent jobs[3]. Jobs are claimed via ETag-based optimistic locking — if two nodes try to claim the same job, only one succeeds.
Job Lifecycle
Pending ──► Running ──► Completed (once jobs) ──► Sleeping (periodic jobs) ──► Errored ──► Pending (retry, up to 10 attempts) ──► Failed (max retries exceeded)Control operations available at any point:
- Cancel — stops current execution; periodic jobs continue their schedule
- Suspend — stops execution AND pauses the schedule until resumed
- Resume — restores execution and reactivates the schedule
Distributed Lock Mechanism
- Lock lease duration: 30 minutes[4], refreshed every 10 seconds during execution
- Lock expiry detection uses a 150% buffer (45 minutes for 30-minute locks)
- If a node crashes, the lock expires and another node picks up the job automatically
Fast vs Strict Mode
| Mode | Disk Reads | Speed | Consistency |
|---|---|---|---|
| fast (default) | Single disk[5] | Faster | Objects modified during scan may be missed |
| strict | Optimal disk set | Slower | Higher consistency for concurrent writes |
Both modes may miss objects modified during the scan. Use strict when accuracy is more important than speed.
Operational Considerations
IAM Permissions
Two permission sets control inventory access:
S3 permissions (for configuration management):
{ "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Action": [ "s3:PutInventoryConfiguration", "s3:GetInventoryConfiguration" ], "Resource": ["arn:aws:s3:::my-bucket"] }]}Admin permission (for job control operations — cancel, suspend, resume):
{ "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Action": ["admin:InventoryControl"], "Resource": ["arn:aws:s3:::*"] }]}A read-only inventory user needs only s3:GetInventoryConfiguration to view configurations and job status.
Monitoring with Prometheus
Inventory metrics are available at /minio/metrics/v3/inventory.
Cluster-level metrics:
| Metric | Description |
|---|---|
minio_inventory_jobs_completed_count | Total completed jobs |
minio_inventory_jobs_active_count | Currently running jobs |
minio_inventory_jobs_failed_count | Total failed jobs |
minio_inventory_jobs_queued_count | Jobs waiting for execution |
minio_inventory_objects_scanned_count | Total objects scanned |
minio_inventory_bytes_scanned_count | Total bytes scanned |
minio_inventory_total_configs | Total inventory configurations |
minio_inventory_running_jobs | Currently running jobs |
Node-level metrics:
| Metric | Description |
|---|---|
minio_inventory_node_running_jobs | Jobs running on this node |
minio_inventory_node_pending_jobs | Jobs pending on this node |
minio_inventory_node_job_execution_errors | Execution errors on this node |
Manifest Files
Each execution writes an AWS S3-compatible manifest with a MinIO extension:
{ "sourceBucket": "my-bucket", "destinationBucket": "dest-bucket", "version": "2016-11-30", "fileFormat": "CSV (ZSTD compressed)", "fileSchema": "Bucket,Key,Size,LastModifiedDate,...", "files": [ {"key": "...", "size": 1024, "MD5checksum": "abc123"} ], "minioExtension": { "status": "completed", "scannedObjects": 12500, "matchedObjects": 8300, "partialResultsAvailable": false }}The minioExtension field is optional and ignored by AWS S3-compatible tools. AIStor-aware consumers can use it to distinguish completed from canceled/suspended jobs and to check for partial results.
Automatic Recovery
The system includes several self-healing mechanisms:
- Corrupt metadata recovery — the scheduler detects orphaned or missing metadata and automatically cleans up or recreates job state
- Panic recovery — scheduler, executor, and individual job panics are caught, logged with stack traces, and automatically restarted after a 1-minute backoff
- Lock expiry recovery — if a node crashes mid-execution, the lock expires and another node picks up the job
- Retry logic — failed jobs are retried up to 10 times[6] with a 10-minute[7] delay between attempts
Performance Tuning
| Parameter | Default | Impact |
|---|---|---|
| Concurrent jobs per node | 5[3] | More concurrency = higher throughput but more resource consumption |
| Max records per output file | 1,000,000[8] | Larger files = fewer files but more memory during processing |
| Record batch size | 200 | Records buffered before writing |
| Metrics reporting interval | Every 1,000 objects | Scanned count updates |
Recommendations:
- Use fast mode for large-scale inventory where slight inconsistency is acceptable
- Use Parquet format for analytics workloads — columnar storage enables efficient queries
- Apply prefix filters to narrow scope and reduce execution time
- Schedule during off-peak hours for periodic jobs on resource-constrained clusters
- The inventory system does not use or depend on the data scanner — it uses
ObjectLayer.Walk()directly for listing
Migrating from Batch Catalog
If you have existing batch catalog YAML files (apiVersion v2), convert them to inventory format:
mc inventory migrate-from-batch batch-job.yaml my-inventory-id > inventory-job.yamlmc inventory put myminio/my-bucket inventory-job.yamlThe migration converts apiVersion: v2 to v1, removes the bucket field (specified via the API endpoint instead), and adds the inventory id field. All YAML comments are preserved.
Common Scenarios
Stop a Runaway Job
mc inventory status myminio/my-bucket my-jobmc inventory cancel myminio/my-bucket my-jobFor periodic jobs, this only stops the current execution — future runs continue. To fully stop:
mc inventory suspend myminio/my-bucket my-jobPause for Maintenance
# Suspend before maintenancemc inventory suspend myminio/bucket1 job1mc inventory suspend myminio/bucket2 job2
# Perform maintenance...
# Resume after maintenancemc inventory resume myminio/bucket1 job1mc inventory resume myminio/bucket2 job2Update a Running Job’s Configuration
# Delete the old configuration (running job stops within 10 seconds)mc inventory delete myminio/my-bucket my-job
# Create new configuration with same IDmc inventory put myminio/my-bucket updated-config.ymlView All Inventory Jobs Across Cluster
# List all configurations across all bucketsmc inventory list --all myminio
# Watch all jobs in real-timemc inventory status --watch --all myminioSource Code References
internal/inventory/system-params.go:239-defaultSchedulerInterval = 15 * time.Minuteinternal/inventory/system-params.go:240-defaultExecutorInterval = 2 * time.Minutecmd/inventory.go:43-maxConcurrentInventoryJobs = 5internal/inventory/system-params.go:243-defaultLockDuration = 30 * time.Minutecmd/inventory.go:477-479- fast mode:askDisks = "disk", strict mode:askDisks = "optimal"internal/inventory/system-params.go:244-defaultMaxRetryAttempts = uint8(10)internal/inventory/system-params.go:242-defaultRetryOnErrorDelay = 10 * time.Minuteinternal/inventory/system-params.go:245-defaultInventoryMaxRecordsPerFile = 1_000_000