How do I use the AIStor Inventory API to generate object reports?

The AIStor Inventory API is a bucket-level object metadata reporting system that generates comprehensive inventory reports listing objects in a bucket. It follows the AWS S3 Inventory model while extending it with AIStor-specific capabilities such as Parquet and Iceberg output, advanced filtering, and distributed job orchestration across cluster nodes.

Answer

The Inventory API enables you to generate on-demand or scheduled reports about every object in a bucket — including metadata like size, ETag, tags, encryption status, replication state, and more. Reports are written in CSV, JSON, Parquet, or Iceberg format to a destination bucket of your choice.

This is an enterprise feature available in AIStor. All operations are managed through mc inventory commands or the S3-compatible REST API.

Use Cases

Compliance and Auditing

Generate periodic inventory snapshots to prove regulatory compliance (GDPR, HIPAA, SOX). Track which objects exist, their encryption status, object lock settings, and retention dates.

schedule: daily
includeFields:
  - EncryptionStatus
  - ObjectLockMode
  - ObjectLockRetainUntilDate
  - ObjectLockLegalHoldStatus

Storage Cost Optimization

Identify large, old, or rarely accessed objects for tiering or deletion. Filter by size and age to find optimization targets.

filters:
  size:
    greaterThan: 100MiB
  lastModified:
    olderThan: 90d
includeFields:
  - StorageClass
  - Tier
  - TieringStatus
  - AccessTime

Data Lifecycle Management

Discover objects with excessive versions, stale delete markers, or multipart uploads that need cleanup.

versions: all
filters:
  versionsCount:
    greaterThan: 10
includeFields:
  - IsDeleteMarker
  - IsLatest
  - IsMultipart

Analytics and Data Catalog Integration

Export Parquet or Iceberg reports directly into your data lakehouse for ad-hoc analytics with tools like Spark, Trino, or DuckDB.

destination:
  format: parquet
  compression: on
includeFields:
  - Tags
  - UserMetadata
  - ReplicationStatus

Replication Verification

Audit replication status across buckets to ensure data redundancy policies are being met.

includeFields:
  - ReplicationStatus
filters:
  prefix:
    - "critical-data/"

Quick Start

Step 1: Generate a Configuration Template

mc inventory generate myminio/my-source-bucket my-job > config.yml

Step 2: Edit the Configuration

A minimal configuration:

apiVersion: v1
id: my-job
destination:
  bucket: my-dest-bucket
  prefix: inventory-reports/
  format: csv
  compression: on
schedule: once
versions: current

Step 3: Submit the Configuration

mc inventory put myminio/my-source-bucket config.yml

The job starts processing immediately after submission.

Step 4: Monitor Progress

# One-shot status check
mc inventory status myminio/my-source-bucket my-job

# Live watch mode with auto-refresh
mc inventory status --watch myminio/my-source-bucket my-job

Status output includes: job state, objects scanned/matched, records written, execution time, manifest path, and error details.

Step 5: Access the Report

mc ls -r myminio/my-dest-bucket/inventory-reports/

Output folder structure:

inventory-reports/
  my-source-bucket/
    my-job/
      2026-02-23T10-30Z/
        files/
          file-001.csv.zst
          file-002.csv.zst
        manifest.json

Download and decompress:

mc cp myminio/my-dest-bucket/inventory-reports/.../file-001.csv.zst ./
zstd -d file-001.csv.zst

API Reference

S3-Compatible Endpoints

All endpoints use the ?minio-inventory query parameter on the bucket path.

Operation	Method	Query Parameters	IAM Permission
Generate template	`GET`	`?minio-inventory&id={id}&generate`	`s3:PutInventoryConfiguration`
Put config	`PUT`	`?minio-inventory&id={id}`	`s3:PutInventoryConfiguration`
Get config	`GET`	`?minio-inventory&id={id}`	`s3:GetInventoryConfiguration`
Delete config	`DELETE`	`?minio-inventory&id={id}`	`s3:PutInventoryConfiguration`
List configs	`GET`	`?minio-inventory&continuation-token={token}`	`s3:GetInventoryConfiguration`
Get job status	`GET`	`?minio-inventory&id={id}&status`	`s3:GetInventoryConfiguration`

Admin Control Endpoints

These require the admin:InventoryControl permission.

Operation	Method	Path
Cancel job	`POST`	`/minio/admin/v3/inventory/{bucket}/{id}/cancel`
Suspend job	`POST`	`/minio/admin/v3/inventory/{bucket}/{id}/suspend`
Resume job	`POST`	`/minio/admin/v3/inventory/{bucket}/{id}/resume`

mc Commands

Command	Purpose
`mc inventory generate ALIAS/BUCKET ID`	Generate a YAML configuration template
`mc inventory put ALIAS/BUCKET FILE`	Create or replace an inventory configuration
`mc inventory get ALIAS/BUCKET ID`	Retrieve an existing configuration
`mc inventory list ALIAS/BUCKET`	List all configurations for a bucket
`mc inventory list --all ALIAS`	List all configurations across all buckets
`mc inventory status ALIAS/BUCKET ID`	Check job status
`mc inventory status --watch ALIAS/BUCKET ID`	Live-watch job progress
`mc inventory status --all ALIAS`	Show status of all jobs across all buckets
`mc inventory delete ALIAS/BUCKET ID`	Delete a configuration
`mc inventory cancel ALIAS/BUCKET ID`	Cancel the current execution
`mc inventory suspend ALIAS/BUCKET ID`	Suspend job and pause schedule
`mc inventory resume ALIAS/BUCKET ID`	Resume a suspended job
`mc inventory migrate-from-batch FILE ID`	Convert batch catalog YAML to inventory format

Configuration Reference

YAML Structure

apiVersion: v1                # Required, must be "v1"
id: my-inventory-job          # Required, 1-64 chars [a-zA-Z0-9._-]

destination:
  bucket: dest-bucket         # Required
  prefix: reports/            # Optional
  format: csv                 # csv | json | parquet | iceberg (default: csv)
  compression: on             # on | off (default: on)

schedule: once                # once | hourly | daily | weekly | monthly | yearly
mode: fast                    # fast | strict (default: fast)
versions: all                 # all | current (default: all)

includeFields:                # Optional additional fields
  - ETag
  - Tags
  - UserMetadata
  - AccessTime

filters:                      # Optional object filters
  prefix: ["videos/"]
  lastModified:
    olderThan: 30d
  size:
    greaterThan: 1MiB
  name:
    match: "*.mp4"
  tags:
    and:
      - key: project
        valueString:
          match: "ares-*"

Output Formats

Format	Compression	Best For
CSV	ZSTD	Human-readable reports, spreadsheet import
JSON (NDJSON)	ZSTD	Programmatic processing, streaming
Parquet	Snappy	Analytics engines (Spark, Trino, DuckDB)
Iceberg	—	Data lakehouse integration via catalog

Default Fields (Always Included)

Bucket, Key, SequenceNumber, Size, LastModifiedDate

When versions: all, also: VersionID, IsDeleteMarker, IsLatest

Optional Fields

Field	Description
`ETag`	Object’s ETag
`StorageClass`	Storage class
`IsMultipart`	Multipart upload indicator
`EncryptionStatus`	Server-side encryption status
`IsBucketKeyEnabled`	Bucket key encryption enabled
`KmsKeyArn`	KMS key ARN
`ChecksumAlgorithm`	Checksum algorithm used
`Tags`	Object tags (query string format)
`UserMetadata`	User-defined metadata (query string format)
`AccessTime`	Last access time
`ReplicationStatus`	Replication status
`ObjectLockRetainUntilDate`	Object lock retention date
`ObjectLockMode`	Lock mode (GOVERNANCE/COMPLIANCE)
`ObjectLockLegalHoldStatus`	Legal hold status (on/off)
`Tier`	Storage tier
`TieringStatus`	Tiering status

Schedule Options

Schedule	Behavior
`once`	Runs one time immediately (default)
`hourly`	Runs at the beginning of every hour after previous completion
`daily`	Runs at midnight UTC the day after previous completion
`weekly`	Runs on Sunday at midnight UTC following previous completion
`monthly`	Runs on the first Sunday of the month following previous completion
`yearly`	Runs on the first Sunday of January following previous completion

Periodic schedules are based on completion time, not fixed calendar intervals. The scheduler detects missed windows and queues jobs accordingly.

Filtering Capabilities

Prefix filtering — include only objects under specific prefixes:

filters:
  prefix:
    - "videos/"
    - "images/"

Age filtering — relative durations or absolute timestamps:

filters:
  lastModified:
    olderThan: 30d          # Relative duration
    after: "2025-01-01T00:00:00Z"  # Absolute timestamp

Size filtering — human-readable units:

filters:
  size:
    greaterThan: 10MiB
    lessThan: 1GiB

Name filtering — glob, substring, or regex:

filters:
  name:
    match: "archive-*.zip"    # Glob
    contains: "backup"        # Substring
    regex: "report-\\d{4}"    # Regex

Tag/metadata filtering — combine with AND/OR logic:

filters:
  tags:
    and:
      - key: project
        valueString:
          match: "ares-*"
      - key: status
        valueString:
          contains: "complete"

Architecture

Scheduler-Executor Model

The Inventory system uses a distributed Scheduler-Executor architecture designed for reliable, scalable operation across multi-node clusters.

┌─────────────────────────────────────────────────────┐
│                    Cluster                           │
│                                                     │
│  ┌─────────────┐                                    │
│  │  Scheduler   │  Singleton (leader-elected)       │
│  │  Runs: 15min │  Scans configs, creates Schedule  │
│  └──────┬──────┘                                    │
│         │  writes Schedule to .minio.sys            │
│         ▼                                           │
│  ┌─────────────────────────────────────────────┐    │
│  │           Schedule Object                    │    │
│  │  • PendingJobs                               │    │
│  │  • LockExpiredJobs                           │    │
│  │  • ReadyToRetryJobs                          │    │
│  └──────┬──────────┬──────────┬───────────┘    │
│         │          │          │                  │
│         ▼          ▼          ▼                  │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐        │
│  │ Executor │ │ Executor │ │ Executor │        │
│  │ Node 1   │ │ Node 2   │ │ Node N   │        │
│  │ Polls:2m │ │ Polls:2m │ │ Polls:2m │        │
│  │ Max: 5   │ │ Max: 5   │ │ Max: 5   │        │
│  └──────────┘ └──────────┘ └──────────┘        │
│                                                     │
└─────────────────────────────────────────────────────┘

Scheduler — a cluster-wide singleton (leader-elected via distributed lock) that runs every 15 minutes^[1]. It scans all inventory configurations, detects jobs that are due, and produces a centralized Schedule stored in .minio.sys/inventory/__schedule__.bin.

Executor — runs on every node in the cluster, polling the Schedule every 2 minutes^[2] (with ±20% jitter to prevent thundering herd). Each executor can run up to 5 concurrent jobs^[3]. Jobs are claimed via ETag-based optimistic locking — if two nodes try to claim the same job, only one succeeds.

Job Lifecycle

Pending ──► Running ──► Completed     (once jobs)
                    ──► Sleeping      (periodic jobs)
                    ──► Errored ──► Pending (retry, up to 10 attempts)
                                 ──► Failed  (max retries exceeded)

Control operations available at any point:

Cancel — stops current execution; periodic jobs continue their schedule
Suspend — stops execution AND pauses the schedule until resumed
Resume — restores execution and reactivates the schedule

Distributed Lock Mechanism

Lock lease duration: 30 minutes^[4], refreshed every 10 seconds during execution
Lock expiry detection uses a 150% buffer (45 minutes for 30-minute locks)
If a node crashes, the lock expires and another node picks up the job automatically

Fast vs Strict Mode

Mode	Disk Reads	Speed	Consistency
fast (default)	Single disk^[5]	Faster	Objects modified during scan may be missed
strict	Optimal disk set	Slower	Higher consistency for concurrent writes

Both modes may miss objects modified during the scan. Use strict when accuracy is more important than speed.

Operational Considerations

IAM Permissions

Two permission sets control inventory access:

S3 permissions (for configuration management):

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "s3:PutInventoryConfiguration",
      "s3:GetInventoryConfiguration"
    ],
    "Resource": ["arn:aws:s3:::my-bucket"]
  }]
}

Admin permission (for job control operations — cancel, suspend, resume):

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": ["admin:InventoryControl"],
    "Resource": ["arn:aws:s3:::*"]
  }]
}

A read-only inventory user needs only s3:GetInventoryConfiguration to view configurations and job status.

Monitoring with Prometheus

Inventory metrics are available at /minio/metrics/v3/inventory.

Cluster-level metrics:

Metric	Description
`minio_inventory_jobs_completed_count`	Total completed jobs
`minio_inventory_jobs_active_count`	Currently running jobs
`minio_inventory_jobs_failed_count`	Total failed jobs
`minio_inventory_jobs_queued_count`	Jobs waiting for execution
`minio_inventory_objects_scanned_count`	Total objects scanned
`minio_inventory_bytes_scanned_count`	Total bytes scanned
`minio_inventory_total_configs`	Total inventory configurations
`minio_inventory_running_jobs`	Currently running jobs

Node-level metrics:

Metric	Description
`minio_inventory_node_running_jobs`	Jobs running on this node
`minio_inventory_node_pending_jobs`	Jobs pending on this node
`minio_inventory_node_job_execution_errors`	Execution errors on this node

Manifest Files

Each execution writes an AWS S3-compatible manifest with a MinIO extension:

{
  "sourceBucket": "my-bucket",
  "destinationBucket": "dest-bucket",
  "version": "2016-11-30",
  "fileFormat": "CSV (ZSTD compressed)",
  "fileSchema": "Bucket,Key,Size,LastModifiedDate,...",
  "files": [
    {"key": "...", "size": 1024, "MD5checksum": "abc123"}
  ],
  "minioExtension": {
    "status": "completed",
    "scannedObjects": 12500,
    "matchedObjects": 8300,
    "partialResultsAvailable": false
  }
}

The minioExtension field is optional and ignored by AWS S3-compatible tools. AIStor-aware consumers can use it to distinguish completed from canceled/suspended jobs and to check for partial results.

Automatic Recovery

The system includes several self-healing mechanisms:

Corrupt metadata recovery — the scheduler detects orphaned or missing metadata and automatically cleans up or recreates job state
Panic recovery — scheduler, executor, and individual job panics are caught, logged with stack traces, and automatically restarted after a 1-minute backoff
Lock expiry recovery — if a node crashes mid-execution, the lock expires and another node picks up the job
Retry logic — failed jobs are retried up to 10 times^[6] with a 10-minute^[7] delay between attempts

Performance Tuning

Parameter	Default	Impact
Concurrent jobs per node	5^[3]	More concurrency = higher throughput but more resource consumption
Max records per output file	1,000,000^[8]	Larger files = fewer files but more memory during processing
Record batch size	200	Records buffered before writing
Metrics reporting interval	Every 1,000 objects	Scanned count updates

Recommendations:

Use fast mode for large-scale inventory where slight inconsistency is acceptable
Use Parquet format for analytics workloads — columnar storage enables efficient queries
Apply prefix filters to narrow scope and reduce execution time
Schedule during off-peak hours for periodic jobs on resource-constrained clusters
The inventory system does not use or depend on the data scanner — it uses ObjectLayer.Walk() directly for listing

Migrating from Batch Catalog

If you have existing batch catalog YAML files (apiVersion v2), convert them to inventory format:

mc inventory migrate-from-batch batch-job.yaml my-inventory-id > inventory-job.yaml
mc inventory put myminio/my-bucket inventory-job.yaml

The migration converts apiVersion: v2 to v1, removes the bucket field (specified via the API endpoint instead), and adds the inventory id field. All YAML comments are preserved.

Common Scenarios

Stop a Runaway Job

mc inventory status myminio/my-bucket my-job
mc inventory cancel myminio/my-bucket my-job

For periodic jobs, this only stops the current execution — future runs continue. To fully stop:

mc inventory suspend myminio/my-bucket my-job

Pause for Maintenance

# Suspend before maintenance
mc inventory suspend myminio/bucket1 job1
mc inventory suspend myminio/bucket2 job2

# Perform maintenance...

# Resume after maintenance
mc inventory resume myminio/bucket1 job1
mc inventory resume myminio/bucket2 job2

Update a Running Job’s Configuration

# Delete the old configuration (running job stops within 10 seconds)
mc inventory delete myminio/my-bucket my-job

# Create new configuration with same ID
mc inventory put myminio/my-bucket updated-config.yml

View All Inventory Jobs Across Cluster

# List all configurations across all buckets
mc inventory list --all myminio

# Watch all jobs in real-time
mc inventory status --watch --all myminio

Source Code References

internal/inventory/system-params.go:239 - defaultSchedulerInterval = 15 * time.Minute
internal/inventory/system-params.go:240 - defaultExecutorInterval = 2 * time.Minute
cmd/inventory.go:43 - maxConcurrentInventoryJobs = 5
internal/inventory/system-params.go:243 - defaultLockDuration = 30 * time.Minute
cmd/inventory.go:477-479 - fast mode: askDisks = "disk", strict mode: askDisks = "optimal"
internal/inventory/system-params.go:244 - defaultMaxRetryAttempts = uint8(10)
internal/inventory/system-params.go:242 - defaultRetryOnErrorDelay = 10 * time.Minute
internal/inventory/system-params.go:245 - defaultInventoryMaxRecordsPerFile = 1_000_000