How do I use the AIStor Inventory API to generate object reports?

Asked by muratkars Answered by muratkars February 22, 2026
0 views

The AIStor Inventory API is a bucket-level object metadata reporting system that generates comprehensive inventory reports listing objects in a bucket. It follows the AWS S3 Inventory model while extending it with AIStor-specific capabilities such as Parquet and Iceberg output, advanced filtering, and distributed job orchestration across cluster nodes.

Answer

The Inventory API enables you to generate on-demand or scheduled reports about every object in a bucket — including metadata like size, ETag, tags, encryption status, replication state, and more. Reports are written in CSV, JSON, Parquet, or Iceberg format to a destination bucket of your choice.

This is an enterprise feature available in AIStor. All operations are managed through mc inventory commands or the S3-compatible REST API.


Use Cases

Compliance and Auditing

Generate periodic inventory snapshots to prove regulatory compliance (GDPR, HIPAA, SOX). Track which objects exist, their encryption status, object lock settings, and retention dates.

schedule: daily
includeFields:
- EncryptionStatus
- ObjectLockMode
- ObjectLockRetainUntilDate
- ObjectLockLegalHoldStatus

Storage Cost Optimization

Identify large, old, or rarely accessed objects for tiering or deletion. Filter by size and age to find optimization targets.

filters:
size:
greaterThan: 100MiB
lastModified:
olderThan: 90d
includeFields:
- StorageClass
- Tier
- TieringStatus
- AccessTime

Data Lifecycle Management

Discover objects with excessive versions, stale delete markers, or multipart uploads that need cleanup.

versions: all
filters:
versionsCount:
greaterThan: 10
includeFields:
- IsDeleteMarker
- IsLatest
- IsMultipart

Analytics and Data Catalog Integration

Export Parquet or Iceberg reports directly into your data lakehouse for ad-hoc analytics with tools like Spark, Trino, or DuckDB.

destination:
format: parquet
compression: on
includeFields:
- Tags
- UserMetadata
- ReplicationStatus

Replication Verification

Audit replication status across buckets to ensure data redundancy policies are being met.

includeFields:
- ReplicationStatus
filters:
prefix:
- "critical-data/"

Quick Start

Step 1: Generate a Configuration Template

Terminal window
mc inventory generate myminio/my-source-bucket my-job > config.yml

Step 2: Edit the Configuration

A minimal configuration:

apiVersion: v1
id: my-job
destination:
bucket: my-dest-bucket
prefix: inventory-reports/
format: csv
compression: on
schedule: once
versions: current

Step 3: Submit the Configuration

Terminal window
mc inventory put myminio/my-source-bucket config.yml

The job starts processing immediately after submission.

Step 4: Monitor Progress

Terminal window
# One-shot status check
mc inventory status myminio/my-source-bucket my-job
# Live watch mode with auto-refresh
mc inventory status --watch myminio/my-source-bucket my-job

Status output includes: job state, objects scanned/matched, records written, execution time, manifest path, and error details.

Step 5: Access the Report

Terminal window
mc ls -r myminio/my-dest-bucket/inventory-reports/

Output folder structure:

inventory-reports/
my-source-bucket/
my-job/
2026-02-23T10-30Z/
files/
file-001.csv.zst
file-002.csv.zst
manifest.json

Download and decompress:

Terminal window
mc cp myminio/my-dest-bucket/inventory-reports/.../file-001.csv.zst ./
zstd -d file-001.csv.zst

API Reference

S3-Compatible Endpoints

All endpoints use the ?minio-inventory query parameter on the bucket path.

OperationMethodQuery ParametersIAM Permission
Generate templateGET?minio-inventory&id={id}&generates3:PutInventoryConfiguration
Put configPUT?minio-inventory&id={id}s3:PutInventoryConfiguration
Get configGET?minio-inventory&id={id}s3:GetInventoryConfiguration
Delete configDELETE?minio-inventory&id={id}s3:PutInventoryConfiguration
List configsGET?minio-inventory&continuation-token={token}s3:GetInventoryConfiguration
Get job statusGET?minio-inventory&id={id}&statuss3:GetInventoryConfiguration

Admin Control Endpoints

These require the admin:InventoryControl permission.

OperationMethodPath
Cancel jobPOST/minio/admin/v3/inventory/{bucket}/{id}/cancel
Suspend jobPOST/minio/admin/v3/inventory/{bucket}/{id}/suspend
Resume jobPOST/minio/admin/v3/inventory/{bucket}/{id}/resume

mc Commands

CommandPurpose
mc inventory generate ALIAS/BUCKET IDGenerate a YAML configuration template
mc inventory put ALIAS/BUCKET FILECreate or replace an inventory configuration
mc inventory get ALIAS/BUCKET IDRetrieve an existing configuration
mc inventory list ALIAS/BUCKETList all configurations for a bucket
mc inventory list --all ALIASList all configurations across all buckets
mc inventory status ALIAS/BUCKET IDCheck job status
mc inventory status --watch ALIAS/BUCKET IDLive-watch job progress
mc inventory status --all ALIASShow status of all jobs across all buckets
mc inventory delete ALIAS/BUCKET IDDelete a configuration
mc inventory cancel ALIAS/BUCKET IDCancel the current execution
mc inventory suspend ALIAS/BUCKET IDSuspend job and pause schedule
mc inventory resume ALIAS/BUCKET IDResume a suspended job
mc inventory migrate-from-batch FILE IDConvert batch catalog YAML to inventory format

Configuration Reference

YAML Structure

apiVersion: v1 # Required, must be "v1"
id: my-inventory-job # Required, 1-64 chars [a-zA-Z0-9._-]
destination:
bucket: dest-bucket # Required
prefix: reports/ # Optional
format: csv # csv | json | parquet | iceberg (default: csv)
compression: on # on | off (default: on)
schedule: once # once | hourly | daily | weekly | monthly | yearly
mode: fast # fast | strict (default: fast)
versions: all # all | current (default: all)
includeFields: # Optional additional fields
- ETag
- Tags
- UserMetadata
- AccessTime
filters: # Optional object filters
prefix: ["videos/"]
lastModified:
olderThan: 30d
size:
greaterThan: 1MiB
name:
match: "*.mp4"
tags:
and:
- key: project
valueString:
match: "ares-*"

Output Formats

FormatCompressionBest For
CSVZSTDHuman-readable reports, spreadsheet import
JSON (NDJSON)ZSTDProgrammatic processing, streaming
ParquetSnappyAnalytics engines (Spark, Trino, DuckDB)
IcebergData lakehouse integration via catalog

Default Fields (Always Included)

Bucket, Key, SequenceNumber, Size, LastModifiedDate

When versions: all, also: VersionID, IsDeleteMarker, IsLatest

Optional Fields

FieldDescription
ETagObject’s ETag
StorageClassStorage class
IsMultipartMultipart upload indicator
EncryptionStatusServer-side encryption status
IsBucketKeyEnabledBucket key encryption enabled
KmsKeyArnKMS key ARN
ChecksumAlgorithmChecksum algorithm used
TagsObject tags (query string format)
UserMetadataUser-defined metadata (query string format)
AccessTimeLast access time
ReplicationStatusReplication status
ObjectLockRetainUntilDateObject lock retention date
ObjectLockModeLock mode (GOVERNANCE/COMPLIANCE)
ObjectLockLegalHoldStatusLegal hold status (on/off)
TierStorage tier
TieringStatusTiering status

Schedule Options

ScheduleBehavior
onceRuns one time immediately (default)
hourlyRuns at the beginning of every hour after previous completion
dailyRuns at midnight UTC the day after previous completion
weeklyRuns on Sunday at midnight UTC following previous completion
monthlyRuns on the first Sunday of the month following previous completion
yearlyRuns on the first Sunday of January following previous completion

Periodic schedules are based on completion time, not fixed calendar intervals. The scheduler detects missed windows and queues jobs accordingly.

Filtering Capabilities

Prefix filtering — include only objects under specific prefixes:

filters:
prefix:
- "videos/"
- "images/"

Age filtering — relative durations or absolute timestamps:

filters:
lastModified:
olderThan: 30d # Relative duration
after: "2025-01-01T00:00:00Z" # Absolute timestamp

Size filtering — human-readable units:

filters:
size:
greaterThan: 10MiB
lessThan: 1GiB

Name filtering — glob, substring, or regex:

filters:
name:
match: "archive-*.zip" # Glob
contains: "backup" # Substring
regex: "report-\\d{4}" # Regex

Tag/metadata filtering — combine with AND/OR logic:

filters:
tags:
and:
- key: project
valueString:
match: "ares-*"
- key: status
valueString:
contains: "complete"

Architecture

Scheduler-Executor Model

The Inventory system uses a distributed Scheduler-Executor architecture designed for reliable, scalable operation across multi-node clusters.

┌─────────────────────────────────────────────────────┐
│ Cluster │
│ │
│ ┌─────────────┐ │
│ │ Scheduler │ Singleton (leader-elected) │
│ │ Runs: 15min │ Scans configs, creates Schedule │
│ └──────┬──────┘ │
│ │ writes Schedule to .minio.sys │
│ ▼ │
│ ┌─────────────────────────────────────────────┐ │
│ │ Schedule Object │ │
│ │ • PendingJobs │ │
│ │ • LockExpiredJobs │ │
│ │ • ReadyToRetryJobs │ │
│ └──────┬──────────┬──────────┬───────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Executor │ │ Executor │ │ Executor │ │
│ │ Node 1 │ │ Node 2 │ │ Node N │ │
│ │ Polls:2m │ │ Polls:2m │ │ Polls:2m │ │
│ │ Max: 5 │ │ Max: 5 │ │ Max: 5 │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────────┘

Scheduler — a cluster-wide singleton (leader-elected via distributed lock) that runs every 15 minutes[1]. It scans all inventory configurations, detects jobs that are due, and produces a centralized Schedule stored in .minio.sys/inventory/__schedule__.bin.

Executor — runs on every node in the cluster, polling the Schedule every 2 minutes[2] (with ±20% jitter to prevent thundering herd). Each executor can run up to 5 concurrent jobs[3]. Jobs are claimed via ETag-based optimistic locking — if two nodes try to claim the same job, only one succeeds.

Job Lifecycle

Pending ──► Running ──► Completed (once jobs)
──► Sleeping (periodic jobs)
──► Errored ──► Pending (retry, up to 10 attempts)
──► Failed (max retries exceeded)

Control operations available at any point:

  • Cancel — stops current execution; periodic jobs continue their schedule
  • Suspend — stops execution AND pauses the schedule until resumed
  • Resume — restores execution and reactivates the schedule

Distributed Lock Mechanism

  • Lock lease duration: 30 minutes[4], refreshed every 10 seconds during execution
  • Lock expiry detection uses a 150% buffer (45 minutes for 30-minute locks)
  • If a node crashes, the lock expires and another node picks up the job automatically

Fast vs Strict Mode

ModeDisk ReadsSpeedConsistency
fast (default)Single disk[5]FasterObjects modified during scan may be missed
strictOptimal disk setSlowerHigher consistency for concurrent writes

Both modes may miss objects modified during the scan. Use strict when accuracy is more important than speed.


Operational Considerations

IAM Permissions

Two permission sets control inventory access:

S3 permissions (for configuration management):

{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": [
"s3:PutInventoryConfiguration",
"s3:GetInventoryConfiguration"
],
"Resource": ["arn:aws:s3:::my-bucket"]
}]
}

Admin permission (for job control operations — cancel, suspend, resume):

{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": ["admin:InventoryControl"],
"Resource": ["arn:aws:s3:::*"]
}]
}

A read-only inventory user needs only s3:GetInventoryConfiguration to view configurations and job status.

Monitoring with Prometheus

Inventory metrics are available at /minio/metrics/v3/inventory.

Cluster-level metrics:

MetricDescription
minio_inventory_jobs_completed_countTotal completed jobs
minio_inventory_jobs_active_countCurrently running jobs
minio_inventory_jobs_failed_countTotal failed jobs
minio_inventory_jobs_queued_countJobs waiting for execution
minio_inventory_objects_scanned_countTotal objects scanned
minio_inventory_bytes_scanned_countTotal bytes scanned
minio_inventory_total_configsTotal inventory configurations
minio_inventory_running_jobsCurrently running jobs

Node-level metrics:

MetricDescription
minio_inventory_node_running_jobsJobs running on this node
minio_inventory_node_pending_jobsJobs pending on this node
minio_inventory_node_job_execution_errorsExecution errors on this node

Manifest Files

Each execution writes an AWS S3-compatible manifest with a MinIO extension:

{
"sourceBucket": "my-bucket",
"destinationBucket": "dest-bucket",
"version": "2016-11-30",
"fileFormat": "CSV (ZSTD compressed)",
"fileSchema": "Bucket,Key,Size,LastModifiedDate,...",
"files": [
{"key": "...", "size": 1024, "MD5checksum": "abc123"}
],
"minioExtension": {
"status": "completed",
"scannedObjects": 12500,
"matchedObjects": 8300,
"partialResultsAvailable": false
}
}

The minioExtension field is optional and ignored by AWS S3-compatible tools. AIStor-aware consumers can use it to distinguish completed from canceled/suspended jobs and to check for partial results.

Automatic Recovery

The system includes several self-healing mechanisms:

  • Corrupt metadata recovery — the scheduler detects orphaned or missing metadata and automatically cleans up or recreates job state
  • Panic recovery — scheduler, executor, and individual job panics are caught, logged with stack traces, and automatically restarted after a 1-minute backoff
  • Lock expiry recovery — if a node crashes mid-execution, the lock expires and another node picks up the job
  • Retry logic — failed jobs are retried up to 10 times[6] with a 10-minute[7] delay between attempts

Performance Tuning

ParameterDefaultImpact
Concurrent jobs per node5[3]More concurrency = higher throughput but more resource consumption
Max records per output file1,000,000[8]Larger files = fewer files but more memory during processing
Record batch size200Records buffered before writing
Metrics reporting intervalEvery 1,000 objectsScanned count updates

Recommendations:

  • Use fast mode for large-scale inventory where slight inconsistency is acceptable
  • Use Parquet format for analytics workloads — columnar storage enables efficient queries
  • Apply prefix filters to narrow scope and reduce execution time
  • Schedule during off-peak hours for periodic jobs on resource-constrained clusters
  • The inventory system does not use or depend on the data scanner — it uses ObjectLayer.Walk() directly for listing

Migrating from Batch Catalog

If you have existing batch catalog YAML files (apiVersion v2), convert them to inventory format:

Terminal window
mc inventory migrate-from-batch batch-job.yaml my-inventory-id > inventory-job.yaml
mc inventory put myminio/my-bucket inventory-job.yaml

The migration converts apiVersion: v2 to v1, removes the bucket field (specified via the API endpoint instead), and adds the inventory id field. All YAML comments are preserved.


Common Scenarios

Stop a Runaway Job

Terminal window
mc inventory status myminio/my-bucket my-job
mc inventory cancel myminio/my-bucket my-job

For periodic jobs, this only stops the current execution — future runs continue. To fully stop:

Terminal window
mc inventory suspend myminio/my-bucket my-job

Pause for Maintenance

Terminal window
# Suspend before maintenance
mc inventory suspend myminio/bucket1 job1
mc inventory suspend myminio/bucket2 job2
# Perform maintenance...
# Resume after maintenance
mc inventory resume myminio/bucket1 job1
mc inventory resume myminio/bucket2 job2

Update a Running Job’s Configuration

Terminal window
# Delete the old configuration (running job stops within 10 seconds)
mc inventory delete myminio/my-bucket my-job
# Create new configuration with same ID
mc inventory put myminio/my-bucket updated-config.yml

View All Inventory Jobs Across Cluster

Terminal window
# List all configurations across all buckets
mc inventory list --all myminio
# Watch all jobs in real-time
mc inventory status --watch --all myminio
Source Code References
  1. internal/inventory/system-params.go:239 - defaultSchedulerInterval = 15 * time.Minute
  2. internal/inventory/system-params.go:240 - defaultExecutorInterval = 2 * time.Minute
  3. cmd/inventory.go:43 - maxConcurrentInventoryJobs = 5
  4. internal/inventory/system-params.go:243 - defaultLockDuration = 30 * time.Minute
  5. cmd/inventory.go:477-479 - fast mode: askDisks = "disk", strict mode: askDisks = "optimal"
  6. internal/inventory/system-params.go:244 - defaultMaxRetryAttempts = uint8(10)
  7. internal/inventory/system-params.go:242 - defaultRetryOnErrorDelay = 10 * time.Minute
  8. internal/inventory/system-params.go:245 - defaultInventoryMaxRecordsPerFile = 1_000_000
0