How does MinIO AIStor replace Apache Ranger and Atlas for native authorization and data governance?

Asked by rohit-minio Answered by muratkars February 16, 2026
0 views

Enterprise data platforms built on Cloudera rely on Apache Ranger for policy-based authorization (RBAC, ABAC, row/column masking, tag-based policies) and Apache Atlas for data classification, tagging, and lineage. Organizations migrating to MinIO AIStor or building new lakehouse architectures need equivalent governance capabilities without introducing Hadoop-era dependencies. This Q&A describes how AIStor decomposes governance into composable layers that together provide full Ranger + Atlas equivalence.

Answer

MinIO AIStor does not replicate Ranger + Atlas as a monolithic authorization system. Instead, it decomposes governance into layers aligned with modern lakehouse architecture: storage-layer IAM with ABAC conditions, tag-based policy enforcement, external authorization delegation via OPA, query-engine masking via Trino/Spark, and Object Lambda transformation for direct S3 access. Each layer is independently scalable and replaceable.

The key architectural insight: Ranger’s row/column masking operates at the query engine layer, not the storage layer. HDFS has no concept of rows or columns. Ranger intercepts SQL queries in Hive or Presto and rewrites them to apply filters and masking functions. Atlas tags metadata entities, not bytes on disk. MinIO AIStor does not need to replicate Ranger at the storage layer. It provides the policy primitives that query engines and applications leverage, plus its own object-level governance for direct S3 API access.


What Ranger + Atlas Solve

CapabilityRangerAtlas
Centralized policy managementPolicy admin UI + policy engine-
RBACRole/group-based policies-
ABACAttribute conditions on policies-
Tag-based policiesConsumes tags, enforces policiesProduces and manages tags
Row-level filteringSQL rewrite in Hive/Presto-
Column maskingSQL rewrite in Hive/Presto-
Data classification-Automated/manual tagging
Data lineage-Process/column lineage
AuditPer-request audit logEntity audit trail

Architecture Overview

┌──────────────────────────────────────────────────────────────────┐
│ Query Engines │
│ (Trino / Spark / Dremio / StarRocks) │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Engine-native column masking and row-level filtering │ │
│ │ (Trino Access Control SPI / Spark DataSourceV2 views) │ │
│ └────────────────────────────────────────────────────────────┘ │
└────────────────────────────┬─────────────────────────────────────┘
│ S3 API / Iceberg REST Catalog
┌────────────────────────────▼─────────────────────────────────────┐
│ MinIO AIStor │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────────┐ │
│ │ IAM / ABAC │ │ Tag-Based │ │ Object Lambda │ │
│ │ Policies │ │ Policies │ │ (object transformation)│ │
│ └──────┬──────┘ └──────┬───────┘ └────────────┬────────────┘ │
│ │ │ │ │
│ ┌──────▼────────────────▼──────┐ ┌────────────▼────────────┐ │
│ │ Policy Plugin Webhook │ │ Transformation │ │
│ │ (OPA / Styra DAS / │ │ Service (PII redact, │ │
│ │ custom policy engine) │ │ field filtering) │ │
│ └──────────────────────────────┘ └─────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Object Tagging + Bucket Notifications │ │
│ │ (auto-classify on ingest, apply governance tags) │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ AIStor Tables / Iceberg (warehouse, namespace, table IAM) │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Audit Log → SIEM (Splunk / Elastic / Kafka) │ │
│ └────────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────────┘

1. RBAC and ABAC: Native IAM Policies

AIStor’s IAM policy engine uses AWS IAM policy syntax with an extensive set of condition keys[1], providing RBAC and ABAC natively without external components.

Available Condition Keys

Identity attributes:

  • aws:username, aws:userid for IAM user identity
  • aws:groups for group membership array
  • jwt:* for any OpenID Connect claim (jwt:sub, jwt:email, jwt:groups, jwt:department, etc.)
  • ldap:username, ldap:user, ldap:groups for LDAP identity and groups
  • aws:PrincipalType for Account, User, or AssumedRole

Resource attributes:

  • s3:ExistingObjectTag/{key} for tags on an existing object
  • s3:RequestObjectTag/{key} for tags in the current request
  • s3:RequestObjectTagKeys for the array of tag keys in the request
  • s3tables:namespace, s3tables:tableName, s3tables:viewName for Iceberg table attributes

Environment attributes:

  • aws:SourceIp for requester IP address (preserves X-Forwarded-For)
  • aws:SecureTransport for TLS/non-TLS
  • aws:CurrentTime, aws:EpochTime for temporal conditions
  • s3:x-amz-server-side-encryption-aws-kms-key-id for encryption key constraints

RBAC Example: Department-Scoped Access

Users in the finance group can read objects under the finance/ prefix:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "FinanceGroupReadAccess",
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::datalake",
"arn:aws:s3:::datalake/finance/*"
],
"Condition": {
"StringLike": {
"s3:prefix": ["finance/*"]
}
}
}
]
}

This policy is attached to a group. Users inherit it through group membership, which is standard RBAC.

ABAC Example: Attribute-Driven Access via OIDC Claims

Users whose OIDC department claim matches the object prefix get access. No per-user policy management required:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DepartmentScopedAccess",
"Effect": "Allow",
"Action": ["s3:GetObject"],
"Resource": "arn:aws:s3:::datalake/${jwt:department}/*"
}
]
}

A user with jwt:department=engineering can read datalake/engineering/*. A user with jwt:department=finance can read datalake/finance/*. One policy covers all departments.

ABAC Example: Network-Restricted Access

Deny access to restricted data from outside the corporate network:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "CorpNetOnly",
"Effect": "Deny",
"Action": "s3:*",
"Resource": "arn:aws:s3:::restricted-data/*",
"Condition": {
"NotIpAddress": {
"aws:SourceIp": "10.0.0.0/8"
}
}
}
]
}

2. Tag-Based Governance: Object Tagging + Tag-Based Policies

This replaces the Atlas tagging and Ranger tag-based policy enforcement loop. AIStor supports S3 object tagging (up to 10 key-value pairs per object) and IAM policies can reference those tags in conditions[1].

Classification Enforcement

Deny access to objects tagged classification=pii unless the requester belongs to the pii-authorized group:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyPIIUnlessAuthorized",
"Effect": "Deny",
"Action": ["s3:GetObject"],
"Resource": "arn:aws:s3:::datalake/*",
"Condition": {
"StringEquals": {
"s3:ExistingObjectTag/classification": "pii"
},
"StringNotEquals": {
"aws:groups": "pii-authorized"
}
}
}
]
}

Prevent Untagged Uploads

Require all uploaded objects to carry a classification tag:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "RequireClassificationTag",
"Effect": "Deny",
"Action": ["s3:PutObject"],
"Resource": "arn:aws:s3:::governed-bucket/*",
"Condition": {
"Null": {
"s3:RequestObjectTag/classification": "true"
}
}
}
]
}

This ensures every object entering the governed bucket has been classified. This is the equivalent of Atlas mandatory classification, enforced at the storage layer.

Automated Classification Pipeline

To replace Atlas’s automated tagging, use bucket notifications to trigger a classification service on object upload:

Object Upload → Bucket Notification (webhook/Kafka)
→ Classification Service (NLP, regex, ML model)
→ PutObjectTagging (apply classification, sensitivity, retention tags)
→ Tag-based IAM policies enforce access

Bucket notification configuration:

Terminal window
mc event add myminio/datalake arn:minio:sqs::classify:webhook \
--event put --prefix "raw/"

The classification service receives the event, reads the object, applies detection rules (PII patterns, sensitivity heuristics, schema inspection for structured formats), and calls PutObjectTagging to apply governance tags. From that point forward, tag-based IAM policies govern access automatically.


3. External Authorization: Policy Plugin Webhook

For organizations that need centralized policy management or policy logic that exceeds what IAM policy JSON can express, AIStor supports an external authorization webhook (policy_plugin)[2] that can delegate access decisions to OPA or a custom policy engine.

When configured, AIStor sends request and credential details for every API call to the external endpoint and expects an allow/deny response. The plugin’s decision is final. When an external plugin is configured, it completely overrides all other policy evaluation. IAM policies, bucket policies, and session policies are not consulted[3].

Configuration

Terminal window
MINIO_POLICY_PLUGIN_URL=https://policy-engine:8181/v1/data/minio/authz/allow
MINIO_POLICY_PLUGIN_AUTH_TOKEN=<bearer-token>
MINIO_POLICY_PLUGIN_ENABLE_HTTP2=on

Request Context

Every API request sends a rich context payload to the policy engine:

{
"input": {
"account": "jsmith",
"groups": ["engineering", "ml-team"],
"action": "s3:GetObject",
"bucket": "datalake",
"object": "training-data/customers.parquet",
"owner": false,
"claims": {
"department": "engineering",
"clearance": "L2",
"sub": "jsmith@corp.example.com"
},
"conditions": {
"ExistingObjectTag/classification": ["confidential"],
"ExistingObjectTag/data-owner": ["finance"],
"SourceIp": ["10.0.5.42"],
"SecureTransport": ["true"],
"CurrentTime": ["2026-02-17T14:30:00Z"]
}
}
}

The policy engine has access to: the user’s identity and group membership, the action being performed, the target resource, the object’s tags, the requester’s network location, OIDC/LDAP claims, and temporal context.

OPA Integration

Open Policy Agent with Rego policies provides the closest analog to Ranger’s centralized policy engine:

package minio.authz
import rego.v1
default allow := false
# RBAC: admin group has full access
allow if {
"admin" in input.groups
}
# ABAC: users can access objects in their department's prefix
allow if {
department := input.claims.department
startswith(input.object, concat("/", [department, ""]))
}
# Tag-based: deny access to PII unless in pii-authorized group
deny if {
input.conditions["ExistingObjectTag/classification"][_] == "pii"
not "pii-authorized" in input.groups
}
# Tag-based: data steward override
allow if {
data_owner := input.conditions["ExistingObjectTag/data-owner"][_]
sprintf("steward-%s", [data_owner]) in input.groups
}
# Final decision
result := {"allow": allow, "deny": deny}

Deployment pattern for centralized management:

Policy Authors → Git Repository (Rego policies)
→ CI/CD Pipeline (lint, test, bundle)
→ OPA Bundle Server (S3/HTTP)
→ OPA Instances (co-located with MinIO or standalone)
← MinIO policy plugin webhook

This gives you version-controlled policies, CI/CD for policy changes, centralized management, and distributed enforcement without the Hadoop dependency. The key difference from Ranger is that AIStor’s policy plugin is a stateless decision point, not a policy management system. Policy authoring, storage, and administration live entirely in the external engine.


4. Data Masking and Row Filtering

Data masking operates at two layers depending on the access pattern. This is architecturally identical to how Ranger works. Ranger’s masking lives in the query engine (Hive plugin), not in HDFS.

Layer 1: Query Engine Masking (SQL Access)

When users access data through SQL engines (Trino, Spark, Dremio, StarRocks), the engine’s native access control handles masking. This is the direct replacement for Ranger’s Hive/Presto plugin.

Trino Access Control SPI example (rules.json):

{
"catalogs": [
{
"catalog": "iceberg",
"allow": true
}
],
"tables": [
{
"catalog": "iceberg",
"schema": "finance",
"table": "customers",
"privileges": ["SELECT"]
}
],
"columnMasks": [
{
"catalog": "iceberg",
"schema": "finance",
"table": "customers",
"column": "ssn",
"expression": "CASE WHEN user() IN ('admin', 'compliance') THEN ssn ELSE 'XXX-XX-' || substr(ssn, 8) END"
},
{
"catalog": "iceberg",
"schema": "finance",
"table": "customers",
"column": "email",
"expression": "CASE WHEN user() IN ('admin', 'marketing') THEN email ELSE regexp_replace(email, '(.).*@', '$1***@') END"
}
],
"rowFilters": [
{
"catalog": "iceberg",
"schema": "finance",
"table": "transactions",
"filter": "region IN (SELECT allowed_region FROM access_control.user_regions WHERE username = user())"
}
]
}

This configuration masks SSN to show only last 4 digits for non-privileged users, masks email addresses for non-marketing users, and filters transaction rows so users only see their authorized regions.

Trino + AIStor deployment:

Trino Coordinator
├── Iceberg Connector → AIStor (S3 API + Iceberg REST Catalog)
├── Access Control SPI → rules.json or custom plugin (OPA-based)
└── Column Masking + Row Filtering applied at query time

Trino authenticates to AIStor using STS credentials scoped to the user’s session policy. AIStor enforces object-level access (which tables/namespaces the user can reach). Trino enforces column masking and row filtering within the query results.

Layer 2: Object Lambda Transformation (Direct S3 Access)

When applications access data directly via the S3 API (ML pipelines, ETL jobs, application services), Object Lambda[4] provides object-level transformation before delivery. Object Lambda is a transparent proxy. AIStor sends a presigned URL and identity context to an external webhook function, which fetches the original object, applies transformation logic, and returns the result. AIStor itself does not parse or inspect the object’s internal structure (Parquet columns, CSV rows, JSON fields). Row and column semantics are implemented entirely by the transformation function.

Configuration:

Terminal window
MINIO_LAMBDA_WEBHOOK_ENABLE_pii_redact=on
MINIO_LAMBDA_WEBHOOK_ENDPOINT_pii_redact=http://masking-service:5000
MINIO_LAMBDA_WEBHOOK_AUTH_TOKEN_pii_redact=<token>

Masking service implementation (Python/Flask):

from flask import Flask, request, make_response
import requests
import pyarrow.parquet as pq
import io
app = Flask(__name__)
MASKING_RULES = {
"ssn": lambda val, groups: val if "pii-authorized" in groups else "XXX-XX-" + val[-4:],
"email": lambda val, groups: val if "marketing" in groups else val[0] + "***@" + val.split("@")[1],
"phone": lambda val, groups: val if "pii-authorized" in groups else "***-***-" + val[-4:],
}
@app.route("/", methods=["POST"])
def mask_object():
event = request.json
context = event["getObjectContext"]
# Fetch original object via presigned URL
r = requests.get(context["inputS3Url"])
# Apply masking based on content type and user context
masked_content = apply_masking(r.content, event)
resp = make_response(masked_content, 200)
resp.headers["x-amz-request-route"] = context["outputRoute"]
resp.headers["x-amz-request-token"] = context["outputToken"]
return resp

Client-side invocation (Go):

reqParams := make(url.Values)
reqParams.Set("lambdaArn", "arn:minio:s3-object-lambda::pii_redact:webhook")
presignedURL, err := s3Client.PresignedGetObject(
ctx, "datalake", "customers.parquet",
time.Hour, reqParams,
)

Applications that require masked data use the Lambda ARN in their requests. Applications with full access (authorized analytics, compliance) use standard GetObject without the Lambda ARN. Access to the unmasked path is controlled by IAM policies.


5. Table-Level Access Control: AIStor Tables / Iceberg

For structured data in Iceberg format, AIStor Tables provides fine-grained IAM actions at the warehouse, namespace, table, and view level[5]. There are 18 distinct s3tables: actions covering CRUD operations across the entire table hierarchy.

Supported AIStor Tables IAM Actions

Resource LevelActions
WarehouseCreateWarehouse, DeleteWarehouse, GetWarehouse, ListWarehouses
Table BucketCreateTableBucket, GetTableBucket, ListTableBuckets
NamespaceCreateNamespace, DeleteNamespace, GetNamespace, ListNamespaces
TableCreateTable, DeleteTable, GetTable, UpdateTable, RenameTable, ListTables
ViewUpdateView

Namespace-Scoped Access

Grant a data engineering team full access to the staging namespace while restricting production:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "StagingFullAccess",
"Effect": "Allow",
"Action": [
"s3tables:ListTables",
"s3tables:CreateTable",
"s3tables:GetTable",
"s3tables:UpdateTable",
"s3tables:DeleteTable"
],
"Resource": [
"arn:aws:s3tables:::bucket/lakehouse",
"arn:aws:s3tables:::bucket/lakehouse/table/*"
],
"Condition": {
"StringEquals": {
"s3tables:namespace": "staging"
}
}
},
{
"Sid": "ProductionReadOnly",
"Effect": "Allow",
"Action": [
"s3tables:GetTable",
"s3tables:ListTables"
],
"Resource": [
"arn:aws:s3tables:::bucket/lakehouse",
"arn:aws:s3tables:::bucket/lakehouse/table/*"
],
"Condition": {
"StringEquals": {
"s3tables:namespace": "production"
}
}
}
]
}

6. Session Policies via STS

AIStor’s STS (Security Token Service)[6] supports temporary credentials with scoped session policies. A user’s broad permissions can be narrowed to a specific task, dataset, or time window.

Supported STS methods:

MethodIdentity Source
AssumeRoleIAM user credentials
AssumeRoleWithWebIdentityOpenID Connect (Okta, Azure AD, Keycloak, etc.)
AssumeRoleWithLDAPIdentityLDAP / Active Directory
AssumeRoleWithCertificateClient TLS certificate
AssumeRoleWithCustomTokenCustom Identity Plugin

Session policy scoping:

The effective permissions for an STS session are the intersection of the user’s attached policies and the session policy[3]. This enforces least-privilege without modifying the user’s base permissions:

Effective Permissions = User Policies ∩ Session Policy

Example: Scoped session for a Spark job

A data pipeline assumes a role with a session policy that restricts access to a specific namespace:

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3tables:GetTable", "s3tables:ListTables"],
"Resource": "arn:aws:s3tables:::bucket/lakehouse/table/*",
"Condition": {
"StringEquals": {
"s3tables:namespace": "etl-workspace"
}
}
}
]
}

Even if the base user has broad access, the Spark job can only read tables in etl-workspace for the duration of the session.


7. Audit and Compliance

AIStor produces per-request audit logs[7] that can be streamed to external systems for compliance, forensics, and anomaly detection.

Configuration:

Terminal window
mc admin config set myminio audit_webhook \
enable=on \
endpoint=http://splunk-hec:8088/services/collector

Audit log fields include:

  • Requester identity (user, groups, source IP)
  • Action performed (S3 operation)
  • Target resource (bucket, object, table)
  • Policy evaluation result (allowed/denied)
  • Request and response metadata
  • Timestamp

This provides the equivalent of Ranger’s audit log. For SIEM integration, stream audit events to Splunk, Elasticsearch, or Kafka for correlation, alerting, and compliance reporting.


Policy Evaluation Order

AIStor evaluates authorization as follows[3]:

  1. External policy plugin (if configured): the plugin’s decision is final and immediate. IAM policies, bucket policies, and session policies are not consulted.

  2. Without an external plugin, evaluation proceeds through:

    • Owner check: the root account bypasses all policy evaluation
    • Session policy intersection: for STS and service account credentials, effective permissions are the intersection of parent policies and the inline session policy
    • IAM policy evaluation: Deny statements evaluated first; an explicit deny immediately denies the request
    • Bucket policies: evaluated for anonymous/public access when no IAM credentials are present

This follows the standard AWS IAM evaluation model: implicit deny by default, explicit deny always wins, explicit allow required for access.


Capability Mapping: Ranger + Atlas to AIStor

Ranger / Atlas CapabilityAIStor EquivalentLayer
RBAC (role/group policies)IAM policies with group conditionsNative
ABAC (attribute conditions)IAM policy conditions (JWT, LDAP, IP, time, tags)Native
Tag-based policy enforcementExistingObjectTag / RequestObjectTag conditionsNative
Data classification / taggingObject tagging + classification webhookNative + webhook
Column masking (SQL access)Trino Access Control SPI / Spark viewsQuery engine
Column masking (S3 access)Object Lambda transformationNative + function
Row-level filtering (SQL)Trino row filters / Spark DataSourceV2Query engine
Row-level filtering (S3)Object Lambda transformationNative + function
Centralized policy managementPolicy plugin webhook + OPANative + OPA
Table/database permissionsAIStor Tables IAM actions (18 actions)Native
Namespace permissionsAIStor Tables namespace condition keysNative
Temporary scoped accessSTS session policiesNative
Encryption policy enforcementKMS key conditions in IAM policiesNative
Data lineageOpenLineage (from Spark/Trino/Airflow)External
AuditAudit webhook to SIEMNative

Migration from Ranger + Atlas

  1. Inventory existing Ranger policies. Export as JSON, map actions to AIStor IAM actions.
  2. Map Atlas tags to S3 object tags. Build a tag migration script that reads Atlas’s metadata store and applies equivalent S3 object tags via PutObjectTagging.
  3. Translate Ranger tag-based policies to AIStor IAM policies using ExistingObjectTag conditions.
  4. Migrate Ranger row/column masking rules to Trino Access Control SPI rules (for SQL access) and Object Lambda functions (for direct S3 access).
  5. Replace Ranger audit with AIStor audit webhook to your existing SIEM.
  6. Deploy OPA if centralized policy management is required beyond what IAM policies cover.

Summary

Rather than replicating Ranger + Atlas as a monolithic system, AIStor decomposes governance into layers:

  • Storage layer (AIStor): IAM policy engine with ABAC conditions, object tagging with tag-based policy enforcement, STS session policy scoping, encryption policy enforcement, and audit logging. These are native capabilities requiring no external components.
  • Decision layer (policy plugin + OPA): External authorization delegation for centralized policy management. AIStor provides the integration point; the external engine owns the policy logic.
  • Transformation layer (Object Lambda): Object-level transformation proxy for data masking and redaction on direct S3 access.
  • Query layer (Trino / Spark / Dremio): Column masking and row-level filtering for SQL access patterns, using each engine’s native access control.
  • Compute layer (OpenLineage from Spark/Trino/Airflow): Data lineage, which is the one Atlas capability that lives outside the storage tier by design.

Together, these layers provide a composable authorization and governance architecture equivalent to Ranger + Atlas, without Hadoop-era dependencies, and with each layer independently scalable and replaceable.

Source Code References
  1. cmd/bucket-policy.go:163-280 - IAM condition key injection including ExistingObjectTag, RequestObjectTag, RequestObjectTagKeys, aws:SourceIp, aws:SecureTransport, aws:CurrentTime, aws:EpochTime, aws:username, aws:userid, aws:groups, aws:PrincipalType, and JWT/LDAP claim passthrough
  2. internal/config/policy/plugin/config.go:32-34 - Policy plugin environment variables: MINIO_POLICY_PLUGIN_URL, MINIO_POLICY_PLUGIN_AUTH_TOKEN, MINIO_POLICY_PLUGIN_ENABLE_HTTP2
  3. cmd/iam.go:2590-2601 - IsAllowed() checks AuthZPlugin first; when configured, plugin decision is final and overrides all internal policy evaluation
  4. internal/config/lambda/target/webhook.go:43-48 - Object Lambda webhook environment variables: MINIO_LAMBDA_WEBHOOK_ENABLE, MINIO_LAMBDA_WEBHOOK_ENDPOINT, MINIO_LAMBDA_WEBHOOK_AUTH_TOKEN, plus mTLS support via CLIENT_CERT/CLIENT_KEY
  5. cmd/table-auth.go:658-677 - insertTableConditionsFromContext() injects s3tables:namespace, s3tables:tableName, s3tables:viewName condition keys for table-level policy evaluation
  6. cmd/sts-handlers.go:63-67 - All five STS methods: AssumeRole, AssumeRoleWithWebIdentity, AssumeRoleWithLDAPIdentity, AssumeRoleWithCertificate, AssumeRoleWithCustomToken
  7. internal/logger/config_external.go - Audit webhook configuration: audit_webhook with endpoint, auth_token, batch_size, TLS client cert support
0