Enterprise data platforms built on Cloudera rely on Apache Ranger for policy-based authorization (RBAC, ABAC, row/column masking, tag-based policies) and Apache Atlas for data classification, tagging, and lineage. Organizations migrating to MinIO AIStor or building new lakehouse architectures need equivalent governance capabilities without introducing Hadoop-era dependencies. This Q&A describes how AIStor decomposes governance into composable layers that together provide full Ranger + Atlas equivalence.
Answer
MinIO AIStor does not replicate Ranger + Atlas as a monolithic authorization system. Instead, it decomposes governance into layers aligned with modern lakehouse architecture: storage-layer IAM with ABAC conditions, tag-based policy enforcement, external authorization delegation via OPA, query-engine masking via Trino/Spark, and Object Lambda transformation for direct S3 access. Each layer is independently scalable and replaceable.
The key architectural insight: Ranger’s row/column masking operates at the query engine layer, not the storage layer. HDFS has no concept of rows or columns. Ranger intercepts SQL queries in Hive or Presto and rewrites them to apply filters and masking functions. Atlas tags metadata entities, not bytes on disk. MinIO AIStor does not need to replicate Ranger at the storage layer. It provides the policy primitives that query engines and applications leverage, plus its own object-level governance for direct S3 API access.
What Ranger + Atlas Solve
| Capability | Ranger | Atlas |
|---|---|---|
| Centralized policy management | Policy admin UI + policy engine | - |
| RBAC | Role/group-based policies | - |
| ABAC | Attribute conditions on policies | - |
| Tag-based policies | Consumes tags, enforces policies | Produces and manages tags |
| Row-level filtering | SQL rewrite in Hive/Presto | - |
| Column masking | SQL rewrite in Hive/Presto | - |
| Data classification | - | Automated/manual tagging |
| Data lineage | - | Process/column lineage |
| Audit | Per-request audit log | Entity audit trail |
Architecture Overview
┌──────────────────────────────────────────────────────────────────┐│ Query Engines ││ (Trino / Spark / Dremio / StarRocks) ││ ┌────────────────────────────────────────────────────────────┐ ││ │ Engine-native column masking and row-level filtering │ ││ │ (Trino Access Control SPI / Spark DataSourceV2 views) │ ││ └────────────────────────────────────────────────────────────┘ │└────────────────────────────┬─────────────────────────────────────┘ │ S3 API / Iceberg REST Catalog┌────────────────────────────▼─────────────────────────────────────┐│ MinIO AIStor ││ ││ ┌─────────────┐ ┌──────────────┐ ┌─────────────────────────┐ ││ │ IAM / ABAC │ │ Tag-Based │ │ Object Lambda │ ││ │ Policies │ │ Policies │ │ (object transformation)│ ││ └──────┬──────┘ └──────┬───────┘ └────────────┬────────────┘ ││ │ │ │ ││ ┌──────▼────────────────▼──────┐ ┌────────────▼────────────┐ ││ │ Policy Plugin Webhook │ │ Transformation │ ││ │ (OPA / Styra DAS / │ │ Service (PII redact, │ ││ │ custom policy engine) │ │ field filtering) │ ││ └──────────────────────────────┘ └─────────────────────────┘ ││ ││ ┌────────────────────────────────────────────────────────────┐ ││ │ Object Tagging + Bucket Notifications │ ││ │ (auto-classify on ingest, apply governance tags) │ ││ └────────────────────────────────────────────────────────────┘ ││ ││ ┌────────────────────────────────────────────────────────────┐ ││ │ AIStor Tables / Iceberg (warehouse, namespace, table IAM) │ ││ └────────────────────────────────────────────────────────────┘ ││ ││ ┌────────────────────────────────────────────────────────────┐ ││ │ Audit Log → SIEM (Splunk / Elastic / Kafka) │ ││ └────────────────────────────────────────────────────────────┘ │└───────────────────────────────────────────────────────────────────┘1. RBAC and ABAC: Native IAM Policies
AIStor’s IAM policy engine uses AWS IAM policy syntax with an extensive set of condition keys[1], providing RBAC and ABAC natively without external components.
Available Condition Keys
Identity attributes:
aws:username,aws:useridfor IAM user identityaws:groupsfor group membership arrayjwt:*for any OpenID Connect claim (jwt:sub,jwt:email,jwt:groups,jwt:department, etc.)ldap:username,ldap:user,ldap:groupsfor LDAP identity and groupsaws:PrincipalTypefor Account, User, or AssumedRole
Resource attributes:
s3:ExistingObjectTag/{key}for tags on an existing objects3:RequestObjectTag/{key}for tags in the current requests3:RequestObjectTagKeysfor the array of tag keys in the requests3tables:namespace,s3tables:tableName,s3tables:viewNamefor Iceberg table attributes
Environment attributes:
aws:SourceIpfor requester IP address (preserves X-Forwarded-For)aws:SecureTransportfor TLS/non-TLSaws:CurrentTime,aws:EpochTimefor temporal conditionss3:x-amz-server-side-encryption-aws-kms-key-idfor encryption key constraints
RBAC Example: Department-Scoped Access
Users in the finance group can read objects under the finance/ prefix:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "FinanceGroupReadAccess", "Effect": "Allow", "Action": ["s3:GetObject", "s3:ListBucket"], "Resource": [ "arn:aws:s3:::datalake", "arn:aws:s3:::datalake/finance/*" ], "Condition": { "StringLike": { "s3:prefix": ["finance/*"] } } } ]}This policy is attached to a group. Users inherit it through group membership, which is standard RBAC.
ABAC Example: Attribute-Driven Access via OIDC Claims
Users whose OIDC department claim matches the object prefix get access. No per-user policy management required:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "DepartmentScopedAccess", "Effect": "Allow", "Action": ["s3:GetObject"], "Resource": "arn:aws:s3:::datalake/${jwt:department}/*" } ]}A user with jwt:department=engineering can read datalake/engineering/*. A user with jwt:department=finance can read datalake/finance/*. One policy covers all departments.
ABAC Example: Network-Restricted Access
Deny access to restricted data from outside the corporate network:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "CorpNetOnly", "Effect": "Deny", "Action": "s3:*", "Resource": "arn:aws:s3:::restricted-data/*", "Condition": { "NotIpAddress": { "aws:SourceIp": "10.0.0.0/8" } } } ]}2. Tag-Based Governance: Object Tagging + Tag-Based Policies
This replaces the Atlas tagging and Ranger tag-based policy enforcement loop. AIStor supports S3 object tagging (up to 10 key-value pairs per object) and IAM policies can reference those tags in conditions[1].
Classification Enforcement
Deny access to objects tagged classification=pii unless the requester belongs to the pii-authorized group:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "DenyPIIUnlessAuthorized", "Effect": "Deny", "Action": ["s3:GetObject"], "Resource": "arn:aws:s3:::datalake/*", "Condition": { "StringEquals": { "s3:ExistingObjectTag/classification": "pii" }, "StringNotEquals": { "aws:groups": "pii-authorized" } } } ]}Prevent Untagged Uploads
Require all uploaded objects to carry a classification tag:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "RequireClassificationTag", "Effect": "Deny", "Action": ["s3:PutObject"], "Resource": "arn:aws:s3:::governed-bucket/*", "Condition": { "Null": { "s3:RequestObjectTag/classification": "true" } } } ]}This ensures every object entering the governed bucket has been classified. This is the equivalent of Atlas mandatory classification, enforced at the storage layer.
Automated Classification Pipeline
To replace Atlas’s automated tagging, use bucket notifications to trigger a classification service on object upload:
Object Upload → Bucket Notification (webhook/Kafka) → Classification Service (NLP, regex, ML model) → PutObjectTagging (apply classification, sensitivity, retention tags) → Tag-based IAM policies enforce accessBucket notification configuration:
mc event add myminio/datalake arn:minio:sqs::classify:webhook \ --event put --prefix "raw/"The classification service receives the event, reads the object, applies detection rules (PII patterns, sensitivity heuristics, schema inspection for structured formats), and calls PutObjectTagging to apply governance tags. From that point forward, tag-based IAM policies govern access automatically.
3. External Authorization: Policy Plugin Webhook
For organizations that need centralized policy management or policy logic that exceeds what IAM policy JSON can express, AIStor supports an external authorization webhook (policy_plugin)[2] that can delegate access decisions to OPA or a custom policy engine.
When configured, AIStor sends request and credential details for every API call to the external endpoint and expects an allow/deny response. The plugin’s decision is final. When an external plugin is configured, it completely overrides all other policy evaluation. IAM policies, bucket policies, and session policies are not consulted[3].
Configuration
MINIO_POLICY_PLUGIN_URL=https://policy-engine:8181/v1/data/minio/authz/allowMINIO_POLICY_PLUGIN_AUTH_TOKEN=<bearer-token>MINIO_POLICY_PLUGIN_ENABLE_HTTP2=onRequest Context
Every API request sends a rich context payload to the policy engine:
{ "input": { "account": "jsmith", "groups": ["engineering", "ml-team"], "action": "s3:GetObject", "bucket": "datalake", "object": "training-data/customers.parquet", "owner": false, "claims": { "department": "engineering", "clearance": "L2", "sub": "jsmith@corp.example.com" }, "conditions": { "ExistingObjectTag/classification": ["confidential"], "ExistingObjectTag/data-owner": ["finance"], "SourceIp": ["10.0.5.42"], "SecureTransport": ["true"], "CurrentTime": ["2026-02-17T14:30:00Z"] } }}The policy engine has access to: the user’s identity and group membership, the action being performed, the target resource, the object’s tags, the requester’s network location, OIDC/LDAP claims, and temporal context.
OPA Integration
Open Policy Agent with Rego policies provides the closest analog to Ranger’s centralized policy engine:
package minio.authz
import rego.v1
default allow := false
# RBAC: admin group has full accessallow if { "admin" in input.groups}
# ABAC: users can access objects in their department's prefixallow if { department := input.claims.department startswith(input.object, concat("/", [department, ""]))}
# Tag-based: deny access to PII unless in pii-authorized groupdeny if { input.conditions["ExistingObjectTag/classification"][_] == "pii" not "pii-authorized" in input.groups}
# Tag-based: data steward overrideallow if { data_owner := input.conditions["ExistingObjectTag/data-owner"][_] sprintf("steward-%s", [data_owner]) in input.groups}
# Final decisionresult := {"allow": allow, "deny": deny}Deployment pattern for centralized management:
Policy Authors → Git Repository (Rego policies) → CI/CD Pipeline (lint, test, bundle) → OPA Bundle Server (S3/HTTP) → OPA Instances (co-located with MinIO or standalone) ← MinIO policy plugin webhookThis gives you version-controlled policies, CI/CD for policy changes, centralized management, and distributed enforcement without the Hadoop dependency. The key difference from Ranger is that AIStor’s policy plugin is a stateless decision point, not a policy management system. Policy authoring, storage, and administration live entirely in the external engine.
4. Data Masking and Row Filtering
Data masking operates at two layers depending on the access pattern. This is architecturally identical to how Ranger works. Ranger’s masking lives in the query engine (Hive plugin), not in HDFS.
Layer 1: Query Engine Masking (SQL Access)
When users access data through SQL engines (Trino, Spark, Dremio, StarRocks), the engine’s native access control handles masking. This is the direct replacement for Ranger’s Hive/Presto plugin.
Trino Access Control SPI example (rules.json):
{ "catalogs": [ { "catalog": "iceberg", "allow": true } ], "tables": [ { "catalog": "iceberg", "schema": "finance", "table": "customers", "privileges": ["SELECT"] } ], "columnMasks": [ { "catalog": "iceberg", "schema": "finance", "table": "customers", "column": "ssn", "expression": "CASE WHEN user() IN ('admin', 'compliance') THEN ssn ELSE 'XXX-XX-' || substr(ssn, 8) END" }, { "catalog": "iceberg", "schema": "finance", "table": "customers", "column": "email", "expression": "CASE WHEN user() IN ('admin', 'marketing') THEN email ELSE regexp_replace(email, '(.).*@', '$1***@') END" } ], "rowFilters": [ { "catalog": "iceberg", "schema": "finance", "table": "transactions", "filter": "region IN (SELECT allowed_region FROM access_control.user_regions WHERE username = user())" } ]}This configuration masks SSN to show only last 4 digits for non-privileged users, masks email addresses for non-marketing users, and filters transaction rows so users only see their authorized regions.
Trino + AIStor deployment:
Trino Coordinator ├── Iceberg Connector → AIStor (S3 API + Iceberg REST Catalog) ├── Access Control SPI → rules.json or custom plugin (OPA-based) └── Column Masking + Row Filtering applied at query timeTrino authenticates to AIStor using STS credentials scoped to the user’s session policy. AIStor enforces object-level access (which tables/namespaces the user can reach). Trino enforces column masking and row filtering within the query results.
Layer 2: Object Lambda Transformation (Direct S3 Access)
When applications access data directly via the S3 API (ML pipelines, ETL jobs, application services), Object Lambda[4] provides object-level transformation before delivery. Object Lambda is a transparent proxy. AIStor sends a presigned URL and identity context to an external webhook function, which fetches the original object, applies transformation logic, and returns the result. AIStor itself does not parse or inspect the object’s internal structure (Parquet columns, CSV rows, JSON fields). Row and column semantics are implemented entirely by the transformation function.
Configuration:
MINIO_LAMBDA_WEBHOOK_ENABLE_pii_redact=onMINIO_LAMBDA_WEBHOOK_ENDPOINT_pii_redact=http://masking-service:5000MINIO_LAMBDA_WEBHOOK_AUTH_TOKEN_pii_redact=<token>Masking service implementation (Python/Flask):
from flask import Flask, request, make_responseimport requestsimport pyarrow.parquet as pqimport io
app = Flask(__name__)
MASKING_RULES = { "ssn": lambda val, groups: val if "pii-authorized" in groups else "XXX-XX-" + val[-4:], "email": lambda val, groups: val if "marketing" in groups else val[0] + "***@" + val.split("@")[1], "phone": lambda val, groups: val if "pii-authorized" in groups else "***-***-" + val[-4:],}
@app.route("/", methods=["POST"])def mask_object(): event = request.json context = event["getObjectContext"]
# Fetch original object via presigned URL r = requests.get(context["inputS3Url"])
# Apply masking based on content type and user context masked_content = apply_masking(r.content, event)
resp = make_response(masked_content, 200) resp.headers["x-amz-request-route"] = context["outputRoute"] resp.headers["x-amz-request-token"] = context["outputToken"] return respClient-side invocation (Go):
reqParams := make(url.Values)reqParams.Set("lambdaArn", "arn:minio:s3-object-lambda::pii_redact:webhook")
presignedURL, err := s3Client.PresignedGetObject( ctx, "datalake", "customers.parquet", time.Hour, reqParams,)Applications that require masked data use the Lambda ARN in their requests. Applications with full access (authorized analytics, compliance) use standard GetObject without the Lambda ARN. Access to the unmasked path is controlled by IAM policies.
5. Table-Level Access Control: AIStor Tables / Iceberg
For structured data in Iceberg format, AIStor Tables provides fine-grained IAM actions at the warehouse, namespace, table, and view level[5]. There are 18 distinct s3tables: actions covering CRUD operations across the entire table hierarchy.
Supported AIStor Tables IAM Actions
| Resource Level | Actions |
|---|---|
| Warehouse | CreateWarehouse, DeleteWarehouse, GetWarehouse, ListWarehouses |
| Table Bucket | CreateTableBucket, GetTableBucket, ListTableBuckets |
| Namespace | CreateNamespace, DeleteNamespace, GetNamespace, ListNamespaces |
| Table | CreateTable, DeleteTable, GetTable, UpdateTable, RenameTable, ListTables |
| View | UpdateView |
Namespace-Scoped Access
Grant a data engineering team full access to the staging namespace while restricting production:
{ "Version": "2012-10-17", "Statement": [ { "Sid": "StagingFullAccess", "Effect": "Allow", "Action": [ "s3tables:ListTables", "s3tables:CreateTable", "s3tables:GetTable", "s3tables:UpdateTable", "s3tables:DeleteTable" ], "Resource": [ "arn:aws:s3tables:::bucket/lakehouse", "arn:aws:s3tables:::bucket/lakehouse/table/*" ], "Condition": { "StringEquals": { "s3tables:namespace": "staging" } } }, { "Sid": "ProductionReadOnly", "Effect": "Allow", "Action": [ "s3tables:GetTable", "s3tables:ListTables" ], "Resource": [ "arn:aws:s3tables:::bucket/lakehouse", "arn:aws:s3tables:::bucket/lakehouse/table/*" ], "Condition": { "StringEquals": { "s3tables:namespace": "production" } } } ]}6. Session Policies via STS
AIStor’s STS (Security Token Service)[6] supports temporary credentials with scoped session policies. A user’s broad permissions can be narrowed to a specific task, dataset, or time window.
Supported STS methods:
| Method | Identity Source |
|---|---|
AssumeRole | IAM user credentials |
AssumeRoleWithWebIdentity | OpenID Connect (Okta, Azure AD, Keycloak, etc.) |
AssumeRoleWithLDAPIdentity | LDAP / Active Directory |
AssumeRoleWithCertificate | Client TLS certificate |
AssumeRoleWithCustomToken | Custom Identity Plugin |
Session policy scoping:
The effective permissions for an STS session are the intersection of the user’s attached policies and the session policy[3]. This enforces least-privilege without modifying the user’s base permissions:
Effective Permissions = User Policies ∩ Session PolicyExample: Scoped session for a Spark job
A data pipeline assumes a role with a session policy that restricts access to a specific namespace:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": ["s3tables:GetTable", "s3tables:ListTables"], "Resource": "arn:aws:s3tables:::bucket/lakehouse/table/*", "Condition": { "StringEquals": { "s3tables:namespace": "etl-workspace" } } } ]}Even if the base user has broad access, the Spark job can only read tables in etl-workspace for the duration of the session.
7. Audit and Compliance
AIStor produces per-request audit logs[7] that can be streamed to external systems for compliance, forensics, and anomaly detection.
Configuration:
mc admin config set myminio audit_webhook \ enable=on \ endpoint=http://splunk-hec:8088/services/collectorAudit log fields include:
- Requester identity (user, groups, source IP)
- Action performed (S3 operation)
- Target resource (bucket, object, table)
- Policy evaluation result (allowed/denied)
- Request and response metadata
- Timestamp
This provides the equivalent of Ranger’s audit log. For SIEM integration, stream audit events to Splunk, Elasticsearch, or Kafka for correlation, alerting, and compliance reporting.
Policy Evaluation Order
AIStor evaluates authorization as follows[3]:
-
External policy plugin (if configured): the plugin’s decision is final and immediate. IAM policies, bucket policies, and session policies are not consulted.
-
Without an external plugin, evaluation proceeds through:
- Owner check: the root account bypasses all policy evaluation
- Session policy intersection: for STS and service account credentials, effective permissions are the intersection of parent policies and the inline session policy
- IAM policy evaluation: Deny statements evaluated first; an explicit deny immediately denies the request
- Bucket policies: evaluated for anonymous/public access when no IAM credentials are present
This follows the standard AWS IAM evaluation model: implicit deny by default, explicit deny always wins, explicit allow required for access.
Capability Mapping: Ranger + Atlas to AIStor
| Ranger / Atlas Capability | AIStor Equivalent | Layer |
|---|---|---|
| RBAC (role/group policies) | IAM policies with group conditions | Native |
| ABAC (attribute conditions) | IAM policy conditions (JWT, LDAP, IP, time, tags) | Native |
| Tag-based policy enforcement | ExistingObjectTag / RequestObjectTag conditions | Native |
| Data classification / tagging | Object tagging + classification webhook | Native + webhook |
| Column masking (SQL access) | Trino Access Control SPI / Spark views | Query engine |
| Column masking (S3 access) | Object Lambda transformation | Native + function |
| Row-level filtering (SQL) | Trino row filters / Spark DataSourceV2 | Query engine |
| Row-level filtering (S3) | Object Lambda transformation | Native + function |
| Centralized policy management | Policy plugin webhook + OPA | Native + OPA |
| Table/database permissions | AIStor Tables IAM actions (18 actions) | Native |
| Namespace permissions | AIStor Tables namespace condition keys | Native |
| Temporary scoped access | STS session policies | Native |
| Encryption policy enforcement | KMS key conditions in IAM policies | Native |
| Data lineage | OpenLineage (from Spark/Trino/Airflow) | External |
| Audit | Audit webhook to SIEM | Native |
Migration from Ranger + Atlas
- Inventory existing Ranger policies. Export as JSON, map actions to AIStor IAM actions.
- Map Atlas tags to S3 object tags. Build a tag migration script that reads Atlas’s metadata store and applies equivalent S3 object tags via
PutObjectTagging. - Translate Ranger tag-based policies to AIStor IAM policies using
ExistingObjectTagconditions. - Migrate Ranger row/column masking rules to Trino Access Control SPI rules (for SQL access) and Object Lambda functions (for direct S3 access).
- Replace Ranger audit with AIStor audit webhook to your existing SIEM.
- Deploy OPA if centralized policy management is required beyond what IAM policies cover.
Summary
Rather than replicating Ranger + Atlas as a monolithic system, AIStor decomposes governance into layers:
- Storage layer (AIStor): IAM policy engine with ABAC conditions, object tagging with tag-based policy enforcement, STS session policy scoping, encryption policy enforcement, and audit logging. These are native capabilities requiring no external components.
- Decision layer (policy plugin + OPA): External authorization delegation for centralized policy management. AIStor provides the integration point; the external engine owns the policy logic.
- Transformation layer (Object Lambda): Object-level transformation proxy for data masking and redaction on direct S3 access.
- Query layer (Trino / Spark / Dremio): Column masking and row-level filtering for SQL access patterns, using each engine’s native access control.
- Compute layer (OpenLineage from Spark/Trino/Airflow): Data lineage, which is the one Atlas capability that lives outside the storage tier by design.
Together, these layers provide a composable authorization and governance architecture equivalent to Ranger + Atlas, without Hadoop-era dependencies, and with each layer independently scalable and replaceable.
Source Code References
cmd/bucket-policy.go:163-280- IAM condition key injection includingExistingObjectTag,RequestObjectTag,RequestObjectTagKeys,aws:SourceIp,aws:SecureTransport,aws:CurrentTime,aws:EpochTime,aws:username,aws:userid,aws:groups,aws:PrincipalType, and JWT/LDAP claim passthroughinternal/config/policy/plugin/config.go:32-34- Policy plugin environment variables:MINIO_POLICY_PLUGIN_URL,MINIO_POLICY_PLUGIN_AUTH_TOKEN,MINIO_POLICY_PLUGIN_ENABLE_HTTP2cmd/iam.go:2590-2601-IsAllowed()checks AuthZPlugin first; when configured, plugin decision is final and overrides all internal policy evaluationinternal/config/lambda/target/webhook.go:43-48- Object Lambda webhook environment variables:MINIO_LAMBDA_WEBHOOK_ENABLE,MINIO_LAMBDA_WEBHOOK_ENDPOINT,MINIO_LAMBDA_WEBHOOK_AUTH_TOKEN, plus mTLS support viaCLIENT_CERT/CLIENT_KEYcmd/table-auth.go:658-677-insertTableConditionsFromContext()injectss3tables:namespace,s3tables:tableName,s3tables:viewNamecondition keys for table-level policy evaluationcmd/sts-handlers.go:63-67- All five STS methods:AssumeRole,AssumeRoleWithWebIdentity,AssumeRoleWithLDAPIdentity,AssumeRoleWithCertificate,AssumeRoleWithCustomTokeninternal/logger/config_external.go- Audit webhook configuration:audit_webhookwith endpoint, auth_token, batch_size, TLS client cert support