nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Oracle Communities

Oracle Analytics and AI

Child Item

Data flow should support AIDP standard catalog access

pujetha-Oracle

Overview

Oracle AIDP provides a managed Spark runtime and a standard catalog for organizing enterprise data assets such as tables and volumes. Oracle Data Flow, in contrast, operates as an ephemeral job cluster that spins up Spark resources on demand to execute customer workloads. Today, Data Flow jobs cannot leverage the standard catalog that AIDP exposes, creating friction for teams that want a unified metadata layer and governance surface across both services.

Problem Statement

AIDP native workflows are scoped to AIDP objects alone, limiting an enterprise’s ability to orchestrate cross-system processes that include external schedulers and operational tooling.
Enterprises need a consistent way to manage Spark jobs, metadata, and governance controls across heterogeneous environments; the current separation between Data Flow and the catalog makes this difficult.

Proposed Solution

Enable Oracle Data Flow job clusters to authenticate with and read/write against the Oracle AIDP standard catalog. This capability would let each Data Flow job treat the AIDP catalog as the system of record for discovering, creating, and managing data objects (tables, volumes, and future asset types) during ETL execution. Configuration would be handled either through the Data Flow console or SDK by referencing a catalog profile (credentials, tenancy information, and permissions) that the service can use at runtime.

Key Benefits

Unified ETL execution surface – Customers can run their ETL pipelines on Data Flow while using the standard catalog as a single metadata layer, eliminating divergent definitions.
Scheduler-friendly orchestration – Because Data Flow jobs can already be triggered from external schedulers such as Autosys, Control-M, or OCI native schedulers, catalog-aware jobs inherit the same flexibility without being constrained by the AIDP workflow engine.
SDK-driven automation – Data Flow’s SDK support allows programmatic submission and monitoring of ETL jobs; once catalog access is enabled, those SDK flows can seamlessly manage metadata operations as part of the same job submission.
Enterprise integration – Large enterprises often need to coordinate AIDP-managed assets with other platforms (data warehouses, operational stores, regulatory systems). Allowing Data Flow to interact with the catalog provides a controllable interface that can participate in broader integration and synchronization patterns.

Example Use Cases

Core ETL ingestion – Nightly ingestion jobs running in Data Flow can read source systems, land transformed data into AIDP-managed tables, and register them in the catalog automatically.
External scheduler governance – Autosys or other scheduler-managed workflows can orchestrate Data Flow jobs that update catalog entries while coordinating dependent systems such as downstream analytics or reporting platforms.
SDK-based DevOps pipelines – Infrastructure-as-code pipelines can submit Data Flow jobs via SDK, ensuring catalog updates are versioned and repeatable alongside application deployments.

Find more posts tagged with

Declined

AIDP Catalog

Status: Declined

Comments

There are no comments yet