nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Oracle Communities

Oracle Analytics and AI

Child Item

Enable User-Controlled Data Profiling to Optimize Performance in Oracle Analytics

Charles.Ma-Oracle

When creating a dataset in Oracle Analytics Server / Oracle Analytics Cloud Data Visualization (DV), data profiling is currently performed automatically. However, users are experiencing significant performance issues when profiling large datasets — profiling may not complete even after more than 10 minutes.

Typical dataset usage patterns (sourced from Vertica) include:
~10 billion records total, ~1,000 records added per day, 93 columns
~50 billion records, 42 columns
~15 billion records, 122 columns

To address performance and usability concerns, we propose adding a configurable Data Profiling mode with the following options:
No Profiling – Skip profiling entirely for faster dataset creation.
Sample-Based Profiling (default) – Profile a representative sample for balanced performance and insight.
Full Dataset Profiling – Profile the entire dataset, with a performance warning for very large data volumes.

Why This Matters
Performance Optimization – Large-scale datasets often contain billions of rows; forcing full profiling significantly delays dataset creation and consumes system resources unnecessarily.
User Control & Flexibility – Different use cases have different needs. Analysts working with exploratory or time-sensitive data may prefer faster creation with sampling or no profiling, while data stewards might require full profiling for validation purposes.
Efficient Resource Utilization – Profiling large datasets puts heavy load on database connections, memory, and compute resources. Allowing selective profiling can reduce impact on both the OAS/OAC server and source systems (e.g., Vertica).
Improved User Experience – Long-running or incomplete profiling operations lead to frustration and workflow interruptions. Configurable profiling modes enable smoother dataset setup and analysis.
Transparency & Control – By presenting users with clear options and warnings, it empowers them to make informed decisions about trade-offs between accuracy and performance.

Find more posts tagged with

Under Review

Status: Under Review

Comments

Luis E. Rivas -Oracle

Thanks for submitting this idea. We are actively working on giving users more control over how profiling is performed.

One clarification: representative random sampling is already applied to large datasets. Profiling the entire dataset would introduce unnecessary processing time and overhead, so sampling is used to ensure efficiency while maintaining accuracy.