Oracle Analytics Idea Lab

Welcome to the Oracle Analytics Community: Please complete your User Profile and upload your Profile Picture

Enable User-Controlled Data Profiling to Optimize Performance in Oracle Analytics

Under Review
1
Views
1
Comments
Charles.Ma-Oracle
Charles.Ma-Oracle Rank 1 - Community Starter

When creating a dataset in Oracle Analytics Server / Oracle Analytics Cloud Data Visualization (DV), data profiling is currently performed automatically. However, users are experiencing significant performance issues when profiling large datasets — profiling may not complete even after more than 10 minutes.

Typical dataset usage patterns (sourced from Vertica) include:
~10 billion records total, ~1,000 records added per day, 93 columns
~50 billion records, 42 columns
~15 billion records, 122 columns

To address performance and usability concerns, we propose adding a configurable Data Profiling mode with the following options:
No Profiling – Skip profiling entirely for faster dataset creation.
Sample-Based Profiling (default) – Profile a representative sample for balanced performance and insight.
Full Dataset Profiling – Profile the entire dataset, with a performance warning for very large data volumes.

Why This Matters
Performance Optimization
– Large-scale datasets often contain billions of rows; forcing full profiling significantly delays dataset creation and consumes system resources unnecessarily.
User Control & Flexibility – Different use cases have different needs. Analysts working with exploratory or time-sensitive data may prefer faster creation with sampling or no profiling, while data stewards might require full profiling for validation purposes.
Efficient Resource Utilization – Profiling large datasets puts heavy load on database connections, memory, and compute resources. Allowing selective profiling can reduce impact on both the OAS/OAC server and source systems (e.g., Vertica).
Improved User Experience – Long-running or incomplete profiling operations lead to frustration and workflow interruptions. Configurable profiling modes enable smoother dataset setup and analysis.
Transparency & Control – By presenting users with clear options and warnings, it empowers them to make informed decisions about trade-offs between accuracy and performance.

1
1 votes

Under Review · Last Updated

Thanks for submitting this idea. We are exploring ways to provide users more control of the profiling. We will update the idea once this makes progress.

Comments

  • Thanks for submitting this idea. We are actively working on giving users more control over how profiling is performed.

    One clarification: representative random sampling is already applied to large datasets. Profiling the entire dataset would introduce unnecessary processing time and overhead, so sampling is used to ensure efficiency while maintaining accuracy.