Categories
- All Categories
- Oracle Analytics and AI Learning Hub
- 43 Oracle Analytics and AI Sharing Center
- 19 Oracle Analytics and AI Lounge
- 283 Oracle Analytics and AI News
- 60 Oracle Analytics and AI Videos
- 16.3K Oracle Analytics and AI Forums
- 6.4K Oracle Analytics and AI Labs
- Oracle Analytics and AI User Groups
- 108 Oracle Analytics and AI Trainings
- 20 Oracle Analytics and AI Challenge
- Find Partners
- For Partners
Enable User-Controlled Data Profiling to Optimize Performance in Oracle Analytics
When creating a dataset in Oracle Analytics Server / Oracle Analytics Cloud Data Visualization (DV), data profiling is currently performed automatically. However, users are experiencing significant performance issues when profiling large datasets — profiling may not complete even after more than 10 minutes.
Typical dataset usage patterns (sourced from Vertica) include:
~10 billion records total, ~1,000 records added per day, 93 columns
~50 billion records, 42 columns
~15 billion records, 122 columns
To address performance and usability concerns, we propose adding a configurable Data Profiling mode with the following options:
No Profiling – Skip profiling entirely for faster dataset creation.
Sample-Based Profiling (default) – Profile a representative sample for balanced performance and insight.
Full Dataset Profiling – Profile the entire dataset, with a performance warning for very large data volumes.
Why This Matters
Performance Optimization – Large-scale datasets often contain billions of rows; forcing full profiling significantly delays dataset creation and consumes system resources unnecessarily.
User Control & Flexibility – Different use cases have different needs. Analysts working with exploratory or time-sensitive data may prefer faster creation with sampling or no profiling, while data stewards might require full profiling for validation purposes.
Efficient Resource Utilization – Profiling large datasets puts heavy load on database connections, memory, and compute resources. Allowing selective profiling can reduce impact on both the OAS/OAC server and source systems (e.g., Vertica).
Improved User Experience – Long-running or incomplete profiling operations lead to frustration and workflow interruptions. Configurable profiling modes enable smoother dataset setup and analysis.
Transparency & Control – By presenting users with clear options and warnings, it empowers them to make informed decisions about trade-offs between accuracy and performance.
Comments
-
Thanks for submitting this idea. We are actively working on giving users more control over how profiling is performed.
One clarification: representative random sampling is already applied to large datasets. Profiling the entire dataset would introduce unnecessary processing time and overhead, so sampling is used to ensure efficiency while maintaining accuracy.
0
