Currently, Data Profiling in Oracle Analytics Server/Cloud DV only runs on sample data, typically a subset of rows. While this works for quick overviews, it limits accuracy and reliability—especially when working with large datasets or columns with sparse or skewed data distributions.
Requested Enhancement:
Allow users to choose between:
- Sample-Based Profiling (default for performance)
- Full Dataset Profiling (optional, with a warning for large datasets)
Why This Matters:
- Misleading summaries: Outliers, null patterns, or rare categorical values may not appear in the sample but are critical for analysis.
- Data quality issues: Profiling is often used to detect issues like nulls, duplicates, or format anomalies—which may only be visible across the full dataset.
- Governance & trust: Data stewards and analysts need complete views to confidently certify datasets.