Forum Stats

  • 3,770,583 Users
  • 2,253,137 Discussions
  • 7,875,499 Comments

Discussions

Gathering Stats On Ext Tables based on Big Data SQL

2846373
2846373 Member Posts: 1
edited Jan 24, 2019 3:34PM in Big Data Connectors/Hadoop

We have some external tables (on Exadata ) which are actually hive tables in our BDA environment and defined on BigDataSQL.  Some tables can contain more than 800 millions rows.

We know that we should gather stats for those ext. tables in order to enable storage indexes on BDA. Then what is the fastest way to gather stats.

Thanks.

2846373

Answers

  • Mgubar-Oracle
    Mgubar-Oracle Member Posts: 9 Employee
    edited Jan 24, 2019 3:34PM

    The best way to gather status is by using the auto_sample_size - and potentially a degree of parallelism.  For example:

    BEGIN

      DBMS_STATS.GATHER_TABLE_STATS ( ownname => 'OWNER', tabname => 'THE_TABLE', estimate_percent => dbms_stats.auto_sample_size, degree=>12 );

    END;

    /

    This doesn't generate storage indexes.  This is used by the optimizer.  Storage indexes are created based on query patterns.  It examines the WHERE clause and creates SIs based on the columns in that clause.  If a scan returns no rows for an HDFS block, then the SI will be created for that block.  This means that you can influence SI creation.  If you know that users will be querying certain columns (e.g. x, y and z) - then "warm up" the SI by running a query that you know will return zero rows - and include the columns x, y and z in your where clause.  The warm-up query may be slower - but it's a great way to get a nice performance boost for subsequent queries.

    2846373