This discussion is archived
1 2 Previous Next 20 Replies Latest reply: Oct 9, 2009 5:33 AM by Jonathan Lewis RSS

Lots of 'checkpoint incomplete' waits when populating a hash cluster

chris_here Newbie
Currently Being Moderated
Hello there,

I'm trying to populate in parallel a hash cluster table under 11.1.0.7, however only one process is active and spends half of the time waiting for 'log file switch (checkpoint incomplete)'. I have already increased log file space to 5 groups of 256M each which, except for this particular request, is amply enough (but I use NOLOGGING a lot, this is a developement system). Does anyone here has an idea why I have so much wait here, and what to do? Should I simply increase the log space even more? It looks like there must be something else wrong here, although I'm unable to find out exactly what.

Here are some screenshots of OEM "SQL Monitoring" view which should make things clearer:
http://img84.imageshack.us/img84/1807/activity.png
http://img30.imageshack.us/img30/9782/planh.png

Here is the code:
CREATE OR REPLACE PROCEDURE upload_profiles(day IN varchar2) AS
BEGIN

  BEGIN
    EXECUTE IMMEDIATE 'DROP CLUSTER profile_cluster '                       ||
                      'INCLUDING TABLES '                                   ||
                      'CASCADE CONSTRAINTS';
    EXCEPTION
      WHEN OTHERS THEN
          NULL;
  END;

  EXECUTE IMMEDIATE 'CREATE CLUSTER profile_cluster (msisdn varchar2(15)) ' ||
                    'SIZE 512 '                                             ||
                    'SINGLE TABLE '                                         ||
                    'HASH IS cast(msisdn as integer) '                      ||
                    'HASHKEYS 1600000 '                                     ||
                    'TABLESPACE TICKETS '                                   ||
                    'PCTFREE 0';

  EXECUTE IMMEDIATE 'CREATE TABLE profiles ( '                              ||
                    '  msisdn    varchar2(15) PRIMARY KEY, '                ||
                    '  profile   integer NOT NULL) '                        ||
                    'CLUSTER profile_cluster(msisdn)';


  EXECUTE IMMEDIATE 'ALTER TABLE profiles DISABLE PRIMARY KEY';

  INSERT INTO profiles (msisdn, profile)
    SELECT /*+ FULL(sub) PARALLEL(sub, 4) */
           umsisdn, max(get_profile(contract, usubscriptionstring))
      FROM subscribers sub
     WHERE versiondate = to_date(day, 'YYYY-MM-DD')
     GROUP BY umsisdn;

  EXECUTE IMMEDIATE 'ALTER TABLE profiles ENABLE PRIMARY KEY';
END;
/
Thanks a lot for your help!
Chris
  • 1. Re: Lots of 'checkpoint incomplete' waits when populating a hash cluster
    mbobak Oracle ACE
    Currently Being Moderated
    Chris,

    At the 256M log size, how frequent are your logswitches? I like to target my log file size so I get a switch no more than every 20-30 minutes, even at peak.

    Then, to make sure I don't go hours and hours without a switch during non-peak hours, I set archive_lag_target, to get switches every 30 minutes, even if the log file isn't full.

    Hope that helps,

    -Mark

    PS You mentioned you're loading a hash cluster. In case you're not aware of it, you can't do nologging loads into a hash cluster.
  • 2. Re: Lots of 'checkpoint incomplete' waits when populating a hash cluster
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    How much data are you creating ?

    You have declared 1.6M hashkeys and defined a unique key on the hashkey, so it looks like you've only got a maximum of 1.6M rows to insert. But your row is only about 32 bytes where you've declared the hash size at 512 bytes - so there's a contradiction there.

    Each row should give you about 200 bytes of redo - which means about 1.25M rows per redo log file. If you are inserting far more rows than the code suggests, do these numbers start to fit your observations.

    As Mark pointed out, you can't do nologging with a cluster because Oracle has to find the correct block for each row. Also, your code doesn't have anything to suggest you have enabled parallel DML or are running the insert in parallel (although the select should be parallel); this may explain why you only see one process active at any instant.


    Regards
    Jonathan Lewis
    http://jonathanlewis.wordpress.com
    http://www.jlcomp.demon.co.uk

    To post code, statspack/AWR report, execution plans or trace files, start and end the section with the tag {noformat}
    {noformat} (lowercase, curly brackets, no spaces) so that the text appears in 
    fixed format
    .
    
    "Science is more than a body of knowledge; it is a way of thinking" 
    Carl Sagan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
  • 3. Re: Lots of 'checkpoint incomplete' waits when populating a hash cluster
    PavanKumar Guru
    Currently Being Moderated
    Hi Joanthan Sir,

    Small doubt, I hope you can elaborate little more
    But your row is only about 32 bytes where you've declared the hash size at 512 bytes - so there's a contradiction there
    As you have raised the concerned that "contradiction" - can you be more elaborate on this.
    Are you saying the extra bytes which he allocated in appropriatly is wastage (I agree on that)
    Each row should give you about 200 bytes of redo - which means about 1.25M rows per redo log file.
    Can you explain how the each row - raises to "200" bytes of redo, where has orginally 32 bytes.
    I would to go in depth further, if you throw some lights on path... we can walk on it.. Sir.. Please

    - Pavan Kumar N
  • 4. Re: Lots of 'checkpoint incomplete' waits when populating a hash cluster
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    Pavan Kumar wrote:
    Hi Jonathan Sir,

    Can you explain how the each row - raises to "200" bytes of redo, where has orginally 32 bytes.
    I would to go in depth further, if you throw some lights on path... we can walk on it.. Sir.. Please
    The amount of data for a row, including the hash-cluster key entry, is about 32 bytes, but the redo change vector describing the row, where it goes, the change to the ITL entry for the block and so on takes about another 80 bytes; and the change vector for the undo generated by the insert has a basic overhead of about 80 bytes; then the redo record header is about another 12 bytes. Total ca. 200 bytes.

    Details vary across versions, and there are various optimisations to reduce the overheads in special cases, but this is a reasonably sensible approximation in many cases.

    Regards
    Jonathan Lewis
    http://jonathanlewis.wordpress.com
    http://www.jlcomp.demon.co.uk

    To post code, statspack/AWR report, execution plans or trace files, start and end the section with the tag {noformat}
    {noformat} (lowercase, curly brackets, no spaces) so that the text appears in 
    fixed format
    .
    
    "Science is more than a body of knowledge; it is a way of thinking" 
    Carl Sagan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                
  • 5. Re: Lots of 'checkpoint incomplete' waits when populating a hash cluster
    mbobak Oracle ACE
    Currently Being Moderated
    Hi Jonathan,

    I said you can't do nologging into a hash cluster, which is true, but, I think it's more precise to say that you can't do direct load, correct? Because when you create the hash cluster, it effectively pre-creates all the blocks. So, there's no concept of loading data above the high water mark, correct?

    And, parallel is really "parallel direct load", but if there can be no direct load on a cluster, then there can be no parallel direct load, correct? Or am I missing something?

    I guess my point is, even if he enabled parallel dml in his session, and added a parallel hint to the insert statement, he still wouldn't get parallel insert, correct?

    -Mark
  • 6. Re: Lots of 'checkpoint incomplete' waits when populating a hash cluster
    680087 Pro
    Currently Being Moderated
    Mark,

    you are correct, Parallel DML on clustered tables [is not supported|http://download.oracle.com/docs/cd/E11882_01/server.112/e10837/parallel.htm#VLDBG1455].
  • 7. Re: Lots of 'checkpoint incomplete' waits when populating a hash cluster
    mbobak Oracle ACE
    Currently Being Moderated
    Hi Timur,

    Thanks for the documentation reference. I was pretty sure that was true, based on my understanding of how direct load works, but it's good to know that Oracle agrees. :-)

    -Mark
  • 8. Re: Lots of 'checkpoint incomplete' waits when populating a hash cluster
    chris_here Newbie
    Currently Being Moderated
    Hi there,

    thanks for all the feedback. Here are some elements regarding the situation:

    * Regarding the data size, I have about 28M rows to insert into the cluster. Since each row is about 15 bytes long (value given by the DB stats) I can fit 32 in a 512-byte bucket. So I need at least 28M/32 = 875k hashkeys to fit everything and minimize collisions. The more the cluster is oversized the less the probability of collision I guess; here I use 1.6M which is too much, 1M would probably be fine. Although determining precisely the optimal values for these figures is not easy as far as I can tell; there's alerady been a discussion on this topic that you can read here although this is a whole different issue.

    * About the NOLOGGING clause: indeed it's not supported for cluster tables, although I have absolutely no idea why. To my understanding log information is used exclusively at startup when doing recovery, it's never used during normal DB operation (except for standy DB replication, which is another matter). So why can't I disable it for any other table but not a hash cluster? There must be a reason although I can't fathom it.

    * About parallel insert: it's understandable that direct load is not possible when populating a hash cluster, precisely because blocks are all filled simultaneously if the hash function is correctly scattered. So the filling itself cannot be done in parallel. However the annoying thing was that the "load table" process was such a bottleneck, whereas for other similar computations the producer threads are more the limiting factor. Which is why I still see several parallel instances running simultaneously for these "normal" tables.

    * Strangely enough some other runs of the same query had lots of "free buffer waits" instead of "log file switch". There must be something with the disk bandwidth, I'm using a RAID 5 so write performance are probably poor.

    * Finally regarding the amount of redo generated: for whatever reason, the problem seems to get fixed when I use a CREATE TABLE AS SELECT command instead of CREATE then INSERT. Again I have no idea what's going on here, however you can clearly see on [this picture|http://img225.imageshack.us/img225/738/activitynew.png] that the result is now much more satisfying (the insert process is now totally cpu-bound, no more log switch, free buffer, etc).

    The code now looks like this:
    CREATE OR REPLACE PROCEDURE upload_profiles(day IN varchar2) AS
    BEGIN
    
      BEGIN
        EXECUTE IMMEDIATE 'DROP CLUSTER profile_cluster '                       ||
                          'INCLUDING TABLES '                                   ||
                          'CASCADE CONSTRAINTS';
        EXCEPTION
          WHEN OTHERS THEN
              NULL;
      END;
    
      EXECUTE IMMEDIATE 'CREATE CLUSTER profile_cluster (msisdn varchar2(15)) ' ||
                        'SIZE 512 '                                             ||
                        'SINGLE TABLE '                                         ||
                        'HASH IS cast(msisdn as integer) '                      ||
                        'HASHKEYS 1600000 '                                     ||
                        'TABLESPACE TICKETS '                                   ||
                        'PCTFREE 0';
    
      EXECUTE IMMEDIATE 'CREATE TABLE profiles ( '                              ||
                        '  msisdn    PRIMARY KEY DISABLE, '                     ||
                        '  profile   NOT NULL) '                                ||
                        'CLUSTER profile_cluster(msisdn) '                      ||
                        'AS SELECT /*+ FULL(sub) PARALLEL(sub, 4) */ '          ||
                        'cast(umsisdn AS varchar2(15)), '                       ||
                        'max(get_profile(contract, usubscriptionstring)) '      ||
                        'FROM subscribers sub '                                 ||
                        'WHERE versiondate = '                                  ||
                        '          to_date(''' || day || ''', ''YYYY-MM-DD'') ' ||
                        'GROUP BY umsisdn';
    
      EXECUTE IMMEDIATE 'ALTER TABLE profiles ENABLE PRIMARY KEY';
    END;
    /
    To summarize: there must be some logic behind all this but it kind of eludes me now :-)

    Thanks to all for your help!
    Chris
  • 9. Re: Lots of 'checkpoint incomplete' waits when populating a hash cluster
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    mbobak wrote:
    Hi Jonathan,

    I said you can't do nologging into a hash cluster, which is true, but, I think it's more precise to say that you can't do direct load, correct? Because when you create the hash cluster, it effectively pre-creates all the blocks. So, there's no concept of loading data above the high water mark, correct?
    Correct.

    >
    And, parallel is really "parallel direct load", but if there can be no direct load on a cluster, then there can be no parallel direct load, correct? Or am I missing something?

    I guess my point is, even if he enabled parallel dml in his session, and added a parallel hint to the insert statement, he still wouldn't get parallel insert, correct?
    The argument sounds perfectly reasonable - however sometimes things don't get into the manuals.

    We can do parallel DML on non-partitioned tables, including updates and deletes, not just inserts; and we do tend to assume that parallel insert has to be direct path and above the HWM; but in principle there's no reason why Oracle hasn't implemented a parallel insert that doesn't do direct path.

    Even if the manuals said that (normal) parallel DML didn't include inserts, I would still be inclined to run up a little test - it would only take about 20 minutes - to check if I thought it would be really helpful. (If I did, I'd also make sure I set a suitable value of initrans on the table/cluster before I started).

    Regards
    Jonathan Lewis
    http://jonathanlewis.wordpress.com
    http://www.jlcomp.demon.co.uk

    To post code, statspack/AWR report, execution plans or trace files, start and end the section with the tag {noformat}
    {noformat} (lowercase, curly brackets, no spaces) so that the text appears in 
    fixed format
    .
    
    
    "Your manuscript is both good and original; but the part that is 
    good is not original and the part that is original is not good."
    Samuel Johnson                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    
  • 10. Re: Lots of 'checkpoint incomplete' waits when populating a hash cluster
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    chris3110 wrote:
    Hi there,

    * Regarding the data size, I have about 28M rows to insert into the cluster. Since each row is about 15 bytes long (value given by the DB stats) I can fit 32 in a 512-byte bucket. So I need at least 28M/32 = 875k hashkeys to fit everything and minimize collisions. The more the cluster is oversized the less the probability of collision I guess; here I use 1.6M which is too much, 1M would probably be fine. Although determining precisely the optimal values for these figures is not easy as far as I can tell; there's alerady been a discussion on this topic that you can read here although this is a whole different issue.
    The thing about the row length as given by dbms_stats is that it doesn't allow for all the relevant overheads - and in the case of hash clusters it's a long way off. If you take your average row-length of 15 bytes, I think you need to add another 17 bytes for the row overhead and cluster key overhead -for a total of 32 bytes - which conveniently bypasses your problem of the rows being too small for the 8KB block.

    * About the NOLOGGING clause: indeed it's not supported for cluster tables, although I have absolutely no idea why. To my understanding log information is used exclusively at startup when doing recovery, it's never used during normal DB operation (except for standy DB replication, which is another matter). So why can't I disable it for any other table but not a hash cluster? There must be a reason although I can't fathom it.
    You can't disable it for indexes when inserting data either - for the same reason: index entries have to go in the right place, cluster entries have to go in the right place. Heap tables allow you to "hide" the data until you commit.


    Single table hash clusters with primary key give you the fastest possible access into the data - Oracle recognises the primary key, but bypasses it and does a direct, single row access into the cluster using a very specific code path. But if there are any hash collisions in the block, then the code has to switch to a MUCH more CPU-intensive code path.

    In past, this is why Oracle uses single table hash clusters in their tpcC benchmarks, and then uses a 2KB block size for the data. This helps to reduce the cost of unique access, the probability of collision, and the cost of searching the block if a collision occurs.


    If you want to stick with hash clusters, you could do something like the following:
    create cluster profile_cluster(msisdn varchar(15))
         single table
         hash is cast (msisdn as integer) 
         hashkeys 30000000
         size 32     
         tablespace ts_8k
         pctfree 0
    ;
    This probably describes your data nearly perfectly - and gets the best possible packing, with some spare space to allow for collisions. But the technology of single table hash clusters is one of the reasons why you might choose to have a tablespace with a non-standard blocksize - so you might set up a db_2k_cache_size and 2KB tablespace for this hash cluster.

    However, given the probability of hash collisions, and the extra space you might have to allocate to minimise the collisions, I would seriously consider creating this table as an index organized table. Loading it from empty with a "select order by" can be very fast, and the cost of access on the primary key would be flat and fast - not as fast as the optimum access into a single table hash cluster, but without the penalty from a collision.

    Regards
    Jonathan Lewis
    To post code, statspack/AWR report, execution plans or trace files, start and end the section with the tag {noformat}
    {noformat} (lowercase, curly brackets, no spaces) so that the text appears in 
    fixed format
    .
    
    
    "Science is more than a body of knowledge; it is a way of thinking" 
    Carl Sagan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
  • 11. Re: Lots of 'checkpoint incomplete' waits when populating a hash cluster
    635471 Expert
    Currently Being Moderated
    Jonathan Lewis wrote:
    ... Loading it from empty with a "select order by" can be very fast, and the cost of access on the primary key would be flat and fast - not as fast as the optimum access into a single table hash cluster, but without the penalty from a collision.
    Hmmm, I wonder if there would be a benefit to loading bulk data into a hash cluster after ordering it by the hash value of the key column? I suppose that DBMS_Utility.Get_Hash_Value() is used. Obviously any benefit would accrue in a different way to that gained from ordering the insert into an IOT, presumably through finishing the load to each block before moving to the next, thus not revisiting the block later. Not as much benefit I'd think ... but maybe some.

    It's almost a shame to test something like that without pondering it for a while.

    Edited by: David_Aldridge on Oct 6, 2009 9:09 AM
  • 12. Re: Lots of 'checkpoint incomplete' waits when populating a hash cluster
    chris_here Newbie
    Currently Being Moderated
    Hi Jon,

    thanks a lot for the tip about hash cluster config, I've applied it and it seems to work fine. Indeed this technology looks very promising to me, specially for things like datawarehousing where we can end up doing billions of lookups per day (imagine for instance calculating the customer profile associated with each mobile phone call placed in a day in a whole country). However it's scarcely document and in my opinion sorely lacks tools to help choose adequate values for critical parameters like size, hashkey, block size, etc. Also there doesn't seem to be any stat available about collision rate, hash values distribution, etc, which is too bad.

    Regarding the hash cluster creation: you say that redo generation cannot be disabled when populating the hash table, however using a CTAS statement seems to fix the problem of too much log being generated; to the very least the behavior is clearly different than with a CREATE followed by INSERT, so what gives ?

    Thanks a lot for your insight,
    Chris
  • 13. Re: Lots of 'checkpoint incomplete' waits when populating a hash cluster
    Jonathan Lewis Oracle ACE Director
    Currently Being Moderated
    chris3110 wrote:
    However it's scarcely document and in my opinion sorely lacks tools to help choose adequate values for critical parameters like size, hashkey, block size, etc.
    Unfortunately this often happens with Oracle features - the manuals say "this is an example of the command", but rarely explain why it might be useful and what the pros and cons are.
    Also there doesn't seem to be any stat available about collision rate, hash values distribution, etc, which is too bad.
    There isn't really any cost-effective way to analyze this accurately at all. Not only do you have to worry about two values colliding on the same hash, you have to worry about all the values that share the same block as the collision becaues performance of lookups on their values will be affected.

    In your case (with the HASH IS construct) you may be able to do something with a query like:
    select mod(cast(msisdn as number),np(hashkeys), count(*) group by mod(...) having count(*) > 1
    where np(hashkeys) is the next prime greater than your hashkeys.
    I'd have to experiment with this to make sure the idea work first, though - I think it's the right idea, but might need some refining.


    >
    Regarding the hash cluster creation: you say that redo generation cannot be disabled when populating the hash table, however using a CTAS statement seems to fix the problem of too much log being generated; to the very least the behavior is clearly different than with a CREATE followed by INSERT, so what gives ?
    I wonder if David's comment is relevant. If the parallel query did a hash group by, then the hashing algorithm for the hash grouping may have distributed the data across the parallel slaves in a way that grouped data for the same block in clumps - giving you the "many rows per block = one redo record" reduction in redo overheads. Again something I'd have to check. If you want to investigate further, then an extended trace (event 10046 at level 8) may give you some clues.


    (Update: If you want to pursue the hash cluster for maximum speed, I would look at setting the hashkeys to something like double the number you want, and consider using a special 2KB block size tablespace to minimize the cost of collisions and isolate the lookup table).

    Regards
    Jonathan Lewis
    http://jonathanlewis.wordpress.com
    http://www.jlcomp.demon.co.uk

    To post code, statspack/AWR report, execution plans or trace files, start and end the section with the tag {noformat}
    {noformat} (lowercase, curly brackets, no spaces) so that the text appears in 
    fixed format
    .
    
    "There are three classes of people. Those who see; those who see when they are shown; 
    those who do not see."
    Leonardo da Vinci
    
    Edited by: Jonathan Lewis on Oct 6, 2009 9:06 PM                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
  • 14. Re: Lots of 'checkpoint incomplete' waits when populating a hash cluster
    680087 Pro
    Currently Being Moderated
    Hi, Chris,
    chris3110 wrote:
    (imagine for instance calculating the customer profile associated with each mobile phone call placed in a day in a whole country).
    so, is it your real issue? Because if it is, it might be solved by different way - not doing lookups too frequently. Please, tell us about your main goal.
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points