12 Replies Latest reply: Oct 10, 2013 6:15 AM by Billy~Verreynne Branched to a new discussion. RSS

    Affect on query performance by daily partitioning

    Moazzam

      We are using Oracle 10g R2 on Linux platform.

       

      We have a cash balance table containing more than 500 million records, Rang-List partitioned. We have to merge 2 million records daily into this table. Previously we had 6 monthly partitions (two partitions in a year), the merge statement used to merge data from staging into this cash balance table took hours to load it. Now we have decided to make this table daily partitioned. As a result, the merge statement is taking only minutes to load data.

       

      Some people are saying that by daily partition, the performance of select statement against this table shall be slow. The select statements that want to get balance for last 6 months shall be slow because query executor shall have to scan a lot of partitions. I am currently setting up environment to test it. Can anybody currently tell me from experience about affect on Select queries after this change.

       

      Thanks.

        • 1. Re: Affect on query performance by daily partitioning
          SomeoneElse

          > Can anybody currently tell me from experience about affect on Select queries after this change.

           

          Depends on the query.  Ultimately, you'll have to test your queries against your data.

          • 2. Re: Affect on query performance by daily partitioning
            rp0428
            We are using Oracle 10g R2 on Linux platform.

             

            We have a cash balance table containing more than 500 million records, Rang-List partitioned. We have to merge 2 million records daily into this table. Previously we had 6 monthly partitions (two partitions in a year), the merge statement used to merge data from staging into this cash balance table took hours to load it. Now we have decided to make this table daily partitioned. As a result, the merge statement is taking only minutes to load data.

             

             

            If you have already 'decided to make this table daily partitioned' then you must have ALREADY conducted testing that showed what the impact of that would be for your particular use cases. If you did NOT conduct such testing then someone there made an incompetent decision.

             

            The time to ask such questions is BEFORE a decision is made, not afterward.

            Some people are saying that by daily partition, the performance of select statement against this table shall be slow. The select statements that want to get balance for last 6 months shall be slow because query executor shall have to scan a lot of partitions.

            Well, as Homer Simpson would say, DOH! It should be obvious that if the data you want is in 6 months worth of daily partitions then Oracle will have to scan 'a lot of partitions' to find the data. Again, that is something that should have be easily anticipated, and tested for, BEFORE the daily partition decision was approved.

             

            We can't help you. You have already made the decision. In addition, you haven't posted ANY information about any of your use cases.

             

            What kind of 'merge' needs to happen?

            What does the 'daily' data represent: is each day's data 100% distinct from all other data, or does it need to be integrated with some existing data?

            What are the different use cases that USE that data? How much data do they typically use? How many users issue the queries?

            How long is the data kept online?

            Is yesterday's data (or last months) EVER modified? If not then you don't need to keep older data in daily paritions; you could aggregate the data weekly or monthly during a quiet period. That would provide far fewer partitions that need to be queried but still provide for a 'daily' partition for quick loading.

            • 3. Re: Affect on query performance by daily partitioning
              Moazzam
              We can't help you. You have already made the decision.

               

              But the decision is not final and we can revert it as we have not made any changes in the production related to it.

               

              What kind of 'merge' needs to happen?

               

              Merge means merge statement containing update and insert statements, fetching data from staging table and inserting into the destination table.

               

              What does the 'daily' data represent: is each day's data 100% distinct from all other data, or does it need to be integrated with some existing data?

               

              Cash balances of our customers are calculated daily and updated in the database. Yes 100% data is distinct from all other data and hence there are only inserts in the balances table although our merge statement contains WHEN MATCHED section as well.

               

              What are the different use cases that USE that data? How much data do they typically use? How many users issue the queries?

               

              Typically it is used in reporting to show balances to our clients. About last 6-months data is used. About 50 users/terminals can issue the queries for reporting.

               

              How long is the data kept online?

               

              About last 6-months

               

              Is yesterday's data (or last months) EVER modified?

               

              No.

               

              If not then you don't need to keep older data in daily paritions; you could aggregate the data weekly or monthly during a quiet period. That would provide far fewer partitions that need to be queried but still provide for a 'daily' partition for quick loading.

              Great suggestion. We'll implement it. Thanks.

              • 4. Re: Affect on query performance by daily partitioning
                Nikolay Savvinov

                Hi,

                 

                if you leave your indexes global non-partitioned, queries using it wouldn't even notice that the table is partitioned.

                Of course, that would cost you some of the benefits that partitioning offers (e.g. any partition maintenance operations

                would invalidate the global indexes).

                 

                Best regards,

                Nikolay

                • 5. Re: Affect on query performance by daily partitioning
                  Billy~Verreynne

                  Moazzam wrote:

                   

                  Some people are saying that by daily partition, the performance of select statement against this table shall be slow. The select statements that want to get balance for last 6 months shall be slow because query executor shall have to scan a lot of partitions.

                  Sounds like basic bs to me.

                   

                  Why?

                   

                  Because the AMOUNT of I/O to do does not significantly increase because 6x31 partitions are processed instead of a single partition. The amount of rows needed to be read, remains the same. The indexing (assuming local partition indexes), are the same structure and provides the same alternative I/O paths for finding rows. So where do all the vast amounts of extra I/O come from, that now turn this from a previously fast query, into a very slow query?

                   

                  The benefits of having a finer grained range partitioning is likely better and faster queries, and better data management, as daily boundaries (or even monthly boundaries) for partitioning, is by far more natural than 6 month boundaries. To be honest, range partitioning schemas that make no sense to me are weekly, 6-monthly/bi-monthly and so on. There is nothing natural about such boundaries in data modelling.

                   

                  Also, having 6x31 physical structures as oppose to a single 1, makes for easier and more flexible parallel processing (be that via PQ or self-rolled).

                  • 6. Re: Affect on query performance by daily partitioning
                    Paul  Horth

                    Agreed, I was just reading this and  couldn't figure out why people were thinking reading the same amount of data would necessarily be slower if it was daily partitioned (given the provisos you gave above).

                     

                    As RP said - this should all have been thoroughly tested before going live anyway.

                    • 7. Re: Affect on query performance by daily partitioning
                      Moazzam

                      To see the impact of daily partitioning on the Select query containing partitioning column in the where clause which is of date type, following test has been conducted:

                       

                      In cash_balances table, daily partitions were created from 1st July, 2013 to 31st December, 2013.

                      1. For July and August, each daily partition contained 100,000 rows.
                      2. For September and October, each partition contains 1 million rows.
                      3. The partitions for last two months (November and December) were empty.
                      4. WHERE clause in the query was: calculation_date BETWEEN '01-JUL-2013' AND '31-dec-2013'

                       

                      With the above scenario, the select query took averagely 27 msec for execution while with single 6 month partition, it took 19 msec on average. This difference of 8msec does not increase/decrease with change in data as long as total number of partitions remain same because Oracle uses the data dictionary tables when deciding partitions for scanning.

                       

                      Message was edited by: Moazzam

                      • 8. Re: Affect on query performance by daily partitioning
                        Paul  Horth

                        Difficult to tell what is causing the difference in reading the same amount of data without further information like execution plans and traces.

                         

                        The principle still stands - it should take roughly the same amount of time - so something else is affecting this.

                         

                        Also, you say 'on average'. What was the standard deviation?

                        • 9. Re: Affect on query performance by daily partitioning
                          Moazzam
                          Because the AMOUNT of I/O to do does not significantly increase because 6x31 partitions are processed instead of a single partition. The amount of rows needed to be read, remains the same. The indexing (assuming local partition indexes), are the same structure and provides the same alternative I/O paths for finding rows. So where do all the vast amounts of extra I/O come from, that now turn this from a previously fast query, into a very slow query?

                           

                          With 6x31 partitions, we have the same number of index partitions i.e. the same number of B-Trees to scan for finding data. With a single 6 month partition, we have a big single B-Tree. Suppose it is a 6 level B tree. For daily partition, the B tree shall be smaller than that of 6 month partition. Suppose it is a 3 level B Tree.

                           

                          Suppose I want to search data from table from 01-Jan-2013 to 30-Apr-2013. For 6 month partition, to reach to leaf block, 6 read operations are required while in case of daily partition, we shall have to scan 4x31 partitions causing 3x4x31 read operations. Thus I/O shall be increased in case of daily partition impacting the query performance.

                          • 10. Re: Affect on query performance by daily partitioning
                            Billy~Verreynne

                            Then using that argument, why use partitioning at all then?

                            • 11. Re: Affect on query performance by daily partitioning
                              Moazzam
                              Then using that argument, why use partitioning at all then?

                              Partition pruning. If table contains last five year's data, then instead of scanning the entire table, we'll be scanning only the 6 monthly data if we have partitioned it on 6 monthly basis.

                               

                              The daily partitioning shall perform best if the SELECT query accesses data for 15 to 20 days. If we are accessing data for one to two months, then we should go for monthly partitions instead. It's the behaviour of the user accessing the table defines which partitioning level is best for it.

                              • 12. Re: Affect on query performance by daily partitioning
                                Billy~Verreynne

                                But you have the exact same problem if the SQL spans 2 years and 4 partitions. So if 2 years were queried at a time, the partitioning now needs to be year-based in order to limit the number of local indexes to hit? How is this a sensible approach?

                                 

                                You cannot claim that because a single query hits 2 partitions only (for a year), it is better than the query hitting 365 partitions (for a year) - as then it can be claimed that a single yearly partition will only require a single index hit, and as such is 50% better than the 6 month partitioned table requiring 2 index hits. Heck, why then use partitions at all if we need to hit multiple pesky local indexes??

                                 

                                Yes, what you said about additional indexes is correct. But partitioning is not guided by how many local indexes may need to be by a specific query (and there are global indexes too). Instead, there are a wide range of considerations, such as partition pruning that you have mentioned.

                                 

                                Partition design and usage are beyond a simplistic view of "oh, I'm going to hit too many local indexes with my query, so we need to change partitioning to a coarse grain".

                                 

                                The major issue I have with 6 month partitions is that it does not make sense from data modelling and business usage perspective. It is not a natural boundary for business queries. It is not a natural boundary for managing data. And justifying 6 month boundaries because of the number of local indexes required to be processed for a SQL spanning a calendar year, does not make sense to me.

                                 

                                Daily partitions does. Monthly partitions does. But daily is very much a more frequent boundary than monthly. Which is why I do not see any performance fubars over choosing a daily partitioning range scheme over an unnatural and inflexible 6-monthly partitioning scheme. The latter needs some serious justification beyond that potentially only 2 local partition indexes will be hit.