1 2 Previous Next 19 Replies Latest reply on Jul 20, 2013 9:35 AM by Etbin

    Group by problem from huge table

    user2488234

      Hi,

       

      I have a huge cdr table which contains 700 million records. There s no partition only , index on gsm_no column. (main_cdr_table)

      And i have another table with the following columns . gsm_no , cdr_count , last_digit. (gsm_temp)

       

      What i want to do is ,  to get a sum from main_cdr_table for each gsm_no  and store it another table , like this.

       

      insert into my_temp
      select gsm_no,sum(price_amount)
      from main_cdr_table
      group by gsm_no
      

       

      As you see this way is not efficient. What can you offer to perform this operation?

       

      Thanks...

        • 1. Re: Group by problem from huge table
          kendenny

          If main_cdr_table has an index on gsm_no, make sure your stats are up to date and then what you have there is as efficient as you're likely to be able to get.

          You're not going to process 700 million rows in 2 or 3 seconds no matter what you do.

          • 2. Re: Group by problem from huge table
            Pacmann

            I doubt this index will be used in this case, no matter the statistics.

             

            Maybe by adding price_amount column in the index...

            • 3. Re: Group by problem from huge table
              Karthick2003


              Looks like a good candidate for materialized view. Just out of curiosity I would like to know why are you creating a temp table?

               

              Please read Overview of Materialized View

              • 4. Re: Group by problem from huge table
                Rahul_India

                What do you mean by stats upto to date?

                How does it help in fast retrieval of data?

                • 6. Re: Group by problem from huge table
                  user2488234

                  temp table is already created , don't get hung up on temp table

                  may be joining with temp table helps to improve performance

                  • 7. Re: Group by problem from huge table
                    Etbin

                    Seems not much can be done just querrying the table. ( /*+ append */ should make the insert faster ).

                    By chance I have a kind of your table 92,969,221 rows of average length 476 with fresh statistics.

                    Submitting a query like yours: sum of all rows group by an indexed column reveals Pacmann is right - no reason to use the index it's doing a hash group by.

                    Thus 204,009 rows are produced in 47 seconds - summing those sums to produce an overall sum, both steps take less than a minute

                    But I'm on Exadata using version 11.2.0.2.0.

                     

                    Regards

                     

                    Etbin

                    • 8. Re: Group by problem from huge table
                      Mike Kutz

                      Is that with or without parallel DML??

                      I know parallel DML can drastically speed things up.

                      On the downside, you won't be able to use a transaction level GTT as you must call COMMIT after the INSERT ... SELECT.

                       

                      As others have said, this sounds like a good candidate for Materialized View.

                      • 9. Re: Group by problem from huge table
                        APC

                        Etbin wrote:

                         

                        But I'm on Exadata using version 11.2.0.2.0.

                         

                         

                        Gloat about it, why don't you?

                         

                        But seriously, smashing through vast amounts of data in a highly parallel fashion is exactly what Exadata does.  But that finding is irrelevant to anybody working with a regular database on a normal server with access to dozens of compute nodes and a layer of cell servers implementing SmartScan.


                        Cheers, APC

                        • 10. Re: Group by problem from huge table
                          APC

                          People have made some suggestions but they are all guesses.   You haven't explained your business requirement, so we cannot given you any proper advice.  If you want us to help you must provide answers to these questions:

                           

                          #1.  Why are you inserting into a new table?

                          #2. Is this a one off exercise or something you want to run on a regular basis?

                          #3.  By "temp table" do you mean a GLOBAL TEMPORARY TABLE ?  Or just a regular table which you zap when you've finished with it?

                          #4   Which database version and Edition are you using?  How many CPUs?

                          #5.  Do you have a specific performance problem (i.e. users are complaining about the elapsed time) or is this idle curiosity?  .

                           

                          Cheers, APC

                          • 11. Re: Group by problem from huge table
                            Oracle Maniac

                            Telecom database is one of the most heavy loaded database. Apparently you are on the DWH side (away from OLTP .i.e. Intelligent Network ,where these cdrs are generated). The server must be a high configuration server with (30+ processors and 60 gig of RAM ). Experts its just a guess and i know i might not be correct . have worked on this domain .Use the parallel processing ,append hint.Increase the size of inital and next extents of the table.Decrease the pctfree .Parallel can make a drastic improvement but there are some prerequisites for the parallel hint to work .Optimizer stats should be up to date.

                             

                             

                            Thanks

                            • 12. Re: Group by problem from huge table
                              Etbin

                              MikeKutz wrote:

                              Is that with or without parallel DML??

                              Looking at the plan it was without parallel DML, but I (sorry for that) forgot to mention storage full scan.

                              With such a timing I didn't even think of trying parallel as I was considering it as the last resort until now.

                               

                              Regards

                               

                              Etbin

                              • 13. Re: Group by problem from huge table
                                Mike Kutz

                                One other "trick" I'm aware of is to compress the source table.

                                For the original poster - look into Adv. Compression add-on and read the whitepapers.

                                 

                                For Etbin - I'm wondering how much of the "parallel" and "compressed table" tricks exadata is doing automagically behind the scenes.

                                • 14. Re: Group by problem from huge table
                                  Etbin

                                  Alas I'm a rather underprivileged user . (some thirteenth pig we use to say)

                                  Nevertheless I could try to find out something useful (reporting to investigate big table compression(s) should look serious enough in these vacation times )

                                  As far as I can remember, maybe a year ago, I was investigating various table compressions but using only a 3M rows table collecting 100% sample statistics to check the results.

                                  There was quite a difference when comparing the number of blocks, but a rather small one when comparing select timings. A 3M rows table seems too small for the task.

                                  A 92M table might reveal new facts.

                                  At the moment I know collecting 100% sample statistics took 4.5 hours.

                                  Regarding automagical things I cannot check how things get done (I can only read in the documentation about that), the storage is definitely data aware (usually you don't need an index for each and every low cardinality - storage full scan seems to be smart enough to almost always get just the blocks where the required rows are located)

                                  I'll let you know about the findings.


                                  Regards


                                  Etbin


                                  Sorry, no further findings possible for now - ORA-01625: unable to extend temp segment by 8192 in tablespace ...

                                   

                                  Message was edited by: Etbin

                                  1 2 Previous Next