12 Replies Latest reply: Aug 25, 2014 8:23 AM by Gaff RSS

    How to create life-like test data?

    954696

      Hi!

       

      Are there any general hints to creating life (production) like data?

       

      I don't mean the values, but things like clustering, fragmentation etc.

       

      When creating test data, typically all or most of the rows of a table are inserted in a loop or even a single INSERT statement in one transaction, while in production they are inserted one by one over a longer time period, multiple transactions, mixed with inserts into other tables etc.

       

      Any suggestions welcome!

      (I am focused on a log like table: just inserts and selects, no delete or update)

       

      Regards,

      David

        • 2. Re: How to create life-like test data?
          rp0428
          Are there any general hints to creating life (production) like data?

           

          I don't mean the values, but things like clustering, fragmentation etc.

           

          When creating test data, typically all or most of the rows of a table are inserted in a loop or even a single INSERT statement in one transaction, while in production they are inserted one by one over a longer time period, multiple transactions, mixed with inserts into other tables etc.


          (I am focused on a log like table: just inserts and selects, no delete or update)

          Your question doesn't make any sense.

           

          You talk about 'most of the rows' for test data being created in 'one transaction' while in production it takes longer and is mixed with other DML.

           

          But then you say your 'focus' is 'on a log like table' that just has inserts.

           

          For a table that just has inserts what difference does it make if those inserts are in 'one transaction' or not? The only effect other transactions might have is is they use extents from the same tablespace and that effect will be pretty minor.

          Are there any general hints to creating life (production) like data? 

          Yes - the same ones that apply to ANY project: define the requirements and write a requirements document.

          I don't mean the values, but things like clustering, fragmentation etc.

          Which is EXACTLY why you need a requirements document.

           

          Terms like 'clustering' and 'fragmentation' are too vague to have any specific meaning. In over 25+ years, including creating large amounts of data, I have NEVER had a requirement to try to produce specific 'fragmentation' or 'clustering'. The only thing close is a requirement to produce data that meets certain demographic distributions. That is, something like: 30% of state values should be California, 2% Wyoming, etc. and so on rather than just use the same state value for all rows.

           

          Without knowing YOUR specific requirements we can't really offer any specific solutions.

          • 3. Re: How to create life-like test data?
            Gaff

            Your question is somewhat unclear.  Is this "live" test data to test application performance and you are concerned about query performance as new data is being added?  If not, your concerns about "clustering" etc. make no sense because you say that you are not concerned about "values".  Well, the clustering is going to depend on how you are partitioning your data and on the values of that data.  How and when that data got into the table doesn't really matter unless the key partitioning is by log date.  If your log data is partitioned by log date then that's something that the data generation scripts can easily handle.

            • 4. Re: How to create life-like test data?
              Etbin

              Maybe

               

              select dbms_random.string('l',3 + to_number(substr(to_char(rnd),1,1))) some_string,

                     100 + round(30 * dbms_random.normal,2) some_amount,

                     trunc(sysdate - 90 * rnd / 10000) some_date,

                     case when rnd < 4500 then 'USD'

                          when rnd < 7000 then 'CAD'

                          when rnd < 9000 then 'AUD'

                          when rnd < 9500 then 'NZD'

                          when rnd < 9700 then 'HKD'

                          else 'SGD'

                     end dollars

                from (select dbms_random.value(1,10000) rnd

                        from dual

                      connect by level <= 30

                     )

               

              SOME_STRINGSOME_AMOUNTSOME_DATEDOLLARS
              qhywam52.2807/26/2014USD
              oqbatjztopx83.2806/10/2014AUD
              zdsphmrhi71.4406/27/2014CAD
              zldubuzjdobw31.7205/27/2014SGD
              phgboqttc95.8808/18/2014USD
              tvdnqkxjlfaz108.7605/27/2014SGD
              cyojaa96.9707/23/2014USD
              lsjezc74.4507/24/2014USD
              mxplamhb64.7607/04/2014CAD
              bweogao70.4907/12/2014CAD
              xpmdxu67.9507/20/2014USD
              ypycuqtseqa92.2206/11/2014AUD
              vnmmy67.308/01/2014USD
              vyalosdxjldq4206/02/2014NZD
              efxlndkzpkso122.7205/30/2014HKD
              nznijxqnbe120.5506/13/2014AUD
              urskjyni185.8207/04/2014CAD
              fhrxswkq132.2907/03/2014CAD
              orhba83.2208/05/2014USD
              vcnhb153.0707/29/2014USD
              eosqbui34.0708/19/2014USD
              tijzmkjfn84.708/18/2014USD
              tcnjfdtay50.3808/18/2014USD
              btwpnmnahhnx141.1606/04/2014NZD
              bcxqgagycj88.2706/18/2014AUD
              gktjawrbgc62.4606/16/2014AUD
              ayyfwxoqplya90.7905/27/2014SGD
              gukaqnufqz64.6206/16/2014AUD
              qhzf119.6308/07/2014USD
              uxluq116.8208/02/2014USD

               

              Regards

               

              Etbin

              • 5. Re: How to create life-like test data?
                xerces8

                Gaff wrote:

                 

                Your question is somewhat unclear.  Is this "live" test data to test application performance and you are concerned about query performance as new data is being added?  If not, your concerns about "clustering" etc. make no sense because you say that you are not concerned about "values".  Well, the clustering is going to depend on how you are partitioning your data and on the values of that data.  How and when that data got into the table doesn't really matter unless the key partitioning is by log date.  If your log data is partitioned by log date then that's something that the data generation scripts can easily handle.

                Yes, it is to test application performance. I already have good test data values. I did not mean to say they don't matter, but that I already have that covered.

                The table is not partitioned. The reason I am asking it that queries on the production run much slower, even if the data is (almost) same. The execution plan is same too. So I suspect the physical data location to be at play.

                • 6. Re: How to create life-like test data?
                  Paulie

                  954696 wrote:

                   

                  Are there any general hints to creating life (production) like data?

                  I don't mean the values, but things like clustering, fragmentation etc.

                   

                  When creating test data, typically all or most of the rows of a table are inserted in a loop or even a single INSERT statement in one transaction, while in production they are inserted one by one over a longer time period, multiple transactions, mixed with inserts into other tables etc.

                   

                  Any suggestions welcome!

                  (I am focused on a log like table: just inserts and selects, no delete or update)

                   

                  I'm unclear as to your  ultimate goal. If it's for testing with realistic loads (which is my take on your request), try here HammerDB - it allows you to replay your own loads which will match your usage pattern. Disclaimer, I've never used it for that, but I have found it useful for general work.

                  • 7. Re: How to create life-like test data?
                    Gaff

                    OK.  That's kind of what I thought.  In addition to something like HammerDB, suggested by another post, Oracle itself does have a replay option that would seem to be exactly what you are looking for.  I myself have never had the need to use it, but the brochure looks nice

                     

                    What version of Oracle are you using? 

                    Is the Oracle version (and hardware) the same between test and prod?

                     

                    The main thing I can think of where you would be impacted is that you are not partitioning by log time and your queries are trying to query the same parts of the dis that are being written too.  It's not necessarily an Oracle thing (readers don't block writers, etc. but also I think you said you aren't doing updates) but an I/O thing.  But this should only happen if you are logging heavily, I would think.

                     

                    If you can look at Enterprise Manager (Grid control) you can get a good idea of where all of the time is being spent in a nice graphical display.

                    • 8. Re: How to create life-like test data?
                      rp0428

                      xerces8 wrote:

                       

                      Gaff wrote:

                       

                      Your question is somewhat unclear.  Is this "live" test data to test application performance and you are concerned about query performance as new data is being added?  If not, your concerns about "clustering" etc. make no sense because you say that you are not concerned about "values".  Well, the clustering is going to depend on how you are partitioning your data and on the values of that data.  How and when that data got into the table doesn't really matter unless the key partitioning is by log date.  If your log data is partitioned by log date then that's something that the data generation scripts can easily handle.

                      Yes, it is to test application performance. I already have good test data values. I did not mean to say they don't matter, but that I already have that covered.

                      The table is not partitioned. The reason I am asking it that queries on the production run much slower, even if the data is (almost) same. The execution plan is same too. So I suspect the physical data location to be at play.

                      Why are you answering questions Gaff posed to OP?

                       

                      Are you working with OP on this issue?

                       

                      If you have performance issues with production queries the FIRST step is to determine the cause of those issues. People on the forum can help you with that but ONLY if you provide the relevant information.

                       

                      To 'suspect' that physical data location is an issue is fine as a hypotheses but I think it is premature to start trying to replicate any such data distributions.

                       

                      1. When did production queries become slower?

                      2. What does 'slower' mean? 1 second to 1.01 seconds? 1 minute to three days?

                      3. What changed on the production server between the time queries were fine and the time they started running slow

                       

                      IMHO you haven't adequately identified the cause of the problem. Until you do it is rather pointless and a waste of time, effort and resources to start implementing solutions.

                      • 9. Re: How to create life-like test data?
                        954696

                        Gaff wrote:

                         

                        OK.  That's kind of what I thought.  In addition to something like HammerDB, suggested by another post, Oracle itself does have a replay option that would seem to be exactly what you are looking for.  I myself have never had the need to use it, but the brochure looks nice

                         

                        What version of Oracle are you using?

                        Is the Oracle version (and hardware) the same between test and prod?

                         

                        The main thing I can think of where you would be impacted is that you are not partitioning by log time and your queries are trying to query the same parts of the dis that are being written too.  It's not necessarily an Oracle thing (readers don't block writers, etc. but also I think you said you aren't doing updates) but an I/O thing.  But this should only happen if you are logging heavily, I would think.

                         

                        If you can look at Enterprise Manager (Grid control) you can get a good idea of where all of the time is being spent in a nice graphical display.

                        Oracle version is 11.2.0.2.0 and 11.2.0.3.0 on test systems that I work with and 11.2.0.4.0 on other test and production. Those will be updated ASAP.

                        Hardware is different.

                         

                        There is not a lot of writing activity.

                         

                        Time is spent in "CPU + CPU Wait" (about 80% of it).

                        • 10. Re: How to create life-like test data?
                          954696

                          It seems I have two accounts. I submitted a request to merge them.

                           

                          1.) Not sure. Customer is shy with feedback.

                          2.) minutes, from couple to 60 (depends on used query parameters)

                          3) See item 1

                          • 11. Re: How to create life-like test data?
                            954696

                            954696 wrote:

                            Oracle version is 11.2.0.2.0 and 11.2.0.3.0 on test systems that I work with and 11.2.0.4.0 on other test and production. Those will be updated ASAP.

                            This is wrong, sorry.

                            The versions are all 11.2.0.3.0

                            • 12. Re: How to create life-like test data?
                              Gaff

                              Are you concerned because the two systems used to have similar performance and now suddenly do not?

                               

                              You say it's not an I/O issue but 80% of the time is in "CPU + CPU WAIT".  What do you think the most common thing to wait for is?  Disk I/O.

                               

                              Sorry.  Don't really have much to go on.

                               

                              This may help you pin things down.

                               

                              http://docs.oracle.com/cd/E24628_01/server.121/e17635/tdppt_realtime.htm