3 Replies Latest reply: Feb 6, 2013 6:00 PM by rp0428 RSS

    table partition strategy for  data loading performance & data retreval

    user3084749
      Hi there, need your suggestions.
      we are using the Oracle version: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

      We have a HR data approximately of about 270 mill for 100,000 persons in an SAP module, & we need to load this data into an oracle table.
      This is an accumulated data for the past 2 yrs starting from Jan 2011. & with the assumptions that the data will be incrementally loaded every day is about 350,000.
      The data granularity is available at the fraction of the day known as TIME_TYPE for a given day.
      for example an person can have multiple records on a given day, depending upon the TIME_TYP.

      sample data:
      Pers_ID     Payroll_date     Time_Type Day_Hrs
      1960     1-Jan-11     Maximum Vacation     4
      1960     1-Jan-11     Vacation Quota Maximum     2
      1960     1-Jan-11     Maximum Sick     2
      1960 2-Jan-11     Paid Eligible OT Hrs     3
      1960     2-Jan-11     Paid Hours     3
      1960     2-Jan-11     WW Eligible OT Hrs     2
      1960     5-Jan-11     Daily Overtime Hours     2
      1960     5-Jan-11     Weekly Additional Hours     2
      1960 5-Jan-11     Personal Quota Balance     2
      1960     5-Jan-11     Total Overtime Hours     2

      The above data is of an individual person, his time spent on a particular day over a 3 day period.

      My question is how best I can design the table (partitioned table), so that the data loading process can be fast (for the initial load as well as the daily incremental load)
      also we have lot of reporting on this table, few reports such as total no. of hrs utilized by a particular employee, whats the most time_typ used by a group of employees, & so on.

      please let me know your suggestions & thoughts.

      Edited by: user3084749 on Feb 7, 2013 9:56 AM

      Edited by: user3084749 on Feb 7, 2013 10:00 AM

      Edited by: user3084749 on Feb 7, 2013 10:01 AM
        • 1. Re: table partition strategy for  data loading performance & data retreval
          rp0428
          Thanks for opening your own thread. That will make it easier to get the help you want and for people to focus on the particulars of your problem.
          >
          Hi there, I have similar kind of a situation, need your suggestions.
          we are using the Oracle version: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

          We have a HR data approximately of about 270 mill for 100,000 persons in an SAP module, & we need to load this data into an oracle table.
          This is an accumulated data for the past 2 yrs starting from Jan 2011. & the assumpitions are the data that will be incrementally loaded every day is about 600,000.
          The data granularity is available at the fraction of the day known as TIME_TYPE for a given day.
          for example an person can have multiple records on a given day, depending upon the TIME_TYP.

          sample data:
          Pers_ID Payroll_date Time_Type Day_Hrs
          1960 1-Jan-11 Maximum Vacation 4
          1960 1-Jan-11 Vacation Quota Maximum 2
          1960 1-Jan-11 Maximum Sick 2
          1960 2-Jan-11 Paid Eligible OT Hrs 3
          1960 2-Jan-11 Paid Hours 3
          1960 2-Jan-11 WW Eligible OT Hrs 2
          1960 5-Jan-11 Daily Overtime Hours 2
          1960 5-Jan-11 Weekly Additional Hours 2
          1960 5-Jan-11 Personal Quota Balance 2
          1960 5-Jan-11 Total Overtime Hours 2

          The above data is of an individual person, his time spent on a particular day over a 3 day period.

          My question is how best I can design the table (partitioned table), so that the data loading process can be fast (for the initial load as well as the daily incremental load)
          also we have lot of reporting on this table, few reports such as total no. of hrs utilized by a particular employee, whats the most time_typ used by a group of employees, & so on.
          >
          Partitioning is usually done to improve data management (roll off old data), improve performance or both.

          Loading of original or incremental data won't necessarily be affected by whether the table is partitioned. The same INSERT options (e.g. parallel, append hint, NOLOGGING) are available whether the table is partitioned or not.

          A main performance benefit for partitioned tables is when it allows Oracle to do full 'partition' scans instead of full table scans.

          So if you query for all data for a particular month on a table with no indexes Oracle has to do a full table scan. The same query on a table partitioned by month could be done by a full 'partition' scan of the partition for that month. So only a fraction of the data needs to be scanned.

          A query of all records for one employee won't really be affected by partitioning if those records are scattered throughout the table. An index can find them either way if appropriate and if an index cannot be used then a full scan of the table, or all partitions, will be needed.

          A typical use case of partitioning for data management for use cases such as yours would be to partition the data by month (load month or the equivalent) and perhaps even use a different tablespace for each month.

          Then you can deal with old data by dropping old partitions or even using transportable tablespace to move those partitions to another database or history table. Each month you might add a new partition for the newly loaded data and 'roll off' the oldest partition to maintain 2-3 years of online data.

          For performance improvement you would partition on a column, or colulmns, that would allow Oracle to 'prune' partitions that are not needed by the queries you run most often or that have the biggest performance issues.

          None of the sample data you provided above really shows any indication that partitioning might be useful.

          Do you currently have any queries that you think have performance issues and that might benefit from partitioning? Can you post them?

          You might want to read through the 'Partitioning for Availablility, Manageability, and Performance' chapter of the VLDB and Partitioning Guide to get a better sense of the value and use of partitioning.
          http://docs.oracle.com/cd/B28359_01/server.111/b32024/part_avail.htm#BJEIEDIA
          • 2. Re: table partition strategy for  data loading performance & data retreval
            user3084749
            Thank you for the detailed explanation and the link you provided.

            actually I don't have any queries as such but the data, the reporting team is looking or querying for is the aggregates, counts or avg of the hours of the employees utilizing their time_types which I have mentioned below. few time_types(Vacation, Sick, Comp Time, over time etc.) and the reports may ran for a particular person, or a group of persons(team or department) on a weekly, monthly, quarterly and yearly basis.

            Thanks
            - D
            • 3. Re: table partition strategy for  data loading performance & data retreval
              rp0428
              >
              actually I don't have any queries as such but the data, the reporting team is looking or querying for is the aggregates, counts or avg of the hours of the employees utilizing their time_types which I have mentioned below. few time_types(Vacation, Sick, Comp Time, over time etc.) and the reports may ran for a particular person, or a group of persons(team or department) on a weekly, monthly, quarterly and yearly basis.
              >
              Then, assuming partitioning is even needed, I would suggest exploring the use of a partitioning scheme using a date range; for example, monthly based on an appropriate date.

              Another issue is what you want to happen if an insert/update is attempted with data with an invalid date (e.g. the year 4513). If you have a normal range partitioned table with a MAXVALUE partition the data will get put into that partition.,

              If you opt for interval partition (i.e. let Oracle create new partitions as needed) there is no MAXVALUE partition but Oracle will create a partition for that invalid data. Oracle could create hundreds or thousands of unnecessary partitions if you tried to insert data for many months in the future and the dates are not really valid.

              You could also start with a table that only has one partition, a MAXVALUE partition. Then later when you actually need new partitions the main table will already be partitioned and you can perform SPLIT PARTITION commands to split the data up.