5 Replies Latest reply: Jan 25, 2013 3:36 AM by Hesh RSS

    Dimensional modeling and year ago values

    Hesh
      Hi,

      I have been in this process for the last six months to model a warehouse to my client. I have a requirement where my client want to compare the data from one year to another year ( this is what generally warehouse is build for :) ).

      Biggest challenge in my requirement is to get previous year-weeks Articles comparing current year-week Articles, as new Articles can be added during the Current year and few old Articles will no more exist in the current year-week.
      Art_no     Sold_amt     Tim_id     year-week
      10001      20.5           700001     201101
      10002      10.3           700001     201101
      
      10001      30.5           800001     201201
      20001      50.2           800001     201201
      Art_no      Sold_amt Tim_id          year-week      Prev_Sold_Amt
      10001      30.5      800001          201201           20.5
      10002      null      null            201201           10.3
      20001      50.2      800001          201201          null
                                                             I decided to do them in ETL, it worked for me now, but yes it is not scalable to get previous-previous years data and has redundant data. I wonder how generally this is achieved in warehouses? I know we can do an attempt to calculate on the fly using an full-outer joint, but it can be result null values in the time dimension column of fact table.

      This was also discussed in my previous therad

      MV as core table


      Thanks,
      Hesh
        • 1. Re: Dimensional modeling and year ago values
          rp0428
          >
          I decided to do them in ETL, it worked for me now, but yes it is not scalable to get previous-previous years data and has redundant data. I wonder how generally this is achieved in warehouses? I know we can do an attempt to calculate on the fly using an full-outer joint, but it can be result null values in the time dimension column of fact table.
          >
          I'm not sure I understand what your question is.

          Are you talking about needing to provide groups for dates or ranges that you do not actually have data for. For example to show 1st quarter data and show 'something' for February even if you have no data for February?

          If so the term for that is Data Densification and there is a discussion of it with examples in the Oracle Data Warehousing Guide
          http://docs.oracle.com/cd/E14072_01/server.112/e10810/analysis.htm#i1014934
          >
          Data Densification for Reporting

          Data is normally stored in sparse form. That is, if no value exists for a given combination of dimension values, no row exists in the fact table. However, you may want to view the data in dense form, with rows for all combination of dimension values displayed even when no fact data exist for them. For example, if a product did not sell during a particular time period, you may still want to see the product for that time period with zero sales value next to it. Moreover, time series calculations can be performed most easily when data is dense along the time dimension.

          This is because dense data will fill a consistent number of rows for each period, which in turn makes it simple to use the analytic windowing functions with physical offsets. Data densification is the process of converting sparse data into dense form.To overcome the problem of sparsity, you can use a partitioned outer join to fill the gaps in a time series or any other dimension. Such a join extends the conventional outer join syntax by applying the outer join to each logical partition defined in a query. Oracle logically partitions the rows in your query based on the expression you specify in the PARTITION BY clause. The result of a partitioned outer join is a UNION of the outer joins of each of the partitions in the logically partitioned table with the table on the other side of the join.

          Note that you can use this type of join to fill the gaps in any dimension, not just the time dimension. Most of the examples here focus on the time dimension because it is the dimension most frequently used as a basis for comparisons.
          >
          Also see my reply in this recent thread.
          Re: Outer join
          • 2. Re: Dimensional modeling and year ago values
            Hesh
            Thank you for the valuable post on Dense data, but in my warehouse we have around two hundred thousand records in Article dimension and also for 350 stores ! means around 70 million records per week, huge one !

            Regards
            Hesh
            • 3. Re: Dimensional modeling and year ago values
              rp0428
              >
              Thank you for the valuable post on Dense data, but in my warehouse we have around two hundred thousand records in Article dimension and also for 350 stores ! means around 70 million records per week, huge one !
              >
              I'm not sure I understand the point you are trying to make with that statement.

              I've had some special cases where it helped to create a work table that had the dummy records for each reporting data in it. Then instead of using a dynamic query to generate the 'dense' list of dates we would just join the dummy table to the query like the 'outer join' example in the doc link I posted.
              • 4. Re: Dimensional modeling and year ago values
                Hesh
                sorry for replying late, I was trying different things actually...

                work table in this scenario may not work I guess,, because, as said, huge number of articles needs to be put into this work table along with various combinations of store numbers and time dimension values.

                here is a sample table along with insert scripts which can help to understand this better...


                Following is the query used to get the expected results, you can see the values for YW columns are nulls, YW is nothing but time dimension. We can get results using some sub-query, but my requirement is to get the expected result without complex queries, doing joins only,as these joins will be performed at RPD level, we can only use outer/fullout join if required.
                select b.art_d,a.yw,a.qty,a.geog_d 
                from
                (
                select * 
                from tmp_sfact 
                where yw=201201
                )a, tst_def b
                where a.art_d(+)=b.art_d
                order by 1
                create table tmp_sfact(
                ART_D     number,
                YW     number,
                QTY     number,
                GEOG_D number);
                
                create table tmp_artd(ART_D number);
                
                Insert into TMP_SFACT (ART_D,YW,QTY,GEOG_D) values (10001,201101,10.1,91100);
                Insert into TMP_SFACT (ART_D,YW,QTY,GEOG_D) values (10002,201101,12.3,91100);
                Insert into TMP_SFACT (ART_D,YW,QTY,GEOG_D) values (10001,201201,13.4,91100);
                Insert into TMP_SFACT (ART_D,YW,QTY,GEOG_D) values (10003,201201,14.9,91100);
                
                
                Insert into TST_DEF (ART_D) values (10001);
                Insert into TST_DEF (ART_D) values (10002);
                Insert into TST_DEF (ART_D) values (10003);
                Insert into TST_DEF (ART_D) values (10004);
                It is expected to have a clear data model at physical level(not complex views with sub-quires to get results).

                Thanks,
                Hesh

                Thanks,
                Hesh
                • 5. Re: Dimensional modeling and year ago values
                  Hesh
                  Surprisingly, reporting environment it self doing dense format of the result. This may be a big reason why reporting servers/tools are being used for BI systems.Further I find that the type of joining they do is, they do not join two facts directly, if we fetch fact values from two different fact tables at same granular level, then they execute each query separately and 'somehow' stich the output in dense format.

                  Thanks,
                  Hesh

                  PS: we are using OBIEE for reporting