3 Replies Latest reply: Mar 28, 2013 1:33 PM by Cortanamo RSS

    Y/N flags.  Junk dimension or attributes on the fact table?

    997611
      Hi folks. I know this is not a OWB question. Its more of a general DW design question but I believe that a key design decision is performance using your tool and our tool is Oracle DW and Cognos for reporting...So here it goes.

      I have about 40 Y/N flags attributes that I need in my current DW subject area. These flags come in three groups - client type flags, case type flags and outcome type flags.

      Solution 1:
      Based on my modelling background I default to putting these type of flags into a junk dimension which is really a dimension with all distinct combinations of Y/N for a set of indicators. If you had 10 Y/N flags there would be 2**10 or 1024 combinations/rows in you dimension table. That's a nice small dimension to link to my fact table. The advantages of a junk dim is that its easy to extend and you can group flags together. Its also a small table to query for BI tools lookups (ie much smaller than the fact table!) that are populated by querying the domain of an attribute ( though this is only an advantage if it "really" improves performance).

      Solution 2:
      Leave these Y/N flags on the fact table and bitmap index them. The advantage with this design is that you dont have to join to a dimension table to get at your flags.

      Test:
      I tested out solution 1 versus solution 2 on a 10 million row fact table. I had two tables for the test. MyFact_F which has an FK to MyDim_D called junk_sid. MyFact_F has 10 Y/N attributes -> attribute_1 through attribute_10. MyDim_D also has 10 attributes, attribute_1 to attribute_10.

      For solution 1, I got better performance with a bitmap index on the junk_sid dimension primary key and also bitmap indexes on the Y/N flags in the dimension. For solution 2 (attributes on fact), i just bitmap indexed the Y/F flags right there on the fact. On a 10 million row fact table, solution 2 is alot faster. Again, my training and past experience suggest that a better design is using a junk dimension but when performance is alot better using attributes on the fact table, i really makes me question which solution is "better". Any thoughts would be appreciated!
        • 1. Re: Y/N flags.  Junk dimension or attributes on the fact table?
          Cortanamo
          It's interesting to hear about your original "junk" dimension approach. I'd call it normalization, as you save storage space by referencing which row in the "junk" dimension fits which row in the fact, but it'll cost you a lookup from fact to the junk dimension EVERY time you acccess a row in the fact. This can't be good for performance.

          I'd call these Y/N columns in fact degenerate dimensions and leave them in the fact as char(1). The tablespace cost will not be significant, but you'll get better performance than using a junk dimension.

          Another potential problem is: what happens if some of your dimensions are not only Y/N but something like Y/N/null/unknown/not yet defined/TBD/won't say/probably not/maybe Y/... You can still model this, but it would skyrocket the size of your junk dimension. Junk is not really a dimension but a collective of multiple dimensions. Never did it, won't do it.
          • 2. Re: Y/N flags.  Junk dimension or attributes on the fact table?
            Kurt Geens
            I wouldn't go as far as calling the junk dimension approach 'original' - it was described already in the data warehouse toolkit book by Kimball more than a decade ago. The choice pro/contra is a personal one however, mostly centered around whether you want to keep multiple degenerate dimensions in your fact table (mostly implemented as a string with a recognisable keyword) or move these columns to a dimension and keep only 1 column in the fact table - in essence, one weighs the extra space consumption of the fact table against the performance loss of using an extra join. This choice has nothing to do with normalization however, otherwise one could argue the same for all other dimensions - keep them, attributes and all, in the fact table!

            I have recently found myself in the position that I had to convert a fact table from using DD's (amongst other things, an indicator sales/retour) to a junk dimension, because OBIEE didn't handle the DD correctly. Perhaps I didn't use the right approach though, but defining a logical dimension and a logical fact both using the same physical source table messed up the results of my queries, regardless of using the dimension in the report. Once I converted to using a junk dimension that issue dissapeared and the query results were corrected. However, having said that, I do definately prefer the DD approach (due to performance benefits), and keep junk dimensions in mind as a fallback plan.
            • 3. Re: Y/N flags.  Junk dimension or attributes on the fact table?
              Cortanamo
              It's not classical normalization but this junk dimension approach is very similar to normalization: you replace the data with a relation to another table where only unique values are stored. The difference is that you didn't store contents from only one column but from several columns combined together.