This content has been marked as final. Show 3 replies
It's interesting to hear about your original "junk" dimension approach. I'd call it normalization, as you save storage space by referencing which row in the "junk" dimension fits which row in the fact, but it'll cost you a lookup from fact to the junk dimension EVERY time you acccess a row in the fact. This can't be good for performance.
I'd call these Y/N columns in fact degenerate dimensions and leave them in the fact as char(1). The tablespace cost will not be significant, but you'll get better performance than using a junk dimension.
Another potential problem is: what happens if some of your dimensions are not only Y/N but something like Y/N/null/unknown/not yet defined/TBD/won't say/probably not/maybe Y/... You can still model this, but it would skyrocket the size of your junk dimension. Junk is not really a dimension but a collective of multiple dimensions. Never did it, won't do it.
I wouldn't go as far as calling the junk dimension approach 'original' - it was described already in the data warehouse toolkit book by Kimball more than a decade ago. The choice pro/contra is a personal one however, mostly centered around whether you want to keep multiple degenerate dimensions in your fact table (mostly implemented as a string with a recognisable keyword) or move these columns to a dimension and keep only 1 column in the fact table - in essence, one weighs the extra space consumption of the fact table against the performance loss of using an extra join. This choice has nothing to do with normalization however, otherwise one could argue the same for all other dimensions - keep them, attributes and all, in the fact table!
I have recently found myself in the position that I had to convert a fact table from using DD's (amongst other things, an indicator sales/retour) to a junk dimension, because OBIEE didn't handle the DD correctly. Perhaps I didn't use the right approach though, but defining a logical dimension and a logical fact both using the same physical source table messed up the results of my queries, regardless of using the dimension in the report. Once I converted to using a junk dimension that issue dissapeared and the query results were corrected. However, having said that, I do definately prefer the DD approach (due to performance benefits), and keep junk dimensions in mind as a fallback plan.
It's not classical normalization but this junk dimension approach is very similar to normalization: you replace the data with a relation to another table where only unique values are stored. The difference is that you didn't store contents from only one column but from several columns combined together.