2 Replies Latest reply on Jun 14, 2016 4:13 PM by SPowell42

    How to handle missing information in designing periodic snapshot fact table?


      We are about to design a data warehouse in which the computed measures in fact tables are updated monthly, but often there is no transaction for most customers.

      Assume that our model has one Fact table and two Dimensions, say d1 and d2 that ...

      • d1 contains customers information and has 9,000,000 members
      • d2 represents transactions information and has 1,000 members

      In real-world of the business that we are trying to model; d2 is very sparse for customers, i.e. most customers may have done just one or even no transaction by month.


      I want to know that for months that a customer had not done any transaction, if I should insert a row for each transaction types for that customer with zero value for the specified measure in fact table?

      • If yeah, just a few portion of the about 9E9 rows have meaningful and useful information and it seems that the fact table would have been bursting!
      • If nay, how should I handle information of a user that had no transaction for some months in a year in the analyses that I will have done "Group by" on month and filter or prompt on customers, years and transaction types? In other words, in this case how can I handle not existing (logically zero) values to be interpreted as zero in my analyses?
        • 1. Re: How to handle missing information in designing periodic snapshot fact table?
          Gianni Ceresa


          I would say that it depends, it's not really a YES or NO answer ...

          If you want to make analysis on your full population of customers all the time (and not miss those without transactions) creating FACT rows for all of them will make your life easier.


          If on the other hand you mainly do analysis on customers with transactions and you just want to know, for example, how many customers don't have a single transaction for the month XX then you can also stay without the "empty" rows in your FACT table and work with outer joins to get the list of customers without transactions.


          That's why it's difficult to answer with a YES or NO, everything is possible as you can model things in a way to get your information (for example think about making your D1 Customers a fact table as well with an outer join on the Fact transaction table and adding a condition where transaction is null and you get all your customers without transactions ....

          • 2. Re: How to handle missing information in designing periodic snapshot fact table?

            I would implement a fact table that just contains the transactions that happen for customers (i.e. your couple of thousand of transactions per month vs. your 9 million "customers"). Then you can create metrics which show the transactions by month, or you can create a "life to date" type metric that would sum over time. That way, if you only want to see the transactions that happened in a month it's super easy, but if you want to see total spend by customer over their lifetimes you could still see that also (without having to insert tons of "empty" fact records).


            Hope this helps,