Hi, I have an issue modelling the time dimension for one of our facts (employee counts). The problem is this - when a user says "I want to see the count of employees for a given time period" - what they mean is that they want to see the count of employees as of the last day in that time period. i.e. "June employee counts" means employee counts as of 6/30, "Year 2000" employee counts means as of 12/31/2000.
This is very similar to modelling a "ending balance" inventory type problem in OBIEE.
I've solved this (poorly) in OBIEE by changing the aggregation to use "LAST" for the time dimension and "count(distinct(emplid))" for everything else. This works....but....the performance is horrible. If I ask for employee counts for year 2000, it appears to bring down employee counts for all 365 days in the year, then sorting them by date, then throwing away all except the last date. In other words, it brings back 364 days worth of records and then throws them away. This makes the query perform absolutely terrible. A 2 or 3 second query ends up running for minutes.
Does anyone have a different way to solve this issue?
I assume your model is something below
You got columns Date_wid and emp_wid with number type in Fact
In this case count(distinct emp_wid) should work for you with no issues other than datatype.
Whats the datatype for the column emplid?
select date, count(distinct emp_wid)
day joins fact
group by date
should fetch 365 or 366 rows only with no issues.
How BI generating the physical query?
the performance problem isn't the "throwing away the unnecessary 364 rows once they are retrieved"...it's that querying for all those rows in the first place takes forever.
a) Select count(distinct emplid)) from fact inner join dim_date where dim_date.date = '12/31/2012' takes about 4 seconds
b) Select dim_date.date, count(distinct emplid) from fact inner join dim_date where dim_date.date between '1/1/2012' and '12/31/2012' group by date --- this takes about 220 seconds
OBIEE SQL is running a query like the 2nd one - and then adds extra SQL to sort and partition the results, then only keep where partition = 1. But the underlying problem is that, completely forgetting the sorting and partitioning, the base query takes more than 50x longer.
I don't understand why OBIEE does this - querying for 365 times too much data and then throwing away 364 of it is just not efficient.
well, that's what "Last" means (and hence implies): find out the values which exist and use the last one.
- materialize the measure(s) (ETL it/them)
- create multiple fact LTSs which are filtered to only ever retrieve the last day-of-month, day-of-quarter, day-of-year and use those to source your logical fact column.
- use a cube ;-)
Edited by: Christian Berg on Mar 20, 2013 10:19 AM
Hi Christian - agree completely, I just wish OBIEE were smart enough to do the "last" bit on the 2000 row time dimension instead of the mutli-million rows fact table...
I've thought about trying to materialize the measures, but I have a hidden "twist" on my data. In our fact table, I do NOT write a record for each day for each employee. Instead, I only write a record when something on the employee fact changes. So instead of having a hard "date" column, I instead have a "start date" and "end date", and then need to join to the reporting date dimension using a BETWEEN. I do this because most records stay unchanged for long periods of time, so instead of 365 records per employee per year, I average under 3 records per employee per year.
I may give the multiple fact LTS a try, but truthfully I'm a big believer in simplicity (mainly because I'm lazy....) so was trying to find something easier.
And honestly, I'd use a cube if I could, I'm a (very) old Express wiennie.
Ah yes...the bane of "bewtween joins" ;)
TBH I think a bit of LTS magic will be the simplest (if you can use this word here) solution and save you from creating a bespoke aggregate table containing monthly aggregates...which in fact would be yet another option: creating a monthly aggregate table that refreshes the current month once per day (to have a progressing LAST) and freezes at month-end.
The way I usually build a date dimension (time dimension for some of you) is to include columns year_ending_day, qtr_ending_day and period_ending_day. These three columns contain the last day of the year, quarter or period/month. With the filter day=year_ending_day you are specifying the last day of the year, when filtering day=qtr_ending_day you are specifying the last day of the quarter and day=period_ending_day means the last day of the period/month.
You create aliases in the RPD to specify whether you want the year_end, quarter_end or period_end and into each of these you put the appropriate filter.
When the dashboard prompt fills in the period | quarter| year the resulting SQL finds the one day you want and only returns these rows.