This content has been marked as final. Show 7 replies
I assume your model is something below
You got columns Date_wid and emp_wid with number type in Fact
In this case count(distinct emp_wid) should work for you with no issues other than datatype.
Whats the datatype for the column emplid?
select date, count(distinct emp_wid)
day joins fact
group by date
should fetch 365 or 366 rows only with no issues.
How BI generating the physical query?
the performance problem isn't the "throwing away the unnecessary 364 rows once they are retrieved"...it's that querying for all those rows in the first place takes forever.
a) Select count(distinct emplid)) from fact inner join dim_date where dim_date.date = '12/31/2012' takes about 4 seconds
b) Select dim_date.date, count(distinct emplid) from fact inner join dim_date where dim_date.date between '1/1/2012' and '12/31/2012' group by date --- this takes about 220 seconds
OBIEE SQL is running a query like the 2nd one - and then adds extra SQL to sort and partition the results, then only keep where partition = 1. But the underlying problem is that, completely forgetting the sorting and partitioning, the base query takes more than 50x longer.
I don't understand why OBIEE does this - querying for 365 times too much data and then throwing away 364 of it is just not efficient.
well, that's what "Last" means (and hence implies): find out the values which exist and use the last one.
- materialize the measure(s) (ETL it/them)
- create multiple fact LTSs which are filtered to only ever retrieve the last day-of-month, day-of-quarter, day-of-year and use those to source your logical fact column.
- use a cube ;-)
Edited by: Christian Berg on Mar 20, 2013 10:19 AM
Hi Christian - agree completely, I just wish OBIEE were smart enough to do the "last" bit on the 2000 row time dimension instead of the mutli-million rows fact table...
I've thought about trying to materialize the measures, but I have a hidden "twist" on my data. In our fact table, I do NOT write a record for each day for each employee. Instead, I only write a record when something on the employee fact changes. So instead of having a hard "date" column, I instead have a "start date" and "end date", and then need to join to the reporting date dimension using a BETWEEN. I do this because most records stay unchanged for long periods of time, so instead of 365 records per employee per year, I average under 3 records per employee per year.
I may give the multiple fact LTS a try, but truthfully I'm a big believer in simplicity (mainly because I'm lazy....) so was trying to find something easier.
And honestly, I'd use a cube if I could, I'm a (very) old Express wiennie.
Ah yes...the bane of "bewtween joins" ;)
TBH I think a bit of LTS magic will be the simplest (if you can use this word here) solution and save you from creating a bespoke aggregate table containing monthly aggregates...which in fact would be yet another option: creating a monthly aggregate table that refreshes the current month once per day (to have a progressing LAST) and freezes at month-end.
We're giving the "build a LTS at every level joining to a summarized date dim that has the last date of the period". I'm sure it'll work, just hate complicating the RPD. Thanks for the advice!
The way I usually build a date dimension (time dimension for some of you) is to include columns year_ending_day, qtr_ending_day and period_ending_day. These three columns contain the last day of the year, quarter or period/month. With the filter day=year_ending_day you are specifying the last day of the year, when filtering day=qtr_ending_day you are specifying the last day of the quarter and day=period_ending_day means the last day of the period/month.
You create aliases in the RPD to specify whether you want the year_end, quarter_end or period_end and into each of these you put the appropriate filter.
When the dashboard prompt fills in the period | quarter| year the resulting SQL finds the one day you want and only returns these rows.