11 Replies Latest reply on Aug 31, 2011 4:23 PM by 851147

# A measure to calculate distinct count

Hi

I saw a couple of threads on count distinct for example Count of Weeks where a certain measure is > 0

I thought it would be better to start a new thread as I guess my problem is slightly different

I have a cube which has 5 dimensions in the fact table

namely commodity, organization,store,supplier,time

I want to count the distinct suppliers across the other 4 dimensions, is it possible ?

If so any ideas on this ?
• ###### 1. Re: A measure to calculate distinct count
I think this is exactly the same problem that was discussed on the thread you mentioned, so you should be able to use the same approach. Specifically

(1) Define a new cube, COUNT_CUBE say, with dimensions commodity, organization, store, supplier, and time.

(2) Define a numeric measure, SUPPLIER_SALES_COUNT say, in the new cube.

(3) Map SUPPLIER_SALES_COUNT to a query that has the value "1" in any fact row for which sales is greater than zero and is null elsewhere. For example you can load it from a view defined as
SELECT
CASE WHEN sales > 0 THEN 1 ELSE NULL END supplier_sales_count,
commodity,
organization,
store,
supplier,
time
FROM fact
WHERE sales > 0
(4) Define the aggregation rules COUNT_CUBE to be MAX over commodity, organization, store, and time, and SUM over supplier. Note that SUM should be the final aggregation.
• ###### 2. Re: A measure to calculate distinct count
Hi David

I acutally want a distinct count of suppliers across these three dimensions
• ###### 3. Re: A measure to calculate distinct count
Can you give an example of what you mean by "across these three dimensions"?
• ###### 4. Re: A measure to calculate distinct count
Hi David

My fact table is a transaction table and it has the following fields

commodity_id organization_id store_id supplier_id time_id

Now my customer wants to view the count of distinct supplier_ids and he wants to be able to filter this count by commodity_id, organzation_id

store_id and time_id.

Moreover all these fields have got hierarchies including the supplier_id.

So what i meant by "across these 3 dimensions" is that I must be able to filter this count measure by these dimensions.

I actually was planning to create a count_fact table as

create table count_fact as
(
select count(distinct supplier_id),commodity_id, organization_id, store_id, time_id
from transaction_table
group by cube (commodity_id, organization_id, store_id, time_id)
)

by using the "group by cube" and then using the count_fact table for cube generation.

But I want to know if there is a way to achieve this in oracle olap itself
• ###### 5. Re: A measure to calculate distinct count
I may be missing something, but what you want sounds like the standard distinct count discussed in the other threads. The only difference from what I outlined above is that you don't want to use a specific measure (e.g. sales), so your fact mapping would be

SELECT
1 supplier_count,
commodity_id,
organization_id,
store_id,
supplier_id,
time_id
FROM transaction_table

You can create a view like this or simply map the SUPPLIER_COUNT measure to the constant 1. Other than that the MAX over all other dimensions and SUM over supplier should do the trick.
• ###### 6. Re: A measure to calculate distinct count
Hi David

Well I have tried out what you said but it doesn't seem to work for me.

My fact table is as follows

Supplier_ID commodity_id organization_id store_id time_id
s1 c1 b1 sc1 2010
s1 c2 b1 sc1 2010
s1 c2 b2 sc1 2010
s1 c2 b2 sc2 2010
s1 c2 b2 sc2 2011

So you can see that the supplier_id is getting repeated due to the various combinations of the other dimensions

I have applied max over all dimensions and sum over supplier in that order

Can you tell me where I have gone wrong ?

PS: I have partitioned the cube by time and I am using parallel processes to maintain the cube
• ###### 7. Re: A measure to calculate distinct count
Hi David

I read your post at Count of Weeks where a certain measure is > 0

which says

You may have hit

BUG 9509395 - MAX AND SUM MIXED OPERATORS IN CC PRODUCE THE WRONG RESULTS

This is fixed in the 11.2.0.1 OLAP B Patch (#10130392)

What version are you using? How exactly are the results wrong? (e.g. numbers too high or too low)

Well what is the meaning of CC in first place and in our case the numbers are too high !!
• ###### 8. Re: A measure to calculate distinct count
CC means "Compressed Composite"

When you create a cube if you select 'Use compression' option on the storage tab AWM will create Compressed cube and if you select any of your dimensions as sparse AWM creates composite.

Edited by: Dileep on Jun 10, 2011 10:20 AM
• ###### 9. Re: A measure to calculate distinct count
Hi Dileep

Thanks for explaining what CC is.
• ###### 10. Re: A measure to calculate distinct count
With the approach suggested by David and after applying the patch the count distinct cube works fine,
thanks a lot everyone
• ###### 11. Re: A measure to calculate distinct count
Hi

I have another query regarding the distinct count.

Now currently I have the distinct count implemented correctly for my supplier dimension.

So my supplier dimension hierarchy has 2 levels in the ascending order as

1. Supplier_Level (this has the actual supplier ids)
2. All_Supplier (this is a level to which all the supplier counts aggregate to)

Now my requirement is that I need to get the distinct supplier count aggregated to the all_supplier level...

what i mean here is that when i query the cube I always state where dim_supplier = 'ALL_SUPPLIER' and hence do not need the details at the supplier_level.

One possible solution is that I could create a view from the cube view which has data only at the All_supplier level.

But is there something that I could do during cube maintainence itself so that the cube view stores the distinct coount at the all_supplier level only ?