11 Replies Latest reply: Aug 31, 2011 11:23 AM by 851147 RSS

    A measure to calculate distinct count

    851147
      Hi

      I saw a couple of threads on count distinct for example Count of Weeks where a certain measure is > 0

      I thought it would be better to start a new thread as I guess my problem is slightly different

      I have a cube which has 5 dimensions in the fact table

      namely commodity, organization,store,supplier,time

      I want to count the distinct suppliers across the other 4 dimensions, is it possible ?

      If so any ideas on this ?
        • 1. Re: A measure to calculate distinct count
          DavidGreenfield
          I think this is exactly the same problem that was discussed on the thread you mentioned, so you should be able to use the same approach. Specifically

          (1) Define a new cube, COUNT_CUBE say, with dimensions commodity, organization, store, supplier, and time.

          (2) Define a numeric measure, SUPPLIER_SALES_COUNT say, in the new cube.

          (3) Map SUPPLIER_SALES_COUNT to a query that has the value "1" in any fact row for which sales is greater than zero and is null elsewhere. For example you can load it from a view defined as
          SELECT
            CASE WHEN sales > 0 THEN 1 ELSE NULL END supplier_sales_count,
            commodity, 
            organization, 
            store, 
            supplier, 
            time
          FROM fact
          WHERE sales > 0
          (4) Define the aggregation rules COUNT_CUBE to be MAX over commodity, organization, store, and time, and SUM over supplier. Note that SUM should be the final aggregation.
          • 2. Re: A measure to calculate distinct count
            851147
            Hi David

            I acutally want a distinct count of suppliers across these three dimensions
            • 3. Re: A measure to calculate distinct count
              DavidGreenfield
              Can you give an example of what you mean by "across these three dimensions"?
              • 4. Re: A measure to calculate distinct count
                851147
                Hi David

                My fact table is a transaction table and it has the following fields

                commodity_id organization_id store_id supplier_id time_id

                Now my customer wants to view the count of distinct supplier_ids and he wants to be able to filter this count by commodity_id, organzation_id

                store_id and time_id.

                Moreover all these fields have got hierarchies including the supplier_id.

                So what i meant by "across these 3 dimensions" is that I must be able to filter this count measure by these dimensions.

                I actually was planning to create a count_fact table as

                create table count_fact as
                (
                select count(distinct supplier_id),commodity_id, organization_id, store_id, time_id
                from transaction_table
                group by cube (commodity_id, organization_id, store_id, time_id)
                )


                by using the "group by cube" and then using the count_fact table for cube generation.

                But I want to know if there is a way to achieve this in oracle olap itself
                • 5. Re: A measure to calculate distinct count
                  DavidGreenfield
                  I may be missing something, but what you want sounds like the standard distinct count discussed in the other threads. The only difference from what I outlined above is that you don't want to use a specific measure (e.g. sales), so your fact mapping would be

                  SELECT
                  1 supplier_count,
                  commodity_id,
                  organization_id,
                  store_id,
                  supplier_id,
                  time_id
                  FROM transaction_table

                  You can create a view like this or simply map the SUPPLIER_COUNT measure to the constant 1. Other than that the MAX over all other dimensions and SUM over supplier should do the trick.
                  • 6. Re: A measure to calculate distinct count
                    851147
                    Hi David

                    Well I have tried out what you said but it doesn't seem to work for me.

                    My fact table is as follows

                    Supplier_ID commodity_id organization_id store_id time_id
                    s1 c1 b1 sc1 2010
                    s1 c2 b1 sc1 2010
                    s1 c2 b2 sc1 2010
                    s1 c2 b2 sc2 2010
                    s1 c2 b2 sc2 2011

                    So you can see that the supplier_id is getting repeated due to the various combinations of the other dimensions

                    I have applied max over all dimensions and sum over supplier in that order

                    Can you tell me where I have gone wrong ?

                    PS: I have partitioned the cube by time and I am using parallel processes to maintain the cube
                    • 7. Re: A measure to calculate distinct count
                      851147
                      Hi David

                      I read your post at Count of Weeks where a certain measure is > 0

                      which says

                      You may have hit

                      BUG 9509395 - MAX AND SUM MIXED OPERATORS IN CC PRODUCE THE WRONG RESULTS

                      This is fixed in the 11.2.0.1 OLAP B Patch (#10130392)

                      What version are you using? How exactly are the results wrong? (e.g. numbers too high or too low)


                      Well what is the meaning of CC in first place and in our case the numbers are too high !!
                      • 8. Re: A measure to calculate distinct count
                        Dileep
                        CC means "Compressed Composite"

                        When you create a cube if you select 'Use compression' option on the storage tab AWM will create Compressed cube and if you select any of your dimensions as sparse AWM creates composite.

                        Edited by: Dileep on Jun 10, 2011 10:20 AM
                        • 9. Re: A measure to calculate distinct count
                          851147
                          Hi Dileep

                          Thanks for explaining what CC is.
                          • 10. Re: A measure to calculate distinct count
                            851147
                            With the approach suggested by David and after applying the patch the count distinct cube works fine,
                            thanks a lot everyone
                            • 11. Re: A measure to calculate distinct count
                              851147
                              Hi

                              I have another query regarding the distinct count.

                              Now currently I have the distinct count implemented correctly for my supplier dimension.

                              So my supplier dimension hierarchy has 2 levels in the ascending order as

                              1. Supplier_Level (this has the actual supplier ids)
                              2. All_Supplier (this is a level to which all the supplier counts aggregate to)

                              Now my requirement is that I need to get the distinct supplier count aggregated to the all_supplier level...

                              what i mean here is that when i query the cube I always state where dim_supplier = 'ALL_SUPPLIER' and hence do not need the details at the supplier_level.

                              One possible solution is that I could create a view from the cube view which has data only at the All_supplier level.

                              But is there something that I could do during cube maintainence itself so that the cube view stores the distinct coount at the all_supplier level only ?