Oracle Analytics Cloud and Server

Welcome to the Oracle Analytics Community: Please complete your User Profile and upload your Profile Picture

Relative filter on Date - Performance

Received Response
235
Views
16
Comments

Hello,

It seems that the usage of relative filter on date (last 3 days) is resulting in a sql like this one :

This result in poor performance or even row limit exceeds on very large tables like any Fact table (I suppose that the filter will be applied afterwards in the BI Server).

Does anyone know a trick to avoid that issue ?

Regards.

Jean-Pierre

Welcome!

It looks like you're new here. Sign in or register to get started.
«1

Answers

  • Rank 7 - Analytics Coach
    edited November 2023

    Hi @Jean-Pierre Hoedenaeken ,

    The Relative Time filter does NOT create new queries, but it adds where clause conditions to the logical/physical queries generated from each visualization.

    Where did you grab that physical SQL query?

    By looking at it, I suppose that you are using a dataset with multiple tables from a database connection. The physical queries generated from these objects are not the same as the physical queries generated from a proper subject area/semantic model: the BI Server always retrieve all rows from required tables (eventually with filters applied), and THEN join them together (I suppose that this happens because in a dataset you can potentially join tables from different databases, files, etc.)

    This is the physical query generated from a dataset with 3 tables (FACT_ORDER_LINE, DIM_PRODUCTS, and DIM_DATE) when I visualize a Quantity by Category bar chart for latest 8 years (using the Relative Time filter):

    WITH 
    SAWITH0 AS (select T1000005.QUANTITY as c1,
         T1000005.SALE_ORDER_DATE_KEY as c2,
         T1000005.PRODUCT_KEY as c3
    from 
         GCBC_PEF.FACT_ORDER_LINE T1000005),
    SAWITH1 AS (select T1000008.DIM_DATE_KEY as c1
    from 
         GCBC_PEF.DIM_DATE T1000008
    where  ( ADD_MONTHS(TO_TIMESTAMP ('2023-11-22 16:12:58.974', 'YYYY-MM-DD HH24:MI:SS.FF'), -8 * 12 ) < T1000008.DIM_DATE_KEY and T1000008.DIM_DATE_KEY <= TO_TIMESTAMP ('2023-11-22 16:12:58.974', 'YYYY-MM-DD HH24:MI:SS.FF') ) ),
    SAWITH2 AS (select T1000011.PRODUCT_CATEGORY as c1,
         T1000011.PRODUCT_KEY as c2
    from 
         GCBC_PEF.DIM_PRODUCTS T1000011),
    SAWITH3 AS (select D3.c1 as c1,
         D1.c1 as c3
    from 
         (
              SAWITH0 D1 inner join SAWITH1 D2 On D1.c2 = D2.c1) inner join SAWITH2 D3 On D1.c3 = D3.c2),
    SAWITH4 AS (select D1000001.c1 as c1,
         sum(D1000001.c3) as c2
    from 
         SAWITH3 D1000001
    group by D1000001.c1)
    select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from ( select 0 as c1,
         cast(NULL as  NUMBER  ) as c2,
         D115.c1 as c3,
         D115.c2 as c4
    from 
         SAWITH4 D115 ) D1 where rownum <= 500001
    


    How can you solve the issue? By modeling your data in the RPD/Semantic Modeler! This is the physical query generated from the same chart, but based on a proper subject area:

    WITH 
    SAWITH0 AS (select sum(T43.QUANTITY) as c1,
         T20.PRODUCT_CATEGORY as c2
    from 
         GCBC_PEF.DIM_DATE T40 /* Dim Date */ ,
         GCBC_PEF.DIM_PRODUCTS T20 /* Dim Products */ ,
         GCBC_PEF.FACT_ORDER_LINE T43 /* Fact Order Line */ 
    where  ( T20.PRODUCT_KEY = T43.PRODUCT_KEY and T40.DIM_DATE_KEY = T43.SALE_ORDER_DATE_KEY and  cast(ADD_MONTHS(TO_DATE('2023-11-22' , 'YYYY-MM-DD'), -8 * 12 ) as DATE ) < T40.DIM_DATE_KEY and  cast(ADD_MONTHS(TO_DATE('2023-11-22' , 'YYYY-MM-DD'), -8 * 12 ) as DATE ) < T43.SALE_ORDER_DATE_KEY and T40.DIM_DATE_KEY <= TO_DATE('2023-11-22' , 'YYYY-MM-DD') and T43.SALE_ORDER_DATE_KEY <= TO_DATE('2023-11-22' , 'YYYY-MM-DD') ) 
    group by T20.PRODUCT_CATEGORY)
    select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from ( select 0 as c1,
         D1.c2 as c2,
         cast(NULL as  DOUBLE PRECISION  ) as c3,
         D1.c1 as c4
    from 
         SAWITH0 D1 ) D1 where rownum <= 500001
    
  • Rank 3 - Community Apprentice

    Jean-Pierre,

    Your example SQL looks like it is using a dimension table (dim_date), but your performance issue refers to scenarios when using a Fact table. It would be useful to see the SQL where the performance is bad.

    If the SQL provided is such an example, I would start off with running Explain Plan on the SQL statement. It doesn't look like it should be having bad performance, unless your index statistics need updating or was disabled or dropped.

  • Rank 7 - Analytics Coach

    Hi @Jean-Pierre Hoedenaeken ,

    The Relative Time filter does NOT generate new queries, but adds where clause conditions to the logical/physical queries generated from each visualization.

    Where did you grab that SQL code?

    By looking at it, I suppose that you are using a dataset with multiple tables from a database connection. Am I right? These objects are not the same as subject areas/semantic models, and generate queries in a different way:

    1. The BI Server issues one query for each dataset table involved in a visualization to retrieve all rows. These queries includes filters as where clause conditions.
    2. The results are joined together. If the tables come from the same data source, then the join is executed at physical level, otherwise the BI Server does it.

    I created a simple Quantity by Category bar chart visualization and applied a Relative Time filter (last 8 years). This is the physical query generated when using a dataset with 3 tables (FACT_ORDER_LINE, DIM_PRODUCTS and DIM_DATE):

    WITH 
    SAWITH0 AS (select T1000005.QUANTITY as c1,
         T1000005.SALE_ORDER_DATE_KEY as c2,
         T1000005.PRODUCT_KEY as c3
    from 
         GCBC_PEF.FACT_ORDER_LINE T1000005),
    SAWITH1 AS (select T1000008.DIM_DATE_KEY as c1
    from 
         GCBC_PEF.DIM_DATE T1000008
    where  ( ADD_MONTHS(TO_TIMESTAMP ('2023-11-22 17:07:37.787', 'YYYY-MM-DD HH24:MI:SS.FF'), -8 * 12 ) < T1000008.DIM_DATE_KEY and T1000008.DIM_DATE_KEY <= TO_TIMESTAMP ('2023-11-22 17:07:37.787', 'YYYY-MM-DD HH24:MI:SS.FF') ) ),
    SAWITH2 AS (select T1000011.PRODUCT_CATEGORY as c1,
         T1000011.PRODUCT_KEY as c2
    from 
         GCBC_PEF.DIM_PRODUCTS T1000011),
    SAWITH3 AS (select D3.c1 as c1,
         D1.c1 as c3
    from 
         (
              SAWITH0 D1 inner join SAWITH1 D2 On D1.c2 = D2.c1) inner join SAWITH2 D3 On D1.c3 = D3.c2),
    SAWITH4 AS (select D1000001.c1 as c1,
         sum(D1000001.c3) as c2
    from 
         SAWITH3 D1000001
    group by D1000001.c1)
    select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from ( select 0 as c1,
         cast(NULL as  NUMBER  ) as c2,
         D115.c1 as c3,
         D115.c2 as c4
    from 
         SAWITH4 D115 ) D1 where rownum <= 500001
    


    This is the physical SQL for the same visualization, but created from a subject area:

    WITH 
    SAWITH0 AS (select sum(T43.QUANTITY) as c1,
         T20.PRODUCT_CATEGORY as c2
    from 
         GCBC_PEF.DIM_DATE T40 /* Dim Date */ ,
         GCBC_PEF.DIM_PRODUCTS T20 /* Dim Products */ ,
         GCBC_PEF.FACT_ORDER_LINE T43 /* Fact Order Line */ 
    where  ( T20.PRODUCT_KEY = T43.PRODUCT_KEY and T40.DIM_DATE_KEY = T43.SALE_ORDER_DATE_KEY and  cast(ADD_MONTHS(TO_DATE('2023-11-22' , 'YYYY-MM-DD'), -8 * 12 ) as DATE ) < T40.DIM_DATE_KEY and  cast(ADD_MONTHS(TO_DATE('2023-11-22' , 'YYYY-MM-DD'), -8 * 12 ) as DATE ) < T43.SALE_ORDER_DATE_KEY and T40.DIM_DATE_KEY <= TO_DATE('2023-11-22' , 'YYYY-MM-DD') and T43.SALE_ORDER_DATE_KEY <= TO_DATE('2023-11-22' , 'YYYY-MM-DD') ) 
    group by T20.PRODUCT_CATEGORY)
    select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from ( select 0 as c1,
         D1.c2 as c2,
         cast(NULL as  DOUBLE PRECISION  ) as c3,
         D1.c1 as c4
    from 
         SAWITH0 D1 ) D1 where rownum <= 500001
    


    How can you avoid the issue? Model your data using the Model Administration tool/Semantic modeler instead of creating datasets with multiple tables!

  • Rank 6 - Analytics Lead

    Hello,

    Please find in attachment th detail of the queries generated :

    1) without any filter (so giving the complete year of 2023);

    2) with a explicit filter on 3 days (from 20/11 to 22/11);

    3) with a relative filter of 4 days to get the days from 20/11 to 22/11).


    The last one give us an error, number of rows exceeded.

    Regards.

    Jean-Pierre

  • Rank 6 - Analytics Lead

    Hello,

    Does anyone have some explanations since we are using a semantic model and not a dataset ?

    Thanks.

    Jean-Pierre

  • Rank 7 - Analytics Coach

    Hi @Jean-Pierre Hoedenaeken ,

    I believe that nobody is willing to open a docx file shared on a forum. I suggest to share a PDF file instead, or screenshots

  • Rank 6 - Analytics Lead

    Hi Federico,

    I Apologize for that, here is the PDF.

    Best regards.

    Jean-Pierre

  • @Jean-Pierre Hoedenaeken ,

    In the picture you posted at the beginning, is that the LSQL for something where you selected to see only the "last 3 days" ?

    Because to me, if the 2023-11-22 you select last 3 days I would expect a filter being "date" > 2023-11-19 (19 being 3 days before 22) and not that kind of filter. Because that query is just loading everything you have in your table as long as it isn't in the future (which for most fact tables will not filter a single thing out, only in case of forecasted numbers you could have facts in the future already).

    Where / how are you defining that filter "last 3 days" in your analysis?

    Looking at your queries in the file, that's exactly the behavior: the "between" query push down to the database the exact range of dates you look for, while the "last 4 days" is asking the database to return every record up to to 2023-11-23 (and I guess it's just the date when you did run the example, and not something you wrote yourself).

    In this context, stay away from that "last 4 days" filter, because it isn't working correctly at all! Any date filter should be pushed down to the database, otherwise you will move all your data from the database to the BI Server first, and then kill the BI Server after by making it behave like a database (and that's something you will pay with high CPU and memory usage on your BI Server, plus huge temp files on disk probably...).

  • Rank 6 - Analytics Lead

    Hello Gianni,

    You will find the screenshots in th following attachment.

    Just to be clear, i'am using a semantic model for tthat visualization.

    Regard.

    Jean-Pierre

  • Rank 7 - Analytics Coach

    Hi @Jean-Pierre Hoedenaeken ,

    Which version of OAC or OAS are you currently using?

    I tried to replicate the same on my OAC instance (September 2023 Update), and the filters are applied as expected. It smells like a bug to me

Welcome!

It looks like you're new here. Sign in or register to get started.