Relative filter on Date - Performance
Hello,
It seems that the usage of relative filter on date (last 3 days) is resulting in a sql like this one :
This result in poor performance or even row limit exceeds on very large tables like any Fact table (I suppose that the filter will be applied afterwards in the BI Server).
Does anyone know a trick to avoid that issue ?
Regards.
Jean-Pierre
Answers
-
The Relative Time filter does NOT create new queries, but it adds where clause conditions to the logical/physical queries generated from each visualization.
Where did you grab that physical SQL query?
By looking at it, I suppose that you are using a dataset with multiple tables from a database connection. The physical queries generated from these objects are not the same as the physical queries generated from a proper subject area/semantic model: the BI Server always retrieve all rows from required tables (eventually with filters applied), and THEN join them together (I suppose that this happens because in a dataset you can potentially join tables from different databases, files, etc.)
This is the physical query generated from a dataset with 3 tables (FACT_ORDER_LINE, DIM_PRODUCTS, and DIM_DATE) when I visualize a Quantity by Category bar chart for latest 8 years (using the Relative Time filter):
WITH SAWITH0 AS (select T1000005.QUANTITY as c1, T1000005.SALE_ORDER_DATE_KEY as c2, T1000005.PRODUCT_KEY as c3 from GCBC_PEF.FACT_ORDER_LINE T1000005), SAWITH1 AS (select T1000008.DIM_DATE_KEY as c1 from GCBC_PEF.DIM_DATE T1000008 where ( ADD_MONTHS(TO_TIMESTAMP ('2023-11-22 16:12:58.974', 'YYYY-MM-DD HH24:MI:SS.FF'), -8 * 12 ) < T1000008.DIM_DATE_KEY and T1000008.DIM_DATE_KEY <= TO_TIMESTAMP ('2023-11-22 16:12:58.974', 'YYYY-MM-DD HH24:MI:SS.FF') ) ), SAWITH2 AS (select T1000011.PRODUCT_CATEGORY as c1, T1000011.PRODUCT_KEY as c2 from GCBC_PEF.DIM_PRODUCTS T1000011), SAWITH3 AS (select D3.c1 as c1, D1.c1 as c3 from ( SAWITH0 D1 inner join SAWITH1 D2 On D1.c2 = D2.c1) inner join SAWITH2 D3 On D1.c3 = D3.c2), SAWITH4 AS (select D1000001.c1 as c1, sum(D1000001.c3) as c2 from SAWITH3 D1000001 group by D1000001.c1) select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from ( select 0 as c1, cast(NULL as NUMBER ) as c2, D115.c1 as c3, D115.c2 as c4 from SAWITH4 D115 ) D1 where rownum <= 500001
How can you solve the issue? By modeling your data in the RPD/Semantic Modeler! This is the physical query generated from the same chart, but based on a proper subject area:
WITH SAWITH0 AS (select sum(T43.QUANTITY) as c1, T20.PRODUCT_CATEGORY as c2 from GCBC_PEF.DIM_DATE T40 /* Dim Date */ , GCBC_PEF.DIM_PRODUCTS T20 /* Dim Products */ , GCBC_PEF.FACT_ORDER_LINE T43 /* Fact Order Line */ where ( T20.PRODUCT_KEY = T43.PRODUCT_KEY and T40.DIM_DATE_KEY = T43.SALE_ORDER_DATE_KEY and cast(ADD_MONTHS(TO_DATE('2023-11-22' , 'YYYY-MM-DD'), -8 * 12 ) as DATE ) < T40.DIM_DATE_KEY and cast(ADD_MONTHS(TO_DATE('2023-11-22' , 'YYYY-MM-DD'), -8 * 12 ) as DATE ) < T43.SALE_ORDER_DATE_KEY and T40.DIM_DATE_KEY <= TO_DATE('2023-11-22' , 'YYYY-MM-DD') and T43.SALE_ORDER_DATE_KEY <= TO_DATE('2023-11-22' , 'YYYY-MM-DD') ) group by T20.PRODUCT_CATEGORY) select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from ( select 0 as c1, D1.c2 as c2, cast(NULL as DOUBLE PRECISION ) as c3, D1.c1 as c4 from SAWITH0 D1 ) D1 where rownum <= 500001
0 -
Jean-Pierre,
Your example SQL looks like it is using a dimension table (dim_date), but your performance issue refers to scenarios when using a Fact table. It would be useful to see the SQL where the performance is bad.
If the SQL provided is such an example, I would start off with running Explain Plan on the SQL statement. It doesn't look like it should be having bad performance, unless your index statistics need updating or was disabled or dropped.
0 -
The Relative Time filter does NOT generate new queries, but adds where clause conditions to the logical/physical queries generated from each visualization.
Where did you grab that SQL code?
By looking at it, I suppose that you are using a dataset with multiple tables from a database connection. Am I right? These objects are not the same as subject areas/semantic models, and generate queries in a different way:
- The BI Server issues one query for each dataset table involved in a visualization to retrieve all rows. These queries includes filters as where clause conditions.
- The results are joined together. If the tables come from the same data source, then the join is executed at physical level, otherwise the BI Server does it.
I created a simple Quantity by Category bar chart visualization and applied a Relative Time filter (last 8 years). This is the physical query generated when using a dataset with 3 tables (FACT_ORDER_LINE, DIM_PRODUCTS and DIM_DATE):
WITH SAWITH0 AS (select T1000005.QUANTITY as c1, T1000005.SALE_ORDER_DATE_KEY as c2, T1000005.PRODUCT_KEY as c3 from GCBC_PEF.FACT_ORDER_LINE T1000005), SAWITH1 AS (select T1000008.DIM_DATE_KEY as c1 from GCBC_PEF.DIM_DATE T1000008 where ( ADD_MONTHS(TO_TIMESTAMP ('2023-11-22 17:07:37.787', 'YYYY-MM-DD HH24:MI:SS.FF'), -8 * 12 ) < T1000008.DIM_DATE_KEY and T1000008.DIM_DATE_KEY <= TO_TIMESTAMP ('2023-11-22 17:07:37.787', 'YYYY-MM-DD HH24:MI:SS.FF') ) ), SAWITH2 AS (select T1000011.PRODUCT_CATEGORY as c1, T1000011.PRODUCT_KEY as c2 from GCBC_PEF.DIM_PRODUCTS T1000011), SAWITH3 AS (select D3.c1 as c1, D1.c1 as c3 from ( SAWITH0 D1 inner join SAWITH1 D2 On D1.c2 = D2.c1) inner join SAWITH2 D3 On D1.c3 = D3.c2), SAWITH4 AS (select D1000001.c1 as c1, sum(D1000001.c3) as c2 from SAWITH3 D1000001 group by D1000001.c1) select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from ( select 0 as c1, cast(NULL as NUMBER ) as c2, D115.c1 as c3, D115.c2 as c4 from SAWITH4 D115 ) D1 where rownum <= 500001
This is the physical SQL for the same visualization, but created from a subject area:
WITH SAWITH0 AS (select sum(T43.QUANTITY) as c1, T20.PRODUCT_CATEGORY as c2 from GCBC_PEF.DIM_DATE T40 /* Dim Date */ , GCBC_PEF.DIM_PRODUCTS T20 /* Dim Products */ , GCBC_PEF.FACT_ORDER_LINE T43 /* Fact Order Line */ where ( T20.PRODUCT_KEY = T43.PRODUCT_KEY and T40.DIM_DATE_KEY = T43.SALE_ORDER_DATE_KEY and cast(ADD_MONTHS(TO_DATE('2023-11-22' , 'YYYY-MM-DD'), -8 * 12 ) as DATE ) < T40.DIM_DATE_KEY and cast(ADD_MONTHS(TO_DATE('2023-11-22' , 'YYYY-MM-DD'), -8 * 12 ) as DATE ) < T43.SALE_ORDER_DATE_KEY and T40.DIM_DATE_KEY <= TO_DATE('2023-11-22' , 'YYYY-MM-DD') and T43.SALE_ORDER_DATE_KEY <= TO_DATE('2023-11-22' , 'YYYY-MM-DD') ) group by T20.PRODUCT_CATEGORY) select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from ( select 0 as c1, D1.c2 as c2, cast(NULL as DOUBLE PRECISION ) as c3, D1.c1 as c4 from SAWITH0 D1 ) D1 where rownum <= 500001
How can you avoid the issue? Model your data using the Model Administration tool/Semantic modeler instead of creating datasets with multiple tables!
0 -
Hello,
Please find in attachment th detail of the queries generated :
1) without any filter (so giving the complete year of 2023);
2) with a explicit filter on 3 days (from 20/11 to 22/11);
3) with a relative filter of 4 days to get the days from 20/11 to 22/11).
The last one give us an error, number of rows exceeded.
Regards.
Jean-Pierre
0 -
Hello,
Does anyone have some explanations since we are using a semantic model and not a dataset ?
Thanks.
Jean-Pierre
0 -
I believe that nobody is willing to open a docx file shared on a forum. I suggest to share a PDF file instead, or screenshots
0 -
Hi Federico,
I Apologize for that, here is the PDF.
Best regards.
Jean-Pierre
0 -
In the picture you posted at the beginning, is that the LSQL for something where you selected to see only the "last 3 days" ?
Because to me, if the 2023-11-22 you select last 3 days I would expect a filter being
"date" > 2023-11-19
(19 being 3 days before 22) and not that kind of filter. Because that query is just loading everything you have in your table as long as it isn't in the future (which for most fact tables will not filter a single thing out, only in case of forecasted numbers you could have facts in the future already).Where / how are you defining that filter "last 3 days" in your analysis?
Looking at your queries in the file, that's exactly the behavior: the "between" query push down to the database the exact range of dates you look for, while the "last 4 days" is asking the database to return every record up to to 2023-11-23 (and I guess it's just the date when you did run the example, and not something you wrote yourself).
In this context, stay away from that "last 4 days" filter, because it isn't working correctly at all! Any date filter should be pushed down to the database, otherwise you will move all your data from the database to the BI Server first, and then kill the BI Server after by making it behave like a database (and that's something you will pay with high CPU and memory usage on your BI Server, plus huge temp files on disk probably...).
0 -
Hello Gianni,
You will find the screenshots in th following attachment.
Just to be clear, i'am using a semantic model for tthat visualization.
Regard.
Jean-Pierre
0 -
Which version of OAC or OAS are you currently using?
I tried to replicate the same on my OAC instance (September 2023 Update), and the filters are applied as expected. It smells like a bug to me
0 -
Hi Federico,
We are using the november release.
Regards.
Jean-Pierre
0 -
I repeated the test on a newly create OAC instance with the November update, and again the filter is working as expected.
I noticed that you are using the ODBC 3.5 connection type to access your physical data sources. I highly recommend you to select a native driver instead if possible (e.g. OCI if your data is on Oracle database), and then repeat the test. Native drivers are expected to generate better physical queries compared to generic ones like ODBC.. the SQL generated when you apply the Relative Time filter does not look right, so the two things could be related.
Another test that you can do is to replace the Relative Time filter with an Expression filter such as
DateColumn BETWEEN TIMESTAMPADD(SQL_TSI_DAY, -4, CURRENT_DATE) AND CURRENT_DATE
0 -
Hello Federico,
I have done some new tests from scratch with a new semantic model based on our Redshift connection.
The issue is still the same :
- the Redshift connection seems to use the ODBC 3.5 connection type !
- the relative filter use the following where clause : T5."date" <= TO_DATE('2023-11-27' , 'YYYY-MM-DD')
- using an expression filter BETWEEN TIMESTAMPADD(SQL_TSI_DAY, -4, CURRENT_DATE) AND CURRENT_DATE generates the same where clause.
You will find the screenshots and logs in the PDF.
Regards.
Jean-Pierre
0 -
I would try replacing the "BETWEEN" expression filter with 2 separate expressions:
- TIMESTAMPADD(SQL_TSI_DAY, -4, CURRENT_DATE) >= DateColumn
- DateColumn <= CURRENT_DATE
However, that behaviour is really weird and smells like a bug to me. I would raise a SR.
0 -
Jean-Pierre,
I would raise an SR reporting a bug.
You are using a feature that other users will probably use, and this lead to a suboptimal (to say the least) query generation.
It can be an issue with the database features, but if you didn't change the defaults, it should already work at the best with the default parameters.
You aren't in charge of generating the query, the tool is, raise the point in a SR with Oracle and they will hopefully fix it in the coming release.
1 -
Thank you Federico and Gianni, I have raised a Service Request.
Best regards.
Jean-Pierre
0