Forum Stats

  • 3,768,008 Users
  • 2,252,739 Discussions
  • 7,874,407 Comments

Discussions

Fetching CLOB column Faster

User_OCZ1T
User_OCZ1T Member Posts: 1,923 Gold Badge
edited Aug 5, 2019 11:44AM in General Database Discussions

Hi, We are using version 11.2.0.4 of oracle exadata. We have below query which fetch the data and is getting executed from informatica. And its just a "SELECT * from TAB1" query. It was taking ~1hr for fetching ~135k rows and then from the sql monitor we found all the time was just spent on client for fetching data. And then we got to know it has one CLOB column which is causing the issue, if we comment the CLOB column(C10) , the data fetch is finishing in few seconds. So as an alternative we were using below SUBSTR option to fetch the column C10 and it was helping us to finish the query in few seconds. But suddenly we got to see failure for this query with error "ORA-06502: PL/SQL: numeric or value error: character string buffer too small" and then its found its failing because of few of the values came into column C10 which were holding values >4000 bytes.  So want to understand if there is any alternate way we can fetch the clob column here without fail and for large value(>4000bytes)?

Query:-

SELECT c1,c2,c3...c39 FROM TAB1;

Alternate option to fetch column C10:-

DBMS_LOB.SUBSTR (C10,(SELECT MAX (LENGTH (C10)) FROM TAB1)) C10

Error:-

ORA-06502: PL/SQL: numeric or value error: character string buffer too small

ORA-06512: at line 1

Below is the sql monitor for one of the slow execution which we used to see with CLOB column fetched in full:-

Query:- SELECT c1,c2,c3...c39 FROM TAB1;Global Information------------------------------ Status              :  EXECUTING                  Instance ID         :  4                          SQL Execution ID    :  67108864                   Execution Started   :  04/09/2018 06:02:49        First Refresh Time  :  04/09/2018 06:02:49        Last Refresh Time   :  04/09/2018 06:40:45        Duration            :  2277s                      Module/Action       :  SQL*Plus/-                 Program             :  sqlplus.exe                Fetch Calls         :  26415                     Global Stats=================================================| Elapsed |   Cpu   | Cluster  | Fetch | Buffer || Time(s) | Time(s) | Waits(s) | Calls |  Gets  |=================================================|    0.69 |    0.69 |     0.01 | 26415 |  27031 |=================================================SQL Plan Monitoring Details (Plan Hash Value=2531190874)============================================================================================================================================================| Id   |          Operation          |        Name        |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Activity | Activity Detail | Progress ||      |                             |                    | (Estim) |      | Active(s) | Active |       | (Actual) |   (%)    |   (# samples)   |          |============================================================================================================================================================| -> 0 | SELECT STATEMENT            |                    |         |      |      2278 |     +0 |     1 |    26417 |          |                 |          || -> 1 |   TABLE ACCESS STORAGE FULL | TAB1               |    135K | 7212 |      2278 |     +0 |     1 |    26417 |          |                 |       6% |============================================================================================================================================================
Mustafa_KALAYCIDom Brooks

Best Answer

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Jul 31, 2019 8:24AM Accepted Answer
    933257 wrote:Initially i thought you are suggesting to have a UNION but then i realized as the SUBSTR will be VARCHAR data type vs the other part of the query will be a CLOB, so it has to be fetched twice by informatica. Correct me if wrong.And also i tested the query by setting high value of longchunksize and long but that didn't affect the query performance. So other one i was thinking, if i should test this by setting the CACHE option for the CLOB storage and if that will benefit us?

    Exactly, unioning will need to use the same data type so won't work too well. Although if you can make it project the clob as null for the short lines it might be okay - would probably confuse informatica though!

    I meant run:

    select ..substr(clobcol,1,4000) my_clob from my_table where dbms_lob.getLength(clobcol)<=4000

    and

    select ..clobcol my_clob from my_table where dbms_lob.getLength(clobcol)>4000

    Assuming 1 byte characters.

    I don't think cache will help, the problem is in the network round trips from what I can tell

«1

Answers

  • Dom Brooks
    Dom Brooks Member Posts: 5,552 Silver Crown
    edited Jul 30, 2019 7:02AM
    if we comment the CLOB column(C10) , the data fetch is finishing in few seconds.

    Is this surprising?

    A CLOB is a large object.

    Exactly how long it takes to fetch will depend on how large the object is. The larger the object, the longer it will take.

    And it will depend on client/network settings - you will basically be roundtripping on the CLOB.

    If you traced this you would see exactly how many round trips etc.

    Some client software has different options in how to deal with CLOBs.

    There also might be some influence on whether internally CLOB on table is set to CACHE / NOCACHE etc.

    So as an alternative we were using below SUBSTR option to fetch the column C10 and it was helping us to finish the query in few seconds

    Really? Why would this be a valid way of returning an entire CLOB?

    If it really is effective in limited circumstances then it would only work for CLOBS < 4000 characters because you're then returning a VARCHAR2 which means that you are probably fetching that data differently in the client software which deals with large objects via lob locators - which is all you can do once it is actually "large". And it will predictably fail for anything > 4000 characters.

    Investigate what options informatica has for dealing with clobs and what network settings you are currently using.

    Do some tracing with current setup so you can see exactly what the problem is (e.g. network round tripping mainly with some physical IO probably) and then you can see what effect any change you make has.

    Mustafa_KALAYCI
  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,786 Gold Crown
    edited Jul 30, 2019 11:50AM

    One of the features of selecting LOBs is that the client code probably has to fetch the LOB Locator and then fetch the LOB data one item at a time.  Then there's probably a fixed size buffer the client uses for fetching the LOB so a single LOB may require multiple round-trips before the whole thing is acquired.

    At some point in the past SQL*Plus used to fetch 80 bytes of a LOB at a time, and if the LOB was declared "nocache" each 80 byte chunk would require the block to be re-read.  The "workaround" to this was to "set long" and "set longchunksize" in SQL*Plus to larger values - possibly there's a similar configuration issue that you need to review in Informatica.

    As a test of hypothesis you could start the query running and check how many "SQL*Net message to client" and "SQL*Net more data to client" it takes to move a small subset of your larger LOBs.

    Regards

    Jonathan Lewis

  • User_OCZ1T
    User_OCZ1T Member Posts: 1,923 Gold Badge
    edited Jul 30, 2019 2:53PM

    As i see the CLOB column definition its saying " ENABLE STORAGE IN ROW  CHUNK  8192  RETENTION  NOCACHE". So in table its defined as NOCACHE.

    In the last posted sql monitor the number of fetch calls was 26415 for 26417 rows in rows(Actual) column of the sql monitor.

    I tried fetching the number of rows which really has rows having length>4000 bytes, and it turns out to be very less(i.e 81 out of 124k).

    SELECT max(length(C10)), count(*) FROM TAB1 where LENGTH (C10)<4000

    MAX(LENGTH(C10))    COUNT(*)

    3823                     124876

    SELECT max(length(C10)), count(*) FROM TAB1

    MAX(LENGTH(C10))    COUNT(*)

    18799                    124957

    Now comparing fetching CLOB column vs Varchar2(4000), i tried to test/compare the performance for rows with <4000 byte length. Still seeing the CLOB column fetch is significantly slower, why? Isn't it true that in below two scenarios we are  fetching same amount of data from CLOB column, then why the CLOB fetch is so much slower. I am able to see the same effect when i run it from sqlplus client.

    Fetching CLOB column -C10 as it is.:- SQL Text------------------------------SELECT /*+monitor*/c1,c2,c3...TAB1.C10 FROM TAB1 where LENGTH (C10)<4000Global Information------------------------------ Status              :  EXECUTING                  Instance ID         :  4                          SQL ID              :  bwyw0v1h3pgxq              SQL Execution ID    :  67108864                   Execution Started   :  07/30/2019 12:45:54        First Refresh Time  :  07/30/2019 12:45:54        Last Refresh Time   :  07/30/2019 12:57:25        Duration            :  691s                       Module/Action       :  SQL*Plus/-                 Program             :  sqlplus.exe                Fetch Calls         :  8526                      Global Stats=================================================| Elapsed |   Cpu   |  Other   | Fetch | Buffer || Time(s) | Time(s) | Waits(s) | Calls |  Gets  |=================================================|    0.30 |    0.03 |     0.27 |  8526 |   8794 |=================================================SQL Plan Monitoring Details (Plan Hash Value=2531190874)============================================================================================================================================================| Id   |          Operation          |        Name        |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Activity | Activity Detail | Progress ||      |                             |                    | (Estim) |      | Active(s) | Active |       | (Actual) |   (%)    |   (# samples)   |          |============================================================================================================================================================| -> 0 | SELECT STATEMENT            |                    |         |      |       693 |     +0 |     1 |     8534 |          |                 |          || -> 1 |   TABLE ACCESS STORAGE FULL | TAB1               |    6248 | 8594 |       693 |     +0 |     1 |     8534 |          |                 |       2% |============================================================================================================================================================Fetching CLOB column -C10 using SUBSTR function:- SQL Monitoring ReportSQL Text------------------------------SELECT /*+monitor*/c1,c2,..DBMS_LOB.SUBSTR (C10,(SELECT MAX (LENGTH (C10)) FROM TAB1)) C10 FROM TAB1 where LENGTH (C10)<4000Global Information------------------------------ Status              :  DONE (ALL ROWS)            Instance ID         :  4                          SQL ID              :  g0ya736m3sgdu              SQL Execution ID    :  67108864                   Execution Started   :  07/30/2019 12:53:38        First Refresh Time  :  07/30/2019 12:53:38        Last Refresh Time   :  07/30/2019 12:53:45        Duration            :  7s                         Module/Action       :  SQL*Plus/-                 Program             :  sqlplus.exe                Fetch Calls         :  26                        Global Stats===================================================================================================| Elapsed |   Cpu   |    IO    | Concurrency | PL/SQL  |  Other   | Fetch | Buffer | Read | Read  || Time(s) | Time(s) | Waits(s) |  Waits(s)   | Time(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes |===================================================================================================|    5.11 |    1.58 |     0.78 |        0.00 |    1.22 |     2.76 |    26 |  81960 |  151 |   1MB |===================================================================================================SQL Plan Monitoring Details (Plan Hash Value=3132348012)================================================================================================================================================================| Id |          Operation           |        Name        |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Activity |         Activity Detail         ||    |                              |                    | (Estim) |      | Active(s) | Active |       | (Actual) |   (%)    |           (# samples)           |================================================================================================================================================================|  0 | SELECT STATEMENT             |                    |         |      |         6 |     +1 |     1 |        1 |    83.33 | Cpu (1)                         ||    |                              |                    |         |      |           |        |       |          |          | SQL*Net more data to client (3) ||    |                              |                    |         |      |           |        |       |          |          | direct path read (1)            ||  1 |   SORT AGGREGATE             |                    |       1 |      |         1 |     +1 |     1 |        1 |          |                                 ||  2 |    TABLE ACCESS STORAGE FULL | TAB1               |    125K | 7212 |         1 |     +1 |     1 |     125K |          |                                 ||  3 |   TABLE ACCESS STORAGE FULL  | TAB1               |    6248 | 8594 |         7 |     +1 |     1 |     125K |    16.67 | Cpu (1)                         |================================================================================================================================================================Predicate Information (identified by operation id):---------------------------------------------------   3 - filter(LENGTH("C10")<4000)Statistics----------------------------------------------------------         47  recursive calls          0  db block gets      81999  consistent gets        163  physical reads        572  redo size   32115212  bytes sent via SQL*Net to client        736  bytes received via SQL*Net from client         26  SQL*Net roundtrips to/from client          0  sorts (memory)          0  sorts (disk)     124876  rows processed
  • User_OCZ1T
    User_OCZ1T Member Posts: 1,923 Gold Badge
    edited Jul 30, 2019 3:26PM

    One thing i noticed, irrespective of the type of lob value, its taking same amount of time. I mean to say, if i run the "select query for ~81 rows which holds larger CLOB value(length>4000bytes)" and compare the time with the "select query for other ~81 rows having length <4000 bytes " both are consuming same amount of time.

    Tried running the query for ~1000 rows having rows with length<4000bytes, by setting "set longchunksize 200000" and "set long 200000", but seeing similar timing as below(i.e. without any of these set at sqlplus).

    Fetched the CLOB column as it is for ~1000 rows with arraysize  of 5000 in sqlplus, below is the sql monitor for same.

    Edited:- I am not sure if changing the storage option to CACHE will help us here, but will try to test it on Dev.

    Global Information------------------------------ Status              :  DONE (ALL ROWS)            Instance ID         :  4                          SQL ID              :  5mr5nx1x81dfg              SQL Execution ID    :  67108864                   Execution Started   :  07/30/2019 14:57:58        First Refresh Time  :  07/30/2019 14:57:58        Last Refresh Time   :  07/30/2019 14:59:23        Duration            :  85s                        Module/Action       :  SQL*Plus/-                 Program             :  sqlplus.exe                Fetch Calls         :  1000                      Global Stats=================================================| Elapsed |   Cpu   |  Other   | Fetch | Buffer || Time(s) | Time(s) | Waits(s) | Calls |  Gets  |=================================================|    0.03 |    0.00 |     0.03 |  1000 |   1065 |=================================================SQL Plan Monitoring Details (Plan Hash Value=4129443724)===========================================================================================================================================================| Id |                Operation                |        Name        |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Activity | Activity Detail ||    |                                         |                    | (Estim) |      | Active(s) | Active |       | (Actual) |   (%)    |   (# samples)   |===========================================================================================================================================================|  0 | SELECT STATEMENT                        |                    |         |      |        86 |     +0 |     1 |      999 |          |                 ||  1 |   COUNT STOPKEY                         |                    |         |      |        86 |     +0 |     1 |      999 |          |                 ||  2 |    TABLE ACCESS STORAGE FULL FIRST ROWS | TAB1               |     999 | 2534 |        86 |     +0 |     1 |      999 |          |                 |===========================================================================================================================================================Predicate Information (identified by operation id):---------------------------------------------------   1 - filter(ROWNUM<1000)   2 - filter(LENGTH("C10")<4000)Statistics----------------------------------------------------------          1  recursive calls          0  db block gets       1068  consistent gets          3  physical reads          0  redo size     775428  bytes sent via SQL*Net to client     306930  bytes received via SQL*Net from client       2000  SQL*Net roundtrips to/from client          0  sorts (memory)          0  sorts (disk)        999  rows processed     
  • mtefft
    mtefft Member Posts: 836 Gold Badge
    edited Jul 30, 2019 4:12PM

    We have run into this before, with JDBC and also with SSIS.

    If a column is CLOB/NCLOB and it is not null in a particular row, then (as Jonathan pointed out) there will be multiple trips to fetch the column value for that row. You can't avoid that.

    If the CLOB/NCLOB column is null in a particular row, then this does not happen.

    On the other hand, if a column is VARCHAR/NVARCHAR, you will fetch many rows in a single trip.

    You can see this behavior in a SQL trace.

    What to do? If you have a rather small number of rows with the actual CLOB data > 4000 bytes (i.e. too large for VARCHAR) then you can do this:

    SELECT

    CASE WHEN LENGTH(MY_CLOB_COLUMN) <= 4000 then CAST(MY_CLOB_COLUMN AS VARCHAR2(4000)) ELSE NULL END AS MY_CLOB_COLUMN_AS_VARCHAR2,

    CASE WHEN LENGTH(MY_CLOB_COLUMN) <= 4000 then NULL ELSE MY_CLOB_COLUMN END AS MY_CLOB_COLUMN_AS_CLOB,

    (note: if NCLOB then use 2000 instead of 4000)

    And the client application (Informatica) would have to marry these two columns (MY_CLOB_COLUMN_AS_VARCHAR2 and MY_CLOB_COLUMN_AS_CLOB) together.

    So the result is that the vast majority of your rows would have nulls in their CLOB columns, so the multi-trip behavior drops dramatically.

    But now your client has to deal with the two columns to get the single value you want. Don't try to put them back together within a SQL statement, you will just be putting yourself back where you started.

    If most of your CLOBs are too big to fit in a VARCHAR - don't bother.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,786 Gold Crown
    edited Jul 30, 2019 6:22PM

    Just to clarify one important detail:

    If you're running this hacked select from SQL*Plus then whatever you set the arraysize to be Oracle will use single row processing because the query includes a CLOB column in the select list.  The reduction in roundtrips occurs because it always takes (at least) two round trips to fetch a row with a non-null CLOB - one to get the row with the LOB Locator and one to fetch the actual LOB content if the locator is null (or, maybe, says that the LOB is empty, but I'd have to check that).

    There may be further roundtrips in SQL*Plus if you've done

    set long N

    set longchunksize M

    where N and M are numeric (the defaults are 80) and N is larger than M and the size of some of your LOBs is larger than M.

    Moreover, if your lobs are stored out of line and NOCACHE and it takes several chunks to read and display one lob then each of those reads will be a direct path read. There may be something about this on my blog (or old website, or in Practical Oracle 8i somewhere), but if not I'll write something up in the next few days.

    Regards

    Jonathan Lewis

  • Mustafa_KALAYCI
    Mustafa_KALAYCI Member Posts: 3,368 Bronze Crown
    edited Jul 31, 2019 1:35AM

    I would like to add something not really related to OP's question. a CLOB with size 4000 will be stored in the lob segment not in the table.It must be less than 4000bytes ( 3964 to be precise). if your data length is 4000 it will be stored in lob segment and will require additional fetch as Jonathan said.

    Here some examples about this: CLOB size matters!

  • AndrewSayer
    AndrewSayer Member Posts: 12,998 Gold Crown
    edited Jul 31, 2019 3:13AM
    Jonathan Lewis wrote:Just to clarify one important detail:If you're running this hacked select from SQL*Plus then whatever you set the arraysize to be Oracle will use single row processing because the query includes a CLOB column in the select list. The reduction in roundtrips occurs because it always takes (at least) two round trips to fetch a row with a non-null CLOB - one to get the row with the LOB Locator and one to fetch the actual LOB content if the locator is null (or, maybe, says that the LOB is empty, but I'd have to check that).There may be further roundtrips in SQL*Plus if you've doneset long Nset longchunksize Mwhere N and M are numeric (the defaults are 80) and N is larger than M and the size of some of your LOBs is larger than M.Moreover, if your lobs are stored out of line and NOCACHE and it takes several chunks to read and display one lob then each of those reads will be a direct path read. There may be something about this on my blog (or old website, or in Practical Oracle 8i somewhere), but if not I'll write something up in the next few days.RegardsJonathan Lewis

    I thought that the new lobPrefetch option in sql*plus would prevent the extra fetches by including lob values in the standard fetch but it doesn’t seem to behave like that for me - it’s probably for chunks.

    If you use odp.net, then you can use the initialLobFetchSize parameter to decide how much of a lob to pull through in the same fetch as the rest of the rows https://docs.oracle.com/en/database/oracle/oracle-database/12.2/odpnt/DataReaderInitialLOBFetchSize.html#GUID-E015AF50-1…

    Ive used it in my own big fetches and it makes a huge difference.

    I believe Informatica uses the ODBC driver which doesn’t have such an option unfortunately. Do you have to use Informatica for this?

    Dom Brooks
  • User_OCZ1T
    User_OCZ1T Member Posts: 1,923 Gold Badge
    edited Jul 31, 2019 5:54AM

    Tried modifying the query as you mentioned. But not seeing any improvement. Below is the sql monitor for fetching ~1000 rows of LOB having length<4000bytes. Both using CASE and SUBSTR clause.

    Using CASE statement taking same time as it was for fetching whole LOB:-

    SQL Monitoring ReportSQL Text------------------------------SELECT c1,c2...,CASE WHEN LENGTH(C10) <= 4000 then CAST(C10 AS VARCHAR2(4000)) ELSE NULL END AS MY_CLOB_COLUMN_AS_VARCHAR2, CASE WHEN LENGTH(C10) > 4000 then NULL ELSE C10 END AS MY_CLOB_COLUMN_AS_CLOB FROM TAB1 where LENGTH (C10)<4000 and rownum<1000Global Information------------------------------ Status              :  DONE (ALL ROWS)            Instance ID         :  4                          SQL ID              :  3m0j31tb3rd8x              SQL Execution ID    :  67108864                   Execution Started   :  07/31/2019 05:41:21        First Refresh Time  :  07/31/2019 05:41:21        Last Refresh Time   :  07/31/2019 05:42:46        Duration            :  85s                        Module/Action       :  SQL*Plus/-                 Program             :  sqlplus.exe                Fetch Calls         :  1000                      Global Stats=================================================| Elapsed |   Cpu   |  Other   | Fetch | Buffer || Time(s) | Time(s) | Waits(s) | Calls |  Gets  |=================================================|    0.02 |    0.00 |     0.02 |  1000 |   1065 |=================================================SQL Plan Monitoring Details (Plan Hash Value=4129443724)===========================================================================================================================================================| Id |                Operation                |        Name        |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Activity | Activity Detail ||    |                                         |                    | (Estim) |      | Active(s) | Active |       | (Actual) |   (%)    |   (# samples)   |===========================================================================================================================================================|  0 | SELECT STATEMENT                        |                    |         |      |        86 |     +0 |     1 |      999 |          |                 ||  1 |   COUNT STOPKEY                         |                    |         |      |        86 |     +0 |     1 |      999 |          |                 ||  2 |    TABLE ACCESS STORAGE FULL FIRST ROWS | TAB1               |     999 | 2335 |        86 |     +0 |     1 |      999 |          |                 |===========================================================================================================================================================Predicate Information (identified by operation id):---------------------------------------------------   1 - filter(ROWNUM<1000)   2 - filter(LENGTH("C10")<4000)Statistics----------------------------------------------------------          1  recursive calls          0  db block gets       1068  consistent gets          3  physical reads          0  redo size     608700  bytes sent via SQL*Net to client     306929  bytes received via SQL*Net from client       2000  SQL*Net roundtrips to/from client          0  sorts (memory)          0  sorts (disk)        999  rows processed

    Using SUBSTR function Finishing in 3 seconds:-

    SELECT c1,c2....,DBMS_LOB.SUBSTR (C10,(SELECT MAX (LENGTH (C10)) FROM TAB1)) C10 FROM TAB1 where LENGTH (C10)<4000 and rownum<1000999 rows selected.Elapsed: 00:00:03.59Execution Plan----------------------------------------------------------Plan hash value: 2117026829------------------------------------------------------------------------------------------------------------| Id  | Operation                             | Name               | Rows  | Bytes | Cost (%CPU)| Time     |------------------------------------------------------------------------------------------------------------|   0 | SELECT STATEMENT                      |                    |   999 |   195K|  2335   (1)| 00:00:29 ||   1 |  SORT AGGREGATE                       |                    |     1 |   164 |            |       ||   2 |   TABLE ACCESS STORAGE FULL           | TAB1               |   151K|    23M|  7212   (1)| 00:01:27 ||*  3 |  COUNT STOPKEY                        |                    |       |       |            |       ||*  4 |   TABLE ACCESS STORAGE FULL FIRST ROWS| TAB1               |   999 |   195K|  2335   (1)| 00:00:29 |------------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):---------------------------------------------------   3 - filter(ROWNUM<1000)   4 - filter(LENGTH("C10")<4000)Statistics----------------------------------------------------------         47  recursive calls          0  db block gets      41131  consistent gets         14  physical reads        572  redo size      25154  bytes sent via SQL*Net to client       1197  bytes received via SQL*Net from client         68  SQL*Net roundtrips to/from client          0  sorts (memory)          0  sorts (disk)        999  rows processed