Forum Stats

  • 3,874,734 Users
  • 2,266,768 Discussions
  • 7,911,961 Comments

Discussions

New PL/SQL datatype to deal with strings larger than 32K

user4377404
user4377404 Member Posts: 6 Green Ribbon
edited Nov 1, 2018 5:48AM in Database Ideas - Ideas

Currently, when there's a need to deal with texts, which are longer than 32K, you must use the CLOB.

What I'd like to have is datatype, the same as VARCHAR2, but without max size limits (just like String in C++).

Why not just use CLOB? Because it's just so much slower than VARCHAR2 and (I believe) internally is converted to VARCHAR2 in a lot of cases.

- You would not be able to have it as a database type (there's CLOB for this already)

- You'd be able to assign varchar2/CLOB to it

- You'd be able to assign it to varchar2/CLOB (error about 32K varchar2 would be OK)

- If it becomes too big and session runs out of the memory, that's fine. We'll deal with it.

- Make sure that Replace, Instr, etc. would work with it without 32K varchar2 limits.

- There would be possibility to pre-allocate size for it - for the performace reasons.

- There would be possibility to access individual chars like in the array - for the performace reasons.

So in the code it would look like:

declare

  dummy STRING;

  v     VARCHAR2(100);

  c     CLOB;

begin 

  v := 'bla bla';

  c := 'bla bla';

  dummy := 'Hello world'; -- it would auto allocate itself

  dummy := dummy || v; -- this would work

  dummy := dummy || c; -- this would work as well

  dummy[1] := 'A'; -- change H to A

  set_size(dummy, 1024 * 1024, ' '); -- set it's size to 1M, pre-populate with spaces

  dummy[33333] := 'x'; -- set 33333rd element to x

  dummy[11133333] := 'x'; --  this would throw an out of bounds error

  c := dummy; -- this should always work

  v := dummy; -- this would throw error about 32K limit, which is fine

end;

-- this would work

type t_string_tab is table of STRING index by binary_integer;

-- this would work

type t_rec is record (

s1 string

s2 string);

-- this would work

type t_string_tab is table of t_rec index by binary_integer;

user4377404
11 votes

Active · Last Updated

Comments

  • Mike Kutz
    Mike Kutz Member Posts: 6,254 Gold Crown

    Shhh.... It's a Secret:

    Extended Type VARCHAR2 strings (ie those with length > 4000) are actually CLOBs.  (At least, those that are stored in Tables)

    I see no purpose for a 3rd string-based data type.

  • Sven W.
    Sven W. Member Posts: 10,559 Gold Crown
    edited Nov 1, 2018 10:16AM

    So your STRING type would behave almost excatly as a CLOB, with the only difference, that you must set the maximum size? Correct?

    I just don't understand where the advantage comes into place.

    working example

    set serveroutput on size unlimiteddeclare  v_clob clob;begin   v_clob := 'Hello world';  v_clob := v_clob || ' bla bla';  -- change H to A    DBMS_LOB.WRITE(v_clob,amount => 1, offset => 1, buffer=>'A');      dbms_output.put_line(v_clob);end;/

    Aello world bla bla

    PL/SQL procedure successfully completed.

  • user4377404
    user4377404 Member Posts: 6 Green Ribbon

    You, guys, really think I don't know what CLOB is

    Take a look, this is what I'm talking about.

    This is, of course, very simplified, but here you can see that VARCHAR (and possibly STRING) version is at least 3-4 times faster.

    Performance matters to me.

    DECLARE

      TYPE t_v_tab IS TABLE OF VARCHAR2(32767) INDEX BY BINARY_INTEGER;

      l_v_tab t_v_tab;

      TYPE t_clob_tab IS TABLE OF CLOB INDEX BY BINARY_INTEGER;

      l_clob_tab t_clob_tab;

      l_start   NUMBER;

    BEGIN

      l_start := dbms_utility.get_time;

      FOR nn IN 1 .. 100000 LOOP

        l_clob_tab(nn) := 'aaa';

        l_clob_tab(nn) := 'b' || substr(l_clob_tab(nn), 2);

      END LOOP;

      dbms_output.put_line('CLOBS: Elapsed miliseconds=' || to_char(dbms_utility.get_time - l_start));

      l_start := dbms_utility.get_time;

      FOR nn IN 1 .. 100000 LOOP

        l_v_tab(nn) := 'aaa';

        l_v_tab(nn) := 'b' || substr(l_v_tab(nn), 2);

      END LOOP;

      dbms_output.put_line('VARCHARS: Elapsed miliseconds=' || to_char(dbms_utility.get_time - l_start));

    END;

  • Mike Kutz
    Mike Kutz Member Posts: 6,254 Gold Crown

    You, guys, really think I don't know what CLOB is

    Take a look, this is what I'm talking about.

    This is, of course, very simplified, but here you can see that VARCHAR (and possibly STRING) version is at least 3-4 times faster.

    Performance matters to me.

    DECLARE

      TYPE t_v_tab IS TABLE OF VARCHAR2(32767) INDEX BY BINARY_INTEGER;

      l_v_tab t_v_tab;

      TYPE t_clob_tab IS TABLE OF CLOB INDEX BY BINARY_INTEGER;

      l_clob_tab t_clob_tab;

      l_start   NUMBER;

    BEGIN

      l_start := dbms_utility.get_time;

      FOR nn IN 1 .. 100000 LOOP

        l_clob_tab(nn) := 'aaa';

        l_clob_tab(nn) := 'b' || substr(l_clob_tab(nn), 2);

      END LOOP;

      dbms_output.put_line('CLOBS: Elapsed miliseconds=' || to_char(dbms_utility.get_time - l_start));

      l_start := dbms_utility.get_time;

      FOR nn IN 1 .. 100000 LOOP

        l_v_tab(nn) := 'aaa';

        l_v_tab(nn) := 'b' || substr(l_v_tab(nn), 2);

      END LOOP;

      dbms_output.put_line('VARCHARS: Elapsed miliseconds=' || to_char(dbms_utility.get_time - l_start));

    END;

    Use the right tool for the job

    If you need to do complex string manipulation, on large string (>32k), in an efficient manner:  Use Java.

    And since 18c XE now includes Java ... you have no excuse.

    My $0.02

    MK

  • Sven W.
    Sven W. Member Posts: 10,559 Gold Crown
    edited Nov 1, 2018 2:44PM

    You, guys, really think I don't know what CLOB is

    Take a look, this is what I'm talking about.

    This is, of course, very simplified, but here you can see that VARCHAR (and possibly STRING) version is at least 3-4 times faster.

    Performance matters to me.

    DECLARE

      TYPE t_v_tab IS TABLE OF VARCHAR2(32767) INDEX BY BINARY_INTEGER;

      l_v_tab t_v_tab;

      TYPE t_clob_tab IS TABLE OF CLOB INDEX BY BINARY_INTEGER;

      l_clob_tab t_clob_tab;

      l_start   NUMBER;

    BEGIN

      l_start := dbms_utility.get_time;

      FOR nn IN 1 .. 100000 LOOP

        l_clob_tab(nn) := 'aaa';

        l_clob_tab(nn) := 'b' || substr(l_clob_tab(nn), 2);

      END LOOP;

      dbms_output.put_line('CLOBS: Elapsed miliseconds=' || to_char(dbms_utility.get_time - l_start));

      l_start := dbms_utility.get_time;

      FOR nn IN 1 .. 100000 LOOP

        l_v_tab(nn) := 'aaa';

        l_v_tab(nn) := 'b' || substr(l_v_tab(nn), 2);

      END LOOP;

      dbms_output.put_line('VARCHARS: Elapsed miliseconds=' || to_char(dbms_utility.get_time - l_start));

    END;

    user4377404 wrote:You, guys, really think I don't know what CLOB is Take a look, this is what I'm talking about. This is, of course, very simplified, but here you can see that VARCHAR (and possibly STRING) version is at least 3-4 times faster.Performance matters to me. ...

    Thanks for providing some sample code. This is certainly helpful.

    I think you know what a CLOB is. However I don't think you know how to use CLOBs in the most optimal way.

    I'm not saying I know the most optimal way to work with clobs. However if I have an performance issue, then I try to find out where I loose the performance instead of suggesting a new datatype. And in your example I see several potential candidates to improve performance.

    For example the SUBSTR function can accept and return a CLOB, In my experience one must use DBMS_LOB.SUBSTR if you want a fast clob operation, or even better DBMS_LOB.writeappend.
    In your code the main culprit is that using SUBSTR (even if it would be DBMS_LOB.SUBSTR) is way to much work. You are just replacing the first letter. Instead of doing that, you are creating a new clob out of two parts and then use that to overwrite the existing clob. One other issue is that in some cases a temp clob should be used. Your example code is such a case. However it might be different for the real code. Also I doubt why you want a table of CLOBs in expensive plsql memory. For a real table performance might differ greatly because of this.

    I did run some tests with code variations. Here are the results:

    The original "benchmark":

    CLOBS: Elapsed miliseconds=4296

    VARCHARS: Elapsed miliseconds=7

      l_buffer :=  'aaa';  FOR nn IN 1 .. 100000 LOOP    l_clob_tab(nn) := l_buffer;    dbms_lob.write(l_clob_tab(nn),1,1, 'b');  END LOOP;

    dbms_lob.CLOBS: Elapsed miliseconds=2633

    VARCHARS: Elapsed miliseconds=7

      dbms_lob.createtemporary(l_clob,true);  l_buffer :=  'aaa';  FOR nn IN 1 .. 100000 LOOP    l_clob := l_buffer;    dbms_lob.write(l_clob,1,1, 'b');  END LOOP;

    single temp CLOB: Elapsed miliseconds=513

    VARCHARS: Elapsed miliseconds=7

      dbms_lob.createtemporary(l_clob,true);  l_buffer :=  'aaa';  FOR nn IN 1 .. 100000 LOOP    dbms_lob.write(l_clob,length(l_buffer),1, l_buffer);    dbms_lob.write(l_clob,1,1, 'b');  END LOOP;

    reusing single temp CLOB: Elapsed miliseconds=218

    VARCHARS: Elapsed miliseconds=8

    I'm not saying all those modifications can be used in your real code. but it gives an indication where most of the time is spend. And a large part is the initialization of the 10000 clobs and moving them to a collection in expensive PGA.

    However instead of suggesting a new datatype, why not suggesting to improve the performance of CLOB operations. If you backup that suggestion by some benchmark code example (as you did already) then you might find that many more users would vote for it and not against it.

    Billy Verreynne
  • Billy Verreynne
    Billy Verreynne Software Engineer Member Posts: 28,945 Red Diamond

    You, guys, really think I don't know what CLOB is

    Take a look, this is what I'm talking about.

    This is, of course, very simplified, but here you can see that VARCHAR (and possibly STRING) version is at least 3-4 times faster.

    Performance matters to me.

    DECLARE

      TYPE t_v_tab IS TABLE OF VARCHAR2(32767) INDEX BY BINARY_INTEGER;

      l_v_tab t_v_tab;

      TYPE t_clob_tab IS TABLE OF CLOB INDEX BY BINARY_INTEGER;

      l_clob_tab t_clob_tab;

      l_start   NUMBER;

    BEGIN

      l_start := dbms_utility.get_time;

      FOR nn IN 1 .. 100000 LOOP

        l_clob_tab(nn) := 'aaa';

        l_clob_tab(nn) := 'b' || substr(l_clob_tab(nn), 2);

      END LOOP;

      dbms_output.put_line('CLOBS: Elapsed miliseconds=' || to_char(dbms_utility.get_time - l_start));

      l_start := dbms_utility.get_time;

      FOR nn IN 1 .. 100000 LOOP

        l_v_tab(nn) := 'aaa';

        l_v_tab(nn) := 'b' || substr(l_v_tab(nn), 2);

      END LOOP;

      dbms_output.put_line('VARCHARS: Elapsed miliseconds=' || to_char(dbms_utility.get_time - l_start));

    END;

    user4377404 wrote:You, guys, really think I don't know what CLOB is 

    Sure sounds like it - as a CLOB is and never was like a string in the PGA.

    A flat file can contain structured data. So too does a database. Trying to create a false equivalence between them, as with strings and LOBs, is silly and ignorant.