3 Replies Latest reply on Aug 18, 2009 5:41 PM by William Robertson

    i don't won't to tokenize in plsql

    423413
      10g.

      i have a table

      create table test (id integer, identifiers varchar2(200))

      with the data

      insert into test values (1, 'Bob,Jon,Liz');
      insert into test values (2, 'Luc,Sue');

      what's the best way to query the table to get the resultset

      1 Bob
      1 Jon
      1 Liz
      2 Luc
      2 Sue

      thanks in advance
        • 1. Re: i don't won't to tokenize in plsql
          Frank Kulash
          Hi,

          Welcome to the forum!

          Here's how you can do that in SQL.
          In PL/SQL, you can simplify parts of it. For example, the "counter table" cntr is needed because SQL does not have LOOPs, but PL/SQL does.
          /*     How to Split a Delimited String
          
          This shows how to take a single row with a delimited string, such as
          
               Animal     amoeba,bat,cedusa,dodo
          
          and transform it into multiple rows:
          
               Animal     1     amoeba
               Animal     2     bat
               Animal     3     cedusa
               Animal     4     dodo
          
          */
          
          PROMPT     ==========  -1. sep_char parameter  ==========
          
          VARIABLE     sep_char     VARCHAR2 (10)
          
          EXECUTE     :sep_char := ',';
          
          SELECT     :sep_char     AS sep_char
          FROM     dual;
          
          
          PROMPT     ==========  0. string_test table  ==========
          
          DROP TABLE     string_test;
          
          CREATE TABLE     string_test
          (     grp_name     VARCHAR2 (10)
          ,     list_txt     VARCHAR2 (50)
          );
          
          INSERT INTO string_test (grp_name, list_txt) VALUES ('Animal',     'amoeba,bat,cedusa,dodo');
          INSERT INTO string_test (grp_name, list_txt) VALUES ('Date',     '15-Oct-1582,16-Oct-2008');
          INSERT INTO string_test (grp_name, list_txt) VALUES ('Nothing',     NULL);
          INSERT INTO string_test (grp_name, list_txt) VALUES ('Place',     'New York');
          INSERT INTO string_test (grp_name, list_txt) VALUES ('Skip',     'Hop,,Jump');
          
          
          SELECT     *
          FROM     string_test
          ORDER BY     grp_name;
          
          
          PROMPT     ==========  Q1.  Oracle 11 Query  ==========
          
          WITH     cntr     AS          -- Requires Oracle 9
          (     -- Begin sub-query cntr, to generate n (1, 2, 3, ...)
               SELECT     LEVEL     AS n     -- Requires Oracle 9
               FROM     dual
               CONNECT BY     LEVEL     <= 1 +     (
                                   SELECT     MAX ( REGEXP_COUNT (list_txt, :sep_char) )     -- Requires Oracle 11
                                   FROM     string_test
                                   )                                        
          )     -- End sub-query cntr, to generate n (1, 2, 3, ...)
          SELECT     grp_name
          ,     n
          ,     REGEXP_SUBSTR     ( list_txt     -- Requires Oracle 10
                         , '[^' || :sep_char || ']'     -- Anything except sep_char ...
                              || '+'               -- ... one or more times
                         , 1
                         , n
                         )     AS item_txt
          FROM     string_test
          JOIN     cntr                                   -- Requires Oracle 9
          ON     n     <= 1 + REGEXP_COUNT (list_txt, :sep_char)     -- Requires Oracle 11
          ORDER BY     grp_name
          ,          n;
          
          /*     Notes:
          
               REGEXP_SUBSTR (s, '[^,]+', 1, n)
          
          returns the n-th item in a comma-delimited list s.
          If there are fewer than n items, it returns NULL.
          One or more consecutive characters other than comma make an item, so 
          'Hop,,Jump' has two items, the second one being 'Jump'.
          
          The sub-query cntr produces a list of integers 1, 2, 3, ..., w
          where w is the worst-case (the largest number of items in any list).
          This actually counts separators, not items, (e.g., it counts both
          commas in 'Hop,,Jump', even though), so the w it produces may be
          larger than is really necessary.  No real harm is done.
          */
          
          PROMPT     ==========  Q2. Possible Problems Fixed  ==========
          
          WITH     cntr     AS
          (     -- Begin sub-query cntr, to generate n (1, 2, 3, ...)
               SELECT     LEVEL     AS n
               FROM     dual
               CONNECT BY     LEVEL     <= 1 +     (
                                   SELECT     MAX ( REGEXP_COUNT (list_txt, :sep_char) )
                                   FROM     string_test
                                   )                                        
          )     -- End sub-query cntr, to generate n (1, 2, 3, ...)
          SELECT     grp_name
          ,     n
          ,     REGEXP_SUBSTR     ( list_txt
                         , '[^' || :sep_char || ']'     -- Anything except sep_char ...
                              || '+'               -- ... one or more times
                         , 1
                         , n
                         )     AS item_txt
          FROM     string_test
          JOIN     cntr          ON n     <= 1 + NVL     ( REGEXP_COUNT (list_txt, :sep_char)     -- Problem (1)
                                        , 0
                                        )
          WHERE     REGEXP_SUBSTR     ( list_txt     -- Problem (2)
                         , '[^' || :sep_char || ']'     -- Anything except sep_char ...
                              || '+'               -- ... one or more times
                         , 1
                         , n
                         )     IS NOT NULL
          OR     list_txt     IS NULL          -- Problems (1) and (2) together
          ORDER BY     grp_name
          ,          n;
          
          /*
               (Possible) Problems and Fixes
          
          (1) If list_txt IS NULL, then REGEXP_COUNT (list_txt, :sep_char)
               returns NULL, the join condition fails, and the output
               contains nothing corresponding to the row from string_test.
               If you want a NULL item to appear in the results, use
               NVL to make sure the expression returns 0 instead of NULL.
          (2) If list_txt contains multiple consecutive sep_chars (or if it
               begins or ends with sep_char, then the original query
               will return NULL items.  To suppress these, add a WHERE
               clause to test that the item_txt to be displayed IS NOT NULL.
          */
          
          PROMPT     ==========  Q3. Oracle 8.1 Query  ===========
          
          SELECT     grp_name
          ,     n
          ,     SUBSTR     ( list_txt
                    , begin_pos
                    , end_pos - begin_pos
                    )     AS item_txt
          FROM     (     -- Begin sub-query to compute begin_pos and end_pos
               SELECT     grp_name
               ,     n
               ,     list_txt
               ,     INSTR     ( :sep_char || list_txt
                         , :sep_char
                         , 1
                         , n
                         )     AS begin_pos
               ,     INSTR     ( list_txt || :sep_char
                         , :sep_char
                         , 1
                         , n
                         )     AS end_pos
               FROM     string_test
               ,     (     -- Begin sub-query cntr, to generate n (1, 2, 3, ...)
                    SELECT     ROWNUM     AS n
                    FROM     all_objects
                    WHERE     ROWNUM     <= 1 +     (
                                   SELECT     MAX     ( LENGTH (list_txt)
                                             - LENGTH (REPLACE (list_txt, :sep_char))
                                             )
                                   FROM     string_test
                                   )                                        
                    )     -- End sub-query cntr, to generate n (1, 2, 3, ...)
                    cntr
               WHERE     n     <= 1 +     ( LENGTH (list_txt)
                              - LENGTH (REPLACE (list_txt, :sep_char))
                              )
               )     -- End sub-query to compute begin_pos and end_pos
          ORDER BY     grp_name
          ,          n;
          
          /*     Version-Dependent Features and Work-Arounds
          
          The code above, Q3, runs in Oracle 8.1.
          The following changes were made to Q1:
          
          (11) REGEXP_COUNT was introduced in Oracle 11.
               In earlier versions, to find the number of sep_chars in list_txt,
               see how much the LENGTH changes when sep_chars are removed.
          (10) REGEXP_SUBSTR was introduced in Oracle 10.
               In earlier versions, use INSTR to find where the sep_chars are,
               and use SUBSTR to get the sub-strings between them.
               (Using this technique, 'Hop,,Jump' still contains three items,
               but now item 2 IS NULL and item 3 is 'Jump'.)
          (9.a) The WITH-clause was introduced in Oracle 9
               In earlier versions, use in-line views.
          (9.b) "CONNECT BY LEVEL < constant" doesn't work in Oracle 8.
               Use ROWNUM from any sufficiently large table or view instead.
          (9.c) ANSII join notation (JOIN table_name ON ...) was introduced in Oracle 9
               In earlier versions, join condition go in a WHERE-clause.
          */
          • 2. Re: i don't won't to tokenize in plsql
            GVR
            with  test as(
            select 1 id , 'Bob,Jon,Liz' name_list from dual union all
            select 2, 'Luc,Sue' from dual union all
            select 3 , 'aaa,bbb,ccc' from dual union all
            select 4, 'dd,ee' from dual)
            SELECT id, name_list
              FROM (SELECT id, REGEXP_SUBSTR (xyz, '[^,]+', 1, LEVEL)  name_list,LEVEL l1,
                               LAG (LEVEL, 1, 0) OVER (PARTITION BY id ORDER BY LEVEL) l2
                          FROM (SELECT id, ',' || name_list xyz FROM test)
                    CONNECT BY REGEXP_SUBSTR (xyz, '[^,]+', 1, LEVEL) IS NOT NULL)
             WHERE l1 != l2;
            Out put
            ID     NAME_LIST
            1     Bob
            1     Jon
            1     Liz
            2     Luc
            2     Sue
            3     aaa
            3     bbb
            3     ccc
            4     dd
            4     ee
            • 3. Re: i don't won't to tokenize in plsql
              William Robertson
              You could also avoid string tokenising by storing the data in a normalised form to begin with ;)