3 Replies Latest reply on Aug 2, 2020 8:03 PM by Frank Kulash

    Column split in to rows with delimeter "|" value have null values

    user7324681

      Hi

       

      Currently am using oracle 12c version

       

      my current date have below format

       

        

      IDSAMPLE
      1|||aa|bb||ccc

       

      select '1' as id,'|||aa|bb||ccc' as sample from dual

       

      expected output like .i have used regular expression its skipping the null value

       

         

      IDABCDEFGH
      1 aabb cc
        • 1. Re: Column split in to rows with delimeter "|" value have null values
          Frank Kulash

          Hi,

           

          Whenever you have a question, please post a little sample data (CREATE TABLE and INSERT statements, relevant columns only) for all the tables involved, and the exact results you want from that data, so that the people who want to help you can re-create the problem and test their ideas.

          Explain, using specific examples, how you get those results from that data.

          There is no Oracle version 12b or 12d, so it's rather silly to say you're using 12c.  Always say exactly what version of Oracle you're using (e.g. 12.2.0.2.0).

          See the forum FAQ: Re: 2. How do I ask a question on the forums?

           

          user7324681 wrote:

           

          Hi

           

          Currently am using oracle 12c version

           

          my current date have below format

           

           

          IDSAMPLE
          1|||aa|bb||ccc

           

          select '1' as id,'|||aa|bb||ccc' as sample from dual

           

          expected output like .i have used regular expression its skipping the null value

           

           

          IDABCDEFGH
          1aabbcc

          The subject line you chose for this thread was "Column split in to rows ...".  Do you really want the column slit into rows, or do you wqnt it split into multiple culumns, as shown?

          DO you really want two blank columns (or rows) between 'bb' and 'cc'?

           

          To get the N-th item (possibly NULL) from a '|'-delimited list, you can use

          REGEXP_SUBSTR ( sample

                        , '([^|]*)(\||$)'

                        , 1

                        , N    -- an integer > 0

                        , NULL

                        , 1

                        )

          The 6th argument to REGEXP_SUBSTR is like a back-reference.  1 means "Don't repeat the whole sub-string that matches the 2nd argument; return only what matches the part inside the 1st set of parentheses."

          1 person found this helpful
          • 2. Re: Column split in to rows with delimeter "|" value have null values
            mathguy

            Your problem can be broken up into two parts. (That doesn't necessarily mean that the solution must also break into two parts - the one I show below does everything in a single pass.)

             

            First you need to split the input (or inputs) into component tokens. That part is relatively easy. Then, you want to pivot the result, so that the tokens are shown in a single row, in separate columns. This may be a problem. A table (at least in a relational database) has a fixed number of columns - meaning that every row must have the same number of column values. That is fine if in your input all "sample" strings have the same number of pipe separators. But what will you ask us to do if some inputs have seven pipes, some have five, and some have none at all?

             

            Moreover, with "standard" SQL the number of columns in the output must be known when you write the query, it can't depend on the data. You can't say "all the sample input strings will have the same number of delimiters, but we don't know ahead of time what that number will be." If you need the result in an unknown number of columns, depending on the data, you will need to use dynamic SQL, or - better - a reporting tool (interface) that can do the same under the hood.

             

            So, back to your problem. Here is a (relatively) efficient to do it in a single pass, assuming that the output must have exactly nine columns - one for the ID plus eight more for the tokens. In the inputs I chose for testing, I included a "sample" that is too short (in that case, the remaining tokens are filled with NULL) or too long (in that case, only the first eight tokens are present in the output, any remaining tokens are simply discarded). I said "relatively" efficient because I used REGEXP_*; if speed becomes an issue, this can be rewritten to use standard INSTR and SUBSTR functions, which should do the same work faster (at the cost of slightly more complicated code).

             

            with

              inputs (id, sample) as (

                select 1, '|||aa|bb||ccc' from dual union all

                select 2, '|||||||'       from dual union all

                select 3, 'ab|cd|e'       from dual union all

                select 4, 'a|||b|||c|d|e' from dual

              )

            select id

                 , regexp_substr(sample, '([^|]*)(\||$)', 1, 1, null, 1) as a

                 , regexp_substr(sample, '([^|]*)(\||$)', 1, 2, null, 1) as b

                 , regexp_substr(sample, '([^|]*)(\||$)', 1, 3, null, 1) as c

                 , regexp_substr(sample, '([^|]*)(\||$)', 1, 4, null, 1) as d

                 , regexp_substr(sample, '([^|]*)(\||$)', 1, 5, null, 1) as e

                 , regexp_substr(sample, '([^|]*)(\||$)', 1, 6, null, 1) as f

                 , regexp_substr(sample, '([^|]*)(\||$)', 1, 7, null, 1) as g

                 , regexp_substr(sample, '([^|]*)(\||$)', 1, 8, null, 1) as h

            from   inputs

            ;

             

             

               ID A    B    C    D    E    F    G    H

            ----- ---- ---- ---- ---- ---- ---- ---- ----

                1                aa   bb        ccc    

                2                                      

                3 ab   cd   e                          

                4 a              b              c    d

            • 3. Re: Column split in to rows with delimeter "|" value have null values
              Frank Kulash

              Hi,

               

              If performance is important, avoid REGEXP_SUBSTR.  You can use SUBSTR instead.

              To get each sub-string on a separate row, you can do this:

              SELECT    i.id

              ,         c.n

              ,         SUBSTR (i.sample, c.p1, c.p2 - c.p1)  AS sub_str

              FROM      inputs  i

              CROSS APPLY  (

                               SELECT  LEVEL                                   AS n

                               ,       INSTR ('|' || i.sample, '|', 1, LEVEL)  AS p1

                               ,       INSTR (i.sample || '|', '|', 1, LEVEL)  AS p2

                               FROM    dual

                               CONNECT BY  LEVEL <= 1 + REGEXP_COUNT (i.sample, '\|')

                           )    c

              ORDER BY  i.id, c.n

              ;

               

              To get the sub-strings as separate columns in a single row, you can start with the exact same query (only without an ORDER BY clause), and then pivot the results:

              WITH    sub_strs_on_rows    AS

              (

                  SELECT    i.id

                  ,         c.n

                  ,         SUBSTR (i.sample, c.p1, c.p2 - c.p1)  AS sub_str

                  FROM      inputs  i

                  CROSS APPLY  (

                                   SELECT  LEVEL                                   AS n

                                   ,       INSTR ('|' || i.sample, '|', 1, LEVEL)  AS p1

                                   ,       INSTR (i.sample || '|', '|', 1, LEVEL)  AS p2

                                   FROM    dual

                                   CONNECT BY  LEVEL <= 1 + REGEXP_COUNT (i.sample, '\|')

                               )    c

              )

              SELECT    *

              FROM      sub_strs_on_rows

              PIVOT     (    MIN (sub_str)

                        FOR  n  IN ( 1  AS a

                                   , 2  AS b

                                   , 3  AS c

                                   , 4  AS d

                                   , 5  AS e

                                   , 6  AS f

                                   , 7  AS g

                                   , 8  AS h

                                   )

                        )

              ORDER BY  id

              ;

              CROSS APPLY was a new feature in Oracle 12.1.