Forum Stats

  • 3,824,981 Users
  • 2,260,447 Discussions


Split 2 comma delimited columns and create rows

BeefStu Member Posts: 326 Blue Ribbon

I'm trying to produce the following output

col1    col2
aaa    qq
bbb    ww
ccc    ee

As you can see from my attempt I have hardcoded VALUES, which only produces one row.

I'm looking to get a count from both columns and iterate through the data. If the counts aren't the same for both columns then I want to use a NULL where the count is less and match it with a value.

Below is my simple attempt that is incomplete. Any help would be greatly appreciated.

select regexp_substr('aaa,bbb,ccc', '[^,]+', 1, 1) as col1, 
       regexp_substr('qq, ww, ee', '[^,]+', 1, 1) as col2 from dual



  • mathguy
    mathguy Member Posts: 10,538 Blue Diamond
    select regexp_substr('aaa,bbb,ccc', '[^,]+', 1, level) as col1,
           regexp_substr('qq, ww, ee', '[^,]+', 1, level) as col2
    from dual
    connect by level <= greatest(
      regexp_count('aaa,bbb,ccc', '[^,]+'),
      regexp_count('qq, ww, ee', '[^,]+')

    Your inputs are inconsistent: the first string has just comma as delimiter, but the second has comma + space. If that's also your real-life data, the solution shown above will preserve the leading spaces; you may trim them in the select list if needed.

    Will all the tokens be non-empty? Or can the first string look like 'xxx,,zzz' (indicating that the middle token is the empty string, the same as NULL in Oracle)? If there can be empty-string (NULL) tokens, your approach will not include them in the output - and perhaps mess up the ordering (then zzz would be paired with the second token from the other comma-separated string, instead of the last token). If that is a concern, you must change the + quantifier to * in all places.

    I didn't want to assume the two comma-separated list have the same number of tokens; if they always do, you can simplify the CONNECT BY condition, you don't need the GREATEST function there.

    This will work if you have just two "matched" comma-separated strings as input. If you have a table with several rows, and in each row you have two such comma-separated strings, and you need a query that will split all those comma-separated strings keeping track of the matching, the query will be a bit more complicated (but not by much). Anyway, I am leaving that out for now, since you didn't say anything about that.