13 Replies Latest reply: Aug 10, 2011 5:38 PM by 881677 RSS

    REGEXP_SUBSTR question

    Clearance 6`- 8``
      I have a csv data like follows
      '3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US'
      I am able to get the individual values like
      Connected to:
      Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
      With the Partitioning, OLAP, Data Mining and Real Application Testing options
      
      
      SQL> SELECT REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US'
        2                      ,'[^,]+'
        3                      ,1
        4                      ,1
        5                      )
        6    FROM DUAL;
      
      REGEXP_SUBSTR('3
      ----------------
      3221245112343434
      
      1 row selected.
      
      Elapsed: 00:00:00.00
      SQL> SELECT REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US'
        2                      ,'[^,]+'
        3                      ,1
        4                      ,2
        5                      )
        6    FROM DUAL;
      
      REGEXP_SUBSTR(
      --------------
      3116 N 12TH PL
      
      1 row selected.
      Which is good so far. But the problem is when I try to get the 3rd value. When I am expecting a null value or an empty string, I am getting the following value which should be coming for 8th value.
      SQL> SELECT REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US'
        2                      ,'[^,]+'
        3                      ,1
        4                      ,3
        5                      )
        6    FROM DUAL;
      
      REGEXP_SUB
      ----------
      MILWAUKEE
      I can get what I want using a non-regex solution. But I am looking for a regex way for doing this.
        • 1. Re: REGEXP_SUBSTR question
          Clearance 6`- 8``
          Seems I got a workaround. But, if you have a better solution feel free to post.
          SQL> SELECT TRIM(REGEXP_SUBSTR(REPLACE('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US',',',', ')
            2                      ,'[^,]+'
            3                      ,1
            4                      ,3
            5                      ))
            6    FROM DUAL;
          
          T
          -
          
          
          1 row selected.
          
          Elapsed: 00:00:00.01
          SQL> SELECT TRIM(REGEXP_SUBSTR(REPLACE('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US',',',', ')
            2                      ,'[^,]+'
            3                      ,1
            4                      ,8
            5                      ))
            6    FROM DUAL;
          
          TRIM(REGE
          ---------
          MILWAUKEE
          • 2. Re: REGEXP_SUBSTR question
            AlanWms
            One way, you can use TRIM to remove the trailing commas from this result, I left the TRIM off to make it a little less muddled.
            SELECT
            REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US' || ',','[^,]*,',1,1) str1,
            REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US' || ',','[^,]*,',1,2) str2,
            REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US' || ',','[^,]*,',1,3) str3,
            REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US' || ',','[^,]*,',1,4) str4,
            REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US' || ',','[^,]*,',1,5) str5,
            REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US' || ',','[^,]*,',1,6) str6,
            REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US' || ',','[^,]*,',1,7) str7,
            REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US' || ',','[^,]*,',1,8) str8,
            REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US' || ',','[^,]*,',1,9) str9,
            REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US' || ',','[^,]*,',1,10) str10,
            REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US' || ',','[^,]*,',1,11) str11,
            REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US' || ',','[^,]*,',1,12) str12
            FROM DUAL;
            I concatenate a comma at the end of your string then look for 0 or more non-commas followed by a comma. The concatenated comma is so that there is a match of 'US,' at the end.

            Edited by: AlanWms on Apr 19, 2011 2:40 PM
            Typo in explanation.
            • 3. Re: REGEXP_SUBSTR question
              Frank Kulash
              Hi,

              Here's how to get the 3rd (possibly NULL) item:
              SELECT  SUBSTR ( REGEXP_SUBSTR ( ',' || '3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US'
                                                , ',[^,]*'
                                                , 1
                                                , 3     -- or whatever
                                                )
                          , 2
                          )     AS sub_txt
              FROM    dual;
              You were originally looking for
              '[^,]+'
              which means 1 or more consecutive characters from the set consisting of everything except commas. By definition, that will never match NULL.
              I changed that to:
              ',[^,]*'
              meaning a comma followed by 0 or more non-commas. I concatenated an extra '.' to the beginning of the string, in order to make this get the 1st item. Of course, it looks silly to concatenate two literals like that, but I imagine in your real query you'll have a column (like txt in the example below) or some other expression there, not a literal, so you'll say something like:
              SELECT  SUBSTR ( REGEXP_SUBSTR ( ',' || txt
                                                , ',[^,]*'
                                                , 1
                                                , n     -- item number, 1 or higher
                                                )
                          , 2
                          )     AS sub_txt    ...
              REGEXP_SUBSTR will bring back the comma that delimits the n-th item as well as the n-th item itself. I used SUBSTR to remove the comma, leaving only the (possibly NULL) item itself.

              Edited by: Frank Kulash on Apr 19, 2011 2:40 PM
              Added explanation
              • 4. Re: REGEXP_SUBSTR question
                Ganesh Srivatsav
                Hi,

                Try this,
                SQL> SELECT REGEXP_SUBSTR (REPLACE('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US',',,',', , '),
                  2                        '[^,]+',
                  3                        1,
                  4                        3) col
                  5    FROM DUAL;
                
                COL
                ----
                
                
                SQL> SELECT REGEXP_SUBSTR (REPLACE('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US',',,',', , '),
                  2                        '[^,]+',
                  3                        1,
                  4                        1) col
                  5    FROM DUAL;
                
                COL
                ----------------
                3221245112343434
                
                SQL> SELECT REGEXP_SUBSTR (REPLACE('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US',',,',', , '),
                  2                        '[^,]+',
                  3                        1,
                  4                        8) col
                  5    FROM DUAL;
                
                COL
                -----------
                 MILWAUKEE
                
                SQL> 
                G.
                • 5. Re: REGEXP_SUBSTR question
                  Ganesh Srivatsav
                  Sorry didnt see the starting comma in franks post.
                  G.

                  Edited by: Ganesh Srivatsav on Apr 19, 2011 2:43 PM
                  • 6. Re: REGEXP_SUBSTR question
                    Solomon Yakobson
                    '[^,]+' is non-comma repeated one or more times. Therefore it is not looking for empty substrings - non-comma repeated zero times. And since Oracle regexp doesn't support look-ahead, '[^,]*' will not help you. Since you are on 10g, use:
                    SQL> SELECT  RTRIM(REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US','[^,]*(,|$)',1,&occurrence),',') element
                      2    FROM  DUAL
                      3  /
                    Enter value for occurrence: 1
                    old   1: SELECT  RTRIM(REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US','[^,]*(,|$)',1,&occurrence),',') element
                    new   1: SELECT  RTRIM(REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US','[^,]*(,|$)',1,1),',') element
                    
                    ELEMENT
                    ----------------
                    3221245112343434
                    
                    SQL> /
                    Enter value for occurrence: 2
                    old   1: SELECT  RTRIM(REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US','[^,]*(,|$)',1,&occurrence),',') element
                    new   1: SELECT  RTRIM(REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US','[^,]*(,|$)',1,2),',') element
                    
                    ELEMENT
                    --------------
                    3116 N 12TH PL
                    
                    SQL> /
                    Enter value for occurrence: 3
                    old   1: SELECT  RTRIM(REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US','[^,]*(,|$)',1,&occurrence),',') element
                    new   1: SELECT  RTRIM(REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US','[^,]*(,|$)',1,3),',') element
                    
                    E
                    -
                    
                    
                    SQL> /
                    Enter value for occurrence: 8
                    old   1: SELECT  RTRIM(REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US','[^,]*(,|$)',1,&occurrence),',') element
                    new   1: SELECT  RTRIM(REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US','[^,]*(,|$)',1,8),',') element
                    
                    ELEMENT
                    ----------
                    MILWAUKEE
                    
                    SQL> 
                    For 11g:
                    SQL> SELECT  REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US','([^,]*)(,|$)',1,&occurrence,null,1) element
                      2    FROM  DUAL
                      3  /
                    Enter value for occurrence: 1
                    old   1: SELECT  REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US','([^,]*)(,|$)',1,&occurrence,null,1) element
                    new   1: SELECT  REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US','([^,]*)(,|$)',1,1,null,1) element
                    
                    ELEMENT
                    ----------------
                    3221245112343434
                    
                    SQL> /
                    Enter value for occurrence: 2
                    old   1: SELECT  REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US','([^,]*)(,|$)',1,&occurrence,null,1) element
                    new   1: SELECT  REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US','([^,]*)(,|$)',1,2,null,1) element
                    
                    ELEMENT
                    --------------
                    3116 N 12TH PL
                    
                    SQL> /
                    Enter value for occurrence: 3
                    old   1: SELECT  REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US','([^,]*)(,|$)',1,&occurrence,null,1) element
                    new   1: SELECT  REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US','([^,]*)(,|$)',1,3,null,1) element
                    
                    E
                    -
                    
                    
                    SQL> /
                    Enter value for occurrence: 8
                    old   1: SELECT  REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US','([^,]*)(,|$)',1,&occurrence,null,1) element
                    new   1: SELECT  REGEXP_SUBSTR('3221245112343434,3116 N 12TH PL,,,,,,MILWAUKEE ,WI ,53226,,US','([^,]*)(,|$)',1,8,null,1) element
                    
                    ELEMENT
                    ----------
                    MILWAUKEE
                    
                    SQL> 
                    SY.
                    • 7. Re: REGEXP_SUBSTR question
                      Clearance 6`- 8``
                      Thanks Alan, Frank, Ganesh and Solomon for the solutions as well as the explanations.
                      • 8. Re: REGEXP_SUBSTR question
                        Clearance 6`- 8``
                        I have one more small question. See this,
                        SQL> DECLARE
                          2     l_file_handle   UTL_FILE.file_type;
                          3     l_file_name     VARCHAR2(150)      := 'abc.csv';
                          4     l_line          VARCHAR2(32000);
                          5     var             NUMBER(20);
                          6  BEGIN
                          7     l_file_handle := UTL_FILE.fopen('INPUT_DIRECTORY'
                          8                                    ,l_file_name
                          9                                    ,'R'
                         10                                    ,32767
                         11                                    );
                         12
                         13     BEGIN
                         14        LOOP
                         15           UTL_FILE.get_line(l_file_handle, l_line);
                         16           l_line := ',' || l_line;
                         17           DBMS_OUTPUT.put('The value is :: ');
                         18           DBMS_OUTPUT.put_line(TRIM(LEADING ',' FROM REGEXP_SUBSTR(l_line
                         19                                                  ,',[^,]*'
                         20                                                  ,1
                         21                                                  ,5
                         22                                                  )));
                         23           var := TRIM(LEADING ',' FROM REGEXP_SUBSTR(l_line
                         24                                    ,',[^,]*'
                         25                                    ,1
                         26                                    ,5
                         27                                    ));
                         28        END LOOP;
                         29     EXCEPTION
                         30        WHEN NO_DATA_FOUND
                         31        THEN
                         32           NULL;
                         33     END;
                         34  END;
                         35  /
                        The value is :: 13377
                        DECLARE
                        *
                        ERROR at line 1:
                        ORA-06502: PL/SQL: numeric or value error: character to number conversion error
                        ORA-06512: at line 23
                        
                        
                        Elapsed: 00:00:00.03
                        SQL>
                        I realized that the issue is being caused because of "CARRIAGE RETURN". I am overcoming this situation like follows. If there is any better way to overcome this, please feel free to post it.
                        SQL> DECLARE
                          2     l_file_handle   UTL_FILE.file_type;
                          3     l_file_name     VARCHAR2(150)      := 'abc.csv';
                          4     l_line          VARCHAR2(32000);
                          5     var             NUMBER(20);
                          6  BEGIN
                          7     l_file_handle := UTL_FILE.fopen('INPUT_DIRECTORY'
                          8                                    ,l_file_name
                          9                                    ,'R'
                         10                                    ,32767
                         11                                    );
                         12
                         13     BEGIN
                         14        LOOP
                         15           UTL_FILE.get_line(l_file_handle, l_line);
                         16           l_line := ',' || l_line;
                         17           DBMS_OUTPUT.put('The value is :: ');
                         18           DBMS_OUTPUT.put_line(TRIM(LEADING ',' FROM REGEXP_SUBSTR(l_line
                         19                                                  ,',[^,]*'
                         20                                                  ,1
                         21                                                  ,5
                         22                                                  )));
                         23           var := REPLACE(TRIM(LEADING ',' FROM REGEXP_SUBSTR(l_line
                         24                                    ,',[^,]*'
                         25                                    ,1
                         26                                    ,5
                         27                                    )),CHR(13));
                         28        END LOOP;
                         29     EXCEPTION
                         30        WHEN NO_DATA_FOUND
                         31        THEN
                         32           NULL;
                         33     END;
                         34  END;
                         35  /
                        The value is :: 13377
                        The value is :: 0
                        The value is :: 4170
                        The value is :: 8236
                        
                        PL/SQL procedure successfully completed.
                        csv data is
                        23548,2022,1022,13377,13377
                        23548,1233,5894,42667,0
                        23548,1233,8917,42667,4170
                        23548,2000,1257,223370,8236
                        Edited by: Clearance 6`- 8`` on Apr 19, 2011 3:32 PM
                        added csv data
                        • 9. Re: REGEXP_SUBSTR question
                          Ganesh Srivatsav
                          Try this,
                             var   :=
                                TRIM (LEADING ',' FROM REGEXP_SUBSTR (l_line,
                                                                      ',[^,' || CHR (13) || ']*',
                                                                      1,
                                                                      5));
                          G.
                          • 10. Re: REGEXP_SUBSTR question
                            Clearance 6`- 8``
                            That's a good alternative. I am trying to see what's the standard or more used way that is used in regular expressions while reading data using utl_file when trying to read the last value in the line.
                            • 11. Re: REGEXP_SUBSTR question
                              Frank Kulash
                              Clearance 6`- 8`` wrote:
                              ... I am overcoming this situation like follows. If there is any better way to overcome this, please feel free to post it.
                              ...
                              23           var := REPLACE(TRIM(LEADING ',' FROM REGEXP_SUBSTR(l_line
                              24                                    ,',[^,]*'
                              25                                    ,1
                              26                                    ,5
                              27                                    )),CHR(13));
                              That's basically the best that I can think of. It might be a little cleaner to remove the newline character as soon as you set l_line, or when you add the extra ',', but you might have reasons for leaving it in l_line.

                              You can simplify it a tiny bit by using TRANSLATE to remove commas and newlines at the same time, like this:
                              ...
                               23           var := TO_NUMBER ( TRANSLATE ( REGEXP_SUBSTR ( l_line
                               24                                                             , ',[^,]*'
                               25                                                             , 1
                               26                                                             , 5
                               27                                                             )
                               ...                                        , 'X,' || CHR(13)
                               ...                                   , 'X'
                               ...                            )
                               ...                      );
                              This is one place where an implict conversion probably won't hurt you, but explicitly converting the string to a NUMBER will definitely not hurt, either, and, if only for the princliple, I want to avoid implicit conversions. If I don't put an explicit TO_NUMBER there, I'm liable to waste some time in the future, when I've forgotten all about this code and have to debug it. I'm likely to see that the code is assigning a string value to a NUMBER variable, and then check to see if that's causing any problems, which, of course, it isn't.
                              • 12. Re: REGEXP_SUBSTR question
                                Clearance 6`- 8``
                                Thanks for your insights Frank.
                                • 13. Re: REGEXP_SUBSTR question
                                  881677
                                  Solomon,

                                  Thanks for your post on this thread.
                                  I had a similar challenge except the delimiter was a period (.) which as you might expect, presented it's own unique challenges.
                                  Your solution was the perfect answer to work in my situation.
                                  Thanks! :-)

                                  Side note: For those of you familiar with the MUMPS or "M" language, this is similar in concept to the $PIECE function where you simply identify the string, the delimiter and the substring "piece" number that you want.

                                  Some day when I fully understand the POSIX Regular Expressions and ERE syntax, I'll better understand how and why the pattern you chose works.

                                  Thanks again Solomon! :-)