8 Replies Latest reply: Jun 18, 2012 6:08 AM by Frank Kulash RSS

    One for the Tekkies: How to get this output using REGULAR EXPRESSIONS?

    user12240205
      How to get the below output using REGULAR EXPRESSIONS??
      SQL> ed
      Wrote file afiedt.buf
      
        1* CREATE TABLE cus___addresses    (full_address                   VARCHAR2(200 BYTE))
      SQL> /
      
      Table created.
      
      SQL> PROMPT Address Format is: House #/Housename,  street,  City, Zip Code, COUNTRY
      House #/Housename,  street,  City, Zip Code, COUNTRY
      SQL> INSERT INTO cus___addresses VALUES('1, 3rd street, Lansing, MI 49001, USA');
      
      1 row created.
      
      SQL> INSERT INTO cus___addresses VALUES('3B, fifth street, Clinton, OK 74103, USA');
      
      1 row created.
      
      SQL> INSERT INTO cus___addresses VALUES('Rose Villa, Stanton Grove, Murray, TN 37183, USA');
      
      1 row created.
      
      SQL> SELECT * FROM cus___addresses;
      
      FULL_ADDRESS
      ----------------------------------------------------------------------------------------------------
      1, 3rd street, Lansing, MI 49001, USA
      3B, fifth street, Clinton, OK 74103, USA
      Rose Villa, Stanton Grove, Murray, TN 37183, USA
      
      SQL> The REG EXP query shouLd output the ZIP codes: i.e. 49001, 74103, 37183 in 3 rows.
      Edited by: user12240205 on Jun 18, 2012 3:19 AM
        • 1. Re: One for the Tekkies: How to get this output using REGULAR EXPRESSIONS?
          indra budiantho
          /* Formatted on 2012/06/18 17:25 (Formatter Plus v4.8.8) */
          SELECT REGEXP_SUBSTR ((REGEXP_SUBSTR (full_address, '[^,]+', 1, 4)), '[[:digit:]]+') RESULT
            FROM cus___addresses
          • 2. Re: One for the Tekkies: How to get this output using REGULAR EXPRESSIONS?
            908002
            select replace(substr(full_address,-10,10),', USA') zip from cus___addresses;
            • 3. Re: One for the Tekkies: How to get this output using REGULAR EXPRESSIONS?
              Frank Kulash
              Hi,

              Here's one way:
              SELECT     REGEXP_REPLACE ( full_address
                               , '.*(\d{5})\D*'
                               , '\1'
                               )     AS zip_code
              FROM     cus___addresses
              ;
              This assumes that ZIP codes are always 5 digits, and that country (if it is present) never contains a digit.
              You can eaisly modify this to accept either 5- or 9-digit ZIP codes, such as '49001-0012'.

              &mbsp;

              Do you really want 3 '_'s in a row in the table name? Won't that be confusing for every person who has to look at this code? Why not use just one '_', as in cus_addresses?
              • 4. Re: One for the Tekkies: How to get this output using REGULAR EXPRESSIONS?
                Saubhik
                May be regexp_substr(full_address,'[[:digit:]]{5}')
                • 5. Re: One for the Tekkies: How to get this output using REGULAR EXPRESSIONS?
                  user12240205
                  Kiran wrote:
                  select replace(substr(full_address,-10,10),', USA') zip from cus___addresses;
                  Problem with this one is, we don't know it will be exactly 10 characters from the end. Cos, countries could be other than USA also.
                  • 6. Re: One for the Tekkies: How to get this output using REGULAR EXPRESSIONS?
                    user12240205
                    Frank Kulash wrote:
                    Hi,

                    Here's one way:
                    SELECT     REGEXP_REPLACE ( full_address
                                     , '.*(\d{5})\D*'
                                     , '\1'
                                     )     AS zip_code
                    FROM     cus___addresses
                    ;
                    This assumes that ZIP codes are always 5 digits, and that country (if it is present) never contains a digit.
                    You can eaisly modify this to accept either 5- or 9-digit ZIP codes, such as '49001-0012'.
                    Frank, ʃʃp's method, I understand. But your method, although correct, I find it difficult to understand.

                    Could you explain how you did this?? What does '.*(\d{5})\D*' and '\1' mean???
                    Your method is better because it uses only ONE reg expression function. ʃʃp's uses 2.
                    • 7. Re: One for the Tekkies: How to get this output using REGULAR EXPRESSIONS?
                      user12240205
                      Frank Kulash wrote:
                      Hi,

                      Here's one way:
                      SELECT     REGEXP_REPLACE ( full_address
                                       , '.*(\d{5})\D*'
                                       , '\1'
                                       )     AS zip_code
                      FROM     cus___addresses
                      ;
                      This assumes that ZIP codes are always 5 digits, and that country (if it is present) never contains a digit.
                      You can eaisly modify this to accept either 5- or 9-digit ZIP codes, such as '49001-0012'.

                      &mbsp;

                      Do you really want 3 '_'s in a row in the table name? Won't that be confusing for every person who has to look at this code? Why not use just one '_', as in cus_addresses?
                      Thank you Frank. That table name I put 3 underscores so that when you create it in your DB it wont clash with a table name in your DB. that is why. Normally, we also use only 1 underscore.

                      Edited by: user12240205 on Jun 18, 2012 3:42 AM
                      • 8. Re: One for the Tekkies: How to get this output using REGULAR EXPRESSIONS?
                        Frank Kulash
                        Hi,
                        user12240205 wrote:
                        ... Frank, ʃʃp's method, I understand. But your method, although correct, I find it difficult to understand.

                        Could you explain how you did this?? What does '.*(\d{5})\D*' and '\1' mean???
                        Your method is better because it uses only ONE reg expression function. ʃʃp's uses 2.
                        In Oracle 10.2 (I believe) and higher, '\d' is equivalent to '[[:digit:]]', and '\D' is equivalent to '[^[:digit:]]'. I find '\d' and '\D' easier to type, but there's nothing wrong with using '[[:digit:]]' and '[^[:digit:]]'.

                        '.*' means "0 or more of any character".
                        '\D*' means "0 or more non-digits".
                        The whole expression, '.*(\d{5})\D*' means:
                        a. 0 or more characters (any characters)
                        b. 5 digits
                        c. 0 or more non-digits.

                        '\1' is a Backreference . It means the sub-string that matched the pattern after the 1st '(', up to (but not including) its matching ')'. In this case, that means the sub-string that matched '\d{5}', or b. using the explanation immediately above.

                        So the entire REGEXP_REPLACE call means "When you see a sub-string consisting of a., follwed immediately by b., followed immedately by c., replace that sub-string with b. alone."