1 2 Previous Next 21 Replies Latest reply: Oct 23, 2013 2:02 AM by ranit B RSS

    Reg: REGEXP_REPLACE issue -

    ranit B

      Hi Experts,

       

      I'm facing an issue working with REGEXP_REPLACE.

       

      Issue:

      I've a "comment" field where data has been inserted manually. I want to fetch only the date part but the Regex doesn't seem to parse the Newline character properly.

       

      WITH table_x AS(

        SELECT 'aaa

      ccc- 12/23/2048

      bbb' txt FROM dual

      )

      SELECT

        Trim(REGEXP_REPLACE(txt, '(.*)(\d{2}/\d{2}/\d{4})(.*)','\2'))

      FROM table_x;

       

      Expected output - '12/23/2048'

       

      There are few more 'bigger' concerns regarding this which I'll explain after this gets resolved... (trying to go step-by-step).

       

      Could you please help me with this? Help much appreciated.

       

      Thanks and Regards,

      -- Ranit

      (on Oracle 11.2.0.3.0)

        • 1. Re: Reg: REGEXP_REPLACE issue -
          Hoek
          WITH table_x AS(
            SELECT 'aaa
          ccc- 12/23/2048
          bbb' txt FROM dual
          )
          SELECT
            Trim(REGEXP_REPLACE(replace(txt, chr(10)), '(.*)(\d{2}/\d{2}/\d{4})(.*)','\2'))
          FROM table_x;
          • 2. Re: Reg: REGEXP_REPLACE issue -
            chris227

                WITH table_x AS(

             

                  SELECT 'aaa

             

                ccc- 12/23/2048

             

                bbb' txt FROM dual

             

                )

             

                SELECT

             

                  REGEXP_SUBSTR(txt, '\d{2}/\d{2}/\d{4}')

             

                FROM table_x;

            • 3. Re: Reg: REGEXP_REPLACE issue -
              Ramin Hashimzadeh

              if i understand you correctly you need to extract date from string....

               

              check this:

               

              WITH table_x AS(

                SELECT 'aaa

              ccc- 12/23/2048

              bbb' txt FROM dual

              )

              SELECT  to_date((REGEXP_REPLACE(txt, '[^[:digit:]]','')),'mmddyyyy')

              FROM table_x;

               

              ----

              Ramin Hashimzade

              • 4. Re: Reg: REGEXP_REPLACE issue -
                chris227

                Or

                 

                 

                 

                    WITH table_x AS(

                 

                      SELECT 'aaa

                 

                    ccc- 12/23/2048

                 

                    bbb' txt FROM dual

                 

                    )

                 

                    SELECT

                 

                      REGEXP_REPLACE(txt, '(.*)(\d{2}/\d{2}/\d{4})(.*)','\2', 1, 1, 'n')

                 

                    FROM table_x;

                • 5. Re: Reg: REGEXP_REPLACE issue -
                  ranit B

                  Thanks Hoek.

                   

                  But is that how REGEXP_REPLACE behaves in case of Newlines? and we need to explicitly remove using REPLACE?

                  • 6. Re: Reg: REGEXP_REPLACE issue -
                    BluShadow

                    You can use the "n" match parameter of regexp_replace (see the documentation) to make it treat newline characters as other characters when the pattern is a "." (be default they are treated as non-matching).

                     

                    SQL> ed
                    Wrote file afiedt.buf

                      1  WITH table_x AS(
                      2    SELECT 'aaa
                      3  ccc- 12/23/2048
                      4  bbb' txt FROM dual
                      5  )
                      6  SELECT
                      7    Trim(REGEXP_REPLACE(txt, '(.*)(\d{2}/\d{2}/\d{4})(.*)','\2',1,1,'n'))
                      8* FROM table_x
                    SQL> /

                     

                    TRIM(REGEX
                    ----------
                    12/23/2048

                     

                     

                    REGEXP_REPLACE

                    • 7. Re: Reg: REGEXP_REPLACE issue -
                      chris227

                      Consider

                       

                       

                       

                          WITH table_x AS(

                       

                            SELECT 'aaa

                       

                          ccc- 12/23/2048

                       

                          bbb' txt FROM dual

                       

                          )

                       

                          SELECT

                       

                            REGEXP_REPLACE(txt, '([^[:digit:]]*)(\d{2}/\d{2}/\d{4})([^[:digit:]]*)','\2')

                       

                          FROM table_x;

                       

                      This is one more reason for me always to propagate to prefer negated characterclasses instead of the .* or .*? constructs.

                      The dot doesnt match the new line by default. you can use the fifth parameter with 'n' as value to change this behaviour as shown above.

                      • 8. Re: Reg: REGEXP_REPLACE issue -
                        Hoek

                        Hi Ranit,

                         

                        You probably noticed the other, more elegant solutions already.

                        Blu's example shows a feature I wasn't yet aware of.

                        And, as Ramin did, if you want to make it a date then TO_DATE the result of the regexp.

                        • 9. Re: Reg: REGEXP_REPLACE issue -
                          chris227

                          We had this one minute before already, but it seems people do "like" yours more

                          • 10. Re: Reg: REGEXP_REPLACE issue -
                            ranit B

                            Thanks all.

                             

                            Now let me explain the tougher part of my requirement:

                            I have a column 'Comments' where user enters data manually. I know 'Manually = "Prone to error"

                            So, they might enter the data in "any" format they wish to.

                             

                            Sample data:

                            WITH table_x AS

                            (

                              SELECT 'aaa

                            9/25/2013

                            ccc' txt FROM dual UNION ALL

                              SELECT 'aaa

                            September 25, 2013

                            ccc' txt FROM dual UNION ALL

                              SELECT 'aaa

                            09/25/13

                            bbb' txt FROM dual UNION ALL

                              SELECT 'aaa

                            9-25-2013

                            bbb' txt FROM dual UNION ALL

                              SELECT 'aaa

                            2.25.13

                            bbb' txt FROM dual UNION ALL

                              SELECT 'aaa

                            Sept 25,2013

                            bbb' txt FROM dual

                            )

                            SELECT *

                            FROM table_x;

                             

                            I want to extract only the DATE parts... Is this feasible?

                             

                            -- Ranit

                            • 11. Re: Reg: REGEXP_REPLACE issue -
                              Etbin
                              So, they might enter the data in "any" format they wish to.

                              Why ?

                              Even if the date part is always on separate line (IMHO) you're looking for troubles.

                              It's just not worth the effort.

                              Once Frank has shown a function capable of dealing with a great number of date formats ... (might be still somewhere in this forum)

                               

                              Regards

                               

                              Etbin

                              • 12. Re: Reg: REGEXP_REPLACE issue -
                                Hoek

                                A few follow up questions:

                                - what purpose does the date part serve?

                                - What should happen if a user accidentally enters a date that is an invalid date (for example: 01-0-20013)  and you run into it when extracting?

                                is this feasible?

                                No, you'll never get it 100% nailed shut I'm afraid. Dates simply aren't strings. Extracting dates from strings is a cumbersome process.

                                Best would be to introduce a separate COMMENT_DATE (or whatever name suits) column in the application, that way you'll be much much better off...

                                 

                                Message was edited by: Hoek

                                • 13. Re: Reg: REGEXP_REPLACE issue -
                                  BluShadow

                                  chris227 wrote:

                                   

                                  We had this one minute before already, but it seems people do "like" yours more

                                   

                                  Well, when I was constructing my post, your response wasn't there....

                                  • 14. Re: Reg: REGEXP_REPLACE issue -
                                    BluShadow

                                    ranitB wrote:

                                     

                                    Thanks all.

                                     

                                    Now let me explain the tougher part of my requirement:

                                    I have a column 'Comments' where user enters data manually. I know 'Manually = "Prone to error"

                                    So, they might enter the data in "any" format they wish to.

                                     

                                    Sample data:

                                    WITH table_x AS

                                    (

                                      SELECT 'aaa

                                    9/25/2013

                                    ccc' txt FROM dual UNION ALL

                                      SELECT 'aaa

                                    September 25, 2013

                                    ccc' txt FROM dual UNION ALL

                                      SELECT 'aaa

                                    09/25/13

                                    bbb' txt FROM dual UNION ALL

                                      SELECT 'aaa

                                    9-25-2013

                                    bbb' txt FROM dual UNION ALL

                                      SELECT 'aaa

                                    2.25.13

                                    bbb' txt FROM dual UNION ALL

                                      SELECT 'aaa

                                    Sept 25,2013

                                    bbb' txt FROM dual

                                    )

                                    SELECT *

                                    FROM table_x;

                                     

                                    I want to extract only the DATE parts... Is this feasible?

                                     

                                    -- Ranit

                                     

                                    Easy answer... if users are being required to store structured data that you need in the future, then get them to create it in structured fields which are stored in the correct datatypes on the table.  Do not allow them to create it how they like as free text.

                                     

                                    For example, what date is "11/12/13"  ?  11th December 2013?  12th November 2013? 13th December 2011? etc.

                                     

                                    Stupid data in = stupid data out.

                                    1 2 Previous Next