12 Replies Latest reply: Jun 21, 2013 10:19 AM by 0110ee23-757d-464e-a2bd-4f474710d866 RSS

    regexp_replace function

    0110ee23-757d-464e-a2bd-4f474710d866

      Hi,

       

      First of all, thanks for viewing and I appreciate your great help.

       

      I am new to 11g and trying explore whether this is doable or not using regexp_replace function.

       

      Here is my requirement.

       

      I have a column, which has 300 character fixed-width text, so i know 1 to 9 characters are SSN, 11 to 20 characters are Date of Birth, 21 to 45 are belongs to an ID (The ID contains upto 25 characters, Some time it will contain only 10 characters, some times 20. It depends, So if it contains only 10, the remaining 15 will be spaces ) etc..

       

      To retrieve ID, I use TRIM(SUBSTR(col, 21, 25)) ID from Table

       

      The requirement is, we run some validations against the SSN in other databases and update the ID in this table if necessary.

       


      In the older versions, I

       

      can achieve the result in the following way.

                        UPDATE table set col = substr(col, 1, 20) || AddSpacesIfLessthan25Characters(newID) || substr(col, 46) where SSN=123456789

       

      In 11g, how can i achieve the above result, Is there any simpler way? Does regexp_replace function work for me?


      Thanks

        • 1. Re: regexp_replace function
          Frank Kulash

          Hi,

           

          Let me make sure I understand the question: You want to know if there's a regular expression that can replace:

           

          substr(col, 1, 20) || AddSpacesIfLessthan25Characters(newID) || substr(col, 46)

           

          That depends on what AddSpacesIfLessthan25Characters does.

          If it does what it says (that is, adds just enough spaces to the argument to make it 25 characters), then you could use the built-in function RPAD (or LPAD).  It will be faster than regular expressions, and much faster than calling a user-defined function.

          At any rate, this doesn't really sound like a job for regular expressions.

           

          I hope this answers your question.

          If not, post some sample data (CREATE TABLE and INSERT statements) and the results you want from that sample data.

          Simplify the problem.  For example, say id is padded to 5 or 10 characters, not 25; it will make the sample data and results easier to write and read.  It will be easy to adjust the answer for your real problem.

          • 2. Re: regexp_replace function
            Karthick_Arp

            May I ask why are you storing multiple values in a single column? RDBMS is not meant for that purpose. Your DB design is in violation of 1NF. If you receive input as a fixed length file the proper way would be to split the values and load them into a table and not load them as such.

            • 3. Re: regexp_replace function
              Purvesh K

              You already have a solution that is working for you. So, why do you intend to change the code that ain't broken?

               

              My advice is, if it ain't broken, don't fix it. You will/might end up breaking it.

               

              Moreover, usually or almost always, normal functions will perform better when compared to Regular Expressions, as Regex's tend to be computationally intensive. And your requirement does not search for a pattern, as you already know the character positions. Hence, use of SUBSTR or INSTR functions will be much more helpful then opting for Regex counterparts.

              • 4. Re: regexp_replace function
                0110ee23-757d-464e-a2bd-4f474710d866

                AddSpacesIfLessthan25Characters will add additional spaces to the ID incase if it is less than 25, because this is 300 character fixed-width column. Based on position and length, we know which field it is belongs to.

                Example: if the ID is 20 chracters, the AddSpacesIfLessthan25Characters function will add additional 5 spaces at the end to make it 25 characters.


                CREATE TABLE T1 (
                Seq NUMBER NOT NULL,
                Data VARCHAR2(300)
                )


                INSERT INTO (Seq, Data) VALUES (1,'123456789 01-01-1980TWENTYFIVECHARACTERSTWENTWENEEDTOREPLACETWENTYFIVECHRACTERS WITHANOTHERVALUEWENEEDTOREPLACETWENTYFIVECHRACTERS WITHANOTHERVALUEWENEEDTOREPLACETWENTYFIVECHRACTERS WITHANOTHERVALUEWENEEDTOREPLACETWENTYFIVECHRACTERS WITHANOTHERVALUEWENEEDTOREPLACETWENTYFIVECHRACTERS ABCDEFGHIJKLMNOPQRST1');


                Now to retrieve ID, I use Select TRIM(SUBSTR(Data, 21, 25)) ID from T1

                Result is: TWENTYFIVECHARACTERSTWENT


                The requirement is, we get the ID and run some validations in other databases and update this ID in this table if necessary.

                 

                In the older versions, I can update the ID, replace characters 21 to 45in the following way.

                                  UPDATE table set Data = substr(Data, 1, 20) || AddSpacesIfLessthan25Characters('THISISTWENTYFOURCHRACERS ') || substr(Data, 46) where SSN=123456789

                SELECT Data from T1;

                The result will be '123456789 01-01-1980THISISTWENTYFOURCHRACERS WENEEDTOREPLACETWENTYFIVECHRACTERS WITHANOTHERVALUEWENEEDTOREPLACETWENTYFIVECHRACTERS WITHANOTHERVALUEWENEEDTOREPLACETWENTYFIVECHRACTERS WITHANOTHERVALUEWENEEDTOREPLACETWENTYFIVECHRACTERS WITHANOTHERVALUEWENEEDTOREPLACETWENTYFIVECHRACTERS ABCDEFGHIJKLMNOPQRST1'

                Select TRIM(SUBSTR(Data, 21, 25)) ID from T1;

                The result will be 'THISISTWENTYFOURCHRACERS '


                I can stick to older way, But since we are using 11g, I want to make sure whether there are any better ways to do this.

                 

                Hope this is clear.

                • 5. Re: regexp_replace function
                  0110ee23-757d-464e-a2bd-4f474710d866

                  Agree with you. But I am new here. I don't know, this is how the table was structured and created 10 years ago and using in this fashion for such a long time.

                  • 6. Re: regexp_replace function
                    0110ee23-757d-464e-a2bd-4f474710d866

                    This is kind of new requirement and to update the same table for different position and length. Just checking whether we can achieve in a better way than using substr.

                    • 7. Re: regexp_replace function
                      Frank Kulash

                      Hi,

                       

                      If the AddSpacesIfLessthan25Characters function does the same thing as RPAD, only slower, then I would forget about AddSpacesIfLessthan25Characters and use RPAD instead.

                       

                      Other than that, I don't see how regular expressions, or any other Oracle features, can improve on what you're already doing.  You could use REGEXP_REPLACE, but it's debatable if it would be any simpler, and it would definitely be less efficient, than what you have now.

                       

                      UPDATE  table_x

                      SET     data = REGEXP_REPLACE ( data

                                                    , '^(.{20}).{25}'

                                                    , '\1' || RPAD ('THISISTWENTYFOURCHRACERS ', 25)

                                                    )

                      WHERE   ssn  = 123456789

                      ;

                       

                      This assumes that the new id does not contain any backslash ( \ ) characters, and maybe some other punctuation characters that have special meanings in regular expressions.

                      • 8. Re: regexp_replace function
                        Etbin

                        Having known fixed positions stick to that as Frank suggested

                         

                        update table

                           set col = substr(col,1,20) || id || rpad(' ',25 - length(id),' ') || substr(col,46)

                        where ssn = 123456789

                         

                        or even - not allowing lengths over 25 but having id's justified

                         

                        update table

                           set col = substr(col,1,20) || trim(id) || rpad(' ',25 - length(trim(id)),' ') || substr(col,46)

                        where ssn = 123456789


                        update table

                           set col = substr(col,1,20) || lpad(' ',25 - length(trim(id)),' ') || trim(id) || substr(col,46)

                        where ssn = 123456789


                        Regards

                         

                        Etbin

                        • 9. Re: regexp_replace function
                          0110ee23-757d-464e-a2bd-4f474710d866

                          Thanks FrankKulash. How do we write this using regexp_replace incase if we want avoid substr?

                          • 10. Re: regexp_replace function
                            Frank Kulash

                            Hi,

                             

                            If you really, really must use regular expressions, then one way is:

                             

                            UPDATE  table_x

                            SET     data = REGEXP_REPLACE ( data

                                                          , '^(.{20}).{25}'

                                                          , '\1' || RPAD ('THISISTWENTYFOURCHRACERS ', 25)

                                                          )

                            WHERE   ssn  = 123456789

                            ;


                            Again, this will be slower than using SUBSTR.  Personally, I find this harder to read, understand, debug and maintain than SUBSTR, even though I'm pretty familiar will regular expressions.  I would use SUBSTR, like you do already.

                            • 11. Re: regexp_replace function
                              0110ee23-757d-464e-a2bd-4f474710d866

                              Thank you. It works. Onething i noticed is i executed both options

                               

                              The one Etbin and yours. I don't see any difference in the performance. On average, the query execution results are as follows

                               

                              Etbin traditional solution: 179 msecs

                              Your Regexp_Replace solution: 182 msecs

                               

                              I am running the query against a table which has millions of rows.

                               


                              • 12. Re: regexp_replace function
                                0110ee23-757d-464e-a2bd-4f474710d866

                                Yes, I completely agree. It is difficult to readable and understand, if we revisit this query after 6 months or one year.

                                 

                                Thank you so much and I appreciate your great help.