8 Replies Latest reply on Mar 18, 2019 4:17 PM by user9098259

    Substracting two strings in a query

    user9098259

      Hello, i'd like to do the following in Oracle 10g:

       

      I am trying to clean some data in a client address table which in some cases contains repeated patterns to the right side of the string, for example:

       

      "anything 11241 address xyz 154 address xyz 154"

      After cleaning this the result should be: "anything 11241 address xyz 154"

       

      At least i have access to the repeated pattern (always at the right side of the string), so what i have to do is to try to subtract one string from the other like a RTRIM, but only one match:

       

      select 'anything 11241 address xyz 154 address xyz 154'  --> column A

                , 'address xyz 154'     --> column B

                , ??--> column C should be column A - column B, in this case: anything 11241 address xyz 154

      from dual;

       

      Thanks in advanced.

       

      Mensaje editado por: user9098259

        • 1. Re: Substracting two strings in a query
          Frank Kulash

          Hi,

          user9098259 wrote:

           

          Hello, i'd like to do the following in Oracle 10g:

           

          I am trying to clean some data in a client address table which in some cases contains repeated patterns to the right side of the string, for example:

           

          "anything 11241 address xyz 154 address xyz 154"

          After cleaning this the result should be: "anything 11241 address xyz 154"

           

          At least i have access to the repeated pattern (always at the right side of the string), so what i have to do is to try to subtract one string from the other like a RTRIM, but only one match:

           

          select 'anything 11241 address xyz 154 address xyz 154'

          , 'address xyz 154'

          , /*third column should give as result the second column substracted from the first one, resulting in: anything 11241 address xyz 154*/

          from dual;

           

          Thanks in advanced.

          Whenever you have a problem, please post a little sample data (CREATE TABLE and INSERT statements, relevant columns only) from all tables involved, so that the people who want to help you can re-create the problem and test their ideas.
          Also post the exact results you want from that data, and an explanation of how you get those results from that data, with specific examples.  Include any special cases you want to handle, such as '8-digit phone number is 1234 1234' or 'Address: 1 Park Avenue New York New York'.

          Always say which version of Oracle you're using (for example, 12.2.0.1.0).
          See the Forum FAQ: Re: 2. How do I ask a question on the forums?

          • 2. Re: Substracting two strings in a query
            user9098259

            Hello,

            The oracle version is: Oracle Database 10g Release 10.2.0.5.0 - 64bit Production

             

            And about the sample data is limited to this query:

             

            select 'anything 11241 address xyz 154 address xyz 154'  --> column A

                      , 'address xyz 154'     --> column B

                      , ??--> column C should be column A - column B, in this case: anything 11241 address xyz 154

            from dual;

            • 3. Re: Substracting two strings in a query
              mathguy

              I believe this is doable in principle... but you need to define your problem clearly.

               

              How many such messy addresses do you expect to see in your data? If it is a small number, it may be best to identify them for "you" (you being any qualified human being) and let "you" make the final decision.

               

              The delicate part is to specify clearly what constitutes a "duplicate fragment". You don't want to, for example, change the end of an address string from 'Apartment 1313'  to 'Apartment 13' simply because the substring '13' is repeated at the end. Do you want to require, for example, a minimum LENGTH of the "repeated substring" before you consider removing it?

               

              For example, in the query below I find the longest repeated substring at the end of an address and remove it, but only if it has length three or more. I show the original and the modified address in the output, allowing a human to inspect them and decide what to do. (In particular, I wrote a SELECT query, not an UPDATE).

               

              with

                test_data (address) as (

                  select 'anything 11241 address xyz 154 address xyz 154' from dual union all

                  select 'another address ending in Apartment 1313'       from dual union all

                  select 'address with no repeated substring'             from dual

                )

              select address, regexp_replace(address, '(.{3,})\1$', '\1') clean_address

              from   test_data

              ;

               

              ADDRESS                                          CLEAN_ADDRESS                          

              ------------------------------------------------ ------------------------------------------

              anything 11241 address xyz 154 address xyz 154   anything 11241 address xyz 154          

              another address ending in Apartment 1313         another address ending in Apartment 1313

              address with no repeated substring               address with no repeated substring

               

              Of course, it would be easy to add a column for the repeated fragment, and/or to exclude from the output the addresses that don't have a repeated fragment at the end. These are all minor adjustments.

              • 4. Re: Substracting two strings in a query
                user9098259

                Thank you Mathguy,

                 

                I actually know beforehand the repetead pattern and should only appear once at the end of the evaluated string, so actually is much more simple what I am trying to do and is limited to this query only:

                 

                select 'anything 11241 address xyz 154 address xyz 154'  --> column A

                          , 'address xyz 154'     --> column B

                          , ??--> column C should be column A - column B, in this case: anything 11241 address xyz 154

                from dual;

                • 5. Re: Substracting two strings in a query
                  mathguy

                  If you already have columns A and B populated with whatever strings, you could do something trivial to get column C, like this:

                  SUBSTR (A, 1, LENGTH(A) - LENGTH(B) )

                   

                  or if B may be NULL,  use NVL(LENGTH(B), 0)  instead of LENGTH(B)

                  • 6. Re: Substracting two strings in a query
                    Cookiemonster76

                    Here's one way:

                    SQL> WITH data AS (select 'anything 11241 address xyz 154 address xyz 154' A
                      2            , 'address xyz 154' b
                      3  from dual)
                      4  SELECT a,b, CASE WHEN INSTR(a,b,1,2) > 0 --check b appears twice in a
                      5              AND a LIKE '%'||b THEN --check b comes at end of a
                      6              SUBSTR(a, 1 , LENGTH(a) - LENGTH(b)) --chop off length of b
                      7              END AS c FROM DATA;
                    
                    
                    A                                              B               C
                    ---------------------------------------------- --------------- --------------------------------------------------------------------------------
                    anything 11241 address xyz 154 address xyz 154 address xyz 154 anything 11241 address xyz 154
                    

                     

                    EDIT: So it's Mathguy's solution with added checks in case b isn't always duplicated in a.

                    • 7. Re: Substracting two strings in a query
                      mathguy

                      Note, however, that in your example, Column B is not the LONGEST repeated substring (it does not include a leading space). That may be OK - perhaps the repeated fragment is the ENTIRE address, in which case the FIRST occurrence of the fragment is not preceded by a space. In any case, keep this in mind - perhaps you need a RTRIM also, to remove such superfluous spaces at the end of the result.

                      • 8. Re: Substracting two strings in a query
                        user9098259

                        Thanks a lot Cookiemonster76, this seems like is working pretty well

                         

                        Thanks to mathguy also.