Forum Stats

  • 3,827,386 Users
  • 2,260,768 Discussions
  • 7,897,223 Comments

Discussions

Regexp Challenge

572471
572471 Member Posts: 984 Green Ribbon
edited May 13, 2007 3:38AM in SQL & PL/SQL
Hello, everyone.

I think it's impossible, but still maybe there would be some thoughts.
Well, the question is:
I have a sentence.
Is it possible to replace particular words with the others?
E.g. sentence 'I want milk'
What I want is: to replace 'I' with 'He',
'want' with 'wants' and 'milk' with 'sugar'.

The main idea is to use only one level of nesting in "replace" function.
I have an example, the first column is some kinda "prototype". It has the logic which I want to implement, but it's not working properly.
The two other columns are working variants, but they are not what I want.
SQL> with t as (select 'I want milk' str from dual)
  2  --
  3  select regexp_replace(str, '(I|want|milk)', decode('\1', 'I', 'He', 'want', 'wants', 'milk', 'sugar', '\1')) smth_like_this,
  4         replace(replace(replace(str,'I', 'He'),'want', 'wants'), 'milk', 'sugar') what_is_needed,
  5         (select str from dual
  6          model
  7           reference r
  8            on  (select 1 rn, 'I' from_str, 'He' to_str from dual union all
  9                select 2, 'want', 'wants' from dual union all
 10                select 3, 'milk', 'sugar' from dual)
 11            dimension by (rn)
 12            measures (from_str f, to_str t)
 13            main m
 14             dimension by (0 dim)
 15             measures (cast(str AS VARCHAR2(4000)) str)
 16             rules iterate (1000) until (presentv(f[iteration_number+1],1,0)=0)
 17              (str[0] = replace(str[0], r.f[iteration_number+1], r.t[iteration_number+1]))
 18           ) from t
 19  /

SMTH_LIKE_THIS WHAT_IS_NEEDED (SELECTSTRFROMDUALMODELREFEREN
-------------- -------------- --------------------------------------------------------------------------------
I want milk    He wants sugar He wants sugar
«1

Comments

  • cd_2
    cd_2 Member Posts: 5,021
    edited May 11, 2007 2:48PM
    I don't see a pure regex solution since there is no appropriate loop construct, this is what I came up with:
    WITH t AS (SELECT 'I want milk' col1
                 FROM dual)
       , u AS (SELECT 'I' orig, 'He' repl
                 FROM dual
                UNION 
               SELECT 'want', 'wants'
                 FROM dual
                UNION
               SELECT 'milk', 'sugar'
                 FROM dual
              )   
    SELECT MAX(SYS_CONNECT_BY_PATH(NVL(u.repl, t.word), ' '))
      FROM (SELECT REGEXP_SUBSTR(t.col1, '[^ ]+', 1, LEVEL) word 
                 , level lvl
              FROM t
              CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(t.col1, '([^ ]+) *', '1'))
           ) t
         , u
     WHERE t.word(+) = u.orig    
     START WITH t.lvl = 1
     CONNECT BY PRIOR t.lvl = t.lvl - 1
      ;
    C.
  • John Spencer
    John Spencer Member Posts: 8,567 Bronze Crown
    What I find interesting is that the DECODE seems to recognize the back reference in the else bit, but not as in expression bit.

    This leaves the string unchanged:
    SQL> SELECT REGEXP_REPLACE(str, '(I|want|milk)',
    2 DECODE('\1', 'I', 'He',
    3 'want', 'wants',
    4 'milk', 'sugar', '\1')) smth_like_this
    5 FROM (SELECT 'I want milk please' str FROM dual);

    SMTH_LIKE_THIS
    ------------------
    I want milk please
    But if we replace the back reference in the else part with fixed text, we get:
    SQL> SELECT REGEXP_REPLACE(str, '(I|want|milk)',
    DECODE('\1', 'I', 'He',
    'want', 'wants',
    'milk', 'sugar', 'fred')) smth_like_this
    FROM (SELECT 'I want milk please' str FROM dual); 2 3 4 5

    SMTH_LIKE_THIS
    ---------------------
    fred fred fred please
    John
  • cd_2
    cd_2 Member Posts: 5,021
    edited May 11, 2007 3:55PM
    I don't think that DECODE recognizes the backreference, all it does is select the ELSE part of it's statement since there is no match with the literal '\1'. You'll get the same result with this:
    SELECT REGEXP_REPLACE(str, '(I|want|milk)', 'fred')
      FROM (SELECT 'I want milk please' str 
              FROM dual);
    C.
  • 572471
    572471 Member Posts: 984 Green Ribbon
    2 John Spencer
    What I find interesting is that the DECODE seems to
    recognize the back reference in the else bit, but not
    as in expression bit.
    Fully agree with cd
    decode takes '\1' as just a string '\1' but not a back reference.
    I used it just trying to illustrate what I wanted.

    2 cd
    You made a smal mistake: (+) in the wrong place.
    But the idea is clear. Thanks.
    SQL> WITH t AS (SELECT 'I want please milk' col1
      2               FROM dual)
      3     , u AS (SELECT 'I' orig, 'He' repl
      4               FROM dual
      5              UNION
      6             SELECT 'want', 'wants'
      7               FROM dual
      8              UNION
      9             SELECT 'milk', 'sugar'
     10               FROM dual
     11            )
     12  SELECT MAX(SYS_CONNECT_BY_PATH(NVL(u.repl, t.word), ' '))
     13    FROM (SELECT REGEXP_SUBSTR(t.col1, '[^ ]+', 1, LEVEL) word
     14               , level lvl
     15            FROM t
     16            CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(t.col1, '([^ ]+) *', '1'))
     17         ) t
     18       , u
     19   WHERE t.word = u.orig<strong>(+)</strong>
     20   START WITH t.lvl = 1
     21   CONNECT BY PRIOR t.lvl = t.lvl - 1
     22  /
    
    MAX(SYS_CONNECT_BY_PATH(NVL(U.
    --------------------------------------------------------------------------------
     He wants please sugar
    
    SQL>
    well, it's a pity there's no such a functionality.
  • 572471
    572471 Member Posts: 984 Green Ribbon
    edited May 11, 2007 4:44PM
    cd
    if you have more than one string in table t
    you should change the query that way:
    SQL> WITH t AS (SELECT 'I want please milk' col1
    2 FROM dual union all
    3 select 'He wants milk' from dual)
    4 , u AS (SELECT 'I' orig, 'He' repl
    5 FROM dual
    6 UNION
    7 SELECT 'want', 'wants'
    8 FROM dual
    9 UNION
    10 SELECT 'milk', 'sugar'
    11 FROM dual
    12 )
    13 --
    14 SELECT max(SYS_CONNECT_BY_PATH(NVL(u.repl, t.word), ' '))
    15 FROM (SELECT rn, REGEXP_SUBSTR(t.col1, '[^ ]+', 1, LEVEL) word
    16 , level lvl
    17 FROM (select rownum rn, t.* from t)t
    18 CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(t.col1, '([^ ]+) *', '1'))
    19 and prior rn = rn
    20 and prior dbms_random.value is not null
    21 ) t
    22 , u
    23 WHERE t.word = u.orig(+)
    24 START WITH t.lvl = 1
    25 CONNECT BY PRIOR t.lvl = t.lvl - 1
    26 and prior rn = rn
    27 group by rn
    28 /

    MAX(SYS_CONNECT_BY_PATH(NVL(U.
    --------------------------------------------------------------------------------
    He wants please sugar
    He wants sugar
    but you can see a small drawback.
    in the second sentence word "wants" hasn't change although it includes word "want".
    I think model is better here:
    SQL> WITH t AS (SELECT 'I want please milk' col
    2 FROM dual union all
    3 select 'He wants milk' from dual)
    4 , u AS (SELECT 'I' orig, 'He' repl
    5 FROM dual
    6 UNION
    7 SELECT 'want', 'wants'
    8 FROM dual
    9 UNION
    10 SELECT 'milk', 'sugar'
    11 FROM dual
    12 )
    13 --
    14 select col from (select rownum rn, t.* from t)
    15 model
    16 reference r
    17 on (select * from (select rownum rn, u.* from u))
    18 dimension by (rn)
    19 measures(orig, repl)
    20 main m
    21 dimension by (rn)
    22 measures(cast(col as varchar2(4000)) col)
    23 rules iterate (1000) until (presentv(orig[iteration_number+1],1,0)=0)
    24 (col[ANY] = replace(col[CV()], r.orig[iteration_number+1], r.repl[iteration_number+1]))
    25 /

    COL
    --------------------------------------------------------------------------------
    He wants please sugar
    He wantss sugar
  • John Spencer
    John Spencer Member Posts: 8,567 Bronze Crown
    cd:

    While I acknowledge your rexexp expertise, the DECODE clearly does recognize the back reference, at least in part. I agree that it is comparing the literal string \1 with I and want and milk. Certainly the literal string \1 does not match any of those strings, so the else part is evaluated. The \1 in the else part appears to evaluate to the back reference. If it was again being interpreted as a literal string, I would expect to get the result:

    \1 \1 \1 please

    instead of getting the original string back.

    John
  • 572471
    572471 Member Posts: 984 Green Ribbon
    If it was again being interpreted as a
    literal string, I would expect to get the result:

    \1 \1 \1 please
    yep, sounds reasonable.
    really strange behaviour.
    So, somehow it recognizes backreferences, but only partially in the else part of decode or case.
    SQL> SELECT REGEXP_REPLACE(str, '(I|want|milk)', case '\1' when 'I' then 'He' else '\1' end),
      2         REGEXP_REPLACE(str, '(I|want|milk)', case '\1' when '\1' then 'He' else '\1' end)
      3    FROM (SELECT 'I want milk please' str
      4            FROM dual);
    
    REGEXP_REPLACE(STR,'(I|WANT|MI    REGEXP_REPLACE(STR,'(I|WANT|MI
    --------------------------------- ------------------------------------
    I want milk please                He He He please
    
    SQL> 
  • cd_2
    cd_2 Member Posts: 5,021
    In which case? In your first example, the result string of the decode should be '\1' and regexp can use that literal to return the backreference. All DECODE does is prepare a literal that can be used as result pattern.

    C.
  • MichaelS
    MichaelS Member Posts: 8,424 Bronze Crown
    XML is able to do this as well: ;-)
    michaels>  with t as (
     select 'I want please milk' col from dual union all
     select 'He wants milk' from dual
    ), 
    u as (
     select 'I' orig, 'He' repl from dual union  
     select 'want', 'wants'     from dual union 
     select 'milk', 'sugar'     from dual 
    )
    select col, 
           value(t) repl
      from t,
           xmltable('declare function local:replace_all($str, $orig, $repl)
                     {
                       if($orig[1]) then 
                         local:replace_all(ora:replace($str, $orig[1], $repl[1]), subsequence($orig, 2), subsequence($repl, 2))
                       else ($str)
                     };
                         
                     local:replace_all(/col, $u/ROWSET//ORIG//text(), $u/ROWSET//REPL//text())
                     ' passing xmltype('<col>'  || col  || '</col>'),
                               xmltype(cursor(select * from u)) as "u") t
    
    COL                REPL                     
    ------------------ -------------------------
    I want please milk He wants please sugar    
    He wants milk      He wantss sugar          
    
    
    2 rows selected.
  • 572471
    572471 Member Posts: 984 Green Ribbon
    thanks, michaels.
This discussion has been closed.