Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Regexp Challenge

572471May 11 2007 — edited May 13 2007
Hello, everyone.

I think it's impossible, but still maybe there would be some thoughts.
Well, the question is:
I have a sentence.
Is it possible to replace particular words with the others?
E.g. sentence 'I want milk'
What I want is: to replace 'I' with 'He',
'want' with 'wants' and 'milk' with 'sugar'.

The main idea is to use only one level of nesting in "replace" function.
I have an example, the first column is some kinda "prototype". It has the logic which I want to implement, but it's not working properly.
The two other columns are working variants, but they are not what I want.
SQL> with t as (select 'I want milk' str from dual)
  2  --
  3  select regexp_replace(str, '(I|want|milk)', decode('\1', 'I', 'He', 'want', 'wants', 'milk', 'sugar', '\1')) smth_like_this,
  4         replace(replace(replace(str,'I', 'He'),'want', 'wants'), 'milk', 'sugar') what_is_needed,
  5         (select str from dual
  6          model
  7           reference r
  8            on  (select 1 rn, 'I' from_str, 'He' to_str from dual union all
  9                select 2, 'want', 'wants' from dual union all
 10                select 3, 'milk', 'sugar' from dual)
 11            dimension by (rn)
 12            measures (from_str f, to_str t)
 13            main m
 14             dimension by (0 dim)
 15             measures (cast(str AS VARCHAR2(4000)) str)
 16             rules iterate (1000) until (presentv(f[iteration_number+1],1,0)=0)
 17              (str[0] = replace(str[0], r.f[iteration_number+1], r.t[iteration_number+1]))
 18           ) from t
 19  /

SMTH_LIKE_THIS WHAT_IS_NEEDED (SELECTSTRFROMDUALMODELREFEREN
-------------- -------------- --------------------------------------------------------------------------------
I want milk    He wants sugar He wants sugar

Comments

cd_2
I don't see a pure regex solution since there is no appropriate loop construct, this is what I came up with:
WITH t AS (SELECT 'I want milk' col1
             FROM dual)
   , u AS (SELECT 'I' orig, 'He' repl
             FROM dual
            UNION 
           SELECT 'want', 'wants'
             FROM dual
            UNION
           SELECT 'milk', 'sugar'
             FROM dual
          )   
SELECT MAX(SYS_CONNECT_BY_PATH(NVL(u.repl, t.word), ' '))
  FROM (SELECT REGEXP_SUBSTR(t.col1, '[^ ]+', 1, LEVEL) word 
             , level lvl
          FROM t
          CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(t.col1, '([^ ]+) *', '1'))
       ) t
     , u
 WHERE t.word(+) = u.orig    
 START WITH t.lvl = 1
 CONNECT BY PRIOR t.lvl = t.lvl - 1
  ;
C.
John Spencer

What I find interesting is that the DECODE seems to recognize the back reference in the else bit, but not as in expression bit.

This leaves the string unchanged:

SQL> SELECT REGEXP_REPLACE(str, '(I|want|milk)',
  2                        DECODE('\1', 'I', 'He',
  3                                     'want', 'wants',
  4                                     'milk', 'sugar', '\1')) smth_like_this
  5  FROM (SELECT 'I want milk please' str FROM dual);
 
SMTH_LIKE_THIS
------------------
I want milk please

But if we replace the back reference in the else part with fixed text, we get:

SQL> SELECT REGEXP_REPLACE(str, '(I|want|milk)',
                      DECODE('\1', 'I', 'He',
                                   'want', 'wants',
                                   'milk', 'sugar', 'fred')) smth_like_this
FROM (SELECT 'I want milk please' str FROM dual);  2    3    4    5

SMTH_LIKE_THIS
---------------------
fred fred fred please

John

cd_2
I don't think that DECODE recognizes the backreference, all it does is select the ELSE part of it's statement since there is no match with the literal '\1'. You'll get the same result with this:
SELECT REGEXP_REPLACE(str, '(I|want|milk)', 'fred')
  FROM (SELECT 'I want milk please' str 
          FROM dual);
C.
572471
2 John Spencer
What I find interesting is that the DECODE seems to
recognize the back reference in the else bit, but not
as in expression bit.
Fully agree with cd
decode takes '\1' as just a string '\1' but not a back reference.
I used it just trying to illustrate what I wanted.

2 cd
You made a smal mistake: (+) in the wrong place.
But the idea is clear. Thanks.
SQL> WITH t AS (SELECT 'I want please milk' col1
  2               FROM dual)
  3     , u AS (SELECT 'I' orig, 'He' repl
  4               FROM dual
  5              UNION
  6             SELECT 'want', 'wants'
  7               FROM dual
  8              UNION
  9             SELECT 'milk', 'sugar'
 10               FROM dual
 11            )
 12  SELECT MAX(SYS_CONNECT_BY_PATH(NVL(u.repl, t.word), ' '))
 13    FROM (SELECT REGEXP_SUBSTR(t.col1, '[^ ]+', 1, LEVEL) word
 14               , level lvl
 15            FROM t
 16            CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(t.col1, '([^ ]+) *', '1'))
 17         ) t
 18       , u
 19   WHERE t.word = u.orig(+)
 20   START WITH t.lvl = 1
 21   CONNECT BY PRIOR t.lvl = t.lvl - 1
 22  /

MAX(SYS_CONNECT_BY_PATH(NVL(U.
--------------------------------------------------------------------------------
 He wants please sugar

SQL>
well, it's a pity there's no such a functionality.
572471

cd
if you have more than one string in table t
you should change the query that way:

SQL> WITH t AS (SELECT 'I want please milk' col1
  2               FROM dual union all
  3             select 'He wants milk' from dual)
  4     , u AS (SELECT 'I' orig, 'He' repl
  5               FROM dual
  6              UNION
  7             SELECT 'want', 'wants'
  8               FROM dual
  9              UNION
 10             SELECT 'milk', 'sugar'
 11               FROM dual
 12            )
 13           --
 14  SELECT max(SYS_CONNECT_BY_PATH(NVL(u.repl, t.word), ' '))
 15    FROM (SELECT rn, REGEXP_SUBSTR(t.col1, '[^ ]+', 1, LEVEL) word
 16               , level lvl
 17            FROM (select rownum rn, t.* from t)t
 18            CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(t.col1, '([^ ]+) *', '1'))
 19             and prior rn = rn
 20             and prior dbms_random.value is not null
 21         ) t
 22       , u
 23   WHERE t.word = u.orig(+)
 24   START WITH t.lvl = 1
 25   CONNECT BY PRIOR t.lvl = t.lvl - 1
 26     and prior rn = rn
 27     group by rn
 28  /

MAX(SYS_CONNECT_BY_PATH(NVL(U.
--------------------------------------------------------------------------------
 He wants please sugar
 He wants sugar

but you can see a small drawback.
in the second sentence word "wants" hasn't change although it includes word "want".
I think model is better here:

SQL> WITH t AS (SELECT 'I want please milk' col
  2               FROM dual union all
  3             select 'He wants milk' from dual)
  4     , u AS (SELECT 'I' orig, 'He' repl
  5               FROM dual
  6              UNION
  7             SELECT 'want', 'wants'
  8               FROM dual
  9              UNION
 10             SELECT 'milk', 'sugar'
 11               FROM dual
 12            )
 13           --
 14  select col from (select rownum rn, t.* from t)
 15   model
 16    reference r
 17     on (select * from (select rownum rn, u.* from u))
 18      dimension by (rn)
 19      measures(orig, repl)
 20    main m
 21     dimension by (rn)
 22     measures(cast(col as varchar2(4000)) col)
 23      rules iterate (1000) until (presentv(orig[iteration_number+1],1,0)=0)
 24       (col[ANY] = replace(col[CV()], r.orig[iteration_number+1], r.repl[iteration_number+1]))
 25  /

COL
--------------------------------------------------------------------------------
He wants please sugar
He wantss sugar
John Spencer
cd:

While I acknowledge your rexexp expertise, the DECODE clearly does recognize the back reference, at least in part. I agree that it is comparing the literal string \1 with I and want and milk. Certainly the literal string \1 does not match any of those strings, so the else part is evaluated. The \1 in the else part appears to evaluate to the back reference. If it was again being interpreted as a literal string, I would expect to get the result:

\1 \1 \1 please

instead of getting the original string back.

John
572471
If it was again being interpreted as a
literal string, I would expect to get the result:

\1 \1 \1 please
yep, sounds reasonable.
really strange behaviour.
So, somehow it recognizes backreferences, but only partially in the else part of decode or case.
SQL> SELECT REGEXP_REPLACE(str, '(I|want|milk)', case '\1' when 'I' then 'He' else '\1' end),
  2         REGEXP_REPLACE(str, '(I|want|milk)', case '\1' when '\1' then 'He' else '\1' end)
  3    FROM (SELECT 'I want milk please' str
  4            FROM dual);

REGEXP_REPLACE(STR,'(I|WANT|MI    REGEXP_REPLACE(STR,'(I|WANT|MI
--------------------------------- ------------------------------------
I want milk please                He He He please

SQL> 
cd_2
In which case? In your first example, the result string of the decode should be '\1' and regexp can use that literal to return the backreference. All DECODE does is prepare a literal that can be used as result pattern.

C.
MichaelS
XML is able to do this as well: ;-)
michaels>  with t as (
 select 'I want please milk' col from dual union all
 select 'He wants milk' from dual
), 
u as (
 select 'I' orig, 'He' repl from dual union  
 select 'want', 'wants'     from dual union 
 select 'milk', 'sugar'     from dual 
)
select col, 
       value(t) repl
  from t,
       xmltable('declare function local:replace_all($str, $orig, $repl)
                 {
                   if($orig[1]) then 
                     local:replace_all(ora:replace($str, $orig[1], $repl[1]), subsequence($orig, 2), subsequence($repl, 2))
                   else ($str)
                 };
                     
                 local:replace_all(/col, $u/ROWSET//ORIG//text(), $u/ROWSET//REPL//text())
                 ' passing xmltype('<col>'  || col  || '</col>'),
                           xmltype(cursor(select * from u)) as "u") t

COL                REPL                     
------------------ -------------------------
I want please milk He wants please sugar    
He wants milk      He wantss sugar          


2 rows selected.
572471
thanks, michaels.
cd_2
I'm impressed. Any chance that you're going to write an Introduction to XML functions? ;-)

C.
MichaelS
Any chance that you're going to write an Introduction to XML functions?
You think it is neccessary ;) ? I think the MODEL clause is much more harder to grasp, while those XML functions are at least as powerful (I think even more powerful) but easier to read (personal opinion).
Additionally model is proprietary to oracle, while xml/xquery is (to a high degree at least) open standard and useful in other areas as well.

Would they only be a little (much) more perfomant, I would never look into model I guess ;)
546595
True Cd,
Michaels Is XML lover and he has some nice solutions with it.
He should surely write some artical the way you did it for Regexp.
I thnkm also Rob and Volder can write artical on Model, Or amy be someone else.
Great work guys , Keep it up.

:-)
cd_2
You think it is neccessary ;) ?
Introductory articles do have their merits. Just asking, since you like to work with XML. ;-) Same goes for the MODEL experts.

C.
572471
I thnkm also Rob and Volder can write artical on Model, Or amy be someone else.
Rob said he's working on a piece about model clause. So it would be nice if he provides it for forum members also.

Joe Fuda has already written some kinda article or intoduction to the model clause.
If you have read some documentation or articles about model, e.g. given by Rob

it would be very useful if you go through survey provided by Joe Fuda.
1 - 15
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Jun 10 2007
Added on May 11 2007
15 comments
1,080 views