1 2 3 Previous Next 37 Replies Latest reply: Jan 8, 2013 10:13 PM by 887479 Go to original post RSS
      • 15. Re: Remove repeating letters
        BrendanP
        Ugly, but it meets requirement of no row generation, and also has no iteration. Probably not very fast. I would try a PL/SQL solution.
        SQL> r
          1  with data as (
          2      select ' abcdaeebfghiejklc' str from dual
          3      union all
          4      select ' abbbbcdaeebfghiiiieeejklc' from dual
          5  )
          6  SELECT str,
          7          Regexp_Replace (
          8          Regexp_Replace (
          9          Regexp_Replace (
         10          Regexp_Replace (
         11          Regexp_Replace (
         12          Regexp_Replace (
         13          Regexp_Replace (
         14          Regexp_Replace (
         15          Regexp_Replace (
         16          Regexp_Replace (
         17          Regexp_Replace (
         18          Regexp_Replace (
         19          Regexp_Replace (
         20          Regexp_Replace (
         21          Regexp_Replace (
         22          Regexp_Replace (
         23          Regexp_Replace (
         24          Regexp_Replace (
         25          Regexp_Replace (
         26          Regexp_Replace (
         27          Regexp_Replace (
         28          Regexp_Replace (
         29          Regexp_Replace (
         30          Regexp_Replace (
         31          Regexp_Replace (
         32          Regexp_Replace (
         33            str,
         34          'aa+', 'a'),
         35          'bb+', 'b'),
         36          'cc+', 'c'),
         37          'dd+', 'd'),
         38          'ee+', 'e'),
         39          'ff+', 'f'),
         40          'gg+', 'g'),
         41          'hh+', 'h'),
         42          'ii+', 'i'),
         43          'jj+', 'j'),
         44          'kk+', 'k'),
         45          'll+', 'l'),
         46          'mm+', 'm'),
         47          'nn+', 'n'),
         48          'oo+', 'o'),
         49          'pp+', 'p'),
         50          'qq+', 'q'),
         51          'rr+', 'r'),
         52          'ss+', 's'),
         53          'tt+', 't'),
         54          'uu+', 'u'),
         55          'vv+', 'v'),
         56          'ww+', 'w'),
         57          'xx+', 'x'),
         58          'yy+', 'y'),
         59          'zz+', 'z')
         60          new_str
         61*   FROM data
        
        STR                        NEW_STR
        -------------------------- ------------------------------
         abcdaeebfghiejklc          abcdaebfghiejklc
         abbbbcdaeebfghiiiieeejklc  abcdaebfghiejklc
        
        SQL>
        Edited by: BrendanP on 08-Jan-2013 06:34
        Obviously, I'm only removing consecutive repeats here, which may not be sufficient...
        • 16. Re: Remove repeating letters
          887479
          Repeatation can be non-consecutive also..
          :(
          • 17. Re: Remove repeating letters
            Solomon Yakobson
            BrendanP wrote:
            Ugly, but it meets requirement of no row generation, and also has no iteration. Probably not very fast. I would try a PL/SQL solution.
            Obviously, I'm only removing consecutive repeats here, which may not be sufficient...
            Yes, very ugly. And to remove consecutive repeats all you need is:
            with data as (
                          select ' abcdaeebfghiejklc' str from dual union all
                          select ' abbbbcdaeebfghiiiieeejklc' from dual
                         )
            select  str,
                    regexp_replace(str,'(.)\1+','\1') new_str
              from  data
            /
            
            STR                        NEW_STR
            -------------------------- --------------------
             abcdaeebfghiejklc          abcdaebfghiejklc
             abbbbcdaeebfghiiiieeejklc  abcdaebfghiejklc
            
            SQL> 
            SY.
            • 18. Re: Remove repeating letters
              chris227
              884476 wrote:
              So why restrict yourself with imaginary problems of row generation?
              With 11.2 i would just give recursive subquery a try.
              As already posted, it is already tried..
              But you didnt mentioned the problem with it.
              You dont know the difference of regexp, xbd, model working on this internally, do you?
              So i dont think that this theoratical row generating stuff does really matters here. All solutions doing cpu cycles at first, i guess, not materializing any rows.

              No xml solution will be faster then recursive subquery with good old simple functions imho.

              Regexp wil only slow down it.

              I didnt use model in this case, because i didnt see any advantage to rec. subquery and the latter is more accepted in this place in my experience ...

              May be Solomons is quite faster than mine. However, i am interested in the problems you faced with it.
              • 19. Re: Remove repeating letters
                stefan nebesnak
                Try this:
                SQL> select regexp_replace('abcdaeebfghiejklc','(\w)\1*','\1') as val from dual;
                 
                VAL
                ----------------
                abcdaebfghiejklc
                • 20. Re: Remove repeating letters
                  padders
                  Output should be abcdefghijkl
                  Can you clarify the requirement please, is it merely coincidence that the output is in ascii order?
                  • 21. Re: Remove repeating letters
                    Solomon Yakobson
                    stefan nebesnak wrote:
                    Try this:
                    SQL> select regexp_replace('abcdaeebfghiejklc','(\w)\1*','\1') as val from dual;
                    This is almost same as one I posted:
                    select  str,
                            regexp_replace(str,'(.)\1+','\1') new_str
                      from  data
                    / 
                    but will perform a bit slower. Why? Your code is replacing even non-repeating characters with itself. Pattern (\w)\1* means word character followed by itself repeated 0 or more times comparing to pattern (\w)\1+ which means word character followed by itself one or more times. Look how namy replacements your code does:
                    select regexp_replace('abcdaeebfghiejklc','(\w)\1*','[\1]') as val from dual
                    /
                    
                    VAL
                    ------------------------------------------------
                    [a][c][d][a][e][b][f][g][h][e][j][k][l][c]

                    SQL>
                    And look what happens when we change * to +:
                    SQL> select regexp_replace('abcdaeebfghiejklc','(\w)\1+','[\1]') as val from dual
                    2 /

                    VAL
                    ------------------
                    abcda[e]bfghiejklc

                    SQL>
                    SY.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            
                    • 22. Re: Remove repeating letters
                      chris227
                      with data as (
                                    select ' abcdaeebfghiejklc' str from dual union all
                                    select ' abbbbcdaeebfghiiiieeejklc' from dual
                                   )
                      select  str,
                          regexp_replace(
                            regexp_replace(        
                              str
                             ,'([[:alpha:]])\1+'
                             ,'\1'
                            )
                           ,'([[:alpha:]])(.+)\1+'
                           ,'\1\2'
                          ) new_str
                      from  data
                      
                      STR NEW_STR 
                      abcdaeebfghiejklc abcdebfghijklc 
                      abbbbcdaeebfghiiiieeejklc abcdebfghijklc 
                      Doesnt work on something like ababababa :-(
                      • 23. Re: Remove repeating letters
                        bencol
                        Using translate and replace
                        with t as (select 'abcdaeebfghiejklc' str, 'abcdefghijklmonpqrstuvwxyz' alpha  from dual)
                        select str, replace(translate(alpha,replace(translate(alpha,str,'@'),'@',null),'@'),'@',null) newstr
                        from t;
                        
                        STR               NEWSTR
                        _________________ ____________________
                        abcdaeebfghiejklc abcdefghijkl
                        • 24. Re: Remove repeating letters
                          BrendanP
                          It's not difficult to do a basic performance test on this. I added 10,000 records, all the same except for a leading row number, first with a lot of repetition, then again with less. I tested an XML solution, an RSF one, both copied from the thread, and added two PL/SQL solutions I did myself. The results were quite similar for both data sets, so I'll just give the first set:

                          Repetitive Summary
                          Method          Seconds
                          XML             88
                          RSF             79
                          PLS-1           79
                          PLS-2           73
                          Repetitive Listing
                          BEGIN
                          
                            FOR i IN 1..10000 LOOP
                          
                              INSERT INTO temp_values VALUES (i || 'abbbbcdaeebfghiiiieeejklc');
                          
                            END LOOP;
                          
                          END;
                          
                          SQL> SET TIMING ON
                          SQL> SET AUTOTRACE TRACEONLY
                          SQL> select  x.newstr
                            2    from  temp_values t,
                            3          xmltable(
                            4                   'string-join(distinct-values(ora:tokenize($str,",")),"")'
                            5                   passing regexp_replace(t.str,'(.)',',\1') as "str"
                            6                   columns newstr varchar2(2000) path '.'
                            7                  ) x
                            8  /
                          
                          10000 rows selected.
                          
                          Elapsed: 00:01:27.66
                          
                          Execution Plan
                          ----------------------------------------------------------
                          Plan hash value: 1189847430
                          
                          --------------------------------------------------------------------------------------------------------------
                          | Id  | Operation                            | Name                  | Rows  | Bytes | Cost (%CPU)| Time     |
                          --------------------------------------------------------------------------------------------------------------
                          |   0 | SELECT STATEMENT                     |                       | 10050 |   294K|   242K (10)| 00:50:21 |
                          |   1 |  SORT AGGREGATE                      |                       |     1 |  2002 |            |          |
                          |   2 |   VIEW                               |                       | 16360 |    31M|    24   (9)| 00:00:01 |
                          |   3 |    SORT UNIQUE                       |                       | 16360 | 32720 |    24   (9)| 00:00:01 |
                          |   4 |     COLLECTION ITERATOR PICKLER FETCH| XQSEQUENCEFROMXMLTYPE | 16360 | 32720 |    22   (0)| 00:00:01 |
                          |*  5 |  FILTER                              |                       |       |       |            |          |
                          |   6 |   TABLE ACCESS FULL                  | TEMP_VALUES           | 10050 |   294K|     6   (0)| 00:00:01 |
                          --------------------------------------------------------------------------------------------------------------
                          
                          Predicate Information (identified by operation id):
                          ---------------------------------------------------
                          
                             5 - filter(SYS_XQSTRJOIN( (SELECT "SYS_IXQAGG"(SYS_XQ_PKSQL2XML("QMXQRW$DIST_V_0"."COLUMN_VALUE",1,
                                        2,0)) FROM  (SELECT DISTINCT SYS_XQ_UPKXML2SQL(VALUE(KOKBF$),2,1,0) "COLUMN_VALUE" FROM TABLE()
                                        "KOKBF$") "QMXQRW$DIST_V_0"),NULL) IS NOT NULL)
                          
                          Note
                          -----
                             - SQL plan baseline "SQL_PLAN_5dyg130da9sjn31769e17" used for this statement
                          
                          
                          Statistics
                          ----------------------------------------------------------
                                    0  recursive calls
                                    0  db block gets
                                30693  consistent gets
                                    0  physical reads
                                    0  redo size
                               204254  bytes sent via SQL*Net to client
                                 5021  bytes received via SQL*Net from client
                                  668  SQL*Net roundtrips to/from client
                                10000  sorts (memory)
                                    0  sorts (disk)
                                10000  rows processed
                          
                          SQL> SELECT  Multi.Rem_Dups (str) f_str
                            2    FROM  temp_values
                            3  /
                          
                          10000 rows selected.
                          
                          Elapsed: 00:01:19.23
                          
                          Execution Plan
                          ----------------------------------------------------------
                          Plan hash value: 2739914897
                          
                          ---------------------------------------------------------------------------------
                          | Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
                          ---------------------------------------------------------------------------------
                          |   0 | SELECT STATEMENT  |             | 10050 |   294K|     6   (0)| 00:00:01 |
                          |   1 |  TABLE ACCESS FULL| TEMP_VALUES | 10050 |   294K|     6   (0)| 00:00:01 |
                          ---------------------------------------------------------------------------------
                          
                          Note
                          -----
                             - SQL plan baseline "SQL_PLAN_f8jn1pqbmggpz308f0074" used for this statement
                          
                          
                          Statistics
                          ----------------------------------------------------------
                                    0  recursive calls
                                    0  db block gets
                                  693  consistent gets
                                    0  physical reads
                                    0  redo size
                               209146  bytes sent via SQL*Net to client
                                 5021  bytes received via SQL*Net from client
                                  668  SQL*Net roundtrips to/from client
                                    0  sorts (memory)
                                    0  sorts (disk)
                                10000  rows processed
                          
                          SQL> SELECT  Multi.Rem_Dups_2 (str) f_str
                            2    FROM  temp_values
                            3  /
                          
                          10000 rows selected.
                          
                          Elapsed: 00:01:13.39
                          
                          Execution Plan
                          ----------------------------------------------------------
                          Plan hash value: 2739914897
                          
                          ---------------------------------------------------------------------------------
                          | Id  | Operation         | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
                          ---------------------------------------------------------------------------------
                          |   0 | SELECT STATEMENT  |             | 10050 |   294K|     6   (0)| 00:00:01 |
                          |   1 |  TABLE ACCESS FULL| TEMP_VALUES | 10050 |   294K|     6   (0)| 00:00:01 |
                          ---------------------------------------------------------------------------------
                          
                          Note
                          -----
                             - SQL plan baseline "SQL_PLAN_ga88vm0npmsky308f0074" used for this statement
                          
                          
                          Statistics
                          ----------------------------------------------------------
                                   18  recursive calls
                                   66  db block gets
                                  722  consistent gets
                                   10  physical reads
                                21936  redo size
                               209146  bytes sent via SQL*Net to client
                                 5021  bytes received via SQL*Net from client
                                  668  SQL*Net roundtrips to/from client
                                    0  sorts (memory)
                                    0  sorts (disk)
                                10000  rows processed
                          
                          SQL> with r(
                            2         str,
                            3         new_str,
                            4         l
                            5        ) as (
                            6               select  str,
                            7                       cast(substr(str,1,1) as varchar2(4000)) new_str,
                            8                       1 l
                            9                 from  temp_values
                           10              union all
                           11               select  str,
                           12                       case instr(new_str,substr(str,l + 1,1))
                           13                         when 0 then new_str || substr(str,l + 1,1)
                           14                         else new_str
                           15                       end new_str,
                           16                       l + 1 l
                           17                 from  r
                           18    where l < length(str)
                           19             )
                           20  select new_str
                           21    from  r
                           22    where l = length(str)
                           23  /
                          
                          10000 rows selected.
                          
                          Elapsed: 00:01:19.41
                          
                          Execution Plan
                          ----------------------------------------------------------
                          Plan hash value: 138163077
                          
                          ---------------------------------------------------------------------------------------------------------
                          | Id  | Operation                                 | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
                          ---------------------------------------------------------------------------------------------------------
                          |   0 | SELECT STATEMENT                          |             | 20100 |    77M|    12   (0)| 00:00:01 |
                          |*  1 |  VIEW                                     |             | 20100 |    77M|    12   (0)| 00:00:01 |
                          |   2 |   UNION ALL (RECURSIVE WITH) BREADTH FIRST|             |       |       |            |          |
                          |   3 |    TABLE ACCESS FULL                      | TEMP_VALUES | 10050 |   294K|     6   (0)| 00:00:01 |
                          |*  4 |    RECURSIVE WITH PUMP                    |             |       |       |            |          |
                          ---------------------------------------------------------------------------------------------------------
                          
                          Predicate Information (identified by operation id):
                          ---------------------------------------------------
                          
                             1 - filter("L"=LENGTH("STR"))
                             4 - filter("L"<LENGTH("STR"))
                          
                          Note
                          -----
                             - SQL plan baseline "SQL_PLAN_cua31bqy4pf1q1f42731d" used for this statement
                          
                          
                          Statistics
                          ----------------------------------------------------------
                                    4  recursive calls
                               574517  db block gets
                                   27  consistent gets
                                    0  physical reads
                                    0  redo size
                               209148  bytes sent via SQL*Net to client
                                 5021  bytes received via SQL*Net from client
                                  668  SQL*Net roundtrips to/from client
                                   31  sorts (memory)
                                    0  sorts (disk)
                                10000  rows processed
                          • 25. Re: Remove repeating letters
                            Frank Kulash
                            Hi,

                            Here's a recursive WITH solution:
                            WITH    no_dups (orig_str, str, replace_char)     AS
                            (
                                 SELECT     str          AS orig_str
                                 ,     str          AS str
                                 ,     'a'          AS replace_char
                                 FROM     data
                                UNION ALL
                                    SELECT     orig_str
                                 ,     SUBSTR ( str
                                             , 1
                                             , INSTR ( str || replace_char
                                                         , replace_char
                                                  )
                                             ) || REPLACE ( SUBSTR ( str
                                                                           , INSTR ( str || replace_char
                                                                      , replace_char
                                                               ) + 1
                                                          )
                                                    , replace_char
                                                    )          AS str
                                 ,       CHR (1 + ASCII (replace_char))     AS relace_char
                                 FROM     no_dups
                                 WHERE     replace_char     <= 'z'
                            )
                            SELECT    orig_str
                            ,       str
                            FROM       no_dups
                            WHERE       replace_char     = '{'     -- next character after 'z'
                            ORDER BY  orig_str
                            ;
                            This is kind of like Peter's solution, but instead of nesting 26 function calls, this recurses 26 times.
                            I used REPLACE rather than REGEXP_REPLACE, since REGEXP functions tend to be slow. This is one case where less powerful functions can do the same thing, but you have to call 5 powerful functions to do the same thing as REGEXP_REPLACE, so it may be faster to stick with REGEXP_REPLACE.
                            • 26. Re: Remove repeating letters
                              bencol
                              Brendan,

                              How does that compare with my solution - it seems much quicker to me, but obviously on a different set-ups. For the basic timing data mine ran in 0.64s compare to 3.57s (where yours was 88s) for your xml solution.

                              Ben
                              • 27. Re: Remove repeating letters
                                chris227
                                Frank Kulash wrote:
                                This is kind of like Peter's solution, but instead of nesting 26 function calls, this recurses 26 times.
                                I used REPLACE rather than REGEXP_REPLACE, since REGEXP functions tend to be slow. This is one case where less powerful functions can do the same thing, but you have to call 5 powerful functions to do the same thing as REGEXP_REPLACE, so it may be faster to stick with REGEXP_REPLACE.
                                But in what kind this is better than the solution given by me?
                                I just need number of distinct letters recursions. Mine is also not using regexp.

                                Edited by: chris227 on 08.01.2013 09:29
                                correction, solomons solutions needs lengt of string recursions, if i see correctly
                                • 28. Re: Remove repeating letters
                                  chris227
                                  Yours solution is the one, regarding the requirements a-z characters only and the order of the result should be alphabetical.
                                  Hats off!
                                  • 29. Re: Remove repeating letters
                                    odie_63
                                    On which database did you run the test?

                                    I get quite different results on 11g XE (11.2.0.2) running on my local machine :
                                    Connected to:
                                    Oracle Database 11g Express Edition Release 11.2.0.2.0 - Production
                                    
                                    SQL> set timing on
                                    SQL> set autrotrace traceonly statistics
                                    SP2-0158: unknown SET option "autrotrace"
                                    SQL> set autotrace traceonly statistics
                                    SQL> select x.newstr
                                      2  from temp_values t,
                                      3       xmltable(
                                      4        'string-join(distinct-values(ora:tokenize($str,",")),"")'
                                      5        passing regexp_replace(t.str,'(.)',',\1') as "str"
                                      6        columns newstr varchar2(2000) path '.'
                                      7    ) x
                                      8  ;
                                    
                                    10000 rows selected.
                                    
                                    Elapsed: 00:00:02.09
                                    
                                    Statistics
                                    ----------------------------------------------------------
                                             23  recursive calls
                                              0  db block gets
                                           1186  consistent gets
                                              0  physical reads
                                              0  redo size
                                         260658  bytes sent via SQL*Net to client
                                           7746  bytes received via SQL*Net from client
                                            668  SQL*Net roundtrips to/from client
                                          10000  sorts (memory)
                                              0  sorts (disk)
                                          10000  rows processed
                                    
                                    SQL>    with r(
                                      2         str,
                                      3         new_str,
                                      4         l
                                      5        ) as (
                                      6               select  str,
                                      7                       cast(substr(str,1,1) as varchar2(4000)) new_str,
                                      8                       1 l
                                      9                 from  temp_values
                                     10              union all
                                     11               select  str,
                                     12                       case instr(new_str,substr(str,l + 1,1))
                                     13                         when 0 then new_str || substr(str,l + 1,1)
                                     14                         else new_str
                                     15                       end new_str,
                                     16                       l + 1 l
                                     17                 from  r
                                     18    where l < length(str)
                                     19             )
                                     20  select  new_str
                                     21    from  r
                                     22    where l = length(str)
                                     23  ;
                                    
                                    10000 rows selected.
                                    
                                    Elapsed: 00:00:02.07
                                    
                                    Statistics
                                    ----------------------------------------------------------
                                             10  recursive calls
                                         557572  db block gets
                                            109  consistent gets
                                              0  physical reads
                                             96  redo size
                                         260659  bytes sent via SQL*Net to client
                                           7746  bytes received via SQL*Net from client
                                            668  SQL*Net roundtrips to/from client
                                             31  sorts (memory)
                                              0  sorts (disk)
                                          10000  rows processed