Forum Stats

  • 3,760,128 Users
  • 2,251,647 Discussions
  • 7,870,992 Comments

Discussions

Equivalents To SQL Server's STUFF() Function

2

Answers

  • CarlosDLG
    CarlosDLG Member Posts: 1,361 Silver Trophy
    edited Jul 28, 2016 2:03PM
    Is anyone willing to contribute the regex solution?
    

    For your own learning, you should try to build the regexp version.  You can then post it here to get feedback.

    John StegemanRay Winkelman
  • BluShadow
    BluShadow Member, Moderator Posts: 41,383 Red Diamond
    edited Jul 28, 2016 2:19PM
    John Stegeman wrote:
    
    regexp will NOT be the one you want. Guaranteed.
    
    

    Maybe... maybe not.

    We know regexp functions are more CPU intensive in general, but the payoff between a single fairly simple regexp compared to several regular string functions may be ok... it depends on what you're doing.  Certainly if it's just for a small set of data, the chances are you won't notice any real difference.

  • Ray Winkelman
    Ray Winkelman Member Posts: 75 Green Ribbon
    edited Jul 28, 2016 2:22PM

    Provided I thought that the solution will be a long one, I would do that. 

    It's because I know that the solution will be a short one, I know that critiquing my efforts will almost certainly require a greater collective effort than someone who's better versed in the regex library posting a proficient solution.

    Trust me. I'm just a monkey with a keyboard.

    On another note, I trust John entirely (And my own intuitions) when he says:

    regexp will NOT be the one you want. Guaranteed.
    

    So it's time to mark this sucker as answered!

    I only asked for the regex example because it was entertained earlier in the thread, and if provided, we would have exhausted all (the discussed) possible solutions to my problem.

  • Etbin
    Etbin Member Posts: 8,968 Gold Crown
    edited Jul 28, 2016 2:29PM

    More fuction calls might provide more not null results.

    Replacing nvl(p_start,0) with case when p_start is null then 0 else p_start end and accordingly might be faster.


    with

    data as

    (select '123456789' p_expr,3 p_start,5 p_length,'***' p_stuff from dual union all

    select null,null,null,null from dual union all

    select null,null,null,'123456789' from dual union all

    select null,0,0,'123456789' from dual union all

    select null,0,3,'123456789' from dual union all

    select null,3,0,'123456789' from dual union all

    select '+~+~+~+~',0,0,'123456789' from dual union all

    select '+~+~+~+~',3,0,'123456789' from dual union all

    select '+~+~+~+~',0,3,'123456789' from dual union all

    select 'ABCDEFGH',3,11,'abcdefgh' from dual union all

    select 'ABCDEFGH',3,3,'abcdefgh' from dual union all

    select 'ABCDEFGH',3,3,null from dual union all

    select 'ABCDEFGHIJK',-5,3,'!!!' from dual union all

    select 'ABCDEFGHIJK',-5,-3,'!!!' from dual union all

    select 'ABCDEFGHIJK',5,-3,'!!!' from dual union all

    select '987654321',null,null,'123456789' from dual union all

    select '987654321',null,null,null from dual

    )

    select p_expr,p_start,p_length,p_stuff,

           case when p_start is not null

                 and p_length is not null

                 and p_start > 0

                 and p_start < length(p_expr)

                then substr(p_expr,1,p_start - 1)||p_stuff||substr(p_expr,p_start + p_length)

               else null

             end result,

             substr(p_expr,1,nvl(p_start,0) - 1)||p_stuff||substr(p_expr,nvl(p_start,0) + nvl(p_length,0)) variant

      from data

    P_EXPRP_STARTP_LENGTHP_STUFFRESULTVARIANT
    12345678935***12***8912***89
    ------
    ---123456789-123456789
    -00123456789-123456789
    -03123456789-123456789
    -30123456789-123456789
    +~+~+~+~00123456789-123456789+~+~+~+~
    +~+~+~+~30123456789+~123456789+~+~+~+~123456789+~+~+~
    +~+~+~+~03123456789-123456789+~+~+~
    ABCDEFGH311abcdefghABabcdefghABabcdefgh
    ABCDEFGH33abcdefghABabcdefghFGHABabcdefghFGH
    ABCDEFGH33-ABFGHABFGH
    ABCDEFGHIJK-53!!!-!!!JK
    ABCDEFGHIJK-5-3!!!-!!!DEFGHIJK
    ABCDEFGHIJK5-3!!!ABCD!!!BCDEFGHIJKABCD!!!BCDEFGHIJK
    987654321--123456789-123456789987654321
    987654321----987654321

    Regards

    Etbin

    Ray WinkelmanRay WinkelmanJonathan Hult
  • John Stegeman
    John Stegeman Member Posts: 24,269 Blue Diamond
    edited Jul 28, 2016 2:25PM

    Never trust, always verify

    At any rate, substr and concatenation are pretty simple operations, so I still believe it will be faster than regexp, Blu's point notwithstanding

    @BluShadow what about the overhead of having to parse/compile the regular expression itself? My "gut feel" is that just that part of the work will take more resource than the simple substr/||

    Ray WinkelmanAndrewSayer
  • Paulzip
    Paulzip Member Posts: 8,460 Blue Diamond
    edited Jul 28, 2016 2:28PM

    A regular expression solution will be slower, I can pretty much guarantee that. I forgot to add deterministic clause to my function which would allow result caching, so would be faster if the same inputs occurred.

    I'd be very surprised if you could create something faster, as any solution would still need to use substr.

    EtbinRay Winkelman
  • John Stegeman
    John Stegeman Member Posts: 24,269 Blue Diamond
    edited Jul 28, 2016 2:28PM

    and FYI: Google found the same function

  • Ray Winkelman
    Ray Winkelman Member Posts: 75 Green Ribbon
    edited Jul 28, 2016 2:45PM

    I love when an actually smart person contributes useful information in a place you would expect smart people to contribute useful information.

    You rock, Etbin.

  • BluShadow
    BluShadow Member, Moderator Posts: 41,383 Red Diamond
    edited Jul 28, 2016 3:30PM

    regexp is certainly likely to be slower in most cases, but for the purpose of illustrating (I don't have a database to hand to test at the minute)...

    It would be something like:

    regexp_replace(source_string, '^(.{4}).{3}(.*)$', '\1STUFFSTRING\2')


    So, in this example, the "4" is the number of characters to keep on the left

    The "3" is the number of characters to discard

    and the "STUFFSTRING" is the string to put in place of the 3 characters removed.

    Whilst regexp are generally considered CPU intensive, this pattern is relatively straightforward.

    The first 4 character are easy to parse,  The next 3 characters are easy to parse and the remainder of the string are easy to parse.

    The replacement is just including the first 4 character and the remainder of the string around the string you want to include in place of the 3 that are removed.

    The use of "^" to indicate the start of the string and the "$" to indicate the end of the string will help with parsing as it provides definite anchor points for the parsing to consider.

    Certainly, other regular expression patterns can be more complex and more CPU intensive, especially if they include "|" (or), and/or variable string length patterns etc.

    The best way to determine what's more performant between regexp and regular string functions would be to test it on a good volume of data.

    Ray WinkelmanRay Winkelman
  • BluShadow
    BluShadow Member, Moderator Posts: 41,383 Red Diamond
    edited Jul 28, 2016 3:35PM
    John Stegeman wrote:
    
     BluShadow what about the overhead of having to parse/compile the regular expression itself? My "gut feel" is that just that part of the work will take more resource than the simple substr/||
    

    Overhead of parsing, whether it's code or string parsing, always depends on the complexity of what you're parsing.  Some things are simple and parsed quickly, other things are complex and slow.

    In this 'challenge' the obtaining of a fixed set of characters from the left, followed by a fixed set of characters to discard, followed by the remaining characters is not complex.  Sure, SUBSTR can use relative byte positioning rather than having to do much in the way of parsing, but the difference I suspect will be negligible unless we're processing large volumes of data.

This discussion has been closed.