Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Equivalents To SQL Server's STUFF() Function

Ray WinkelmanJul 28 2016 — edited Jul 28 2016

Hey Guys!

Let's have some fun with strings today.

So, I've got some T-SQL code I'm translating into PL-SQL, and I've been encountering some uses of SQL Server's STUFF() function. Microsoft's documentation on the STUFF() function states:

The STUFF function inserts a string into another string. It deletes a specified length of characters in the first string at the start position and then inserts the second string into the first string at the start position.

The header of the STUFF() function accepts the following arguments:

STUFF ( character_expression , start , length , replaceWith_expression )


So, considering the following code as our example, how can we manipulate a string in PL-SQL with as few calls to system functions as possible? Let's aim for the most performant solution - as always.

select stuff(

(

    select '123456789'

    from   table_name for clob path('')

), 1, 4, ''

) as example

from dual;

And, GO!

This post has been answered by Paulzip on Jul 28 2016
Jump to Answer

Comments

John Stegeman

INSTR() and SUBSTR() would work just fine edit: DOH, what was I thinking?!?! Saw Frank's reply and realized that I typed without thinking. @"Frank Kulash", we also need ||

How it would work is left as an exercise for the reader.

Frank Kulash

Hi,

To get the same results in Oracle, SUBSTR is the only function you need.  In general, you'll have to call that function twice: once to get the part of the original string before start, and once to get the part after (start + length).

If you really, really wanted to minimize the number of function calls, you can do it with a single call to REGEXP_REPLACE, but it would require more || operations.  Since the || operrator does the same thing as the CONCAT function, I assume you'd want to minimize them, too.

Paulzip
Answer

create or replace function Stuff(pExpr varchar2,

                                 pStart integer,

                                 pLength integer,

                                 pReplace varchar2) return varchar2 is

begin

  return case

           when pStart is not null and pLength is not null and

                pStart > 0 and pStart < length(pExpr) then

             substr(pExpr, 1, pStart - 1)||pReplace||substr(pExpr, pStart + pLength)

           else null

         end; 

end;

Marked as Answer by Ray Winkelman · Sep 27 2020
BluShadow

Or it could be done using a single regexp_replace function, but again, if this isn't a question about an issue, I'll leave it up to those who want to have the exercise.

Frank Kulash

Hi,

John Stegeman wrote:

Frank Kulash, we also need ||

...

|| is an operator, not a function.

As OP phrased it, this sounds like a question to test how clever you are and how well you know Oracle SQL, regardless of any practical uses.

Of course, OP may have phrased the question poorly, and may actually want a simple and/or efficient way, regardless of what tools and techniques are used.

Ray Winkelman

Hey guys.

To clear up some ambiguity here: I'm looking for the least processing intensive drop-in replacement for the STUFF() method. Paulzip provided exactly what I was looking for, bravo!

Is anyone willing to contribute the regex solution? I'd like to compare the explain plan of both routines. Please wrap the regex solution in a function called STUFF() like Paulzip did.

This may seem like a trivial question to some of you veterans, but it's important to me that this function is optimal, as I'll be nesting calls to this in queries expected to return thousands of rows.

Here's Paulzip's plan, explained:

-----------------------------------------------------------------

| Id  | Operation        | Name | Rows  | Cost (%CPU)| Time     |

-----------------------------------------------------------------

|   0 | SELECT STATEMENT |      |     1 |     2   (0)| 00:00:01 |

|   1 |  FAST DUAL       |      |     1 |     2   (0)| 00:00:01 |

-----------------------------------------------------------------

John Stegeman

regexp will NOT be the one you want. Guaranteed.

Oh, and by the way - your execution plan has nothing to do with the function. I guess you did "select stuf(....) from dual" You'll get the same execution plan with a stuff() function written to use regexp.

Ray Winkelman

Yeah?

Although it wasn't a verbosely explained plan, I thought that the statistics were inclusive of the function call. Am I wrong?

Should I execute plan the statement instead?

Frank Kulash

Hi,

John Stegeman wrote:

... your execution plan has nothing to do with the function. I guess you did "select stuf(....) from dual" You'll get the same execution plan with a stuff() function written to use regexp.

Good point!

OP: the execution plan describes how rows are fetched.  In most cases, how the rows are fetched has much more to do with efficiency than what you do with the rows after you get them.  However, what you do with the rows after you get them can be significant.

Imagine you want to travel to another country, and have lunch after you get there.  If you're concerned about cost, you'll focus on how you get to the other country: a first-class plane ticket might cost a lot more than a second-class train ticket.  Where you have lunch and what you eat will make a difference, too, but probably not nearly as much.

John Stegeman

Explain plan is not going to be useful in evaluating the performance of a function (as long as the function isn't itself doing a bunch of SQL).

If you wanted to measure performance, I'd be doing something like using v$sesstat snapshots before and after executing the function a few (dozens, hundreds, thousands, depending on the function) times and seeing how much CPU is used and also measuring the elapsed times.

CarlosDLG

Is anyone willing to contribute the regex solution?

For your own learning, you should try to build the regexp version.  You can then post it here to get feedback.

BluShadow

John Stegeman wrote:

regexp will NOT be the one you want. Guaranteed.

Maybe... maybe not.

We know regexp functions are more CPU intensive in general, but the payoff between a single fairly simple regexp compared to several regular string functions may be ok... it depends on what you're doing.  Certainly if it's just for a small set of data, the chances are you won't notice any real difference.

Ray Winkelman

Provided I thought that the solution will be a long one, I would do that. 

It's because I know that the solution will be a short one, I know that critiquing my efforts will almost certainly require a greater collective effort than someone who's better versed in the regex library posting a proficient solution.

Trust me. I'm just a monkey with a keyboard.

On another note, I trust John entirely (And my own intuitions) when he says:

regexp will NOT be the one you want. Guaranteed.

So it's time to mark this sucker as answered!

I only asked for the regex example because it was entertained earlier in the thread, and if provided, we would have exhausted all (the discussed) possible solutions to my problem.

Etbin

More fuction calls might provide more not null results.

Replacing nvl(p_start,0) with case when p_start is null then 0 else p_start end and accordingly might be faster.


with

data as

(select '123456789' p_expr,3 p_start,5 p_length,'***' p_stuff from dual union all

select null,null,null,null from dual union all

select null,null,null,'123456789' from dual union all

select null,0,0,'123456789' from dual union all

select null,0,3,'123456789' from dual union all

select null,3,0,'123456789' from dual union all

select '+~+~+~+~',0,0,'123456789' from dual union all

select '+~+~+~+~',3,0,'123456789' from dual union all

select '+~+~+~+~',0,3,'123456789' from dual union all

select 'ABCDEFGH',3,11,'abcdefgh' from dual union all

select 'ABCDEFGH',3,3,'abcdefgh' from dual union all

select 'ABCDEFGH',3,3,null from dual union all

select 'ABCDEFGHIJK',-5,3,'!!!' from dual union all

select 'ABCDEFGHIJK',-5,-3,'!!!' from dual union all

select 'ABCDEFGHIJK',5,-3,'!!!' from dual union all

select '987654321',null,null,'123456789' from dual union all

select '987654321',null,null,null from dual

)

select p_expr,p_start,p_length,p_stuff,

       case when p_start is not null

             and p_length is not null

             and p_start > 0

             and p_start < length(p_expr)

            then substr(p_expr,1,p_start - 1)||p_stuff||substr(p_expr,p_start + p_length)

           else null

         end result,

         substr(p_expr,1,nvl(p_start,0) - 1)||p_stuff||substr(p_expr,nvl(p_start,0) + nvl(p_length,0)) variant

  from data

P_EXPRP_STARTP_LENGTHP_STUFFRESULTVARIANT
12345678935***12***8912***89
------
---123456789-123456789
-00123456789-123456789
-03123456789-123456789
-30123456789-123456789
+~+~+~+~00123456789-123456789+~+~+~+~
+~+~+~+~30123456789+~123456789+~+~+~+~123456789+~+~+~
+~+~+~+~03123456789-123456789+~+~+~
ABCDEFGH311abcdefghABabcdefghABabcdefgh
ABCDEFGH33abcdefghABabcdefghFGHABabcdefghFGH
ABCDEFGH33-ABFGHABFGH
ABCDEFGHIJK-53!!!-!!!JK
ABCDEFGHIJK-5-3!!!-!!!DEFGHIJK
2 people found this helpful
John Stegeman

Never trust, always verify

At any rate, substr and concatenation are pretty simple operations, so I still believe it will be faster than regexp, Blu's point notwithstanding

@"BluShadow" what about the overhead of having to parse/compile the regular expression itself? My "gut feel" is that just that part of the work will take more resource than the simple substr/||

Paulzip

A regular expression solution will be slower, I can pretty much guarantee that. I forgot to add deterministic clause to my function which would allow result caching, so would be faster if the same inputs occurred.

I'd be very surprised if you could create something faster, as any solution would still need to use substr.

John Stegeman

and FYI: Google found the same function

Ray Winkelman

I love when an actually smart person contributes useful information in a place you would expect smart people to contribute useful information.

You rock, Etbin.

BluShadow

regexp is certainly likely to be slower in most cases, but for the purpose of illustrating (I don't have a database to hand to test at the minute)...

It would be something like:

regexp_replace(source_string, '^(.{4}).{3}(.*)$', '\1STUFFSTRING\2')


So, in this example, the "4" is the number of characters to keep on the left

The "3" is the number of characters to discard

and the "STUFFSTRING" is the string to put in place of the 3 characters removed.

Whilst regexp are generally considered CPU intensive, this pattern is relatively straightforward.

The first 4 character are easy to parse,  The next 3 characters are easy to parse and the remainder of the string are easy to parse.

The replacement is just including the first 4 character and the remainder of the string around the string you want to include in place of the 3 that are removed.

The use of "^" to indicate the start of the string and the "$" to indicate the end of the string will help with parsing as it provides definite anchor points for the parsing to consider.

Certainly, other regular expression patterns can be more complex and more CPU intensive, especially if they include "|" (or), and/or variable string length patterns etc.

The best way to determine what's more performant between regexp and regular string functions would be to test it on a good volume of data.

BluShadow

John Stegeman wrote:

BluShadow what about the overhead of having to parse/compile the regular expression itself? My "gut feel" is that just that part of the work will take more resource than the simple substr/||

Overhead of parsing, whether it's code or string parsing, always depends on the complexity of what you're parsing.  Some things are simple and parsed quickly, other things are complex and slow.

In this 'challenge' the obtaining of a fixed set of characters from the left, followed by a fixed set of characters to discard, followed by the remaining characters is not complex.  Sure, SUBSTR can use relative byte positioning rather than having to do much in the way of parsing, but the difference I suspect will be negligible unless we're processing large volumes of data.

Paulzip

Etbin wrote:

More fuction calls might provide more not null results.

Replacing nvl(p_start,0) with case when p_start is null then 0 else p_start end and accordingly might be faster.

Regards

Etbin

Mine might be more nulls, but it matches the SQL Server notes on the operation and parameters of Stuff function!

Etbin

Until now I was only aware SQL Server exists and most probably such Stuff function parameters specifications (actually limitations) and notes on operation are just another reason to let the before mentioned awareness remain the same.

Might not be irrelevant to the OP though, but at least in principle, migrating from SQL Server to Oracle requiring Oracle to work according to SQL Server specifications sounds weird (at least to me).

A kind of modify-everything-without-changing-anything.

Regards

Etbin

1 - 22
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Aug 25 2016
Added on Jul 28 2016
22 comments
17,477 views