Skip to Main Content

SQL & PL/SQL

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Regular Expression / Mask

781721Dec 16 2010 — edited Dec 17 2010
Hi there

As input I have

A data string:
"1AZJ56U7"
And a mask string: (It does not really matter how this looks -> This is just an example of what I'd like to have as a mask string)
"-..-.-.."
The output should be a masked string:
"-AZ-5-U7"
What would be the best way to do this?

Thanks in advance,
Peter
This post has been answered by fsitja on Dec 16 2010
Jump to Answer

Comments

fsitja
Answer
You can use something like this:
select str, regexp_replace(str, '.(..).(.).(..)', '-\1-\2-\3') masked_str
  from t;
Sample execution:
SQL> with t as (
  2  select '1AZJ56U7' str from dual
  3  ) --
  4  select str, regexp_replace(str, '.(..).(.).(..)', '-\1-\2-\3') masked_str
  5    from t;
 
STR      MASKED_STR
-------- --------------------------------------------------------------------------------
1AZJ56U7 -AZ-5-U7
Marked as Answer by 781721 · Sep 27 2020
Rob van Wijk
user3764335 wrote:
What would be the best way to do this?
Peter,

The best way depends on your own requirements. You can code it as a PL/SQL function or you can do it in SQL.

If you are on 10g or higher, then you can use the SQL model clause to calculate this. Here is an example:
SQL> with t as
  2  ( select '1AZJ56U7' data_string
  3         , '-..-.-..' mask_string
  4      from dual
  5  )
  6  , constants as
  7  ( select '-' mask_character
  8         , '.' non_mask_character
  9      from dual
 10  )
 11  select data_string
 12       , mask_string
 13       , new_string
 14    from t
 15       , constants
 16   where length(data_string) = length(mask_string)
 17   model
 18         partition by (mask_character,non_mask_character)
 19         dimension by (0 i)
 20         measures (data_string,mask_string,cast(null as varchar2(1000)) new_string)
 21         rules iterate (1000) until iteration_number = length(data_string[0]) - 1
 22         ( new_string[0] = new_string[0] ||
 23           case substr(mask_string[0], iteration_number + 1, 1)
 24           when cv(mask_character) then
 25             cv(mask_character)
 26           when cv(non_mask_character) then
 27             substr(data_string[0], iteration_number + 1, 1)
 28           end
 29         )
 30  /

DATA_STRING          MASK_STRING          NEW_STRING
-------------------- -------------------- --------------------
1AZJ56U7             -..-.-..             -AZ-5-U7

1 row selected.
Regards,
Rob.

Edited by: Rob van Wijk on 16-dec-2010 14:59

If your "mask_string" is fixed, then fsitja's answer is of course much simpler, but if it is not then my answer still stands.
Frank Kulash
Hi,

Assuming you know some substring (not necessarily a single character: I used '~?~' below) that will never be mistaken for your text, you can do the job in pure SQL like this:
with t as
  ( select '1AZJ56U7' data_string
         , '-..-.-..' mask_string
      from dual
   )
SELECT	REPLACE ( SYS_CONNECT_BY_PATH ( CASE
					    WHEN  SUBSTR ( mask_string
						     	 , LEVEL
							 , 1
							 ) = '.'
					    THEN  SUBSTR ( data_string
						     	 , LEVEL
							 , 1
							 )
					    ELSE  '-'
			       	 	END
			       	      , '~?~'
				      )
		, '~?~'
		)	AS masked
FROM    t
WHERE	CONNECT_BY_ISLEAF	= 1
CONNECT BY	LEVEL <= LENGTH (data_string)
;
However, a user-defined function would be much more convenient and versatile.
Rob van Wijk
Hi Frank,

I was about to write that your query doesn't work for two or more rows, when I discovered that mine also doesn't work in that case.
Here is my modified version that does work with more rows:
SQL> with t as
  2  ( select '1AZJ56U7' data_string, '-..-.-..' mask_string from dual union all
  3    select '1AZJ56U8', '-.-.-.-.' from dual
  4  )
  5  , constants as
  6  ( select '-' mask_character
  7         , '.' non_mask_character
  8      from dual
  9  )
 10  select data_string
 11       , mask_string
 12       , new_string
 13    from t
 14       , constants
 15   where length(data_string) = length(mask_string)
 16   model
 17         partition by (mask_character,non_mask_character,data_string)
 18         dimension by (0 i)
 19         measures (mask_string,cast(null as varchar2(1000)) new_string)
 20         rules iterate (1000) until iteration_number = length(mask_string[0]) - 1
 21         ( new_string[0] = new_string[0] ||
 22           case substr(mask_string[0], iteration_number + 1, 1)
 23           when cv(mask_character) then
 24             cv(mask_character)
 25           when cv(non_mask_character) then
 26             substr(cv(data_string), iteration_number + 1, 1)
 27           end
 28         )
 29  /

DATA_STRING          MASK_STRING          NEW_STRING
-------------------- -------------------- --------------------
1AZJ56U7             -..-.-..             -AZ-5-U7
1AZJ56U8             -.-.-.-.             -A-J-6-8

2 rows selected.
And here is your current version:
SQL> with t as
  2  ( select '1AZJ56U7' data_string, '-..-.-..' mask_string from dual union all
  3    select '1AZJ56U8', '-.-.-.-.' from dual
  4  )
  5  SELECT  REPLACE ( SYS_CONNECT_BY_PATH ( CASE
  6                                              WHEN  SUBSTR ( mask_string
  7                                                           , LEVEL
  8                                                           , 1
  9                                                           ) = '.'
 10                                              THEN  SUBSTR ( data_string
 11                                                           , LEVEL
 12                                                           , 1
 13                                                           )
 14                                              ELSE  '-'
 15                                          END
 16                                        , '~?~'
 17                                        )
 18                  , '~?~'
 19                  )       AS masked
 20  FROM    t
 21  WHERE   CONNECT_BY_ISLEAF       = 1
 22  CONNECT BY      LEVEL <= LENGTH (data_string)
 23  /

MASKED
----------------------------------------------------------------------------------------------------
-AZ-5-U7
-AZ-5-U8
-AZ-5--7
-AZ-5--8
[removed lots of rows here]
-A-J-6-7
-A-J-6-8

256 rows selected.
Regards,
Rob.
Frank Kulash
Hi,
Rob van Wijk wrote:
... your query doesn't work for two or more rows...
That's right. The same basic idea can be made to work with any number of rows, but it's (even) more tedious.
MichaelS
Or with 11g's recursive WITH clause:
SQL> with t as
(
 select '1AZJ56U7' data_string, '-..-.-..' mask_string from dual union all  
 select '1AZJ56U8', '-.-.-.-.' from dual
),
t2 (p,p1,p2,i,c) as
(
  select data_string, data_string, mask_string, 1, 1 from t union all
  select p, regexp_replace(p1, '.', '-',1,i), p2, instr(p2,'-',1,c), c + 1 from t2 where i > 0 and  c <= length(p2)
)

select p data_string, p2 mask_string, p1 new_string from t2 where i=0
/
DATA_STRING  MASK_STRING  NEW_STRING
------------ ------------ ----------
1AZJ56U7     -..-.-..     -AZ-5-U7  
1AZJ56U8     -.-.-.-.     -A-J-6-8  

2 rows selected.
Hoek
Hi Rob,
query doesn't work for two or more rows
wow, a 'model that fails', if you'd aliased here as Doutzen then this would never have happened..oh, never mind, I'm too late as well I now see ;)

So: we're in fact talking about table unnesting here, is that the correct 'SQL/PL-SQL'-terminology from your point of view?
Especially since the model-clause is/seems so powerful (but hard to get familiar with), I would like to know if you have any 'donkey-bridges' ;)


Regards,

Martijn
Rob van Wijk
hoek wrote:
wow, a 'model that fails', if you'd aliased here as Doutzen then this would never have happened..oh, never mind, I'm too late as well I now see ;)

So: we're in fact talking about table unnesting here, is that the correct 'SQL/PL-SQL'-terminology from your point of view?
Especially since the model-clause is/seems so powerful (but hard to get familiar with), I would like to know if you have any 'donkey-bridges' ;)
Hi Martijn,

I understood the term 'donkey-bridges', and after I while I understood what the Doutzen reference was about, but unfortunately I was not able to understand your question. Can you please try to rephrase it? Or email it to me in Dutch?

My model clause query failed initially because I forgot to adhere to the rule that the partition and dimension columns should be unique. This is checked at runtime and not at compile time. And since 1 row is always unique, the error did not pop out the first time.

Regards,
Rob.
635471
user3764335 wrote:
What would be the best way to do this?
Looking at this at a high-level, I think that with the complexity of doing this in a generalised way in SQL (ie. one that would take account of an arbitrary number of masks) I would look at encapsulating this logic in a PL/SQL function. Some of these techniques are very clever stuff but if you saw them in code would you think, "Ah, a method for masking some elements of a string literal!"? Probably not.

You have to think about maintainability as well as performance ... and I doubt whether some of these techniques are even very performant. sometime I think that the doctrine of "do as much as possible in SQL" can be taken too far.
Aketi Jyuuzou
I like recursive with clause B-)
with t(str,mask) as(
select '1AZJ56U7','-..-.-..' from dual union all
select '1AZJ56U8','-.-.-.-.' from dual),
rec(str,mask,LV,extStr) as(
select str,mask,0,cast(null as varchar2(20)) from t
union all
select str,mask,LV+1,
extStr || case when substr(mask,LV+1,1) = '.'
               then substr(str ,LV+1,1)
               else substr(mask,LV+1,1) end
from rec
where LV+1 <= length(mask))
select * from rec
 where LV = length(mask);

STR       mask      LV  extStr
--------  --------  --  --------
1AZJ56U7  -..-.-..   8  -AZ-5-U7
1AZJ56U8  -.-.-.-.   8  -A-J-6-8
635471
Aketi Jyuuzou wrote:
I like recursive with clause B-)
Well, maybe you should marry it then. I wouldn't blame anyone for not marrying something so ugly, though.

joke!


This looks nice ...
create or replace
function
  character_mask(
    input_string   varchar2,
    masking_string varchar2,
    nomask_char    varchar2 default '.')
return varchar2
  deterministic
  parallel_enable
is
  output_string varchar2(30);
begin
  for x in 1 .. length(input_string)
  loop
    output_string := output_string||
                            case
                              when substr(masking_string,x,1) = nomask_char
                              then substr(input_string,x,1)
                              else substr(masking_string,x,1)
                             end;
  end loop;
  return output_string;
end;
Add in some error checking (lengths of input strings, null values), remove whatever bugs I've put in there, and ...
select character_mask('1AZJ56U7','-..-.-..','.')
from   dual;
You could even try native compilation if performance is a huge concern, but i bet this would give SQL-based methods a run for their money.

I think I could see myself spending the rest of my life with a tasty bit of code like that.
Rob van Wijk
David_Aldridge wrote:
This looks nice ...
create or replace
function
character_mask(
input_string   varchar2,
masking_string varchar2,
nomask_char    varchar2 default '.')
return varchar2
deterministic
parallel_enable
is
output_string varchar2(30);
begin
for x in 1 .. length(input_string)
loop
output_string := output_string||
case
when substr(masking_string,x,1) = nomask_char
then substr(input_string,x,1)
else substr(masking_string,x,1)
end;
end loop;
return output_string;
end;
Hey, that almost looks exactly like my model clause query ... :-)

But you are right about performance (I just tested it): we shouldn't be so worried about context switches in this case. A PL/SQL function does the job faster here.

Regards,
Rob.
Hoek
Hi Rob,
Can you please try to rephrase it? Or email it to me in Dutch?
Already got it, just read Aketi's, Davids and your follow-ups after a 4 hour (normally 45 minutes) drive through a bizarre snow traffic chaos
It appears I was mixing up two totally different things.
I'll need to put some more time in understanding the MODEL-clause ;)
Thanks!


Regards,
Martijn
635471
Rob van Wijk wrote:
... we shouldn't be so worried about context switches in this case. A PL/SQL function does the job faster here.
It's sometimes forgotten that development is not all about performance. If it were then we'd never have gone beyond assembly code.

I try to use the principle that if you have to scratch your head too hard trying to work out what something does then there is probably a better way of doing it. This forum sometimes looks like an Obfuscated SQL competition -- I don't want to support obfuscated code.
1 - 14
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Jan 14 2011
Added on Dec 16 2010
14 comments
9,118 views