Forum Stats

  • 3,734,034 Users
  • 2,246,862 Discussions
  • 7,857,003 Comments

Discussions

Regexp_substr unexpected behaviour

Nicolette
Nicolette Member Posts: 526 Silver Badge
edited November 2007 in SQL & PL/SQL
While trying to come up with a constraint to check for correct RGB color codes
(RvalueGvalueBvalue) where the value is between 0 and 255
I saw the following:

database used: Oracle Database 10g Express Edition Release 10.2.0.1.0
CREATE TABLE test
(color varchar2(12)
,status varchar2(1)
)

insert into test
values ('R2G2B2','G')

insert into test
values ('R22G22B22','G')

insert into test
values ('R222G222B222','G')

insert into test
values ('R300G2B2','F')

insert into test
values ('R300G256B2','F')

insert into test
values ('R300G256B256','F')

select t.status
, t.color
, REGEXP_SUBSTR(t.color,'^R(\d{1,2}|1\d{2}|2[0-4]\d{1}|25[0-5])') red
, REGEXP_SUBSTR(t.color,'G(\d{1,2}|1\d{2}|2[0-4]\d{1}|25[0-5])') green
, REGEXP_SUBSTR(t.color,'B(\d{1,2}|1\d{2}|2[0-4]\d{1}|25[0-5])$') blue
, REGEXP_SUBSTR(t.color,'^R(\d{1,2}|1\d{2}|2[0-4]\d{1}|25[0-5])G(\d{1,2}|1\d{2}|2[0-4]\d{1}|25[0-5])B(\d{1,2}|1\d{2}|2[0-4]\d{1}|25[0-5])$','1','1','i') total
from test t
order by red
output
status   color        red    green    blue  total
G R2G2B2 R2 G2 B2 R2G2B2
G R22G22B22 R22 G22 B22 R22G22B22
G R222G222B222 R22 G22 B222 R222G222B222
F R300G2B2 R30 G2 B2
F R300G300B2 R30 G30 B2
F R300G300B300 R30 G30
Expected output
status   color        red    green    blue  total
G R2G2B2 R2 G2 B2 R2G2B2
G R22G22B22 R22 G22 B22 R22G22B22
G R222G222B222 R222 G222 B222 R222G222B222
F R300G2B2 G2 B2
F R300G300B2 B2
F R300G300B300
Both the total and blue colums have the output as expected
but the red and the green give a output with a maximum of 2 digits instead of 3
even when the string should not match (300 > 255).

I have read the information about regular expressions in the Application Developer's Guide - Fundamentals and SQL Reference but neither explain above behaviour

Can someone explain why this is happening?
Or give a hint of how to correct the select statement in such a way that results are like the expected output?

Comments

  • Rob van Wijk
    Rob van Wijk Member Posts: 5,856 Bronze Trophy
    edited November 2007
    Why go through all this trouble, where you have a design flaw?

    The color should be represented by three numeric columns instead of 1. Add 3 simple check constraints like "check (red between 0 and 255)" and you are done!

    Regards,
    Rob.
  • Nicolette
    Nicolette Member Posts: 526 Silver Badge
    The constraint of the complete RGB is not the problem as is shown in the total column.

    The question remains why
    does the regular expression for the red return a maximum of 2 digits instead of 3
    and the expression for the blue returns a maximum of 3 like I expected.

    The digit part of of the expression is the same thanks to copy paste
    just the letters and the anchoring are different.

    Then why the difference in the maximum number of digits.

    This is for my understanding of regular expression.
    The color example is just where I found something I didn't expected and I want to understand how this comes.

    Nicolette
  • 576633
    576633 Member Posts: 235
    It has to do with the ordering of your alternations, I think.. basically, it reads left -> right , finds the shorter pattern and we're done, that's it :-)

    Try (untested) : REGEXP_SUBSTR (color, 'R(25[0-5]|2[0-4]\d{1}|1\d{2}|\d{1,2})') red,

    or something like that.

    a
  • Aketi Jyuuzou
    Aketi Jyuuzou Member Posts: 1,072 Bronze Badge
    This is interesting question.
    col red   for a10
    col green for a10
    col blue for a10
    col total for a20
    set null null
    select status,color,
    rtrim(red,'G') as red,
    rtrim(green,'B') as green,
    blue,
    case when greatest(red,green,blue) is not null
    then rtrim(red,'G') || rtrim(green,'B') || blue end as total
    from (select status,color,
    RegExp_Substr(color,'R([0-9]{1,2}|[0-1][0-9]{2}|2[0-4][0-9]|25[0-5])G') as red,
    RegExp_Substr(color,'G([0-9]{1,2}|[0-1][0-9]{2}|2[0-4][0-9]|25[0-5])B') as green,
    RegExp_Substr(color,'B([0-9]{1,2}|[0-1][0-9]{2}|2[0-4][0-9]|25[0-5])$') as blue
    from test);
    S  COLOR         RED   GREEN  BLUE  TOTAL
    - ------------ ---- ----- ---- ------------
    G R2G2B2 R2 G2 B2 R2G2B2
    G R22G22B22 R22 G22 B22 R22G22B22
    G R222G222B222 R222 G222 B222 R222G222B222
    F R300G2B2 null G2 B2 null
    F R300G256B2 null null B2 null
    F R300G256B256 null null null null
    We can use below solution,too.
    However, I think that we should use above solution.
    In other words , "total" should be derived using "red" and "green" and "blue".
    select status,color,
    rtrim(red,'G') as red,
    rtrim(green,'B') as green,
    blue,total
    from (select status,color,
    RegExp_Substr(color,'R([0-9]{1,2}|[0-1][0-9]{2}|2[0-4][0-9]|25[0-5])G') as red,
    RegExp_Substr(color,'G([0-9]{1,2}|[0-1][0-9]{2}|2[0-4][0-9]|25[0-5])B') as green,
    RegExp_Substr(color,'B([0-9]{1,2}|[0-1][0-9]{2}|2[0-4][0-9]|25[0-5])$') as blue,
    RegExp_Substr(color,'R([0-9]{1,2}|[0-1][0-9]{2}|2[0-4][0-9]|25[0-5])'
    || 'G([0-9]{1,2}|[0-1][0-9]{2}|2[0-4][0-9]|25[0-5])'
    || 'B([0-9]{1,2}|[0-1][0-9]{2}|2[0-4][0-9]|25[0-5])$') as total
    from test);
  • Sven W.
    Sven W. Member Posts: 10,511 Gold Crown
    edited November 2007
    REGEXP_SUBSTR(t.color,'B(\d{1,2}|1\d{2}|2[0-4]\d{1}|25[0-5])$') blue
    This expression is returning everything starting with B up until the end of the line ($).
    If you add 20 'x' to your string it will return them also. Of cause the first chars after the "B" must match the REGEXP.

    R222G222B222xxxxxxxxxxxxxxxxxxxx => B222xxxxxxxxxxxxxxxxxxxx

    I didn't test it, but this is as I understand the REGEXP.
    The expressions for Red and Green do not return this, because they do not include any variable part.
  • Nicolette
    Nicolette Member Posts: 526 Silver Badge
    First thanks everybody for their response.

    I was wrong in assuming that R300G0B0 with above mentioned regular expression for red should return null instate of R30 because 30 is on the range 0-255 I didn't say that the expression should take all the digits into account.
    Akeeti Jyuuzou's solutions is doing just that with anchoring the red to the green and the green to the blue and the blue to the end of the line. In the blow example I have anchored the red to any non digit character.
    SQL> select t.kleur color
    2 , REGEXP_SUBSTR(t.kleur,'^R(25[0-5]|2[0-4]\d{1}|1\d{2}|\d{1,2})\D') red
    3 , REGEXP_SUBSTR(t.kleur,'G(\d{1,2}|1\d{2}|2[0-4]\d{1}|25[0-5])') green
    4 , REGEXP_SUBSTR(t.kleur,'B(\d{1,2}|1\d{2}|2[0-4]\d{1}|25[0-5])$') blue
    5 , REGEXP_SUBSTR(t.kleur,'^R(\d{1,2}|1\d{2}|2[0-4]\d{1}|25[0-5])G(\d{1,2}|1\d{2}|2[0-4]\d{1}|25[0-5])B(\d{1,2}|1\d{2}|2[0-4]\d{1}|25[0-5])$','1','1','i') total
    6 from test t
    7 /

    COLOR RED GREEN BLUE TOTAL
    --------------------------------------------------------------------------------
    R300G0B300 G0
    R0G0B20X R0G G0

    SQL
    Slow moe is indeed right that the alterations are read form left to right and is stopped by the first correct pattern as is shown below.

    In below example I have turned around the alterations for the red but left the green to see the difference
    SQL> select t.kleur color
    2 , REGEXP_SUBSTR(t.kleur,'^R(25[012345]|2[01234]\d{1}|1\d{2}|\d{1,2})') red
    3 , REGEXP_SUBSTR(t.kleur,'G(\d{1,2}|1\d{2}|2[01234]\d{1}|25[012345])') green
    4 , REGEXP_SUBSTR(t.kleur,'B(\d{1,2}|1\d{2}|2[01234]\d{1}|25[012345])$') blue
    5 , REGEXP_SUBSTR(t.kleur,'^R(\d{1,2}|1\d{2}|2[01234]\d{1}|25[012345])G(\d{1,2}|1\d{2}|2[01234]\d{1}|25[012345])B(\d{1,2}|1\d{2}|2[01234]\d{1}|25[012345])$','1','1','i') total
    6 from test t
    7 order by id
    8 /

    COLOR RED GREEN BLUE TOTAL
    -------------------------------------------------------------------------------------
    R222G222B222 R222 G22 B222 R222G222B222
This discussion has been closed.