Forum Stats

  • 3,840,015 Users
  • 2,262,558 Discussions
  • 7,901,123 Comments

Discussions

How this regex is working ?

AS08
AS08 Member Posts: 19 Green Ribbon

Hi All,

Please refer below examples -

When i am doing

1) select regexp_substr('P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW','(^|/)([^/]*)+(\s) (\S*).*') from dual;

This is the correct O/p that i am getting - LATE TO LAND OR NO SHOW

Now, when i am doing same for the below text it's giving below output.

2) select regexp_substr('H/12/JHS/DSRP/UNKOWN/UNKOWN', '(^|/)([^/]*)+(\s)(\S*).*') from dual;

O/p - null; -- The output i want is this /DSRP/UNKOWN/UNKOWN


Why the expression is not working for the 2 example as it is working for the 1 example ?

Also, Can anyone also explain this expression - '(^|/)([^/]*)+(\s) (\S*).*)



Please advise !


Thanks

Best Answers

  • Paulzip
    Paulzip Member Posts: 8,723 Blue Diamond
    Answer ✓

    You don't need regex, if the text after the last '/' is required, you can substr and instr which is considerably faster:

    with 
      data(col) as (
        select 'P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW' from dual union all
        select 'H/12/JHS/DSRP/UNKOWN/UNKOWN' from dual union all
        select 'H' from dual
      )
    select substr(col, nullif(instr(col, '/', -1), 0) + 1) reason
    from data
    /
    
    
    REASON
    ----------------------------------------
    LATE TO LAND OR NO SHOW
    UNKOWN
    {null}
    
  • BluShadow
    BluShadow Member, Moderator Posts: 42,159 Red Diamond
    Answer ✓

    As Paul says, your requirement is a simple case for SUBSTR/INSTR and will be more performant (especially if you have a lot of data to process).

    It can still be done with regular expressions if you want...

    SQL> ed
    Wrote file afiedt.buf
    
      1  with
      2    data(col) as (
      3      select 'P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW' from dual union all
      4      select 'H/12/JHS/DSRP/UNKOWN/UNKOWN' from dual union all
      5      select 'H' from dual
      6    )
      7  select regexp_substr(col, '[^/]+$') as res
      8        ,regexp_replace(col, '.*\/(.*)$|.*', '\1') as res2
      9* from   data
    SQL> /
    
    RES                            RES2
    ------------------------------ ------------------------------
    LATE TO LAND OR NO SHOW        LATE TO LAND OR NO SHOW
    UNKOWN                         UNKOWN
    H
    

    In this case the regexp_substr is getting any characters that are non-"/" at the end of the string (so that includes the string that has no "/" in it at all); and the regexp_replace is getting the non-"/" characters at the end of the string, but only if it is preceded by a "/", so the string with no "/" results is null.

Answers

  • cormaco
    cormaco Member Posts: 1,961 Silver Crown

    Also, Can anyone also explain this expression

    Here is a helpful website that can do this for you:

    regex101: build, test, and debug regex

  • BluShadow
    BluShadow Member, Moderator Posts: 42,159 Red Diamond
    select regexp_substr('P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW','(^|/)([^/]*)+(\s) (\S*).*') from dual
                                                                                      ^
                                                                                      |
    I Assume this space in the expression is wrong, otherwise you'll get nothing -----/
    So, let's remove that...
    
    select regexp_substr('P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW','(^|/)([^/]*)+(\s)(\S*).*') from dual
    
    which gives a result of...
    
    /LATE TO LAND OR NO SHOW
    
    Expression says...
    
    Find a string that matches...
    (^|/)    ... start of string or "/" followed by
    ([^/]*)+ ... any number of non-"/" characters, 1 or more times (the "+" in this case is kind of pointless), followed by
    (\s)     ... a whitespace, followed by
    (\S*)    ... any number of non-whitespace characters, followed by
    .*       ... any characters
    
    
    


    Essentially the expression is a mess.

    It'll pick out whatever string comes first that has a whitespace character in it.


    You then say...

    select regexp_substr('H/12/JHS/DSRP/UNKOWN/UNKOWN', '(^|/)([^/]*)+(\s)(\S*).*') from dual;


    O/p - null; -- The output i want is this /DSRP/UNKOWN/UNKOWN


    But your expression is still saying to find a string that starts with a "/" and is followed by 0 or more non-'"/" characters, followed by a whitespace, followed by 0 or more non-whitespace characters, followed by 0 or more of any characters. As your string has no spaces in it, then no part of the string matches that.

    If you want that output then it looks like your expression should be saying "look for the last 3 occurrences of a "/" followed by non-"/" characters.

    SQL> select regexp_substr('H/12/JHS/DSRP/UNKOWN/UNKOWN', '((^|/)([^/]+)){3}$') from dual;
    
    REGEXP_SUBSTR('H/12
    -------------------
    /DSRP/UNKOWN/UNKOWN
    


    Pattern here is "/" followed by 1 or more non-"/" characters, all repeated 3 times finishing at the end of the string ($)

  • mathguy
    mathguy Member Posts: 10,604 Blue Diamond

    You gave us two examples, both the input and the output you want from each input. You didn't explain the logic that must be followed to get the "desired output" from each input, so we can't say why that particular solution works fir the first example but not for the second.

    As BluShadow explained already, among other things, the regexp you used will return NULL if the input string doesn't have at least a space somewhere it it. So, if that is not in your problem description (it can't be, based on your second example), then the solution you applied is clearly wrong, even ignoring everything else.

    The regexp you used is wrong in several ways; where did you find it, and why are you looking to use it? Best to ignore it and start fresh. WHAT IS THE TASK HERE? That's the most important thing you need to tell us, and so far you didn't.

  • AS08
    AS08 Member Posts: 19 Green Ribbon

    Hi @BluShadow

    Thanks for your explanation !

    Also i have a requirement that we need to extract the remark from the messages, message can differ

    for example - P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW - we need to extract the remark part which is LATE TO LAND OR NO SHOW

    2nd instance - H/12/JHS/DSRP/UNKOWN/UNKOWN - the query should extract remark - /DSRP/UNKOWN/UNKOWN 
    
    Like this we will have different message with different remarks - So according to you what regular expression will be used ?
    
  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 42,265 Red Diamond

    Hi, @AS08

    i have a requirement that we need to extract the remark from the messages, message can differ

    for example - P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW - we need to extract the remark part which is LATE TO LAND OR NO SHOW

    2nd instance - H/12/JHS/DSRP/UNKOWN/UNKOWN - the query should extract remark - /DSRP/UNKOWN/UNKOWN

    Which part of the message is the remark? For example, how do we know that the remark in the first example is not '/KWE/08/LATE TO LAND OR NO SHOW'? How do we know that the remark in the second example is not 'UNKNOWN'? Give the general requirements, for example "I need to get everything that comes after the last '/' (not including the '/' itself) if that part includes spaces, but if that part does not include spaces, then the I want everything after the third '/' (including the third '/' itself).

  • Paulzip
    Paulzip Member Posts: 8,723 Blue Diamond
    edited May 24, 2022 6:48PM

    @AS08

    Seeing as you seem incapable of explaining the rules to what part of your metadata is a comment and why, how do you expect us to know the answer and come up with a regex solution?

    The fatuitousness of some of the questions on this forum, never ceases to amaze me.

  • AS08
    AS08 Member Posts: 19 Green Ribbon

    Hi @Frank Kulash

    Client will send us the message

    for example in this message after the last / is the reason - P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW - we need to extract - /LATE TO LAND OR NO SHOW

    and if the client send's us other message ' H/12/JHS/DSRP/UNKOWN/UNKOWN' then we need to extract the reason which is at the last  /UNKOWN
    


    So my question is there any regular expression which can be used to extract the reason, when the format of the messsage is -  P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW  and other format is H/12/JHS/DSRP/UNKOWN/UNKOWN
    As per my understanding is there any regex which can extract the reason from the messages , whether the reason contains spaces[LATE TO LAND OR NO SHOW] or no spaces [UNKOWN]- but still will extract the reason part ? 
    


  • Paulzip
    Paulzip Member Posts: 8,723 Blue Diamond
    Answer ✓

    You don't need regex, if the text after the last '/' is required, you can substr and instr which is considerably faster:

    with 
      data(col) as (
        select 'P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW' from dual union all
        select 'H/12/JHS/DSRP/UNKOWN/UNKOWN' from dual union all
        select 'H' from dual
      )
    select substr(col, nullif(instr(col, '/', -1), 0) + 1) reason
    from data
    /
    
    
    REASON
    ----------------------------------------
    LATE TO LAND OR NO SHOW
    UNKOWN
    {null}
    
  • BluShadow
    BluShadow Member, Moderator Posts: 42,159 Red Diamond
    Answer ✓

    As Paul says, your requirement is a simple case for SUBSTR/INSTR and will be more performant (especially if you have a lot of data to process).

    It can still be done with regular expressions if you want...

    SQL> ed
    Wrote file afiedt.buf
    
      1  with
      2    data(col) as (
      3      select 'P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW' from dual union all
      4      select 'H/12/JHS/DSRP/UNKOWN/UNKOWN' from dual union all
      5      select 'H' from dual
      6    )
      7  select regexp_substr(col, '[^/]+$') as res
      8        ,regexp_replace(col, '.*\/(.*)$|.*', '\1') as res2
      9* from   data
    SQL> /
    
    RES                            RES2
    ------------------------------ ------------------------------
    LATE TO LAND OR NO SHOW        LATE TO LAND OR NO SHOW
    UNKOWN                         UNKOWN
    H
    

    In this case the regexp_substr is getting any characters that are non-"/" at the end of the string (so that includes the string that has no "/" in it at all); and the regexp_replace is getting the non-"/" characters at the end of the string, but only if it is preceded by a "/", so the string with no "/" results is null.