Discussions
Categories
- 197K All Categories
- 2.5K Data
- 546 Big Data Appliance
- 1.9K Data Science
- 450.8K Databases
- 221.9K General Database Discussions
- 3.8K Java and JavaScript in the Database
- 31 Multilingual Engine
- 552 MySQL Community Space
- 479 NoSQL Database
- 7.9K Oracle Database Express Edition (XE)
- 3.1K ORDS, SODA & JSON in the Database
- 555 SQLcl
- 4K SQL Developer Data Modeler
- 187.2K SQL & PL/SQL
- 21.4K SQL Developer
- 296.3K Development
- 17 Developer Projects
- 139 Programming Languages
- 293K Development Tools
- 110 DevOps
- 3.1K QA/Testing
- 646.1K Java
- 28 Java Learning Subscription
- 37K Database Connectivity
- 158 Java Community Process
- 105 Java 25
- 22.1K Java APIs
- 138.2K Java Development Tools
- 165.3K Java EE (Java Enterprise Edition)
- 19 Java Essentials
- 162 Java 8 Questions
- 86K Java Programming
- 81 Java Puzzle Ball
- 65.1K New To Java
- 1.7K Training / Learning / Certification
- 13.8K Java HotSpot Virtual Machine
- 94.3K Java SE
- 13.8K Java Security
- 205 Java User Groups
- 24 JavaScript - Nashorn
- Programs
- 468 LiveLabs
- 39 Workshops
- 10.2K Software
- 6.7K Berkeley DB Family
- 3.5K JHeadstart
- 5.7K Other Languages
- 2.3K Chinese
- 175 Deutsche Oracle Community
- 1.1K Español
- 1.9K Japanese
- 233 Portuguese
How this regex is working ?

Hi All,
Please refer below examples -
When i am doing
1) select regexp_substr('P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW','(^|/)([^/]*)+(\s) (\S*).*') from dual;
This is the correct O/p that i am getting - LATE TO LAND OR NO SHOW
Now, when i am doing same for the below text it's giving below output.
2) select regexp_substr('H/12/JHS/DSRP/UNKOWN/UNKOWN', '(^|/)([^/]*)+(\s)(\S*).*') from dual;
O/p - null; -- The output i want is this /DSRP/UNKOWN/UNKOWN
Why the expression is not working for the 2 example as it is working for the 1 example ?
Also, Can anyone also explain this expression - '(^|/)([^/]*)+(\s) (\S*).*)
Please advise !
Thanks
Best Answers
-
You don't need regex, if the text after the last '/' is required, you can substr and instr which is considerably faster:
with data(col) as ( select 'P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW' from dual union all select 'H/12/JHS/DSRP/UNKOWN/UNKOWN' from dual union all select 'H' from dual ) select substr(col, nullif(instr(col, '/', -1), 0) + 1) reason from data / REASON ---------------------------------------- LATE TO LAND OR NO SHOW UNKOWN {null}
-
As Paul says, your requirement is a simple case for SUBSTR/INSTR and will be more performant (especially if you have a lot of data to process).
It can still be done with regular expressions if you want...
SQL> ed Wrote file afiedt.buf 1 with 2 data(col) as ( 3 select 'P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW' from dual union all 4 select 'H/12/JHS/DSRP/UNKOWN/UNKOWN' from dual union all 5 select 'H' from dual 6 ) 7 select regexp_substr(col, '[^/]+$') as res 8 ,regexp_replace(col, '.*\/(.*)$|.*', '\1') as res2 9* from data SQL> / RES RES2 ------------------------------ ------------------------------ LATE TO LAND OR NO SHOW LATE TO LAND OR NO SHOW UNKOWN UNKOWN H
In this case the regexp_substr is getting any characters that are non-"/" at the end of the string (so that includes the string that has no "/" in it at all); and the regexp_replace is getting the non-"/" characters at the end of the string, but only if it is preceded by a "/", so the string with no "/" results is null.
Answers
-
Also, Can anyone also explain this expression
Here is a helpful website that can do this for you:
-
select regexp_substr('P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW','(^|/)([^/]*)+(\s) (\S*).*') from dual ^ | I Assume this space in the expression is wrong, otherwise you'll get nothing -----/ So, let's remove that... select regexp_substr('P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW','(^|/)([^/]*)+(\s)(\S*).*') from dual which gives a result of... /LATE TO LAND OR NO SHOW Expression says... Find a string that matches... (^|/) ... start of string or "/" followed by ([^/]*)+ ... any number of non-"/" characters, 1 or more times (the "+" in this case is kind of pointless), followed by (\s) ... a whitespace, followed by (\S*) ... any number of non-whitespace characters, followed by .* ... any characters
Essentially the expression is a mess.
It'll pick out whatever string comes first that has a whitespace character in it.
You then say...
select regexp_substr('H/12/JHS/DSRP/UNKOWN/UNKOWN', '(^|/)([^/]*)+(\s)(\S*).*') from dual;
O/p - null; -- The output i want is this /DSRP/UNKOWN/UNKOWN
But your expression is still saying to find a string that starts with a "/" and is followed by 0 or more non-'"/" characters, followed by a whitespace, followed by 0 or more non-whitespace characters, followed by 0 or more of any characters. As your string has no spaces in it, then no part of the string matches that.
If you want that output then it looks like your expression should be saying "look for the last 3 occurrences of a "/" followed by non-"/" characters.
SQL> select regexp_substr('H/12/JHS/DSRP/UNKOWN/UNKOWN', '((^|/)([^/]+)){3}$') from dual; REGEXP_SUBSTR('H/12 ------------------- /DSRP/UNKOWN/UNKOWN
Pattern here is "/" followed by 1 or more non-"/" characters, all repeated 3 times finishing at the end of the string ($)
-
You gave us two examples, both the input and the output you want from each input. You didn't explain the logic that must be followed to get the "desired output" from each input, so we can't say why that particular solution works fir the first example but not for the second.
As BluShadow explained already, among other things, the regexp you used will return NULL if the input string doesn't have at least a space somewhere it it. So, if that is not in your problem description (it can't be, based on your second example), then the solution you applied is clearly wrong, even ignoring everything else.
The regexp you used is wrong in several ways; where did you find it, and why are you looking to use it? Best to ignore it and start fresh. WHAT IS THE TASK HERE? That's the most important thing you need to tell us, and so far you didn't.
-
Hi @BluShadow
Thanks for your explanation !
Also i have a requirement that we need to extract the remark from the messages, message can differ
for example - P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW - we need to extract the remark part which is LATE TO LAND OR NO SHOW
2nd instance - H/12/JHS/DSRP/UNKOWN/UNKOWN - the query should extract remark - /DSRP/UNKOWN/UNKOWN Like this we will have different message with different remarks - So according to you what regular expression will be used ?
-
Hi, @AS08
i have a requirement that we need to extract the remark from the messages, message can differ
for example - P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW - we need to extract the remark part which is LATE TO LAND OR NO SHOW
2nd instance - H/12/JHS/DSRP/UNKOWN/UNKOWN - the query should extract remark - /DSRP/UNKOWN/UNKOWN
Which part of the message is the remark? For example, how do we know that the remark in the first example is not '/KWE/08/LATE TO LAND OR NO SHOW'? How do we know that the remark in the second example is not 'UNKNOWN'? Give the general requirements, for example "I need to get everything that comes after the last '/' (not including the '/' itself) if that part includes spaces, but if that part does not include spaces, then the I want everything after the third '/' (including the third '/' itself).
-
Seeing as you seem incapable of explaining the rules to what part of your metadata is a comment and why, how do you expect us to know the answer and come up with a regex solution?
The fatuitousness of some of the questions on this forum, never ceases to amaze me.
-
Client will send us the message
for example in this message after the last / is the reason - P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW - we need to extract - /LATE TO LAND OR NO SHOW
and if the client send's us other message ' H/12/JHS/DSRP/UNKOWN/UNKOWN' then we need to extract the reason which is at the last /UNKOWN
So my question is there any regular expression which can be used to extract the reason, when the format of the messsage is - P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW and other format is H/12/JHS/DSRP/UNKOWN/UNKOWN As per my understanding is there any regex which can extract the reason from the messages , whether the reason contains spaces[LATE TO LAND OR NO SHOW] or no spaces [UNKOWN]- but still will extract the reason part ?
-
You don't need regex, if the text after the last '/' is required, you can substr and instr which is considerably faster:
with data(col) as ( select 'P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW' from dual union all select 'H/12/JHS/DSRP/UNKOWN/UNKOWN' from dual union all select 'H' from dual ) select substr(col, nullif(instr(col, '/', -1), 0) + 1) reason from data / REASON ---------------------------------------- LATE TO LAND OR NO SHOW UNKOWN {null}
-
As Paul says, your requirement is a simple case for SUBSTR/INSTR and will be more performant (especially if you have a lot of data to process).
It can still be done with regular expressions if you want...
SQL> ed Wrote file afiedt.buf 1 with 2 data(col) as ( 3 select 'P/HOO/282/KWE/08/LATE TO LAND OR NO SHOW' from dual union all 4 select 'H/12/JHS/DSRP/UNKOWN/UNKOWN' from dual union all 5 select 'H' from dual 6 ) 7 select regexp_substr(col, '[^/]+$') as res 8 ,regexp_replace(col, '.*\/(.*)$|.*', '\1') as res2 9* from data SQL> / RES RES2 ------------------------------ ------------------------------ LATE TO LAND OR NO SHOW LATE TO LAND OR NO SHOW UNKOWN UNKOWN H
In this case the regexp_substr is getting any characters that are non-"/" at the end of the string (so that includes the string that has no "/" in it at all); and the regexp_replace is getting the non-"/" characters at the end of the string, but only if it is preceded by a "/", so the string with no "/" results is null.