Discussions
Categories
- 197K All Categories
- 2.5K Data
- 546 Big Data Appliance
- 1.9K Data Science
- 450.8K Databases
- 221.9K General Database Discussions
- 3.8K Java and JavaScript in the Database
- 31 Multilingual Engine
- 552 MySQL Community Space
- 479 NoSQL Database
- 7.9K Oracle Database Express Edition (XE)
- 3.1K ORDS, SODA & JSON in the Database
- 556 SQLcl
- 4K SQL Developer Data Modeler
- 187.2K SQL & PL/SQL
- 21.4K SQL Developer
- 296.4K Development
- 17 Developer Projects
- 139 Programming Languages
- 293K Development Tools
- 110 DevOps
- 3.1K QA/Testing
- 646.1K Java
- 28 Java Learning Subscription
- 37K Database Connectivity
- 158 Java Community Process
- 105 Java 25
- 22.1K Java APIs
- 138.2K Java Development Tools
- 165.3K Java EE (Java Enterprise Edition)
- 19 Java Essentials
- 162 Java 8 Questions
- 86K Java Programming
- 81 Java Puzzle Ball
- 65.1K New To Java
- 1.7K Training / Learning / Certification
- 13.8K Java HotSpot Virtual Machine
- 94.3K Java SE
- 13.8K Java Security
- 205 Java User Groups
- 24 JavaScript - Nashorn
- Programs
- 468 LiveLabs
- 39 Workshops
- 10.2K Software
- 6.7K Berkeley DB Family
- 3.5K JHeadstart
- 5.7K Other Languages
- 2.3K Chinese
- 175 Deutsche Oracle Community
- 1.1K Español
- 1.9K Japanese
- 233 Portuguese
RETURN Nth word and Nth character from a string

I have a query that can get the Nth word, second word (separated by spaces) 'Two'. How can I get the first character of the Nth word in my example I want the 'T' from the word 'Two' also to be returned
Select REGEXP_SUBSTR('one Two three four','\w+',1,2) wordN from dual Expected output: Two T
Best Answer
-
Hi, @BeefStu
Select REGEXP_SUBSTR('one Two three four','\w+',1,2) wordN from dual
You're close! You said you want the words separated by spaces, but \w' will get words separated by other chracters (like commas) as well. '\S' (with a capital S) means any non-space character. To get the x-th character, use SUBSTR.
If table_x contains a string column called str, you can get the n-th word and the x-th character of that word like this:
WITH got_wordn AS ( SELECT REGEXP_SUBSTR ( str , '\S+' , 1 , &word_num ) AS wordn FROM table_x ) SELECT wordn , SUBSTR (wordn, &char_num, 1) AS charx FROM got_wordn ;
Of course, you don't need to use substitution variables like &word_num and &char_num. You can use any kind of expression in those places.
Answers
-
Hi, @BeefStu
Select REGEXP_SUBSTR('one Two three four','\w+',1,2) wordN from dual
You're close! You said you want the words separated by spaces, but \w' will get words separated by other chracters (like commas) as well. '\S' (with a capital S) means any non-space character. To get the x-th character, use SUBSTR.
If table_x contains a string column called str, you can get the n-th word and the x-th character of that word like this:
WITH got_wordn AS ( SELECT REGEXP_SUBSTR ( str , '\S+' , 1 , &word_num ) AS wordn FROM table_x ) SELECT wordn , SUBSTR (wordn, &char_num, 1) AS charx FROM got_wordn ;
Of course, you don't need to use substitution variables like &word_num and &char_num. You can use any kind of expression in those places.
-
@Frank Kulash perfect thanks. To satisfy my curiosity could my original test CASE be expanded to include the substr too? For example, find the Nth word and from that word find the Nth character in one statement? I just couldn't get that too work so I posted the question. Thanks for your help and expertise its greatly appreciated
-
Hi, @BeefStu
So, you want a single regular expression that returns a given character from a given word, is that right?
Here's one way:
SELECT REGEXP_SUBSTR ( str , '(\S+\s+){' || TO_CHAR (&word_num - 1) || '}.{' || TO_CHAR (&char_num - 1) || '}(.)' , 1 , 1 , NULL , 2 ) AS charx FROM table_x ;
If &word_num = 1 and &char_num = 2, then the second argument to REGEXP_SUBSTR is
'(\S+\s+){0}.{1}(.)'
-
The solution proposed by Mr. Kulash for this last request is in the correct general direction, but it has one significant mistake. After reading past the first n - 1 words (defined as non-empty substrings separated by whitespace), it starts counting characters regardless of whether they are "word" characters or whitespace. This causes incorrect results when the n'th word is shorter than m characters - the query will return the m'th character counting from the first character of the n'th word, regardless of whether that m'th character is still in the same word. It may be a whitespace character, or a word character from a different word further down (to the right) in the input string.
This is easy to fix. Instead of "dot" use \S in all places. Or, if you really meant \w for "word characters" (meaning that words may also be delimited by dash, comma, etc. - not just by whitespace), you can rewrite the whole thing using \w and \W instead of \S and \s. Something like this (using bind variables for n and m, instead of substitution variables):
select str, regexp_substr(str, '\w+', 1, :n) as wordN, regexp_substr(str, '^\W*(\w+\W+){' || to_char(:n - 1) || '}\w{' || to_char(:m - 1) || '}(\w)' , 1, 1, null, 2) as wordN_charM from (select 'one Two three four' as str from dual);
Note also that I anchored the regexp at the beginning of the input string (so I had to allow for 0 or more non-word characters at the beginning, too). If n is larger than the number of words in the string, the regexp matching starting from the beginning of the string will fail. If we don't anchor at the beginning of the string (with the ^ anchor), the regexp engine will try again, attempting to match from the second character, then from the third, ... - wasting a huge amount of time for no reason. The regexp engine doesn't use logic (like we do), to figure out that the additional attempts will also automatically fail - it will just blindly try them all.
For testing, try all combinations where m may be 1, or 2, or the word count, or GREATER THAN the word count, and n may be 1, or 2, or the length of the n'th word, or GREATER THAN the length of the n'th word. Make sure you get the correct answer in all cases (including NULL when n is too great or m is too great).