Forum Stats

  • 3,768,548 Users
  • 2,252,809 Discussions
  • 7,874,617 Comments

Discussions

processing string with REGEXP_LIKE

Gor_Mahia
Gor_Mahia Member Posts: 1,108 Bronze Badge
edited Oct 16, 2021 8:17AM in SQL & PL/SQL

Hi,

i am forced to ask a question related to this api i tried a couple of formats but dont seem to work.

question: iam getting various strings but i want to return only if the patter is like

starting with "KEN" then 2 digits then 4 letters,

examples: "KEN07YURW" ; "KEN14HSRW" ; "KEN20TWRQ" etc

SELECT ...FROM SUPPLIERS WHERE REGEXP_LIKE (NAME, '^KEN[:digit:]{2}[A-Z]{4}$'); 

any suggestion is appreciated.

thanks.

Best Answer

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,208 Red Diamond
    edited Oct 15, 2021 9:16PM Accepted Answer

    Hi, @Gor_Mahia

    SELECT ...FROM SUPPLIERS WHERE REGEXP_LIKE (NAME, '^KEN[:digit:]{2}[A-Z]{4}$'); 

    You're very close! Classifiers like [:digit:] only work within another set of square brackets. Try

    WHERE REGEXP_LIKE (NAME, '^KEN[[:digit:]]{2}[A-Z]{4}$'); 
    

    I assume the 4 letters at the end of the pattern must be upper case, and that name must end immediately after those four letters.

    I hope this answers your question. If not, post a little sample data (CREATE TABLE and INSERT statements) and the exact results you want from that data. Point out where the expression above is not doing what you want, and explain, in different words, what you're looking for.

    Gor_Mahia

Answers

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,208 Red Diamond
    edited Oct 15, 2021 9:16PM Accepted Answer

    Hi, @Gor_Mahia

    SELECT ...FROM SUPPLIERS WHERE REGEXP_LIKE (NAME, '^KEN[:digit:]{2}[A-Z]{4}$'); 

    You're very close! Classifiers like [:digit:] only work within another set of square brackets. Try

    WHERE REGEXP_LIKE (NAME, '^KEN[[:digit:]]{2}[A-Z]{4}$'); 
    

    I assume the 4 letters at the end of the pattern must be upper case, and that name must end immediately after those four letters.

    I hope this answers your question. If not, post a little sample data (CREATE TABLE and INSERT statements) and the exact results you want from that data. Point out where the expression above is not doing what you want, and explain, in different words, what you're looking for.

    Gor_Mahia
  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,208 Red Diamond

    Hi,

    If performance is very important, then try not to use regular expressions. You can do what you requested using less powerful functions like this:

    WHERE  SUBSTR (name, 1, 3) = 'KEN'
    AND    TRANSLATE ( SUBSTR (name, 4)
        		 , '012345678ABCDEFGHIJKLMNOPQRSTUVWXY'
        	   	 , '999999999ZZZZZZZZZZZZZZZZZZZZZZZZZ
           	         ) = '999ZZZZ'
    

    If you'd care to post some sample data, then I could test it.

    Gor_Mahia
  • mathguy
    mathguy Member Posts: 10,155 Blue Diamond

    In your regular expression, please be aware that the meaning of [A-Z] depends on the session's language characteristics (specifically, the nls_sort parameter). For example, if nls_sort='French', then lower-case letters from a to y (but not z!) are "between A and Z", because in French the order is A, a, B, b, C, c, ... , Z, z, unlike English where it is A, B, C, ..., Z, a, b, c, ... , z. The meaning of "ranges" in a regular expression is determined by the current collating sequence, as encoded in nls_sort.

    For this reason, especially if you work in a "global" environment (or a "local" but not English-centric one), it is safer to use the [:upper:] character class (wrapped within another pair of brackets, like Mr. Kulash pointed out for the other character class in your query).

    Gor_Mahia