5 Replies Latest reply: Aug 5, 2011 8:45 AM by marco RSS

    locate an occurrence of symbols combination

    marco
      Hi all,

      My need is to locate an occurrence of symbols starting from "s." (non-capital letter), following by word (with any capital letter at the beginning) and ending with ", " (comma and space symbols). It works fine, my code returns wanted symbols combination "s Ueklghje" :
      with t1 as 
      (select 'jeklghje, s.Glkgje, u.slgjwek, 904869' as tt from dual union all 
      select 'Ueklghje, s.Glkgje, u.slgjwek, 904869' as tt from dual union all 
      select 's Ueklghje, s.Glkgje, u.slgjwek, 904869' as tt from dual union all 
      select 's Uek-lghje, s.Glkgje, u.slgjwek, 904869' as tt from dual union all 
      select 'Uek88lghje, s.Glkgje, u.slgjwek, 904869' as tt from dual union all 
      select '397348, Ueklghje, s.Glkgje, u.slgjwek, 904869' as tt from dual union all 
      select 's.Tklgj, u.slgjwek, 23578, elslgjs' as tt from dual union all 
      select 's.klgj, u.ekgjes, 238573, dlsjkgj' as tt from dual)
      
      select 
      
      regexp_substr(tt,'^s [[:upper:]][[:alpha:]]*, ') 
      
      from t1 
      Now I need to find combination of symbols using the same rule as above with adding symbols like hyphen "-" and apostrophe "'".
      So in this test case combination "s Uek-lghje" should be found.

      I tried my best to google it before asking here, so I'll appreciate if you also tell me where to find detailed explanation on how regexp_substr does work.
        • 1. Re: locate an occurrence of symbols combination
          Solomon Yakobson
          with t1 as 
          (select 'jeklghje, s.Glkgje, u.slgjwek, 904869' as tt from dual union all 
          select 'Ueklghje, s.Glkgje, u.slgjwek, 904869' as tt from dual union all 
          select 's Ueklghje, s.Glkgje, u.slgjwek, 904869' as tt from dual union all 
          select 's Uek-lghje, s.Glkgje, u.slgjwek, 904869' as tt from dual union all 
          select 'Uek88lghje, s.Glkgje, u.slgjwek, 904869' as tt from dual union all 
          select '397348, Ueklghje, s.Glkgje, u.slgjwek, 904869' as tt from dual union all 
          select 's.Tklgj, u.slgjwek, 23578, elslgjs' as tt from dual union all 
          select 's.klgj, u.ekgjes, 238573, dlsjkgj' as tt from dual
          )
          select regexp_substr(tt,'^s [[:upper:]][-''[:alpha:]]*, ') 
          from t1
          /
          
          REGEXP_SUBSTR(TT,'^S[[:UPPER:]][-''[:ALPHA:]]*,
          ---------------------------------------------
          
          
          s Ueklghje,
          s Uek-lghje,
          
          
          
          
          
          8 rows selected.
          
          SQL> 
          SY.

          Edited by: Solomon Yakobson on Aug 5, 2011 8:50 AM
          • 2. Re: locate an occurrence of symbols combination
            Frank Kulash
            Hi,
            marco wrote:
            ... Now I need to find combination of symbols using the same rule as above with adding symbols like hyphen "-" and apostrophe "'".
            So in this test case combination "s Uek-lghje" should be found.
            So you want to allow a couple of additional characters, along with the alphabetic characters, between the upper-case letter and the last comma.
            Try this:
            SELECT  REGEXP_SUBSTR ( tt
                                  , '^s [[:upper:]][[:alpha:]''-]*, '
                                  )                              AS s_tt
            FROM    t1
            ;
            The last set of square brackets denotes the set of characters that can come between the upper-case letter and the last comma. Just change it from
            [[:alpha:]]
            (which means just the alphabetic characters) to
            [[:alpha:]''-]
            The apostrophe has to be doubled, because it's inside a string literal.
            The hyphen must come either at the very beginning of the set, or at the very end (where I put it); otherwise, it will be taken as indicating a range of characters.
            I tried my best to google it before asking here, so I'll appreciate if you also tell me where to find detailed explanation on how regexp_substr does work.
            Have you seen these?
            http://www.sqlsnippets.com/en/topic-10759.html
            http://www.dba-oracle.com/t_regular_expressions.htm

            Edited by: Frank Kulash on Aug 5, 2011 9:03 AM
            • 3. Re: locate an occurrence of symbols combination
              marco
              Thanks Solomon Yakobson and Frank Kulash,

              I examined links, but still need assistance.

              How do I add space character between the upper-case letter and the last comma?
              I tried
              regexp_substr(tt,'^с [[:upper:]][[:alpha:]''chr(32)-]*, ')
              but it seems doesn't work.

              Edited by: marco on 5/8/2011 6:21
              • 4. Re: locate an occurrence of symbols combination
                Frank Kulash
                marco wrote:
                Thanks Solomon Yakobson and Frank Kulash,

                I examined links, but still need assistance.

                How do I add space character between the upper-case letter and the last comma?
                You said it: add a space character
                SELECT  REGEXP_SUBSTR ( tt
                                      , '^s [[:upper:]][[:alpha:] ''-]*, '
                --                            space character:  ^
                                      )                              AS s_tt
                FROM    t1
                ;
                By default, the space character has no special meaning in regular expressions.
                I tried
                regexp_substr(tt,'^с [[:upper:]][[:alpha:]''chr(32)-]*, ')
                but it seems doesn't work.
                The characters c, h, r, 3 and 2 don't have any special meaning in regular expressions.
                The characters ( and ) don't have any special meaning inside square brackets in regular expressions, so what you posted was simply adding (, 3, 2 and ) to the list of allowable characters. (c, h and r were already in the set.)
                Perhaps you were thinking of
                regexp_substr ( tt
                           , '^с [[:upper:]][[:alpha:]''' || chr (32)
                                                        || '-]*, '
                           )
                that would work, but there's no reason to go to that trouble.
                • 5. Re: locate an occurrence of symbols combination
                  marco
                  Thanks again, it all works now.