Forum Stats

  • 3,759,022 Users
  • 2,251,493 Discussions
  • 7,870,471 Comments

Discussions

Is there a way to convert a function that uses recursive regexp into "plain" SQL?

JonWat
JonWat Member Posts: 536 Silver Badge

I have a function that selects out the first words in a given string if they are all caps (plus a couple of punctuation marks). I now need to do the same thing in an environment where I cannot use functions, and where CTE functions cannot be used (a Direct Database Request in OBIEE 11.1.1.7.).

Below is the contents of the function which works fine given a single row of input, but which cannot handle a set of data (i.e. the two rows in the full testdata). Is there any way to get the same result using just SQL?

Thanks.


with testdata as
 --(select 'END ORGNAME Begin address' p_address from dual union all
 (select 'NEWFOUNDLAND SOCIETY INC 24 High Street' p_address from dual)
select listagg(apart,' ') within group (order by lv) trans_string,length(listagg(apart,' ') within group (order by lv)) trans_string_length
 from 
  (SELECT regexp_substr(p_address,'[^ ]+',1,level) apart,level lv
   from testdata
   connect by level <= REGEXP_COUNT (p_address,'[^ ]+')
  )t  
 where t.lv <= 
      ( select nvl(min(case when isvalid = 0 then lv end) - 1, count(*)) 
        from
          (select lv,ispo,case when not (ispo = 1 or rv= 0) then 1 else 0 end isvalid
           from
             (select lv,apart,regexp_instr(apart,'^[A-Z&,.]*$') rv,case when apart in ('PO','P.O.') then 1 else 0 end ispo
              from 
                (SELECT regexp_substr(p_address,'[^ ]+',1,level) apart,level lv
                 from testdata
                 connect by level <= REGEXP_COUNT (p_address,'[^ ]+')
             )
           )   
        ));

Thanks in advance.

Tagged:

Best Answer

  • Solomon Yakobson
    Solomon Yakobson Member Posts: 18,739 Black Diamond
    edited Jun 11, 2021 10:42AM Accepted Answer

    There is no need to split it into words and then assemble it back:

    with testdata as (
                      select 'END ORGNAME Begin address' p_address from dual union all
                      select 'NEWFOUNDLAND SOCIETY INC 24 High Street' p_address from dual
                     )
    select  p_address,
            regexp_replace(
                           rtrim(
                                 regexp_substr(
                                               regexp_substr(
                                                             p_address,
                                                             '(^.*)(P\.?O\.?)?(( +.*)|$)',
                                                             1,
                                                             1,
                                                             null,
                                                             1
                                                            ),
                                               '^ *(([A-Z&,.]+( +|$))+)',
                                               1,
                                               1,
                                               null,
                                               1
                                              )
                                ),
                           ' +',
                           ' '
                          ) trans_string,
            length(
                   regexp_replace(
                                  rtrim(
                                        regexp_substr(
                                                      regexp_substr(
                                                                    p_address,
                                                                    '(^.*)(P\.?O\.?)?(( +.*)|$)',
                                                                    1,
                                                                    1,
                                                                    null,
                                                                    1
                                                                   ),
                                                      '^ *(([A-Z&,.]+( +|$))+)',
                                                      1,
                                                      1,
                                                      null,
                                                      1
                                                     )
                                       ),
                                  ' +',
                                  ' '
                                 )
                  ) trans_string_length
      from  testdata
    /
    
    P_ADDRESS                               TRANS_STRING             TRANS_STRING_LENGTH
    --------------------------------------- ------------------------ -------------------
    END ORGNAME Begin address               END ORGNAME                               11
    NEWFOUNDLAND SOCIETY INC 24 High Street NEWFOUNDLAND SOCIETY INC                  24
    
    SQL>
    

    And you can remove REGEXP_REPLACE if words are separated by a single space and can't be separated by multiple spaces.

    SY.

Answers

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,060 Red Diamond
    edited Jun 11, 2021 12:47AM

    Hi, @JonWat

    Whenever you have a problem, please post the exact results you want from the given sample data, and a complete explanation of how you get those results from that sample data. Always say which version of Oracle you're using (e.g. 12.2.0.1.0).

    I have a function that selects out the first words in a given string if they are all caps (plus a couple of punctuation marks). 

    Let me make sure I understand what you want. You want to get the longest possible substring of P-address that

    • starts at the beginning of p_address
    • consists of whole words only
    • includes only upper-case letters, the spaces between the words, and a couple of punctuation marks

    Is that right?

    What are those special punctuation marks? (It looks like there are three of them: '&,'', that is, ampersand, comma and dot.) Are there any limitations on how they can be used (e.g. "Dots are only allowed in the word 'P.0.').

  • Solomon Yakobson
    Solomon Yakobson Member Posts: 18,739 Black Diamond
    edited Jun 11, 2021 10:42AM Accepted Answer

    There is no need to split it into words and then assemble it back:

    with testdata as (
                      select 'END ORGNAME Begin address' p_address from dual union all
                      select 'NEWFOUNDLAND SOCIETY INC 24 High Street' p_address from dual
                     )
    select  p_address,
            regexp_replace(
                           rtrim(
                                 regexp_substr(
                                               regexp_substr(
                                                             p_address,
                                                             '(^.*)(P\.?O\.?)?(( +.*)|$)',
                                                             1,
                                                             1,
                                                             null,
                                                             1
                                                            ),
                                               '^ *(([A-Z&,.]+( +|$))+)',
                                               1,
                                               1,
                                               null,
                                               1
                                              )
                                ),
                           ' +',
                           ' '
                          ) trans_string,
            length(
                   regexp_replace(
                                  rtrim(
                                        regexp_substr(
                                                      regexp_substr(
                                                                    p_address,
                                                                    '(^.*)(P\.?O\.?)?(( +.*)|$)',
                                                                    1,
                                                                    1,
                                                                    null,
                                                                    1
                                                                   ),
                                                      '^ *(([A-Z&,.]+( +|$))+)',
                                                      1,
                                                      1,
                                                      null,
                                                      1
                                                     )
                                       ),
                                  ' +',
                                  ' '
                                 )
                  ) trans_string_length
      from  testdata
    /
    
    P_ADDRESS                               TRANS_STRING             TRANS_STRING_LENGTH
    --------------------------------------- ------------------------ -------------------
    END ORGNAME Begin address               END ORGNAME                               11
    NEWFOUNDLAND SOCIETY INC 24 High Street NEWFOUNDLAND SOCIETY INC                  24
    
    SQL>
    

    And you can remove REGEXP_REPLACE if words are separated by a single space and can't be separated by multiple spaces.

    SY.

  • Frank Kulash
    Frank Kulash Member, Moderator Posts: 41,060 Red Diamond
    edited Jun 11, 2021 2:01AM

    Hi, @JonWat

    Below is the contents of the function which works fine given a single row of input, but which cannot handle a set of data (i.e. the two rows in the full testdata). 

    Why can't it handle more than one row? Is it because of how you're using CONNECT BY?

    it's true that when using CONNECT BY, then you (normally) must either

    • use PRIOR in the CONNECT BY clause, or
    • use a table with only one row (such as dual)

    If you want to use CONNECT BY to split p_address into words, and you might have more than one row in testdata, then you do it like this:

    WITH  got_apart  AS
    (
    	SELECT  p_address
    	,	REGEXP_SUBSTR (t.p_address, '[^ ]+', 1, c.lv) AS apart
    	,	c.lv
    	FROM	testdata t
    	CROSS APPLY (
    			SELECT LEVEL AS lv
    			FROM	dual
    			CONNECT BY LEVEL <= REGEXP_COUNT (t.p_address, '[^ ]+')
    	   	    )	 c
    )
    SELECT   *
    FROM	 got_apart
    ORDER BY p_address, lv
    ;
    

    Notice that the query above uses CONNECT BY on a table with only one row, namely dual. There is also another way to do it, using PRIOR.

    However, as Solomon said, you don't need to split p_address into words in this problem, and it's simpler if you don't.

  • JonWat
    JonWat Member Posts: 536 Silver Badge

    Thanks Frank and Solomon,

    Looks like Solomon's query will do what I need. I'm not sure whether CROSS APPLY is available; I suspect that only Oracle 11 syntax is supported in the OBIEE version we have.

    The "PO" is an exception. The only real distinguishing feature to know whether the text at the beginning of the address line really should belong at the end of the organization name is that it is all caps, because only caps and punctuation are allowed in the organization name. The addresses are mixed case, but a common first part of the address is "PO Box" or 'P.O. Box' and that addition is to stop the "PO" being added to the end of the organization name.