This site is currently read-only as we are migrating to Oracle Forums for an improved community experience. You will not be able to initiate activity until January 30th, when you will be able to use this site as normal.

    Forum Stats

  • 3,889,900 Users
  • 2,269,775 Discussions
  • 7,916,823 Comments

Discussions

Extract text from column between two words

User_Z3XHF
User_Z3XHF Member Posts: 8 Green Ribbon

Hello. I need to extract text from table in column with where clausule. I found solution in T-SQL but i dont know how to translate for SQL Oracle


CREATE FUNCTION dbo.str_between(@col varchar(max), @start varchar(50), @end varchar(50))  
  RETURNS varchar(max)  
  WITH EXECUTE AS CALLER  
AS  
BEGIN  
  RETURN substring(@col, charindex(@start, @col) + len(@start), 
         isnull(nullif(charindex(@end, stuff(@col, 1, charindex(@start, @col)-1, '')),0),
         len(stuff(@col, 1, charindex(@start, @col)-1, ''))+1) - len(@start)-1);
END;  

its look like this but need to declare a text to search 

DECLARE @a VARCHAR(MAX) = 'All I knew was that the dog had been very bad and required harsh punishment immediately regardless of what anyone else thought.'
SELECT dbo.str_between(@a, 'the dog', 'immediately')
-- Yields' had been very bad and required harsh punishment '

how to translate to SQL orcl and extract data from table with column 'examplecolum' not delcared varchar like above and add a where clausule to it its possible?


Best Answer

  • mathguy
    mathguy Member Posts: 11,041 Black Diamond

    Sinking deeper and deeper...

    You don't seem too familiar with XML. Perhaps learning about it a bit more, rather than jumping straight to production problems, would help.

    There are several trivial mistakes in what you posted, but also a more serious one, which suggests that you can't take something proposed in one context and adapt it (in a very simple way) to a different setup. Which means, at the very least, that you should stop making up fictitious examples, and instead show us the exact structure you are working with.

    First let me start with the trivial mistakes; but the bigger point is later, when I talk about the bigger one.

    Already pointed out: for some reason, the INSERT statement you posted uses calligraphic double-quotes, not straight ones, in a few places (but, oddly, not everywhere).

    Let's say you fix that - it's easy. Then, also pointed out: you reference a namespace, wydawca, which is not defined in the document. Let's say you remove that reference (or even, the entire node where it appears), since your example query doesn't have anything to do with that element anyway.

    Let's note also that in this new test case, you created the "xml" column as VARCHAR2. You said your real-life case has NVARCHAR2. Best to use the same data type in the test case; you don't want to end up with a solution that works for VARCHAR2 but not for NVARCHAR2. In this case it won't make a difference; but as a general principle, test on data as close to your real-life one as possible.

    Then the table name: you call it TABELAXML in the CREATE TABLE and INSERT statements; but in the query you call it XMLTABELA. You say this produces the error message about the double-quotes. Not possible - unless you also have another table called XMLTABELA (not TABELAXML), not the one you included in your reply (with CREATE TABLE and INSERT). Or are you changing the query before posting it here? If so, that's bad too; copy and paste exactly what you are running, without any changes.

    As a general matter - create a "sandbox" (a play account/user/schema in your development database) in which you don't keep anything. Whatever you want to post here or on similar sites, first test in that sandbox - CREATE TABLE, INSERT, and the query. You would immediately find the "table or view does not exist" error, which is the first error I found when I used your statements (before the next error, after fixing this one, which was the error about enclosing attribute values in double-quotes).

    OK, now to the serious part. Let's say we fixed all the errors so far. But then, in your query, you give the following XQuery string as argument: '/kurs-java/autor'

    Please inspect your own very simple XML document. The top-level element is kurs-java, fine. Does it have a child tagged autor? I don't see it. I see artykul, and below that, autor. Did you mean to write '/kurs-java//autor'? I did mention something like that in my first answer - there are ways to go around not knowing the exact structure of your document. However, if you DO know the structure, don't take shotcuts: don't use //. Instead, search for '/kurs-java/artykul/autor'.

    To me, this last mistake is not trivial. A very quick look at your XML document and the XQuery string you used should tell you something is not going to work, even without actually running the query.

    Anyway - make these changes, try the corrected query, and then let us know if you have further questions.

«1

Answers

  • BluShadow
    BluShadow Mr UKMember, Moderator Posts: 42,706 Red Diamond

    You could use regular expressions...

    SQL> with t(txt, strfrom, strto) as (
      2    select 'All I knew was that the dog had been very bad and required harsh punishment immediately regardless of what anyone else thought.'
      3          ,'the dog'
      4          ,'immediately'
      5    from   dual
      6    )
      7  --
      8  -- end of test data
      9  --
     10  select regexp_replace(txt, '.*'||strfrom||'(.*)'||strto||'.*', '\1') as result
     11  from   t
     12  /
    
    RESULT
    -------------------------------------------------
     had been very bad and required harsh punishment
    
    

    Or, normal string functions (which may be quicker if processing large volumes of data)...

    SQL> with t(txt, strfrom, strto) as (
      2    select 'All I knew was that the dog had been very bad and required harsh punishment immediately regardless of what anyone else thought.'
      3          ,'the dog'
      4          ,'immediately'
      5    from   dual
      6    )
      7  --
      8  -- end of test data
      9  --
     10  select substr(txt, instr(txt,strfrom)+length(strfrom), instr(txt,strto)-(instr(txt,strfrom)+length(strfrom))) as result
     11  from   t
     12  /
    
    RESULT
    -------------------------------------------------
     had been very bad and required harsh punishment
    


    User_Z3XHF
  • User_Z3XHF
    User_Z3XHF Member Posts: 8 Green Ribbon

    Thanks for the answer but i dont need text from above its just example i meaning

    something like this


    with t(col, strfrom, strto) as (

    select dlugi_tekst

           ,'<artykul>'

           ,'</artykul>'

    from  xmltabela

    )

    but it doesnt work dlugi_tekst is name of column and xmltabela is name of table.

    Also i thinking about function when i -select first_word,second word from function where tabelaxml like '%txt%' and its show text between. Is this possible?

  • Frank Kulash
    Frank Kulash Boston, USAMember, Moderator Posts: 43,002 Red Diamond
    edited Sep 13, 2022 9:11PM

    Hi, @User_Z3XHF

    Sorry, it's not clear what you want.

    Whenever you have a question, please post a little sample data (CREATE TABLE and INSERT statements for all tables involved, relevant columns only) so the people who want to help you can re-create the problem and test their ideas. Also post the exact results you want from that data, and explain why you want those results from that data. Include examples of any special cases you need to handle, such as multiple pairs of starting and ending words in the same string. Always post your complete Oracle version (e.g. 18.4.0.0.0).

  • mathguy
    mathguy Member Posts: 11,041 Black Diamond

    The "solution" you found in T-SQL is completely unprofessional (at least as it would apply to translating it to Oracle).

    The string you must work on is not any kind of string - it seems to be part of an XML document. I don't know what T-SQL is (nor do I want to know), and I don't know what capabilities it has with regard to XML; but in Oracle there has been extensive support since at least five or six versions ago, and you should use XML tools on XML documents if you don't want things to crash or blow up at some point.

    First, your starting point should be a column of XMLType data type - not any kind of string. (However, if it is just CLOB or similar, you can do the same thing I show below - use the XMLType constructor applied to your string column first, very similar to how I use it in the INSERT statements). Below I show how to create a table with a proper XMLType column, how to populate data in it, and then how to extract the text you need. (In one case the node has empty content - the extracted text will be null.)


    Table creation and inserts:

    create table xmltabela(id number primary key, slugi_tekst xmltype);
    
    insert into xmltabela(id, slugi_tekst) values(2038,
      xmltype('<note>
                 <to>You</to>
                 <from>Mathguy</from>
                 <about/>
                 <artykul>Text to extract goes here</artykul>
               </note>')
              );
              
    insert into xmltabela(id, slugi_tekst) values(2349,
      xmltype('<note>
                 <to>All</to>
                 <from>Oracle</from>
                 <about>Marketing</about>
                 <artykul/>
               </note>')
              );
    
    commit;
    


    Extracting text from the <artykul> node as you requested:

    select id, artykul
    from   xmltabela,
           xmltable('/note/artykul' 
                    passing xmltabela.slugi_tekst
                      columns
                        artykul varchar2(100) path '/')
    ;
    
       ID ARTYKUL                       
    ----- ------------------------------
     2038 Text to extract goes here     
     2349      
    


    In the above query, if your column data type is CLOB or VARCHAR2 instead of XMLTYPE, you need to replace the PASSING clause with

    passing xmltype(xmltabela.slugi_tekst)
    


    If you don't know where the <artykul> node is in the document, first, that's bad (the structure of the document should be known to you without having to see the actual data); second, you can find out - just inspect the document before writing a query against it; and third, if all else fails, you can use wildcards in the path - but expect this to cause slow processing on long documents, or on a large number of input rows.

    In no case do you need to extract text from an XML document using plain-text string functions.

    User_Z3XHF
  • User_Z3XHF
    User_Z3XHF Member Posts: 8 Green Ribbon

    Thanks for the answer. But the catch is that it's not xmltype but nvarchar, someone doing wrong or for the reason, i Got personal info off ppl in this nvarchar column which is xmlcode and need to extract some data

  • mathguy
    mathguy Member Posts: 11,041 Black Diamond

    I already explained in my answer the minor change you need to make if the column is of some string type.

  • BluShadow
    BluShadow Mr UKMember, Moderator Posts: 42,706 Red Diamond


    So why did you waste our time by asking for something you don't actually want?

    If your question is about extracting values or nodes from an XML document, then make that clear when you ask your question, as that is most certainly not the same as extracting some text from between two other bits of text in a string.

    I see mathguy has had the patience to give you a correct solution.

  • User_Z3XHF
    User_Z3XHF Member Posts: 8 Green Ribbon

    Thanks for patience but its not working i said its not xmltype object. I cant change table structure.

    Create table xmltabela (id number, dlugi_tekst nvarchar(1000)) values (1, 'xmltext') its like this

    Thats why iam talking about string

  • Jason_(A_Non)
    Jason_(A_Non) United StatesMember Posts: 2,184 Gold Trophy

    Did you understand the sentence where mathguy said

    In the above query, if your column data type is CLOB or VARCHAR2 instead of XMLTYPE, you need to replace the PASSING clause with

    passing xmltype(xmltabela.slugi_tekst)
    


  • User_Z3XHF
    User_Z3XHF Member Posts: 8 Green Ribbon

    select id, artykul

    from xmltabela,

    xmltable('/note/artykul'

    passing xmltype(xmltabela.slugi_tekst)

    columns

    artykul varchar2(100) path '/')

    ;


    ID ARTYKUL

    ----- ------------------------------

    2038 Text to extract goes here

    2349



    You mean that ? Thats not work