Forum Stats

  • 3,825,196 Users
  • 2,260,479 Discussions
  • 7,896,435 Comments

Discussions

Meaning and when to use the ABOUT operator?

Laury
Laury Member Posts: 1,665 Silver Badge
edited Oct 19, 2017 10:29AM in Text

Hi,

With Oracle Text, the definition of the ABOUT operator:

"Use the ABOUT operator in English or French to query on a concept. The query string is usually a concept or theme that represents the idea to be searched on. Oracle Text returns the documents that contain the theme."

The two queries below return the same result (doc column is BLOB):

select

    score(1) score,

    id,

    name

from  

    my_docs

where 

    contains(doc, 'RMAN', 1) > 0

order by

    score(1) desc;

select

    score(1) score,

    id,

    name

from  

    my_docs

where 

    contains(doc, 'about(RMAN)', 1) > 0

order by

    score(1) desc;

Can someone explain me what is the use aof ABOUT? and what means "to query on a concept"?

Kind Regards

Laury

Best Answer

  • Barbara Boehmer
    Barbara Boehmer Member Posts: 4,625 Gold Trophy
    edited Oct 11, 2017 9:59AM Answer ✓

    Sometimes examples are clearer than explanations.  Here is an example that shows differences between ABOUT and RT.  The query using ABOUT only returns the exact match, but the query using RT returns all related terms.  In the query using RT, I specified the default thesaurus, but you could specify a different user-defined thesaurus instead.  In the query using ABOUT, you cannot specify a user-defined thesaurus.

    [email protected]_12.1.0.2.0> create table my_docs
      2    (id    number,
      3   name  varchar2(5),
      4   doc   blob)
      5  /

    Table created.

    [email protected]_12.1.0.2.0> insert all
      2  into my_docs values (1, 'name1', utl_raw.cast_to_raw ('mammals'))
      3  into my_docs values (2, 'name2', utl_raw.cast_to_raw ('pets'))
      4  into my_docs values (3, 'name3', utl_raw.cast_to_raw ('canines'))
      5  into my_docs values (4, 'name4', utl_raw.cast_to_raw ('felines'))
      6  into my_docs values (5, 'name5', utl_raw.cast_to_raw ('some other stuff'))
      7  select * from dual
      8  /

    5 rows created.

    [email protected]_12.1.0.2.0> create index doc_idx on my_docs (doc) indextype is ctxsys.context
      2  /

    Index created.

    [email protected]_12.1.0.2.0> select score(1) score, id, name, utl_raw.cast_to_varchar2(doc) doc
      2  from   my_docs
      3  where  contains(doc, 'ABOUT (pets)', 1) > 0
      4  order  by score(1) desc
      5  /

         SCORE         ID NAME  DOC
    ---------- ---------- ----- ------------------------------------------------------------
             5          2 name2 pets

    1 row selected.

    [email protected]_12.1.0.2.0> select score(1) score, id, name, utl_raw.cast_to_varchar2(doc) doc
      2  from   my_docs
      3  where  contains(doc, 'RT (pets, default)', 1) > 0
      4  order  by score(1) desc
      5  /

         SCORE         ID NAME  DOC
    ---------- ---------- ----- ------------------------------------------------------------
             5          1 name1 mammals
             5          4 name4 felines
             5          3 name3 canines
             5          2 name2 pets

    4 rows selected.

    LauryLaury
«13

Answers

  • Bud Light
    Bud Light Member Posts: 70 Blue Ribbon
    edited Sep 22, 2017 8:23AM

    I would start in the Text Application Developers Guide:

    Querying with Oracle Text

    4.1.6 ABOUT Queries and Themes

    An ABOUT query is a query on a document theme. A document theme is a concept that is sufficiently developed in the text. For example, an ABOUT query on US politics might return documents containing information about US presidential elections and US foreign policy. Documents need not contain the exact phrase US politics to be returned.During indexing, document themes are derived from the knowledge base, which is a hierarchical list of categories and concepts that represents a view of the world. Some examples of themes in the knowledge catalog are concrete concepts such as jazz music, football, or Nelson Mandela. Themes can also be abstract concepts such as happiness or honesty.During indexing, the system can also identify and index document themes that are sufficiently developed in the document, but do not exist in the knowledge base.You can augment the knowledge base to define concepts and terms specific to your industry or query application. When you do so, ABOUT queries are more precise for the added concepts.ABOUT queries perform best when you create a theme component in your index. Theme components are created by default for English and French.

    From my experience, for it to be really effective, you need a pretty discrete setup in your knowledgebase.  If you have several tightly related topics with a lot of overlap the ABOUT query really doesn't work well.

    About the question on when to use it, like any other tool:  You use it when it makes sense to use it?

    LauryLaury
  • Barbara Boehmer
    Barbara Boehmer Member Posts: 4,625 Gold Trophy
    edited Sep 23, 2017 5:23AM

    The example below demonstrates some of what Bud Light said, using things that are already in the knowledge base.  If you want it to work for things like RMAN, then you should create your own thesaurus and add it to the knowledge base using the Oracle Text utilities ( http://docs.oracle.com/database/122/CCREF/oracle-text-utilities.htm#CCREF2180 ) to do so.

    [email protected]_12.1.0.2.0> -- table:
    [email protected]_12.1.0.2.0> create table my_docs
      2    (id    number,
      3  name  varchar2(5),
      4  doc  blob)
      5  /

    Table created.

    [email protected]_12.1.0.2.0> -- test data:
    [email protected]_12.1.0.2.0> insert all
      2  into my_docs values (1, 'name1', utl_raw.cast_to_raw ('US politics'))
      3  into my_docs values (2, 'name2', utl_raw.cast_to_raw ('US presidential elections'))
      4  into my_docs values (3, 'name3', utl_raw.cast_to_raw ('US foreign policy'))
      5  into my_docs values (4, 'name4', utl_raw.cast_to_raw ('US presidential elections US foreign policy'))
      6  into my_docs values (5, 'name5', utl_raw.cast_to_raw ('some other stuff'))
      7  select * from dual
      8  /

    5 rows created.

    [email protected]_12.1.0.2.0> column doc format a45 word_wrapped
    [email protected]_12.1.0.2.0> select id, name, utl_raw.cast_to_varchar2(doc) doc from my_docs
      2  /

            ID NAME  DOC
    ---------- ----- ---------------------------------------------
            1 name1 US politics
            2 name2 US presidential elections
            3 name3 US foreign policy
            4 name4 US presidential elections US foreign policy
            5 name5 some other stuff

    5 rows selected.

    [email protected]_12.1.0.2.0> -- optional lexer with index_themes that increases accuracy of about queries:
    [email protected]_12.1.0.2.0> begin
      2    ctx_ddl.create_preference ('test_lex', 'basic_lexer');
      3    ctx_ddl.set_attribute ('test_lex', 'index_themes', 'yes');
      4  end;
      5  /

    PL/SQL procedure successfully completed.

    [email protected]_12.1.0.2.0> -- text index using optional lexer with index_themes:
    [email protected]_12.1.0.2.0> create index doc_idx on my_docs (doc) indextype is ctxsys.context
      2  parameters ('lexer  test_lex')
      3  /

    Index created.

    [email protected]_12.1.0.2.0> -- what is indexed, due to usage of lexer with index_themes, drawing from knowledge base:
    [email protected]_12.1.0.2.0> select token_type, token_text from dr$doc_idx$i
      2  /

    TOKEN_TYPE TOKEN_TEXT
    ---------- ----------------------------------------------------------------
            0 ELECTIONS
            0 FOREIGN
            0 OTHER
            0 POLICY
            0 POLITICS
            0 PRESIDENTIAL
            0 STUFF
            0 US
            1 elections
            1 elections and campaigns
            1 foreign policy
            1 government
            1 government and military
            1 international relations
            1 politics
            1 presidents

    16 rows selected.

    [email protected]_12.1.0.2.0> -- query without about returns only US politics:
    [email protected]_12.1.0.2.0> select score(1) score, id, name, utl_raw.cast_to_varchar2(doc) doc
      2  from  my_docs
      3  where  contains(doc, 'US politics', 1) > 0
      4  order  by score(1) desc
      5  /

        SCORE        ID NAME  DOC
    ---------- ---------- ----- ---------------------------------------------
            5          1 name1 US politics

    1 row selected.

    [email protected]_12.1.0.2.0> -- query with about returns rows with sufficient elements of the theme:
    [email protected]_12.1.0.2.0> select score(1) score, id, name, utl_raw.cast_to_varchar2(doc) doc
      2  from  my_docs
      3  where  contains(doc, 'about(US politics)', 1) > 0
      4  order  by score(1) desc
      5  /

        SCORE        ID NAME  DOC
    ---------- ---------- ----- ---------------------------------------------
            47          1 name1 US politics
            40          4 name4 US presidential elections US foreign policy

    2 rows selected.

    LauryLaury
  • Laury
    Laury Member Posts: 1,665 Silver Badge
    edited Sep 25, 2017 9:25AM

    Hi Barbara,

    Thansk for providing an example.

    But, what means "BASIC_LEXER"? What thye other optons? I do not find it in the documentation.

    More generally, why do you need to define a preference and the set thwo attributes for this preference?

    Furthermore, why 'INDEX_TEXT', 'YES' and INDEX_THEMES', 'NO'?

    After these steps, you create an iindex (which makes sense) with parameters ('lexer  test_lex').

    What does this mean?

    Thanks for giving some explanations.

    Kind Regards

  • Laury
    Laury Member Posts: 1,665 Silver Badge
    edited Sep 25, 2017 9:29AM

    Barbara,

    You wrote "The example below demonstrates some of what Bud Light said, using things that are already in the knowledge base".

    what is that exactly? what is th "default knowledge base", and do I need to extend it?

    Kind Regards

  • Bud Light
    Bud Light Member Posts: 70 Blue Ribbon
    edited Sep 25, 2017 11:44AM

    I'm not Barbara but...

    >>But, what means "BASIC_LEXER"? What thye other optons? I do not find it in the documentation.

    The Lexer Types are in the Oracle Text Reference:
    http://docs.oracle.com/database/122/CCREF/oracle-text-indexing-elements.htm#GUID-3C8A85E5-C346-4B18-874A-669D93F64DE8

    >>why do you need to define a preference and the set thwo attributes for this preference?

    From the docs:

    index_themes
    Specify YES to index theme information in English or French. This makes ABOUT queries more precise. The index_themes and index_text attributes cannot both be NO. The default is NO.


    >>After these steps, you create an iindex (which makes sense) with parameters ('lexer  test_lex').  What does this mean?

    You are telling Text to use the Lexer you created that use the options you specified.


    >>what is that exactly? what is th "default knowledge base",

    Again, this is talked about in the documentation.

    http://docs.oracle.com/database/122/CCAPP/working-with-thesaurus-in-oracle-text.htm#GUID-F35BEFA4-494B-4DD4-AD5E-836C5C1E29B5

    >>and do I need to extend it?

    We cannot answer that for you.  You would need to extend it to set up relationships specific to your data that do not come in the default knowledge base.

    For a completely made up example:  If you are interested documents about different colors and wanted to find ones that are "about('green')", you would need to teach the Kndoelge Base to distigsuish different colors.

    LauryLaury
  • Barbara Boehmer
    Barbara Boehmer Member Posts: 4,625 Gold Trophy
    edited Sep 25, 2017 3:54PM

    I agree with Bud Light's answers.  Here is another example that shows that, even without creating a lexer with index_themes, using that lexer in the text index, or expanding the knowledge base, Oracle Text is still able to identify some themes.  The other things just increase the accuracy.  So, it is better to use them, but, if you are just trying to start simple, then you can test without them first, then add them one at a time, re-testing as you go, to see the differences.  You will find, for example, that it will not recognize the abbreviation RMAN as meaning Recovery Manager, but if you use the full name, then it can recognize those words and may be able to identify a theme.

    [email protected]_12.1.0.2.0> -- table:
    [email protected]_12.1.0.2.0> create table my_docs
      2    (id    number,
      3  name  varchar2(5),
      4  doc  blob)
      5  /

    Table created.

    [email protected]_12.1.0.2.0> -- test data:
    [email protected]_12.1.0.2.0> insert all
      2  into my_docs values (1, 'name1', utl_raw.cast_to_raw ('Recovery Manager'))
      3  into my_docs values (2, 'name2', utl_raw.cast_to_raw ('An Oracle Database utility for backing up, restoration, and recovery of Oracle databases.'))
      4  into my_docs values (5, 'name5', utl_raw.cast_to_raw ('some other stuff'))
      5  select * from dual
      6  /

    3 rows created.

    [email protected]_12.1.0.2.0> column doc format a45 word_wrapped
    [email protected]_12.1.0.2.0> select id, name, utl_raw.cast_to_varchar2(doc) doc from my_docs
      2  /

            ID NAME  DOC
    ---------- ----- ---------------------------------------------
            1 name1 Recovery Manager
            2 name2 An Oracle Database utility for backing up,
                    restoration, and recovery of Oracle
                    databases.

            5 name5 some other stuff

    3 rows selected.

    [email protected]_12.1.0.2.0> -- text index:
    [email protected]_12.1.0.2.0> create index doc_idx on my_docs (doc) indextype is ctxsys.context
      2  /

    Index created.

    [email protected]_12.1.0.2.0> -- query without about:
    [email protected]_12.1.0.2.0> select score(1) score, id, name, utl_raw.cast_to_varchar2(doc) doc
      2  from  my_docs
      3  where  contains(doc, 'Recovery Manager', 1) > 0
      4  order  by score(1) desc
      5  /

        SCORE        ID NAME  DOC
    ---------- ---------- ----- ---------------------------------------------
            4          1 name1 Recovery Manager

    1 row selected.

    [email protected]_12.1.0.2.0> -- query with about:
    [email protected]_12.1.0.2.0> select score(1) score, id, name, utl_raw.cast_to_varchar2(doc) doc
      2  from  my_docs
      3  where  contains(doc, 'about(Recovery Manager)', 1) > 0
      4  order  by score(1) desc
      5  /

        SCORE        ID NAME  DOC
    ---------- ---------- ----- ---------------------------------------------
            89          1 name1 Recovery Manager
            33          2 name2 An Oracle Database utility for backing up,
                                restoration, and recovery of Oracle
                                databases.


    2 rows selected.

    LauryLaury
  • Laury
    Laury Member Posts: 1,665 Silver Badge
    edited Oct 4, 2017 3:52AM

    Thanks for the reactions. Sorry for my late reaction. I will come back to that thread as soon as possible...

  • Laury
    Laury Member Posts: 1,665 Silver Badge
    edited Oct 9, 2017 4:29AM

    Hi Barbara, Bud,

    Some more questions.

    when creating a user-made thesaurus, what are the best practices?

    Should this thesaurus be "merged" with the Knowledge Base or not? If yes, how to achieve tis?

    The last query the uses the ABOUT opartor due to the index tht uses a lexer, returns 2 rows.

    I would expect it would return also 'US presidential elections', US foreign policy'. Both belongs to the theme "US politics", correct?

    Why this does not appear on the list?

    Kind Regards

  • Barbara Boehmer
    Barbara Boehmer Member Posts: 4,625 Gold Trophy
    edited Oct 9, 2017 4:49AM

    The thesaurus should be merged with the Knowledge Base.  The following section of the online documentation contains instructions for how to do that.

    http://docs.oracle.com/database/122/CCREF/oracle-text-utilities.htm#GUID-C652982F-B2AB-4DF3-9CF1-ADF00EF7C528

    The rows containing just 'US presidential elections' or just 'US foreign policy' were insufficient by themselves to establish a theme.  The two of them together were sufficient to establish a theme.  That is why I posted those separately and together, to demonstrate this.

    LauryLaury
  • Laury
    Laury Member Posts: 1,665 Silver Badge
    edited Oct 9, 2017 7:59AM

    Hi,

    I still do not understand why the rows containing just 'US presidential elections' or just 'US foreign policy' were not sufficient to create a theme?

    Kind Regards

This discussion has been closed.