Discussions
Categories
- 17.9K All Categories
- 3.4K Industry Applications
- 3.3K Intelligent Advisor
- 62 Insurance
- 536K On-Premises Infrastructure
- 138.2K Analytics Software
- 38.6K Application Development Software
- 5.7K Cloud Platform
- 109.4K Database Software
- 17.5K Enterprise Manager
- 8.8K Hardware
- 71.1K Infrastructure Software
- 105.2K Integration
- 41.5K Security Software
Meaning and when to use the ABOUT operator?
Hi,
With Oracle Text, the definition of the ABOUT operator:
"Use the ABOUT operator in English or French to query on a concept. The query string is usually a concept or theme that represents the idea to be searched on. Oracle Text returns the documents that contain the theme."
The two queries below return the same result (doc column is BLOB):
select
score(1) score,
id,
name
from
my_docs
where
contains(doc, 'RMAN', 1) > 0
order by
score(1) desc;
select
score(1) score,
id,
name
from
my_docs
where
contains(doc, 'about(RMAN)', 1) > 0
order by
score(1) desc;
Can someone explain me what is the use aof ABOUT? and what means "to query on a concept"?
Kind Regards
Best Answer
-
Sometimes examples are clearer than explanations. Here is an example that shows differences between ABOUT and RT. The query using ABOUT only returns the exact match, but the query using RT returns all related terms. In the query using RT, I specified the default thesaurus, but you could specify a different user-defined thesaurus instead. In the query using ABOUT, you cannot specify a user-defined thesaurus.
[email protected]_12.1.0.2.0> create table my_docs
2 (id number,
3 name varchar2(5),
4 doc blob)
5 /Table created.
[email protected]_12.1.0.2.0> insert all
2 into my_docs values (1, 'name1', utl_raw.cast_to_raw ('mammals'))
3 into my_docs values (2, 'name2', utl_raw.cast_to_raw ('pets'))
4 into my_docs values (3, 'name3', utl_raw.cast_to_raw ('canines'))
5 into my_docs values (4, 'name4', utl_raw.cast_to_raw ('felines'))
6 into my_docs values (5, 'name5', utl_raw.cast_to_raw ('some other stuff'))
7 select * from dual
8 /5 rows created.
[email protected]_12.1.0.2.0> create index doc_idx on my_docs (doc) indextype is ctxsys.context
2 /Index created.
[email protected]_12.1.0.2.0> select score(1) score, id, name, utl_raw.cast_to_varchar2(doc) doc
2 from my_docs
3 where contains(doc, 'ABOUT (pets)', 1) > 0
4 order by score(1) desc
5 /SCORE ID NAME DOC
---------- ---------- ----- ------------------------------------------------------------
5 2 name2 pets1 row selected.
[email protected]_12.1.0.2.0> select score(1) score, id, name, utl_raw.cast_to_varchar2(doc) doc
2 from my_docs
3 where contains(doc, 'RT (pets, default)', 1) > 0
4 order by score(1) desc
5 /SCORE ID NAME DOC
---------- ---------- ----- ------------------------------------------------------------
5 1 name1 mammals
5 4 name4 felines
5 3 name3 canines
5 2 name2 pets4 rows selected.
Answers
-
I would start in the Text Application Developers Guide:
4.1.6 ABOUT Queries and Themes
An
ABOUT
query is a query on a document theme. A document theme is a concept that is sufficiently developed in the text. For example, anABOUT
query on US politics might return documents containing information about US presidential elections and US foreign policy. Documents need not contain the exact phrase US politics to be returned.During indexing, document themes are derived from the knowledge base, which is a hierarchical list of categories and concepts that represents a view of the world. Some examples of themes in the knowledge catalog are concrete concepts such as jazz music, football, or Nelson Mandela. Themes can also be abstract concepts such as happiness or honesty.During indexing, the system can also identify and index document themes that are sufficiently developed in the document, but do not exist in the knowledge base.You can augment the knowledge base to define concepts and terms specific to your industry or query application. When you do so,ABOUT
queries are more precise for the added concepts.ABOUT
queries perform best when you create a theme component in your index. Theme components are created by default for English and French.From my experience, for it to be really effective, you need a pretty discrete setup in your knowledgebase. If you have several tightly related topics with a lot of overlap the ABOUT query really doesn't work well.
About the question on when to use it, like any other tool: You use it when it makes sense to use it?
-
The example below demonstrates some of what Bud Light said, using things that are already in the knowledge base. If you want it to work for things like RMAN, then you should create your own thesaurus and add it to the knowledge base using the Oracle Text utilities ( http://docs.oracle.com/database/122/CCREF/oracle-text-utilities.htm#CCREF2180 ) to do so.
[email protected]_12.1.0.2.0> -- table:
[email protected]_12.1.0.2.0> create table my_docs
2 (id number,
3 name varchar2(5),
4 doc blob)
5 /Table created.
[email protected]_12.1.0.2.0> -- test data:
[email protected]_12.1.0.2.0> insert all
2 into my_docs values (1, 'name1', utl_raw.cast_to_raw ('US politics'))
3 into my_docs values (2, 'name2', utl_raw.cast_to_raw ('US presidential elections'))
4 into my_docs values (3, 'name3', utl_raw.cast_to_raw ('US foreign policy'))
5 into my_docs values (4, 'name4', utl_raw.cast_to_raw ('US presidential elections US foreign policy'))
6 into my_docs values (5, 'name5', utl_raw.cast_to_raw ('some other stuff'))
7 select * from dual
8 /5 rows created.
[email protected]_12.1.0.2.0> column doc format a45 word_wrapped
[email protected]_12.1.0.2.0> select id, name, utl_raw.cast_to_varchar2(doc) doc from my_docs
2 /ID NAME DOC
---------- ----- ---------------------------------------------
1 name1 US politics
2 name2 US presidential elections
3 name3 US foreign policy
4 name4 US presidential elections US foreign policy
5 name5 some other stuff5 rows selected.
[email protected]_12.1.0.2.0> -- optional lexer with index_themes that increases accuracy of about queries:
[email protected]_12.1.0.2.0> begin
2 ctx_ddl.create_preference ('test_lex', 'basic_lexer');
3 ctx_ddl.set_attribute ('test_lex', 'index_themes', 'yes');
4 end;
5 /PL/SQL procedure successfully completed.
[email protected]_12.1.0.2.0> -- text index using optional lexer with index_themes:
[email protected]_12.1.0.2.0> create index doc_idx on my_docs (doc) indextype is ctxsys.context
2 parameters ('lexer test_lex')
3 /Index created.
[email protected]_12.1.0.2.0> -- what is indexed, due to usage of lexer with index_themes, drawing from knowledge base:
[email protected]_12.1.0.2.0> select token_type, token_text from dr$doc_idx$i
2 /TOKEN_TYPE TOKEN_TEXT
---------- ----------------------------------------------------------------
0 ELECTIONS
0 FOREIGN
0 OTHER
0 POLICY
0 POLITICS
0 PRESIDENTIAL
0 STUFF
0 US
1 elections
1 elections and campaigns
1 foreign policy
1 government
1 government and military
1 international relations
1 politics
1 presidents16 rows selected.
[email protected]_12.1.0.2.0> -- query without about returns only US politics:
[email protected]_12.1.0.2.0> select score(1) score, id, name, utl_raw.cast_to_varchar2(doc) doc
2 from my_docs
3 where contains(doc, 'US politics', 1) > 0
4 order by score(1) desc
5 /SCORE ID NAME DOC
---------- ---------- ----- ---------------------------------------------
5 1 name1 US politics1 row selected.
[email protected]_12.1.0.2.0> -- query with about returns rows with sufficient elements of the theme:
[email protected]_12.1.0.2.0> select score(1) score, id, name, utl_raw.cast_to_varchar2(doc) doc
2 from my_docs
3 where contains(doc, 'about(US politics)', 1) > 0
4 order by score(1) desc
5 /SCORE ID NAME DOC
---------- ---------- ----- ---------------------------------------------
47 1 name1 US politics
40 4 name4 US presidential elections US foreign policy2 rows selected.
-
Hi Barbara,
Thansk for providing an example.
But, what means "BASIC_LEXER"? What thye other optons? I do not find it in the documentation.
More generally, why do you need to define a preference and the set thwo attributes for this preference?
Furthermore, why 'INDEX_TEXT', 'YES' and INDEX_THEMES', 'NO'?
After these steps, you create an iindex (which makes sense) with parameters ('lexer test_lex').
What does this mean?
Thanks for giving some explanations.
Kind Regards
-
Barbara,
You wrote "The example below demonstrates some of what Bud Light said, using things that are already in the knowledge base".
what is that exactly? what is th "default knowledge base", and do I need to extend it?
Kind Regards
-
I'm not Barbara but...
>>But, what means "BASIC_LEXER"? What thye other optons? I do not find it in the documentation.
The Lexer Types are in the Oracle Text Reference:
http://docs.oracle.com/database/122/CCREF/oracle-text-indexing-elements.htm#GUID-3C8A85E5-C346-4B18-874A-669D93F64DE8>>why do you need to define a preference and the set thwo attributes for this preference?
From the docs:
index_themes
Specify YES to index theme information in English or French. This makes ABOUT queries more precise. The index_themes and index_text attributes cannot both be NO. The default is NO.
>>After these steps, you create an iindex (which makes sense) with parameters ('lexer test_lex'). What does this mean?You are telling Text to use the Lexer you created that use the options you specified.
>>what is that exactly? what is th "default knowledge base",Again, this is talked about in the documentation.
>>and do I need to extend it?
We cannot answer that for you. You would need to extend it to set up relationships specific to your data that do not come in the default knowledge base.
For a completely made up example: If you are interested documents about different colors and wanted to find ones that are "about('green')", you would need to teach the Kndoelge Base to distigsuish different colors.
-
I agree with Bud Light's answers. Here is another example that shows that, even without creating a lexer with index_themes, using that lexer in the text index, or expanding the knowledge base, Oracle Text is still able to identify some themes. The other things just increase the accuracy. So, it is better to use them, but, if you are just trying to start simple, then you can test without them first, then add them one at a time, re-testing as you go, to see the differences. You will find, for example, that it will not recognize the abbreviation RMAN as meaning Recovery Manager, but if you use the full name, then it can recognize those words and may be able to identify a theme.
[email protected]_12.1.0.2.0> -- table:
[email protected]_12.1.0.2.0> create table my_docs
2 (id number,
3 name varchar2(5),
4 doc blob)
5 /Table created.
[email protected]_12.1.0.2.0> -- test data:
[email protected]_12.1.0.2.0> insert all
2 into my_docs values (1, 'name1', utl_raw.cast_to_raw ('Recovery Manager'))
3 into my_docs values (2, 'name2', utl_raw.cast_to_raw ('An Oracle Database utility for backing up, restoration, and recovery of Oracle databases.'))
4 into my_docs values (5, 'name5', utl_raw.cast_to_raw ('some other stuff'))
5 select * from dual
6 /3 rows created.
[email protected]_12.1.0.2.0> column doc format a45 word_wrapped
[email protected]_12.1.0.2.0> select id, name, utl_raw.cast_to_varchar2(doc) doc from my_docs
2 /ID NAME DOC
---------- ----- ---------------------------------------------
1 name1 Recovery Manager
2 name2 An Oracle Database utility for backing up,
restoration, and recovery of Oracle
databases.5 name5 some other stuff
3 rows selected.
[email protected]_12.1.0.2.0> -- text index:
[email protected]_12.1.0.2.0> create index doc_idx on my_docs (doc) indextype is ctxsys.context
2 /Index created.
[email protected]_12.1.0.2.0> -- query without about:
[email protected]_12.1.0.2.0> select score(1) score, id, name, utl_raw.cast_to_varchar2(doc) doc
2 from my_docs
3 where contains(doc, 'Recovery Manager', 1) > 0
4 order by score(1) desc
5 /SCORE ID NAME DOC
---------- ---------- ----- ---------------------------------------------
4 1 name1 Recovery Manager1 row selected.
[email protected]_12.1.0.2.0> -- query with about:
[email protected]_12.1.0.2.0> select score(1) score, id, name, utl_raw.cast_to_varchar2(doc) doc
2 from my_docs
3 where contains(doc, 'about(Recovery Manager)', 1) > 0
4 order by score(1) desc
5 /SCORE ID NAME DOC
---------- ---------- ----- ---------------------------------------------
89 1 name1 Recovery Manager
33 2 name2 An Oracle Database utility for backing up,
restoration, and recovery of Oracle
databases.
2 rows selected. -
Thanks for the reactions. Sorry for my late reaction. I will come back to that thread as soon as possible...
-
Hi Barbara, Bud,
Some more questions.
when creating a user-made thesaurus, what are the best practices?
Should this thesaurus be "merged" with the Knowledge Base or not? If yes, how to achieve tis?
The last query the uses the ABOUT opartor due to the index tht uses a lexer, returns 2 rows.
I would expect it would return also 'US presidential elections', US foreign policy'. Both belongs to the theme "US politics", correct?
Why this does not appear on the list?
Kind Regards
-
The thesaurus should be merged with the Knowledge Base. The following section of the online documentation contains instructions for how to do that.
The rows containing just 'US presidential elections' or just 'US foreign policy' were insufficient by themselves to establish a theme. The two of them together were sufficient to establish a theme. That is why I posted those separately and together, to demonstrate this.
-
Hi,
I still do not understand why the rows containing just 'US presidential elections' or just 'US foreign policy' were not sufficient to create a theme?
Kind Regards