6 Replies Latest reply: Dec 20, 2012 12:09 AM by 979937 RSS

    Suggestion for Using Oracle Text Index

    979937
      Hi
      I have a table which has say .. city names and names of people from that city and somw other columns having details of that person.
      Ex : City Name SSID
      Newyork James 01
      Newyork Allen 02
      Washington Obama 03
      Newyork James 04

      (Note : names can be repeating)

      I have a particular use case with the following requirements :
      1. You can search by city
      2. You can do a partial search on name.

      Ex : Get all rows with city=Newyork and name starting with J

      The number of rows in the table is in the order of 100 million, and I want to use oracle text for indexing.

      Can you suggest which type of index is the best? (Given that the search query always has full city name and partial person name).

      PS : I thought of using ctxcat index with city as the indexing column and name as sub-index , but I got an error since the size of index set columns should not exceed 30 bytes.

      Edited by: 976934 on Dec 17, 2012 2:34 AM
        • 1. Re: Suggestion for Using Oracle Text Index
          Roger Ford-Oracle
          Need more info:
          Is "Name" always one word, or can it be multiple words?
          What do you mean by "partial search" - will you always be looking for letters at the beginning of the name, or will you be looking for substrings (such as "ame", "lle", "bam")
          Do you need inexact search on the name field - eg should "Cathy" match "Katherine"?
          • 2. Re: Suggestion for Using Oracle Text Index
            979937
            1. Name can be multiple words
            2. It could be substrings also.
            3. No, I don't need inexact search
            • 3. Re: Suggestion for Using Oracle Text Index
              Roger Ford-Oracle
              I would suggest using a CONTEXT index, and storing the city in an MDATA field - which is designed to handle full, exact string matches within an Oracle Text index.

              Using a MULTI_COLUMN_DATASTORE will allow you to fetch the name for conventional indexing, and the city for use in an MDATA field.

              As you want substring searching you'll need to use the SUBSTRING_INDEX option.

              See the worked example below:
              drop table my_table
              /
              create table my_table( city varchar2(30), name varchar2(30), ssid number )
              /
              insert into my_table values( 'New York', 'James', 1);
              insert into my_table values( 'New York', 'Allen', 2);
              insert into my_table values( 'Washington', 'Obama', 3);
              insert into my_table values( 'New York', 'James', 4);
              
              exec ctx_ddl.drop_preference  ( 'my_ds' )
              exec ctx_ddl.create_preference( 'my_ds', 'MULTI_COLUMN_DATASTORE' )
              exec ctx_ddl.set_attribute    ( 'my_ds', 'COLUMNS', 'city, name' )
              
              exec ctx_ddl.drop_preference  ( 'my_wl' )
              exec ctx_ddl.create_preference( 'my_wl', 'BASIC_WORDLIST' )
              exec ctx_ddl.set_attribute    ( 'my_wl', 'SUBSTRING_INDEX', 'true' )
              
              exec ctx_ddl.drop_section_group  ( 'my_sg' )
              exec ctx_ddl.create_section_group( 'my_sg', 'BASIC_SECTION_GROUP' )
              exec ctx_ddl.add_mdata_section   ( 'my_sg', 'city', 'city' )
              
              create index my_index on my_table( name )
              indextype is ctxsys.context
              parameters( 'datastore my_ds wordlist my_wl section group my_sg' )
              /
              select * from my_table where contains( name, '%bam% and mdata(city, Washington)' ) > 0
              /
              select * from my_table where contains( name, '%J% and mdata(city, New York)' ) > 0
              /
              The output from the two queries at the end of this is:
              SQL> select * from my_table where contains( name, '%bam% and mdata(city, Washington)' ) > 0
                2  /
              
              CITY                      NAME                        SSID
              ------------------------------ ------------------------------ ----------
              Washington                 Obama                           3
              
              SQL> select * from my_table where contains( name, '%J% and mdata(city, New York)' ) > 0
                2  /
              
              CITY                      NAME                        SSID
              ------------------------------ ------------------------------ ----------
              New York                 James                           1
              New York                 James                           4
              • 4. Re: Suggestion for Using Oracle Text Index
                979937
                I am trying to create the index as you mentioned. But I see that it is not yet complete even after 30 hrs and I can't see anything written onto the disk.
                Am I doing something wrong?
                • 5. Re: Suggestion for Using Oracle Text Index
                  Roger Ford-Oracle
                  Not necessarily. The SUBSTRING_INDEX option makes indexing talke five or six times longer, but is essential if you want to do "double truncation" wild card searches.

                  It would probably have been a good idea to test it on a small proportion of the data first - like 1 million rows - so you got an idea how long it is going to take.

                  How much memory, and how many CPU cores does the machine have? If it's a big machine, adding a MEMORY clause to the parameters, and increasing the parallelism should help it run faster:
                  exec ctx_output.start_log( 'my_index.log' )
                  
                  create index my_index on my_table( name )
                  indextype is ctxsys.context
                  parameters( 'datastore my_ds wordlist my_wl section group my_sg memory 500M' )
                  parallel 8
                  /
                  
                  exec ctx_output.end_log
                  The "start_log" part will also generate a log file in $ORACLE_HOME/ctx/log/mu_index.log telling you how far it has got.
                  • 6. Re: Suggestion for Using Oracle Text Index
                    979937
                    I was unable to create index on the existing table, but I created a duplicate table and I could create the index on it. I still dont know what the problem is.

                    And the other thing is that, search using context index doesn't give better performance than a normal search with b-tree index on city.