11 Replies Latest reply: Dec 27, 2003 11:16 PM by 807582 RSS

    Seachring word from text file

    807582
      Hi...There..

      I h'va wrirtten Search application which search words from Simple text files.
      My file contains list of words separated by "\n"(new line).
      i am using java.io.BufferedReader for reading file.
      i'want to search word from file within few milliseconds, but when my file containo more then some 2lake words(200000) my process of readind comsumes more then 5 sec. time to search.

      pl. suggest me effective method to search word from file so i can make it rapid search.

      Actually i 've to provide search on "TEXT VALUE CHANGED EVENT" so even if my process takes more then one seconds it is not physible for me.

      Thanks in Advance.
      Timir Patel.
        • 1. Re: Seachring word from text file
          807582
          i'want to search word from file within few
          milliseconds, but when my file containo more then some
          2lake words(200000) my process of readind comsumes
          more then 5 sec. time to search.

          pl. suggest me effective method to search word from
          file so i can make it rapid search.
          Stick your words into a HashSet. Here is an example:
              BufferedReader in = new BufferedReader(new FileReader(filename));
              Set data = new HashSet();
              String line = in.readLine();
              while (line != null) {
                  data.add(line);
                  line = in.readLine();
              }
          If you want to see if a word is in the set then use something like this:
              data.contains(word);
          Hope that helps.
          • 2. Re: Seachring word from text file
            807582
            Thanks for Quick Replay.

            Again time factor remain same with above code.

            Even i need to Find byte position of word within file for other formation purpose.
            Pl. suggest me relavent way.
            Thanks again.

            Timir Patel
            • 3. Re: Seachring word from text file
              807582
              Again time factor remain same with above code.
              Hopefully I'm just misunderstanding your statement,
              because there is no way a HashSet lookup (even with
              200000 strings) is going to take 5 seconds.
              Even i need to Find byte position of word within file
              for other formation purpose.
              Pl. suggest me relavent way.
              It would help if I knew exactly what you had to do.
              Otherwise any advice I give could be wrong given
              the parts of the problem you haven't mentioned yet.
              • 4. Re: Seachring word from text file
                807582
                Actually i 've to provide search on "TEXT VALUE CHANGED EVENT" so even if my process takes more then one seconds it is not physible for me.
                If the AWT-EventQueue thread is executing your search, your application will be unresponsize. Spawn a new thread to do the search.

                Show some code, too, the bottleneck could be in a poorly implemented search algorithm.
                • 5. Re: Seachring word from text file
                  807582
                  Thanks Again...to getting continue replay....

                  My Picture is as below...

                  Actually I am developing high scale Search engine which will lookup words from wordlist file which i've maintioned here.
                  ------------------------------------------------------------------
                  my requirement is as below..
                  --------------------------------------------------------------------
                  - It is a Project to convert existing Search application written in Microsoft VF
                  - it is a search engine which will display word count as quich as key strokes finished on textFiled of search.
                  - Any how my search is not suppose to be relay on any database or other java collection.
                  - Same wordlist files are presently used by existing application which is doing the same search task very fast.
                  - Simply i want to count OR Search words from wordlist file which contains more then 2,00,,000 words as text separated by new Line("\n").
                  ----------------------------------------------------------------------------------------
                  What i am doing in present is summerized here as below.......
                  --------------------------------------------------------------------------------------------
                  - I am using simple text file as a wordlist to make search
                  - my wordlist file will contain more then 4,00,000 words as in present i am testing with 2,00,000 words.
                  - Every words are separated by "\n" Character.
                  - I am using BufferedReader to read file by every line using readLine() Function.
                  - but when my file contains more then 2,00,000 words...my process of file reading takes more time then i 'm expecting.

                  • 6. Re: Seachring word from text file
                    807582
                    If you need it to be done very fast I would suggest implementing binary search. At first you should either load your word list to memory (I don't recommend - 200'000 words ~ 5 Megs RAM, but if you need it extremally fast do it). Then you need to sort this word list. Sorted lists can be searched very fast, even faster than Hashtable does (Hashtable is best for unsortable data), using binary search algorithm. It should find your word using ln(N)/ln(2) where N is number of word comparations (200'000-> 18 comparations). The Collections class from java.util contains generic algorithm for doing that.

                    Since the number of comparations is relatively small you may save your memory by creating sorted index file - this file should contain sorted by word pointers to words in your vocabulary file. You may then load this faile and use it to provide data to your search algorithm. It should save you about 90% of memory (8bytes per word instead of word dependent size (not less than 24 bytes) if you use long[ ] array to load index file).

                    Ofcourse you will never outrun good C application but you should be close to that, especially if you implement your binary search by yourself (each non-final, non-static or non-private method invocation is at last 4 times slower in theory than static, final or private)
                    • 7. Re: Seachring word from text file
                      807582
                      Yes...it is a really Great Solution for Search...
                      we can definatly Go Through....but Sill One more thing happen here when to doing same ...
                      I am Doing as follow
                      <code>
                      BufferedReader lineReader = new BufferedReader(new FileReader(myFileName));
                      String fileData="";
                      String nextLine="";
                      while((nextLine=lineReader.readLine() != null)
                      {
                      finalData+=nextLine;
                      }
                      </code>
                      After Finishind some Number's Of Word it is Showing me java.lang.OutOfMemoryException
                      i solved it by increasing value of Xms and Xmx but that is not a scalable option beacause i've data in different 27 files(a-z and other) .

                      but' i haven't tried out same with other Collection Object....so i will try for that as above.....

                      is there any other option to load all lines in single OR multiple Collection Object...???

                      • 8. Re: Seachring word from text file
                        807582
                        Try this:
                        import java.io.*;
                        import java.util.*;
                        
                        public class searcher
                        {
                                  private static long [] indexes;
                        
                             private static class temp_data
                             {
                                  public final String text;
                                  public final long starts_at;
                        
                                  public temp_data(String t, long l)
                                  {
                                       text = t;
                                       starts_at = l;
                                  };
                             };
                             private static class temp_cmp implements Comparator
                             {
                                  public int compare(Object o1,Object o2)
                                  {
                                       return ((temp_data)o1).text.compareTo(
                                                 ((temp_data)o2).text);
                                  };
                             };
                             /** creats index table. You should do it once, and rather store index
                             table in file then. This method has high peak memory usage but it is
                               easy to optimize it.*/
                             private static void buildIndex(RandomAccessFile file)throws Exception
                             {
                                  List temp = new LinkedList();
                                  String st;
                                  long p = file.getFilePointer();
                                  while((st = file.readLine())!=null)
                                  {
                                       temp.add(
                                            new temp_data(st,p)
                                            );
                                       p = file.getFilePointer();
                                  };
                                  Collections.sort(temp,new temp_cmp());
                                  indexes = new long[temp.size()];
                                  int i=0;
                                  for(Iterator I=temp.iterator();I.hasNext();i++)
                                  {
                                  temp_data tt = ((temp_data)I.next());
                                   System.out.println("indexing :"+tt.text+" ["+tt.starts_at+"]");
                                       indexes=tt.starts_at;
                                  };
                             };
                             /** returns position at which text starts or -1 if not found */
                             public static long find(String text,RandomAccessFile file)throws Exception
                             {
                                  int ncp = indexes.length/2;
                                  int n = 2;
                                  int cp;
                                  do{
                                  cp = ncp;
                                       file.seek(indexes[cp]);
                                       String tt = file.readLine();
                                  System.out.println("comparing with "+tt);
                                       int cmpr = text.compareTo(tt);
                                       if (cmpr==0)
                                            return indexes[cp];
                                       else
                                       if (cmpr>0)
                                            ncp = cp+(indexes.length / (1<<n));
                                       else
                                            ncp = cp-(indexes.length / (1<<n));
                                       n++;
                                  }while(ncp!=cp);
                                  return -1;
                             };

                             public static void main(String args [] )throws Exception
                             {
                                  RandomAccessFile f = new RandomAccessFile(args[0],"r");
                                  buildIndex(f);
                                  for(int i=1;i<args.length;i++)
                                  {
                                  System.out.println("searching for \""+args[i]+"\"");
                                  System.out.println("found at:"+find(args[i],f));
                                  }
                                  f.close();
                             };
                        };

                        It should work, however I gave it less than five minutes testing.
                        • 9. Re: Seachring word from text file
                          807582
                          Hey...Thanks....it's...working Excellance....

                          Great performance....Thaks a lot again....

                          Now ...pl. tell me only one thing....
                          User will search as follow...

                          1. Start Application
                          2.Load Search Form
                          3.Type Word in TextField .....and Search Engine Parally Display Result....

                          I've done as above with my application....but since Search Process is working on your Idea
                          Also Recommand Sequence Diagram....for making application mantioned as above....

                          Thanks....
                          Timir Patel.

                          • 10. Re: Seachring word from text file
                            807582
                            Your schema should work.

                            However, since the dictionary will propably grow continously and even this algorithm will reach its response time limits I would try to move searching to separate thread. This thread will receive request to find word and will start to search it. However, during search it will monitor if new search request arrived. If it did, it would restart search process. At the end of search it should send SwingUtilities.invokeLater with request to show result to user.

                            In this approach user will never see a slow down while typing - only moment when he/she will see a delay related to search process will be the time from the moment he/she stopped typing to the moment when search result appears on screen. This way it will work smooth on my old 486DX2-66 too.

                            You may also consider creating index for vocabulary in same search thread - the GUI should appear to user to be operable faster then.

                            P.S. Change LinkedList to ArrayList(20000) in my example - the process of building index should be much faster then.
                            • 11. Re: Seachring word from text file
                              807582
                              Heyyy...Thank you very much for continious Support.....
                              All things finished up at my level....and result are also.....very fast...

                              I got great support from you....
                              i will try my levelbest to help you.....later as required....
                              pl.....mail...me if required....on timirpatel@rediffmail.com

                              Thankig you,
                              Timir.