6 Replies Latest reply: Jan 2, 2007 10:43 PM by 807599 RSS

    Lucene Support to Russian!

    807599
      I am creating index file using lucene search engine.I am adding Russian characters in the index file but it is inserting special characters.I am unable to figure out the problem.I am using Russian Analyzer as specified in Lucene API.I am using lucene-1.4.3.jar.
        • 1. Re: Lucene Support to Russian!
          800282
          I am creating index file using lucene search engine.I
          am adding Russian characters in the index file but it
          is inserting special characters.I am unable to figure
          out the problem...
          From Lucene In Action, 4.8.1 Unicode and encodings:
          Internally, Lucene stores all characters in the standard UTF-8 encoding.
          ...
          You, however, are responsible for getting external text into Java and Lucene.
          If you?re indexing files on a file system, you need to know what encoding the
          files were saved as in order to read them properly.
          • 2. Re: Lucene Support to Russian!
            807599
            Can u explain in brief how to do that...How to know in what encoding files are saving..I am confused..Please help me out..For your earlier reply thanx.
            • 3. Re: Lucene Support to Russian!
              800282
              Can u explain in brief how to do that...
              No.

              How to know in what encoding files are saving..I am
              confused..Please help me out..For your earlier reply
              thanx.
              If you let Lucene write an index to disk, it's always encoded in UTF-8.
              If you're reading files from disk, you have to know up front (or discover) what encoding they're in in order to read them properly.
              I suggest reading a bit about unicode, encoding schema's and maybe get a hold of a decent Lucene book.

              Useful links:

              Jakarta-lucene Wiki (Lucene FAQ's, tutorials, etc.):
              http://wiki.apache.org/jakarta-lucene/FrontPage?action=show&redirect=FrontPageEN

              Lucene's mailing lists (which are pretty active, btw):
              http://lucene.apache.org/java/docs/mailinglists.html
              • 4. Re: Lucene Support to Russian!
                807599
                Its working now...When i read the index through java code and displayed them the result was Russian Characters...So indexing and searching is working..Thanx.
                Do you know about SecondString20030401.jar.What it does is it compares two strings and based on algorithm you are choosing it will calculate score..Now problem is whether secondstring supports RussianCharacter or not..Thanks for replying earlier dude.
                • 5. Re: Lucene Support to Russian!
                  800282
                  Good to hear you've got things working.

                  ...
                  Do you know about SecondString20030401.jar.What it
                  does is it compares two strings and based on
                  algorithm you are choosing it will calculate
                  score..
                  If you mean this project: http://secondstring.sourceforge.net/
                  then no; I've never used it. But browsing through the API I see a lot of classic String algorithms (Monge-Elkan, Levenstein, Smith-Waterman, etc.) so it's not just one algorithm.

                  Now problem is whether secondstring supports
                  RussianCharacter or not..
                  Can't help you there either. You'll have too ask the authors, or just try it.

                  Thanks for replying earlier
                  dude.
                  You're welcome... dude.
                  • 6. Re: Lucene Support to Russian!
                    807599
                    Ok..Can you tell me one thing whether lucene is fully compatible with Russian characters or no....