This content has been marked as final. Show 6 replies
I am creating index file using lucene search engine.IFrom Lucene In Action, 4.8.1 Unicode and encodings:
am adding Russian characters in the index file but it
is inserting special characters.I am unable to figure
out the problem...
Internally, Lucene stores all characters in the standard UTF-8 encoding.
You, however, are responsible for getting external text into Java and Lucene.
If you?re indexing files on a file system, you need to know what encoding the
files were saved as in order to read them properly.
Can u explain in brief how to do that...No.
How to know in what encoding files are saving..I amIf you let Lucene write an index to disk, it's always encoded in UTF-8.
confused..Please help me out..For your earlier reply
If you're reading files from disk, you have to know up front (or discover) what encoding they're in in order to read them properly.
I suggest reading a bit about unicode, encoding schema's and maybe get a hold of a decent Lucene book.
Jakarta-lucene Wiki (Lucene FAQ's, tutorials, etc.):
Lucene's mailing lists (which are pretty active, btw):
Its working now...When i read the index through java code and displayed them the result was Russian Characters...So indexing and searching is working..Thanx.
Do you know about SecondString20030401.jar.What it does is it compares two strings and based on algorithm you are choosing it will calculate score..Now problem is whether secondstring supports RussianCharacter or not..Thanks for replying earlier dude.
Good to hear you've got things working.
...If you mean this project: http://secondstring.sourceforge.net/
Do you know about SecondString20030401.jar.What it
does is it compares two strings and based on
algorithm you are choosing it will calculate
then no; I've never used it. But browsing through the API I see a lot of classic String algorithms (Monge-Elkan, Levenstein, Smith-Waterman, etc.) so it's not just one algorithm.
Now problem is whether secondstring supportsCan't help you there either. You'll have too ask the authors, or just try it.
RussianCharacter or not..
Thanks for replying earlierYou're welcome... dude.