While implementing sorting in Endeca search we came across a scenario where the sorting should include diacritic characters. Taking the example of Polish, there are characters like *"A"* and *"Ą"* starting for the product names. When we sort with the product name (A-Z sort), we had seen that the records starting with *"Ą"* are retrieved at the end. In fact the expected behavior being it should follow after the names starting with *"A"*. Please let me know if anyone has come across similar behavior and the fix made for it.
The endeca version being used is 6.1.3 and the language used is Polish.
You need to add --lang pl-u-co-standard to the dgidx and dgraph components (in ./config/script/AppConfig.xml). By default Endeca sorts using endeca collation which "sorts text with lower case before upper case and does not account for character accents and punctuation." Standard collation sorts data "according to the International Components for Unicode (ICU) standard for the language you specify". See http://docs.oracle.com/cd/E35641_01/MDEX.621/pdf/AdvDevGuide.pdf , Chapter "Using Internationalized Data" for further details.
If by 6.1.3 you mean MDEX 6.1.3 (as opposed to Platform Services) I'm not sure this sorting was available then, you would need to check the chapter listed above in the MDEX 6.1.3 documentation.
Thanks a lot for providing the information. The changes are working fine with the version 6.2.2.
MDEX Advanced Development Guide for 6.1.3 is unavailable in internet. I made similar changes in the AppConfig file for 6.1.3 too, but the changes are not working.
Looks like this standard collation feature might have been introduced in the version 6.2.0.
I can confirm that locale-specific sorting was introduced in 6.2.0 and previous releases only supported unicode sorting. If you upgraded to 6.2, it's worth looking at bumping all the way to 6.4 - the upgrade steps are the same and 6.4 has a bunch of linguistic enhancements for language support in the mdex.