0 Replies Latest reply: Feb 19, 2014 7:38 AM by user2037288 RSS

    Does anyone have experience supporting Latin to and from other, like Greek, Cyrillic etc?




      We are supporting a customer master database (in AL32UTF8) that serves as the 'heart' of other systems that are in need of customer profile/address date, something like Oracle Customer Hub. This application stores main customer data for a wide variety of countries, among others Netherlands, Great Britain, Germany, but also Greece, Russia, Scandinavian countries, Israel etc. Now especially with the Israeli, Greek and Russian data, we are facing the need to support both ways of writing names from customers in these countries: Latin versus Cyrillic or Latin versus Modern Greek, Latin versus Hebrew. In short: Transliteration. (The word Transliteration comes from Latin transliteratus (trans- "across" + littera "letter"). Transliteration is the method of representing letters or words of one alphabet in the characters of another alphabet or script).


      We know there are transliteration rules to 'translate' between the two (but may be not for all), so that we would be able to support a search by Modern Greek search value and still find the customer even if it is stored physically in our database in Latin characters (or the other way around). We are looking for ways to support this.


      Will we:

      * even be able to transparently translate between two different ways of script (are there algorithms existing and exposed that are good enough to translate back and forth? And if so, does anyone know of available PL/SQL or Java classes we can try out?)

      * would you store data in both ways or would you store it only in one way (for example always Latin) and translate on the fly (if needed for performance using Function Based indexes)? Or are the algorithms not deterministic enough to derive on the fly?

      * Does Oracle Text support such intelligence?

      * Or does this always need a third party application that has built-in all intelligence between them, even in stored lists like Thesauruses?


      I have seen Oracle Enterprise Data Quality is able to do things like we want (http://www.oracle.com/webfolder/technetwork/data-quality/edqhelp/Content/processor_library/transformation/transliterate.htm), but the entire product may be over the top for the things we want to start with. We are in first place wondering if anyone ever did implement such functionality and look for good ideas (if a do-it-yourself approach is even a path we should take).


      Any comment if constructive is welcome.