13 Replies Latest reply: Feb 29, 2012 10:14 AM by Sergiusz Wolicki-Oracle RSS

    Multiple Language Support

    449681
      Hi,

      Currently we support only English but we have been given a requirement to support English, German, French, Spanish, Italian and Japanese.

      Please share what all will be needed to bring this change from database perspective only.

      Current database setting is as follows.
      =======================

      NLS_LANGUAGE AMERICAN
      NLS_TERRITORY AMERICA
      NLS_CURRENCY $
      NLS_ISO_CURRENCY AMERICA
      NLS_NUMERIC_CHARACTERS .,
      NLS_CHARACTERSET AL32UTF8
      NLS_CALENDAR GREGORIAN
      NLS_DATE_FORMAT DD-MON-RR
      NLS_DATE_LANGUAGE AMERICAN
      NLS_SORT BINARY
      NLS_TIME_FORMAT HH.MI.SSXFF AM
      NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
      NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
      NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
      NLS_DUAL_CURRENCY $
      NLS_COMP BINARY
      NLS_LENGTH_SEMANTICS BYTE
      NLS_NCHAR_CONV_EXCP FALSE
      NLS_NCHAR_CHARACTERSET AL16UTF16
      NLS_RDBMS_VERSION 11.2.0.1.0

      Thanks,
      Ankit.
        • 1. Re: Multiple Language Support
          Sergiusz Wolicki-Oracle
          Nothing. The database character set is AL32UTF8 and this is enough. All other considerations are application/schema design issues, not DBA issues.

          One more requirement may exist if applications want ORA-xxxxx error messages in translated form. Then, you need to make sure that relevant translations are installed in your database (and client) Oracle Homes. But note that I do not recommend having database messages in translated form. Database error messages should never be shown to end users. They should be intercepted by applications and reported in some user-friendly manner. The message itself should only go to a log file and then standardizing on English is better from support perspective (when some global English-speaking support personnel at customer site or at Oracle needs to take care of the issue).


          -- Sergiusz
          • 2. Re: Multiple Language Support
            Srini Chavali-Oracle
            Pl also review these MOS Docs

            AL32UTF8 / UTF8 (Unicode) Database Character Set Implications          (Doc ID 788156.1)
            What languages are supported in an Unicode (UTF8/AL32UTF8) database?          (Doc ID 1051824.6)
            Master Note - RDBMS NLS / Globalization (character set, dates, timestamps, etc...)          (Doc ID 1259364.1)

            HTH
            Srini
            • 3. Re: Multiple Language Support
              orafad
              And, of course, the relevant docs, starting with chapter 6 and an Overview of Unicode:

              http://download.oracle.com/docs/cd/E11882_01/server.112/e10729/ch6unicode.htm
              • 4. Re: Multiple Language Support
                834077
                Thanks everyone for the reply.

                I was told that the nls_length_semantics should be changed from BYTE to CHAR for supporting latin languages. It might be an apps requirement but need to be done at DB level. Is it true ?

                Thanks,
                Ankit.
                • 5. Re: Multiple Language Support
                  Sergiusz Wolicki-Oracle
                  Do not change NLS_LENGTH_SEMANTICS to CHAR at INIT.ORA level. This is a very bad idea. Applications should not require it. They should specify the length semantics in CREATE TABLE statements or in an ALTER SESSION statement.


                  -- Sergiusz
                  • 6. Re: Multiple Language Support
                    JustinCave
                    S. Wolicki, Oracle wrote:
                    Do not change NLS_LENGTH_SEMANTICS to CHAR at INIT.ORA level. This is a very bad idea. Applications should not require it. They should specify the length semantics in CREATE TABLE statements or in an ALTER SESSION statement.
                    Sergiusz,

                    Why do you say that it is a bad idea to set NLS_LENGTH_SEMANTICS to CHAR in the parameter file?

                    From a data modeling perspective, if I know that I am always going to want character semantics (because I have never come across a business entity that was described with a length in bytes), why wouldn't I change the default semantics to CHAR? Otherwise, I always have to remember to specify character length semantics, which means that I inevitably miss a handful of places. And any time a developer uses a GUI or a script that isn't sensitive to the CHAR/ BYTE semantics to reverse engineer some DDL, they have to make sure that they re-apply any CHAR qualifiers that have been lost.

                    The first thing I look at when I start working on a system that is going to require a variable length character set is what the NLS_LENGTH_SEMANTICS is set to. And the first request I make is almost always to change that to CHAR. I've never come across a situation where I regretted that so I'm very curious what I'm missing.

                    Justin
                    • 7. Re: Multiple Language Support
                      Sergiusz Wolicki-Oracle
                      Justin,

                      The issue is compatibility. Most application schema installation scripts, including Oracle's own scripts, do not take care of setting the semantics as they need/expect it. At the same time, most applications still expect byte semantics. Therefore, if you run such a "legacy" script in a multibyte database, you may create columns that can store more bytes than some statically allocated application buffers can hold, possibly introducing issues such as unexpected "value too long" errors, silent truncations or even buffer overflows. You do not usually encounter problems, because length semantics is never an issue in single-byte databases, and even in multibyte databases the issues show up only if larger multibyte data is entered outside of the main application.

                      The reverse engineering argument is not convincing because length semantics is always mixed in a database that has at least one character length column (for example, because most data dictionary tables or Oracle and third-party application schemas still use byte semantics). Therefore, whatever is the default semantics, some reverse-engineered DDL may still be wrong.

                      My strong recommendation is to leave the initialization parameter as BYTE, to play safe in respect to "legacy" software, and use explicit length qualifiers in CREATE TABLE statements. Explicit qualifiers properly document the expected semantics.


                      -- Sergiusz
                      • 8. Re: Multiple Language Support
                        JustinCave
                        Sergiusz,

                        OK. So your comment is in reference to packaged applications rather than custom applications, right?

                        If we're talking about a packaged application, then I'm in agreement that you'd need to consult with the application vendor to see how the vendor recommends configuring the environment to support multilingual data.

                        If we're talking about a custom developed application, on the other hand, that was developed to support English and is not being extended to support additional languages, I find it easier to convert the entire application schema to CHAR length semantics and to set the database NLS_LENGTH_SEMANTICS to CHAR. Front-end applications are likely going to need attention anyway because they inevitably have a few lines that assume that 1 character = 1 byte or don't support a Unicode character set internally. But I'd still rather deal with those buffer length issues (which have to be tested anyway) rather than trying to get business buy-off on byte-length field length limitations.

                        I could see if the custom application was being minimally retrofitted to handle Unicode-- i.e. there was only one table that needed to store non-English data-- that it would make sense to leave NLS_LENGTH_SEMANTICS at BYTE and make sure that developers understand the CHAR/ BYTE semantics distinction for that one table. But I'd much rather be consistent about it at least within the application schema. Most apps aren't doing a lot of data dictionary queries and that data is going to be English anyway, so the fact that the application has to handle mixed semantics is relatively immaterial.

                        Justin
                        • 9. Re: Multiple Language Support
                          Sergiusz Wolicki-Oracle
                          Justin,

                          My comment is in reference to a database. You do not necessarily have guarantee what applications, ad-hoc tools, etc., may be installed in the database -- if not today than maybe in half a year.

                          As far as custom applications are concerned that you want to internationalize -- independently of whether you want to use byte semantics or character semantics (this is not really that relevant), a quick conversion to character semantics can be achieved by adding an ALTER SESSION statement at the beginning of each of the scripts (or in the associated installer code, if the scripts happen to be in a custom format and hidden among other installation stuff). This has the benefit of associating the expected semantics with the scripts (effectively: DDL statements) and not with some database configuration where they happen to be executed (new one, old one, test, cloned, just consolidated, whatever...).


                          -- Sergiusz
                          • 10. Re: Multiple Language Support
                            user2907418
                            Sergius,

                            Sorry this is an old thread but..

                            I appreciate your thoughts on this and I understand you had oracle change the wording in the docs to advise users to stay with byte length semantics.

                            However, many people including me are moving (via import/export to a new database) from single byte to multibyte and following oracle docs to do this. They suggest you import rows=n, change all byte length character fields to char length semantics, then import the data.

                            What you then have is a new database instance with all your application data using char length.

                            When developers start working on that database again, if you don't set the database global semantics param to char, they WILL be times they forget to do so and you'll end up with a mix of byte and char and some very confused developers.

                            Surely in this case it's best to set the global parameters to char length semantics and change the session semantics to byte for legacy code?

                            Thanks.

                            Edited by: user2907418 on 28-Feb-2012 03:32

                            Edited by: user2907418 on 28-Feb-2012 03:32

                            Edited by: user2907418 on 28-Feb-2012 03:34
                            • 11. Re: Multiple Language Support
                              Sergiusz Wolicki-Oracle
                              ## When developers start working on that database again, if you don't set the database global semantics param to char, they WILL be times
                              ## they forget to do so and you'll end up with a mix of byte and char and some very confused developers.

                              When developers create new tables, they should get into habit of defining semantics directly in CREATE TABLE statements in any new scripts they prepare. Semantics can usually be corrected easily with ALTER TABLE MODIFY, so even if developers happen to forget setting the semantics correctly, they can quickly repair the schema.

                              Anyway, the parameter exists and it is supported, so as long as you know what you are doing and you remember to change the parameter appropriately before installing byte semantics or character semantics schemas (including Oracle installation scripts), then set the parameter in init.ora. I still consider it bad idea.


                              -- Sergiusz
                              • 12. Re: Multiple Language Support
                                user2907418
                                Good points. I still think it makes sense to set a default to the option you want and would choose 99% of the time. Which is usually char for a UTF8 database. Provided you know, as you say, that some old application schemas don't know about char length semantics and would need set to byte before use.

                                However, it would be a concern if there are up to date oracle scripts supported to be used on 11g, which don't work with char length. If that was the case you could argue Oracle don't support the global char length parameter. That would be a compelling reason not to use char length semantics, and is the key to making the decision for me personally.

                                Thanks.

                                Edited by: user2907418 on 29-Feb-2012 03:42
                                • 13. Re: Multiple Language Support
                                  Sergiusz Wolicki-Oracle
                                  I am pretty sure that there are quite a few Oracle scripts for use in 11g that will create incorrect tables if run with NLS_LENGTH_SEMANTICS=CHAR in init.ora. Most data dictionary scripts are fine because the SYS owner does not obey NLS_LENGTH_SEMANTICS at all, while some other scripts were modified to include an explicit ALTER SESSION SET NLS_LENGTH_SEMANTICS=BYTE. But there are still many scripts that were not updated. Each such script you find means a bug that you can file, but this does not change the recommendation.


                                  -- Sergiusz