10 Replies Latest reply on Apr 2, 2007 12:35 PM by P.Forstmann

    export in one character set and import in different

    430060
      Hi,

      Hi,

      I have one database where NLS_LANG is set to American_America.WE8ISO8859P1, If I export and import it in different database where American_America.UTF8, it throws error for some rows..

      Please note that problem comes only with those rows where whole length is cooupied.. e.g. IF I have char(10), and all 10 bytes are occupied, then I get this problem. But If I have at least 1 bvyte free, then import is successful.

      I can't change schema definition to increase the column length because of application constraints.

      Is there any workaround of this problem.

      Thanks

      Puneet

        • 1. Re: export in one character set and import in different
          358102
          What Oracle versions are you working with? What are the charactersets associated with your databases?
          • 2. Re: export in one character set and import in different
            JustinCave
            If you declare a column VARCHAR2(10), by default, Oracle allocates 10 bytes for storage. In the ISO 8859-1 character set, each character requires 1 byte of storage. In the UTF8 character set, however, characters require between 1 and 3 bytes of storage (4 bytes in the AL32UTF8 character set). If you set NLS_LENGTH_SEMANTICS to CHAR for the database, you can instruct Oracle that VARCHAR2(10) allocates 10 characters of storage. Alternately, you can declare you columns as VARCHAR2(10 CHAR).

            Justin
            Distributed Database Consulting, Inc.
            http://www.ddbcinc.com/askDDBC
            • 3. Re: export in one character set and import in different
              8046
              Just to add to this. By using Character semantics, i.e. VARCHAR2(10 CHAR), Oracle will automatically expand the column storage for you. In UTF8, this is the same as re-defining your column as VARCHAR2(30). Although this means that you will no longer get data truncation during import, it will very likey cause problems for your existing application.

              Regards

              Nat
              • 4. Re: export in one character set and import in different
                JustinCave
                Maybe I've been lucky, but I've never seen an application break when changing NLS_LENGTH_SEMANTICS. What sort of problems have you seen this cause?

                Justin
                Distributed Database Consulting, Inc.
                http://www.ddbcinc.com/askDDBC
                • 5. Re: export in one character set and import in different
                  8046
                  As mentioned previously, switching from byte semantics to character semantics is just a way of increasing the storage size of your existing column. Changing a VARCHAR2(10) column from BYTE to CHAR in a UTF8 database, is the same as altering the underlying column definition from VARCHAR2(10) to VARCHAR2(30).

                  Since most applications have some built-in assumption on the size of the column, increasing the underlying column definition may cause problems with the existing application logic. Using the example above, here are two potential problems:

                  Truncation during data retrieval - If the buffer size in the application layer is hardcoded, it will need to be increased from 10 bytes to 30 bytes.
                  Character overflow during data insertion - With the expansion to 30 bytes, it is now possible that a string (which is within the byte size boundary) may contain more than 10 characters (say 12 ASCII char), and this will cause constraint failure during insertion.

                  To resolve them requires making application changes. Based on Puneet's earlier statement - "I can't change schema definition to increase the column length because of application constraints". I am assuming that this is a no go.

                  Nat
                  • 6. Re: export in one character set and import in different
                    100225
                    I read in Note 260192.1 Changing WE8ISO8859P1 to UTF8 with ALTER DATABASE CHARACTERSET :
                    2) if you have Truncation entries : you can
                    A) update these rows so that they contain less data
                    B) update the table definition so that it can contain longer data.
                    Based on precedent discussion, solution 2B) is not possible. So the last choice is solution 2A). But is it a real alternative ? I don't think so.
                    Is there other solution ?

                    Hung
                    • 7. Re: export in one character set and import in different
                      rmhardma
                      Nat - I don't agree with your characterization that nls_length_semantics=char has Oracle essentially expanding storage to 30 for a varchar2(10 char). Although I can understand where this comes form (standard practice to triple precision when working with UTF8), I don't believe it to be quite accurate, and if not, it plays against the warning you mentioned. Take the following examples:

                      create table g11n_test1(col1 varchar2(10 char));

                      --10 ascii
                      insert into g11n_test1
                      values('abcdefghij');

                      --11 ascii
                      insert into g11n_test1
                      values('abcdefghijk');

                      /*
                      * This results in an error:
                      * values('abcdefghijk')
                      * *
                      *
                      * ERROR at line 2:
                      * ORA-12899: value too large for
                      * column "PLSQL"."G11N_TEST1"."COL1"
                      * (actual: 11, maximum: 10)
                      *
                      * The storage is limited to 10 characters regardless
                      * of byte size. This single-byte string totals 11
                      * bytes, not 30, and it still fails to be inserted.
                      */

                      -- 10 Japanese characters succeeds
                      insert into g11n_test1
                      values('アルファベットの意味');

                      -- 11 Japanese characters fails
                      insert into g11n_test1 "Bytes in 11 char string"
                      values('アルファベットの意味論');

                      /*
                      * This results in an error:
                      * values('アルファベットの意味論')
                      * *
                      *
                      * ERROR at line 2:
                      * ORA-12899: value too large for
                      * column "PLSQL"."G11N_TEST2"."COL1"
                      * (actual: 11, maximum: 10)
                      *
                      * This is where the character limitation differs from
                      * byte size limitation...here is a 4 byte character
                      * going into a varchar2(1 char) which violates the
                      * assumption that storage simply triples.
                      */

                      --4 byte character in a varchar2(1 char)
                      Create table g11n_test2(col1 varchar2(1 char));

                      Insert into g11n_test2
                      Values (unistr('\D84C\DEAB'));

                      Select lengthb(col1)
                      From g11n_test2;

                      -- this results in 4 bytes inserted into
                      -- a varchar2(1 char)

                      As was previously mentioned, I've never seen NLS_LENGTH_SEMANTICS break an application. It actually brings Oracle closer in line with storage from db's like SQL Server that uses character-like semantics by default.

                      If you believe my examples are in error, please do respond as this is a very interesting discussion to follow.

                      -Ron
                      • 8. Re: export in one character set and import in different
                        dinky
                        So what is the solution/workaround to this issue. I am facing the same problem in 10g datapump expdp/impdp.

                        Environment: Oracle 10g, WIndows 2003

                        When I import data from

                        Source: 10.1.0.2.0 WE8MSWIN1252 AMERICAN_AMERICA database
                        with nls_length_semantics BYTE
                        Target: 10.2.0.1.0 AL32UTF8 AMERICAN_AMERICA database
                        with nls_length_semantics BYTE

                        I get error for some rows and those rows do not get imported, rest everything goes fine.

                        ORA-02374: conversion error loading table "ABC"
                        ORA-12899: value too large for column MYFIELD (actual: 263, maximum: 255)
                        ORA-02372: data for row: MYFIELD : 'per e-mail received 09oct06 ..

                        I am not sure changing source table column length will cause an issue or not due to application constraints.

                        I have tried setting Target db's nls_length_semantics to CHAR but that didn't help - I still get the error. Changing Target Column's length cause other issue with triggers and doesn't import any row - So I cann't use that option. Also due to many different db migration that we need to do, I donnot think field level changes will be good option.

                        I am not sure, how safe is to use nls_length_semantics to CHAR in Source db and then try.

                        Is there any workaround/Option/advice I can use.

                        -D
                        • 9. Re: export in one character set and import in different
                          Sergiusz Wolicki-Oracle
                          I will answer a few questions in this thread:

                          (i) The solution to the data expansion issue is either shorten the strings before export or make the columns longer. There is no other solution. Redefining a column from VARCHAR2(x BYTE) to VARCHAR2(x CHAR), in the context of Export/Import is equivalent of making the column n times longer, where n is the maximum character width in bytes of the DB character set (3 for UTF8, 4 for AL32UTF8). Of course, 4000 bytes is still the upper limit. This is what natlangs meant, I guess.

                          (ii) In a wider context, redefining column col from VARCHAR2(x BYTE) to VARCHAR2(x CHAR), is equivalent of making the column n times longer plus adding a constraint CHECK(LENGTH(col)<=x).

                          (iii) Export/Import and Data Pump preserve the original length semantics of exported table columns. Therefore, setting NLS_LENGTH_SEMANTICS before import does not change anything. One has to precreate the tables using CHAR length semantics and import data only.

                          (iv) Changing column length semantics may break applications. There is certainly a class of applications that will still work well after the change, there is a class of applications that may even work better. But there is also a class of applications that will stop functioning correctly - either explicitly, reporting an error, or implicitly, silently truncating data. Note, if working with ASCII data only, or with data that is not close to column length limits, you may not notice any problems for a long time.

                          I have recently (yesterday) diagnosed a similar problem in one of our internal systems, where conversion between WE8MSWIN1252 and AL32UTF8 caused value error through DB link. SQL*Plus silently truncated a value that exceeded 4000 bytes but a PL/SQL cursor loop reported ORA-6502. The error was reported after months of error-free functioning because somebody had entered text that was 4000 characters long and contained two "smart" quotes - characters which are single-byte in WE8MSWIN1252 but multibyte in AL32UTF8.

                          Therefore, be prepared to review and test your applications thoroughly before changing the length semantics.


                          -- Sergiusz
                          • 10. Re: export in one character set and import in different
                            P.Forstmann
                            Have a look to following thread which describe a possible way to fix your problem:
                            Re: charset conversion from WE8ISO8859P1 (8.1.7.0) to AL32UTF8(9.0.1.1)