10 Replies Latest reply on Jan 29, 2017 8:48 AM by jr-phDBA

    Import data from different nls characterset

    jr-phDBA

      Hi,

       

      I've created a new database using AR8ISO8859P6, pump a one table having an issue in arabic characters. I tried to fix the issue by using DMU. NLS Characterset was altered and become AR8MSWIN1256. This resolved the garbage arabic characters.

       

      Importing another table from source(having AR8ISO8859P6) to target (new database converted to AR8MSWIN1256), arabic become garbage again for this particular table.

       

      Can I used datapump to convert the data from these two characterset ( AR8ISO8859P6 to AR8MSWIN1256)?

        • 1. Re: Import data from different nls characterset
          Sergiusz Wolicki-Oracle

          No, you cannot. You need to copy all tables into the new AR8ISO8859P6 database first and then fix the declaration with DMU. If you copy a table from the pass through AR8ISO8859P6 to a correct AR8MSWIN1256, then character codes, which are really already in AR8MSWIN1256 will be converted from AR8ISO8859P6 to AR8MSWIN1256, yielding garbage.

           

          Of course, if you copy a table from a correct AR8ISO8859P6, then you can import safely, but then no DMU is needed.

           

          Thanks,

          Sergiusz

          • 2. Re: Import data from different nls characterset
            jr-phDBA

            Sergiusz Wolicki-Oracle wrote:

             

            No, you cannot. You need to copy all tables into the new AR8ISO8859P6 database first and then fix the declaration with DMU. If you copy a table from the pass through AR8ISO8859P6 to a correct AR8MSWIN1256, then character codes, which are really already in AR8MSWIN1256 will be converted from AR8ISO8859P6 to AR8MSWIN1256, yielding garbage.

             

            Of course, if you copy a table from a correct AR8ISO8859P6, then you can import safely, but then no DMU is needed.

             

            Thanks,

            Sergiusz

            Hi Sir,

             

            Again, thank you for providing response on this issue. Is there any other way to correct the declaration in the new database I've created, the one converted from AR8ISO8859P6 to AR8MSWIN1256?

             

            We just have a few tables need to be used in Oracle BI Cloud service, and altering the whole database needs a lot of work from database and application level, including client side. Thats the fact but for now we're loooking for a workaround  for this issue.

            • 3. Re: Import data from different nls characterset
              Sergiusz Wolicki-Oracle

              If you need to move a few incorrectly encoded tables (AR8MSWIN1256 data in an AR8ISO8859P6 database) into a correctly encoded database (AR8MSWIN1256), then some possibilities are:

               

              1. Create a temporary database in AR8ISO8859P6, export and import the tables to this database, fix the character set declaration with DMU/CSREPAIR, export the tables and import them into the target database. This is a good solution for one-time migration.
              2. Create scripts to spool the content of each of the tables into a text file, e.g. with fixed field lengths. Run the scripts in SQL*Plus after setting NLS_LANG in the environment to .AR8ISO8859P6. Then, load the files into the target database with SQL*Loader after setting NLS_LANG to .AR8MSWIN1256. This way, you can prevent conversion, which is unavoidable with Data Pump.
              3. In the source database, for each table, create a view to select all columns, except that each character column char_columnN is replaced with the expression:

               

              utl_raw.cast_to_raw(char_columnN) char_columnN_raw

               

              (This works for columns up to 2000 bytes. Longer columns can be transformed to RAW only in Oracle Database 12c with parameter max_string_size set to extended -- extended data types switched on). Then, in the target database, create another view to select from the source database view via a database link. This target view should transform RAW columns back to VARCHAR2:

               

              utl_raw.cast_to_varchar2(char_columnN_raw) char_columnN

               

              Now, you can query the target view and see the source content unconverted and thus correct. This method has the advantage of being able to query any dynamically changing original tables, without making a static copy.

              With some CAST calls, you can also copy CHAR columns this way.

               

              Only the first method works well with complex types, such as CLOBs, objects, ANYDATA, VARRAYs, etc. Other methods would require much more work.

               

              Thanks,
              Sergiusz

              • 4. Re: Import data from different nls characterset
                jr-phDBA

                Sergiusz Wolicki-Oracle wrote:

                 

                If you need to move a few incorrectly encoded tables (AR8MSWIN1256 data in an AR8ISO8859P6 database) into a correctly encoded database (AR8MSWIN1256), then some possibilities are:

                 

                1. Create a temporary database in AR8ISO8859P6, export and import the tables to this database, fix the character set declaration with DMU/CSREPAIR, export the tables and import them into the target database. This is a good solution for one-time migration.
                2. Create scripts to spool the content of each of the tables into a text file, e.g. with fixed field lengths. Run the scripts in SQL*Plus after setting NLS_LANG in the environment to .AR8ISO8859P6. Then, load the files into the target database with SQL*Loader after setting NLS_LANG to .AR8MSWIN1256. This way, you can prevent conversion, which is unavoidable with Data Pump.
                3. In the source database, for each table, create a view to select all columns, except that each character column char_columnN is replaced with the expression:

                 

                utl_raw.cast_to_raw(char_columnN) char_columnN_raw

                 

                (This works for columns up to 2000 bytes. Longer columns can be transformed to RAW only in Oracle Database 12c with parameter max_string_size set to extended -- extended data types switched on). Then, in the target database, create another view to select from the source database view via a database link. This target view should transform RAW columns back to VARCHAR2:

                 

                utl_raw.cast_to_varchar2(char_columnN_raw) char_columnN

                 

                Now, you can query the target view and see the source content unconverted and thus correct. This method has the advantage of being able to query any dynamically changing original tables, without making a static copy.

                With some CAST calls, you can also copy CHAR columns this way.

                 

                Only the first method works well with complex types, such as CLOBs, objects, ANYDATA, VARRAYs, etc. Other methods would require much more work.

                 

                Thanks,
                Sergiusz

                Thank you so much Sir. I might go with the first method, but I have a question regarding using CSREPAIR. Could this script alter the nls_characterset after running?

                 

                After running the said script in my test db, it became AR8MSWIN1256.

                SQL> select * from nls_database_parameters where parameter='NLS_CHARACTERSET';

                PARAMETER                      VALUE
                ------------------------------ ----------------------------------------
                NLS_CHARACTERSET               AR8MSWIN1256

                • 5. Re: Import data from different nls characterset
                  jr-phDBA

                  Apologies Sir, just one more question...

                   

                  In case migrating to Unicode is not immediately feasible due to business or technical constraints, it would be desirable to at least correct the database character set declaration to match with the database contents

                   

                  Upon reading again  the document/link you provided in my previous thread, how can it be done above statement? Do I need to  use the same script, CSREPAIR?

                  • 6. Re: Import data from different nls characterset
                    Sergiusz Wolicki-Oracle

                    The whole point of the CSREPAIR script is to change NLS_CHARACTERSET, i.e. the database character set declaration. This "repairs" the declaration that has not been describing the database content correctly.

                     

                    Thanks,
                    Sergiusz

                    • 7. Re: Import data from different nls characterset
                      Sergiusz Wolicki-Oracle

                      I do not quite understand your question. CSREPAIR does only this -- corrects the database character set declaration to match with the database contents, without touching the contents. (As opposed to normal Unicode migration, which is to change the declaration to AL32UTF8 and convert the database content.)

                       

                       

                      Thanks,

                      Sergiusz

                      • 8. Re: Import data from different nls characterset
                        jr-phDBA

                        Sergiusz Wolicki-Oracle wrote:

                         

                        I do not quite understand your question. CSREPAIR does only this -- corrects the database character set declaration to match with the database contents, without touching the contents. (As opposed to normal Unicode migration, which is to change the declaration to AL32UTF8 and convert the database content.)

                         

                         

                        Thanks,

                        Sergiusz

                        On this link Advanced Topics in the DMU

                         

                        How can we correct the database character set declaration to match with the database contents without downtime? Since running CSREPAIR required a bounce?

                         

                        Apologies Sir, just got confused

                        • 9. Re: Import data from different nls characterset
                          Sergiusz Wolicki-Oracle

                          Theoretically, the CSREPAIR script re-initializes internal structures to use the new database character set without a database bounce. However, the Oracle Database is an extremely complex software and it is very difficult to assure that all possible places caching the database character set are properly re-initialized. Therefore, we strongly recommend to bounce the database instance.

                           

                          It is pretty difficult to change such basic parameter as the database character set without downtime. You would need a non-trivial setup with two databases, Golden Gate (or custom replication), and some custom application failover process to achieve this.

                           

                          Note that unless you use a lot of character length semantics columns, the CSREPAIR script runs very fast. (You do not need downtime to run the DMU scan.) Therefore, the downtime is minimal. A more difficult problem would be to synchronize the character set change with the required change to the NLS_LANG on all clients, unless your application is 3-tier and the NLS_LANG is set only on the application servers.

                           

                          Thanks,
                          Sergiusz

                          • 10. Re: Import data from different nls characterset
                            jr-phDBA

                            Sergiusz Wolicki-Oracle wrote:

                             

                            Theoretically, the CSREPAIR script re-initializes internal structures to use the new database character set without a database bounce. However, the Oracle Database is an extremely complex software and it is very difficult to assure that all possible places caching the database character set are properly re-initialized. Therefore, we strongly recommend to bounce the database instance.

                             

                            It is pretty difficult to change such basic parameter as the database character set without downtime. You would need a non-trivial setup with two databases, Golden Gate (or custom replication), and some custom application failover process to achieve this.

                             

                            Note that unless you use a lot of character length semantics columns, the CSREPAIR script runs very fast. (You do not need downtime to run the DMU scan.) Therefore, the downtime is minimal. A more difficult problem would be to synchronize the character set change with the required change to the NLS_LANG on all clients, unless your application is 3-tier and the NLS_LANG is set only on the application servers.

                             

                            Thanks,
                            Sergiusz

                            Than you Sir This makes clear, since after scanning the database, still the characters are the same(garbage), after bounce it fixed.

                             

                            A more difficult problem would be to synchronize the character set change with the required change to the NLS_LANG on all clients, unless your application is 3-tier and the NLS_LANG is set only on the application servers.

                            You're correct, this is one of the concern here since most of the client set their NLS_LANG on each local machine