6 Replies Latest reply: Dec 3, 2013 5:27 PM by Sergiusz Wolicki-Oracle RSS

    SQLLOAD with various Charactersets




      I am working on an Oracle EBS project (version 11i) with Oracle DB 11G hosted on Oracle Linux.

      Character set is UTF8 on the database since we have some Polish users.


      Our ERP is logically interfaced with many systems from which we receive ASCII datafiles that we need to upload in our DB using SQLLOAD utility.


      The problem we have is that depending on the sending system, the characterset of a datafile can vary among following values :

      - EE8MSWIN1250 => files sent by our Polish Subsidiary

      - WE8MSWIN1252 => files sent by WINDOWS systems

      - WE8ISO8859P1 => files sent by some Unix Systems


      We have developped a specific Linux shell that submits SQLLOAD for datafiles with the appropriate control file "CHARACTERSET" option.

      The problem is that until now I was not able to detect precisely the character set of a given datafile.

      - the Linux command "file -i" returns "text/plain; charset=iso-8859-1" even for a windows file encoded with WINDOWS-1252 or WINDOWS-1250

      - I also tried the linux command iconv to convert the file to UTF8 but this command is successfull whatever the "from" characterset we specify (ISO-8859-1 / WINDOWS-1252 / WINDOWS-1250)


      My Question :

      How can I determine precisely the characterset of a given ASCII datafile in order to set correctly the CHARACTERSET option of SQLLOAD control file ?

      (in batch mode on Linux)

      Browsers as IE, Chrome or Firefox are able to do that (detect the character set of a web page to display it correctly) so I suppose that a tool or command should exist for that purpose.


      Thanks in advance for helping and sharing experience.

      Karim Helali

      Toshiba France

        • 1. Re: SQLLOAD with various Charactersets
          Sergiusz Wolicki-Oracle



          There is no fully reliable way to detect the character set of a file.  You can check how well the task is performed by the lcsscan utility, which you will find in your database Oracle Home. This utility uses statistical analysis to guess the character set and language of text but results are not always good. See documentation here: http://docs.oracle.com/cd/E16655_01/server.121/e17750/ch11charsetmig.htm#NLSPG982


          My recommendation is to imply the character set of the file based on its source. You should have a protocol (a contract) with providers of those files to get the files in prescribed character sets. Then you can use lcsscan as a quality verification tool to warn you about possible violations of the protocol.





          • 2. Re: SQLLOAD with various Charactersets

            Thank you Sergiusz : the lcsscan tools gives quite good results and it may be a solution for us.

            The only issue is that lcsscan is only available on last oracle DB releases (10 and 11 ).

            Although our database server is on release 11G, the EBS applications server is on oracle 8i due to Oracle Forms restrictions.

            As the SQLLOAD is run from the applications server, I have to run the lcsscan tool by SSH on the DB server


            So I let the question open a few days again in case someone knows a Linux command or tool that does the same control as lcsscan

            Note: We are also considering the other solution you mention eg to assign to each sending system an agreed characterset .

            Thank you again and best regards

            Karim Helali

            • 3. Re: SQLLOAD with various Charactersets
              Srini Chavali-Oracle

              If you can NFS mount the database binaries on the Apps server, you can use the 11g version of sql*loader. See these MOS Docs






              • 4. Re: SQLLOAD with various Charactersets
                Sergiusz Wolicki-Oracle

                You should double-check it with your account manager (sales person) from Oracle but I am pretty convinced you can install an additional Oracle Client matching your database server version (or even a newer version, if you have a Support contract) on the application server host. Then, set (and export) ORACLE_HOME to the path of the new Oracle Client home in a shell script and run lcsscan. An Oracle Client installation is not small but you have a GB or two spare, don't you? ;-)


                As I said, I strongly recommend a protocol + lcsscan. Neither of the two methods alone is foolproof, though a protocol is usually more reliable, if file generation is done automatically at source.




                • 5. Re: SQLLOAD with various Charactersets

                  Our issue is now solved by applying following logic:

                  If character set returned by Linux command file -i is utf-8

                        then the file is uploaded with oracle charset UTF8

                  else if the file is sent by our Polish company  (very few files easy to recognize by the filename prefix)

                        then the file is uploaded with oracle charset EE8MSWIN1250


                       the file is uploaded with oracle charset WE8MSWIN1252


                  So now we use WE8MSWIN1252 as default characterset rather than WE8ISO8859P1.

                  We have much less issues because, af far as I understand, WE8MSWIN1252 is a superset of WE8ISO8859P1.

                  Thank you Sergiusz and Srini for your help.

                  Best Regards

                  Karim Helali

                  Toshiba France

                  • 6. Re: SQLLOAD with various Charactersets
                    Sergiusz Wolicki-Oracle

                    Yes, WE8MSWIN1252 is a binary superset of WE8ISO8859P1. Whatever is a valid byte sequence in WE8ISO8859P1 is also a valid byte sequence in WE8MSWIN1252 with the same meaning.