Forum Stats

  • 3,758,595 Users
  • 2,251,417 Discussions
  • 7,870,258 Comments

Discussions

How can I load data from AL32UTF8 charset into database using WE8ISO8859P1?

User_JC4YX
User_JC4YX Member Posts: 8 Green Ribbon

I am trying to load data from a file coming from an AL32UTF8 charset source to our file server which then we have to upload to our EBS database using SQL Loader. Our EBS database is charset is set as WE8ISO8859P1 NLS_CHARACTERSET and NLS_LANGUAGE is American. I am submitting a EBS concurrent request that runs a sql loader file. When a diacritic is in the file, the record fails. It appears that a character is added when this happens. 'JOSÉ' becomes 'JOSÉ'. And no matter what we have tried, 'AMPLIACIÓN COLONIA' is converted to 'AMPLIACIÿN COLONIA'. Our DBAs say that changing the charset in either database is a not an option. To confirm this is what is causing the issue, I did edit the file and replaced the accented characters to non-accented ones. The program ran without error and loaded all of the records when there is no accents. Here is the message from the log file from the concurrent request: (The Column it is giving in the error is not the column with the accent) 

  

Record 7: Rejected - Error on table TABLE_NAME, column COLUMN. 

Invalid zoned decimal byt TABLE_NAME, column COLUMN.e. 

Record 5838: Rejected - Error on tab 

Invalid zoned decimal lebyte. 

 

Table TABLE_NAME: 

  6273 Rows successfully loaded. 

  2 Rows not loaded due to data errors. 

  0 Rows not loaded because all WHEN clauses were failed. 

  0 Rows not loaded because all fields were null. 

 

So far my team has tried the following things in the SQL LOADER file without success. 

"TRANSLATE(:column_name USING NCHAR_CS)" and "TRANSLATE(:column_name USING CHAR_CS)" 

TRANSLATE(:column, 'ÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕØÚÙÛÜ', 'AAAAAACEEEEIIIINOOOOOOUUUU'); 

convert(:column,'WE8ISO8859P1') and convert(:column,'WE8ISO8859P1','AL32UTF8') - the program ran without error, but the data was not inserted into the database correctly. It inserted ******* *******R¿ K**** instead of ******* *******RÍA K**

-REPLACE(:column, 'Ó', 'O') 

utl_raw.cast_to_varchar2((nlssort(:column, 'nls_sort=binary_ai'))) - this resulted in more errors 

 

We also tried setting NLS_LANG which resulted in program error 

-NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1 

-NLS_LANG=AMERICAN_AMERICA.AL32UTF8 

-NLS_LANG=AMERICAN_AMERICA.UTF8 

and setting CHARACTERSET which didn't work 

-CHARACTERSET WE8ISO8859P1 

-CHARACTERSET AL32UTF8 

-CHARACTERSET UTF8 

 

We have tried the following that works in TOAD, but doesn’t work when ran through EBS: 

select TRANSLATE(convert(upper('string causing error in the file'),'WE8ISO8859P1'),'ÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕØÚÙÛÜ', 'AAAAAACEEEEIIIINOOOOOOUUUU') from dual 

select TRANSLATE(convert(upper('string causing error in the file'),'US7ASCII'),'ÁÀÂÄÃÅÇÉÈÊËÍÌÎÏÑÓÒÔÖÕØÚÙÛÜ', 'AAAAAACEEEEIIIINOOOOOOUUUU') from dual 

SELECT regexp_replace(regexp_replace('string causing error in the file','[[=O=]]+','O' ),'[[=E=]]+','E' ) FROM dual; 

 

Any help on fixing this is appreciated.  

Best Answer

«1

Answers

  • KayK
    KayK Member Posts: 1,679 Bronze Crown

    Hi JC,

    "Invalid zoned decimal lebyte. " sounds like a numeric column, right ?

    Maybe there is a mismatch with decimal point and decimal comma.

    regards

    Kay

  • User_JC4YX
    User_JC4YX Member Posts: 8 Green Ribbon
    edited Sep 23, 2021 2:07PM

    The error occurs on records with an accent mark in a column before the numeric column. When the character with the accent gets converted to our database character set, an extra character is added and shifts everything after it to the right on place.

    Regardless of what we have tried(Translate, Convert, replace, etc.), EBS seems to ignore them. The SQL Loader log shows the commands we have tried each time, but the error has continued. The output shows the 2 records with an additional character added. 'JOSÉ' becomes 'JOSÉ' and 'AMPLIACIÓN COLONIA' becomes 'AMPLIACIÿN COLONIA'

  • Paulzip
    Paulzip Member Posts: 8,450 Blue Diamond
    edited Sep 24, 2021 2:10AM

    The problem you have is there are no equivalent to some AL32UTF8 accented characters in WE8ISO8859P1, because AL32UTF8 is not a subset of WE8ISO8859P1. And to be honest, pretty much every sensible company in the world now uses AL32UTF8 (Oracle's default NLS charset on installs) as it covers you for every character eventuality. Most companies invested the time converting to AL32UTF8 years ago.

    The reason JOSÉ (AL32UTF8) becomes 'JOSÉ (WE8ISO8859P1)

    É (AL32UTF8) = Multibyte = 0xC389

    0xC3 (WE8ISO8859P1) = 195 = Ã (A tilde)

    0x89 (WE8ISO8859P1) = 137 = ‰ (per mille sign)

    Convert will yield non sensible replacement characters for all situations, so won't solve your problem.

    So, off the top of my head your choices :

    • Accept problems
    • Convert your EBS to AL32UTF8
    • Import to NVarchar2 columns
    • Use a pre-processor on the file (e.g. iconv)


  • mathguy
    mathguy Member Posts: 10,081 Blue Diamond

    I don't even know what EBS stands for (of course I can Google, but that's not the point), and I only used SQL Loader a few times (and probably not very well).

    With that said: While it is true that you can't convert all Unicode characters to WE8ISO8859P1 (since the former includes many more characters than the latter), characters that are included in both sets should be converted correctly. Many if not all your characters seem to be included in WE8ISO8859P1.

    What Paulzip describes is telling: Whatever process is accepting your inputs and storing them in the database thinks that "what comes in must go out as is" (without any processing - translation, conversion, call it what you like). The bytes from the input are mirrored exactly as they are in the output.

    This, to me, suggests that "the process" (EBS, or SQL Loader, or whatever) doesn't know that the input is UTF8. A process that knows that the input is UTF8 and the output is WE8ISO8859P1 will copy ASCII characters exactly as they are, and for other characters it will convert them to the smaller charset if they are supported, and then it will do who knows what with the rest. (Either default them to some value, or throw an error, or some other behavior.) E-acute is supported by WE8ISO8859P1 so it should be translated correctly (and possibly all the other characters in your file, too). You may get more help from EBS and/or SQL Loader experts on this: I would continue to investigate the correct way to communicate to "the process" the character sets for the incoming data and for the db, so that the translation does occur. As Paulzip demonstrated, no such translation seems to be taking place right now.

    Another, less likely, reason I had thought of (but it is now almost certainly not what's happening) is that different characters sometimes have the same graphical representation. For example in the Cyrillic alphabet the letter B stands for the sound "v" (the sound that in English we represent by "v", that is). In Unicode, the Cyrillic alphabet has its own encoding, separate from ASCII. A letter may appear as "B" but if it is from the encoding of the Cyrillic alphabet, it may or may not be translated "correctly" to WE8ISO8859P1. (I put "correctly" in quotes, because the charset does not, in fact, support the cyrillic alphabet; "converting" to Latin "B" may superficially look reasonable, but it isn't.) Anyway, this is very likely not what is happening in your case.

  • Paulzip
    Paulzip Member Posts: 8,450 Blue Diamond

    Thinking about what you've tried, the correct recommended character set for SQL Loader in your scenario is :

    CHARACTERSETNAME= UTF8

    You'll need to set the client NLS character set to :

    NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1

    Try that without covert or translate and report back. That should give you the best result, but you may still hit accent conversion issues (I think some accented characters like from Turkish, Slavic etc, aren't supported in WE8ISO8859P1) as I touched upon in my previous reply.

  • cormaco
    cormaco Member Posts: 1,679 Bronze Crown

    According to the doc, setting the CHARACTERSET parameter takes precedence over NSL_LANG:

    Specifying the CHARACTERSET parameter tells SQL*Loader the character set of the input data file.


    The default character set for all data files, if the CHARACTERSET parameter is not specified, is the session character set defined by the NLS_LANG parameter. Only character data (fields in the SQL*Loader data types CHAR, VARCHAR, VARCHARC, numeric EXTERNAL, and the datetime and interval data types) is affected by the character set of the data file.

    Also the NLS_LANG setting is about the client characterset, not the database characterset.

  • User_JC4YX
    User_JC4YX Member Posts: 8 Green Ribbon
    edited Sep 24, 2021 2:09PM

    Paulzip,

    Thanks for you responses. I get an error when I try NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1 in the sql loader file. Where do I need to set that?

    This is what I tried in the sql loader:

    NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1

    load data 

    CHARACTERSET UTF8

    INFILE ...

    INTO TABLE ....


    This is the error I'm getting:

    SQL*Loader-350: Syntax error at line 16.
    Expecting keyword LOAD, found "NLS_LANG".
    NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1
    

    Thanks for your help!

  • cormaco
    cormaco Member Posts: 1,679 Bronze Crown

    NLS_LANG is not a SQL Loader parameter, it is defined on the OS level.

    You can read about it here:

    3.2 Choosing a Locale with the NLS_LANG Environment Variable

  • User_JC4YX
    User_JC4YX Member Posts: 8 Green Ribbon

    The SQL*Loader is a ctl file I am running through Oracle EBS (E-Business Suite) concurrent request. Where would I set the NLS_LANG?

    Also, Paulzip mentioned importing to NVarchar2 columns or using a pre-processor on the file. How would I do this?

    Thanks.

  • Paulzip
    Paulzip Member Posts: 8,450 Blue Diamond

    NLS_LANG is an environment variable in your client operating system where you are running SQL*Loader files in from. It's nothing to do with Oracle.

    I suspect once you set that, along with the CHARACTERSET UTF8 in your ctl file, your problem will be mostly sorted.