Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Difference between AL32UTF8 and UTF8

1062396Feb 11 2014 — edited Feb 11 2014

Hello

Our current database version is 10g with CHARSET = UTF8 to support greek characters. However, client wants to upgrade to 11g (11.2.0.4.0) and created test database with CHARSET = AL32UTF8.

Until now to insert greek characters, we were providing script with mentioning set NLS_LANG=AMERICAN_AMERICA.UTF8 and it was inserting greek characters properly. However it is not working with new machine. If setting this parameter, script execution gives error for greek characters "quoted string not properly terminated".


I have two questions here:


1) Is there any difference between UTF8 and AL32UTF8?

2) NLS_LANG parameters works on database charset right? Which parameter i can set for NLS_LANG which allow to insert greek characters either executing script from windows or linux machine?


Thanks in advance.

This post has been answered by Dalia Sarey Eldin-Oracle on Feb 11 2014
Jump to Answer

Comments

Zoltan Kecskemethy

First this question would fit better in section.

Re 1. If you check Subsets and Supersets you can see that UTF8 is a superset of AL32UTF8. This means everything should work just fine using the new AL32UTF8 as it was working using the old UTF8. Aka AL32UTF8 has extra characters available but it has all the same as UTF8. But there is one important difference here. While UTF8 uses only 2 bytes to store data AL32UTF8 uses 2 or 4 bytes.

Re 2. Please see The correct NLS_LANG in your environment from my links below. Most importantly NLS_LANG is a client side parameter and instructs how client - server communication works. Basically defines what character encoding you use at client side so if required the encoding is converted in the communication! It is important to use the same encoding for your input data, script to get a good result. Eg. you setup UTF8 in NLS_LANG make sure your input scripts or data files are also encoded using UTF8!

I was working something similar in the past so let me give you my reference collection you may find it useful:

Oracle® Database Globalization Support Guide 10g Release 2 (10.2) - Changing the Database Character Set of an Existing Database

http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch11charsetmig.htm#sthref1476

Oracle® Database Globalization Support Guide 10g Release 2 (10.2) - Character Set Scanner Utilities

http://download.oracle.com/docs/cd/B19306_01/server.102/b14225/ch12scanner.htm#sthref1515

Character set migration best practices – An oracle white paper January 2005

http://www.oracle.com/technology/tech/globalization/pdf/TWP_Character_Set_Migration_Best_Practices_10gR2.pdf

Oracle Metalink: Changing the Database Character Set - a short overview.   Note:225912.1
https://metalink.oracle.com/metalink/plsql/showdoc?db=NOT&id=225912.1

Oracle Metalink: Changing the NLS_CHARACTERSET to AL32UTF8 / UTF8 (Unicode) Doc ID: 260192.1

https://metalink.oracle.com/metalink/plsql/showdoc?db=NOT&id=260192.1

Oracle Metalink: Database Character Set Healthcheck Doc ID: 225938.1 https://metalink.oracle.com/metalink/plsql/showdoc?db=NOT&id=225938.1

Oracle Metalink: The correct NLS_LANG in a Windows Environment Doc ID: 179133.1 https://metalink.oracle.com/metalink/plsql/showdoc?db=NOT&id=179133.1

Oracle Metalink: The correct NLS_LANG setting in Unix Environments Doc ID: 264157.1 https://metalink.oracle.com/metalink/plsql/showdoc?db=NOT&id=264157.1

Oracle Metalink: Solving Convertible or Lossy data in Data Dictionary objects when changing the NLS_CHARACTERSET Doc ID: 258904.1 https://metalink.oracle.com/metalink/plsql/showdoc?db=NOT&id=258904.1

Oracle Metalink: SYS.SOURCE$ marked as having Convertible or Lossy data in Csscan output Doc ID: 291858.1 https://metalink.oracle.com/metalink/plsql/showdoc?db=NOT&id=291858.1

Oracle Metalink: AL32UTF8 / UTF8 (Unicode) Database Character Set Implications Doc ID: 788156.1 https://metalink.oracle.com/metalink/plsql/showdoc?db=NOT&id=788156.1

Oracle Metalink: Installing and configuring Csscan in 10g and 11g (Database Character Set Scanner) Doc ID: 745809.1 https://metalink.oracle.com/metalink/plsql/showdoc?db=NOT&id=745809.1

Oracle Metalink: Csscan output explained Doc ID: 444701.1 https://metalink.oracle.com/metalink/plsql/showdoc?db=NOT&id=444701.1

Oracle Metalink: Examples and limits of BYTE and CHAR semantics usage (NLS_LENGTH_SEMANTICS) Doc ID: 144808.1 https://metalink.oracle.com/metalink/plsql/showdoc?db=NOT&id=144808.1

Oracle Metalink: SCRIPT: Changing columns to CHAR length semantics Doc ID: 313175.1 https://metalink.oracle.com/metalink/plsql/showdoc?db=NOT&id=313175.1

Answer

Hi,

Please find below answers:

--Oracle UTF8 is Unicode revision 3.0 in 8.1.7 and up. AL32UTF8 is updated with newer Unicode versions in each major release, in Oracle RDBMS 12.1 it is updated to Unicode 6.1,

Besides the difference in Unicode version the "big difference" between UTF8 and AL32UTF8 is that AL32UTF8 has build in support for "Supplementary characters" who are encoded using "Surrogate Pairs" ( also wrongly known as "Surrogate characters") .

Oracle UTF8 (Unicode 3.0) stores supplementary characters as 2 characters, totalling 6 bytes, using "modified UTF-8" instead of the "standard UTF-8" (implemented in Oracle 9.2 and up using AL32UTF8 ) of 4 bytes for a supplementary character.

This "modified UTF-8" is also called CESU-8 .

Practically this means that in 99% of the data UTF8 and AL32UTF8 are the same for *storing* data.Only supplementary characters differ in the actual stored bytes/codes between UTF8 and AL32UTF8.


--the needed settings are:

LANG=el_GR.utf8

LC_ALL= el_GR.utf8

NLS_LANG= "GREEK_GREECE.AL32UTF8"

Marked as Answer by 1062396 · Sep 27 2020
Renae Stout-Oracle

Per another user,

One of the previous comments above is incorrect. AL32UTF8 is a superset of UTF8 according to https://docs.oracle.com/cd/B19306_01/server.102/b14225/applocaledata.htm#sthref1988 not vice versa.

1 - 3
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Sep 5 2024
Added on Feb 11 2014
3 comments
223,021 views