Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

AL32UTF8 verses US7ASCII

user2304762Feb 17 2015 — edited Feb 20 2015

Oracle 11g  install instructions  recommends using AL32UTF8 verses US7ASCII. Would applications built for US7ASCII transparently support AL32UTF8?

Comments

unknown-951199

user2304762 wrote:

Oracle 11g  install instructions  recommends using AL32UTF8 verses US7ASCII. Would applications built for US7ASCII transparently support AL32UTF8?

The question above is A**BACKWARDS

A database configured with AL32UTF8 characterset would transparently support an application designed on US7ASCII database.

unknown-7404

Oracle 11g  install instructions  recommends using AL32UTF8 verses US7ASCII. Would applications built for US7ASCII transparently support AL32UTF8?

You can answer that question yourself:

1. ascii is a SINGLE BYTE character set - EVERY character has EXACTLY one byte

2. AL32UTF8 is a MULTI-BYTE character set A character can take 1, 2, 3 or even 4 bytes

Can a single byte transparently support characters that require 2, 3 or even 4 bytes?

Post your answer. Then read the entire doc section at:

http://docs.oracle.com/cd/B28359_01/server.111/b28298/ch2charset.htm

In single-byte character sets, the number of bytes and the number of characters in a string are the same. In multibyte character sets, a character or code point consists of one or more bytes. Calculating the number of characters based on byte lengths can be difficult in a variable-width character set. Calculating column lengths in bytes is called byte semantics, while measuring column lengths in characters is called character semantics.

Character semantics is useful for defining the storage requirements for multibyte strings of varying widths. For example, in a Unicode database (AL32UTF8), suppose that you need to define a VARCHAR2 column that can store up to five Chinese characters together with five English characters. Using byte semantics, this column requires 15 bytes for the Chinese characters, which are three bytes long, and 5 bytes for the English characters, which are one byte long, for a total of 20 bytes. Using character semantics, the column requires 10 characters.

The following expressions use byte semantics:

23ai

>>Oracle 11g  install instructions  recommends using AL32UTF8 verses US7ASCII. Would applications built for US7ASCII transparently support >>AL32UTF8?


Oracle recommends it does not mean you must use that. If your application is suppose to work using only a single character set (i.e. english ) you do not need to use AL32UTF8.   By using AL32UTF8 you will increase the amount of space that is required to store the data.


Oracle recommends it in order to support multiple languages character set within the same database.


HTH,

Pradeep




John Stegeman

> By using AL32UTF8 you will increase the amount of space that is required to store the data.

Not true. If you use AL32UTF8 and only store single byte characters, the space is exactly the same.

jgarry

12cdb wrote:

>>Oracle 11g  install instructions  recommends using AL32UTF8 verses US7ASCII. Would applications built for US7ASCII transparently support >>AL32UTF8?


Oracle recommends it does not mean you must use that. If your application is suppose to work using only a single character set (i.e. english ) you do not need to use AL32UTF8.   By using AL32UTF8 you will increase the amount of space that is required to store the data.


Oracle recommends it in order to support multiple languages character set within the same database.


HTH,

Pradeep




But you really don't want to use US7ASCII.  Some applications will stuff in 8 bit characters, then later if you import/export Oracle will helpfully translate those to the wrong characters.  This was a classic problem for many versions.  Sometimes Oracle recommendations are worth listening to.

I personally dealt with several "we will never use anything but English" situations.  Sometimes they were just plain wrong, others were more subtle, like applications using their own counters stuck into chars.  Exp/imp would change the counters because the translations used 8 bits.

Of course, it could be worse.

Sergiusz Wolicki-Oracle

To answer your question and clarify some of the aspects mentioned in other answers:

  1. An application written for US7ASCII can work unchanged with a database created with the database character set AL32UTF8 provided that:
    1. The application really inserts ASCII characters only. That is, it does not run on a WE8MSWIN1252 (Windows) or WE8ISO8859P1 (Unix-like) platform and enters non-ASCII characters available on the client platform, such as smart quotes, copyright symbol, etc. This may happen if NLS_LANG is set to .US7ASCII (or not set at all) and no client/server conversion kicks in to change the binary codes when they are sent to/from the database -- we call it pass-through configuration. Use CSSCAN or DMU tools to verify that the database content is really US7ASCII only.
    2. Database columns for the application use the (default) byte length semantics or they use character length semantics but the application does not use byte length describe information (in OCI, ODBC, JDBC, etc.) to dynamically control the length of input data. Declared maximum byte length quadruples in an AL32UTF8 database compared to a single-byte database when columns use character length semantics.
    3. Only the application in question may enter data into the application tables. Nobody is allowed to enter non-ASCII data outside of the application only because the database character set supports it. Such data would not be legible by the application and the application could even corrupt it.
  2. Oracle recommends AL32UTF8 even for pure ASCII data, because it is much less costly to start with the right multilingual character set instead of to migrate the database later when it has accumulated giga- or terabytes of data and additional languages have to be supported. In the Internet area, it is not wise to assume that the database will only ever have to support English.
  3. US7ASCII is a particularly bad character set, because hardly any client platform is really US7ASCII. If you insist on a single-byte character set, use WE8MSWIN1252 for the database character set.
  4. ASCII data in an AL32UTF8 database occupies more space indeed but only if it is stored in CLOB columns. Data in VARCHAR2/CHAR columns occupies exactly the same space. Data in CLOB columns uses double the space (as it is stored in UTF-16).

Even if an application does not fulfill condition 1.1, it can often still be made work with AL32UTF8 database after minimal changes.

Thanks,

Sergiusz

1 - 6
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Mar 18 2015
Added on Feb 17 2015
6 comments
7,541 views