Skip to Main Content

DevOps, CI/CD and Automation

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Issue inserting UTF8 data into Oracle in windows environment.

574567Apr 26 2007 — edited Dec 14 2007
I have a UTF8 PHP application that is writing a string containing special characters to oracle through a ODBC connection. The Oracle database is setup for UTF8 support.

Here is the issue. I have a simple string, "louis de funès". When the data manually moved correctly in UTF8 the data comes up correctly. The Oracle dump() shows:

WORKING DATA:
String: louis de funès
select keywords, dump(keywords, 17) from ame_links where keywords like '%louis de %';
Typ=1 Len=15: l,o,u,i,s, ,d,e, ,f,u,n,c3,a8,s


However, when the same string is Inserted through the PHP application the data shows up in the db like this.

NOT - WORKING:
String: louis de funès
select keywords, dump(keywords, 17) from ame_links where keywords like '%louis de %';
Typ=1 Len=17: l,o,u,i,s, ,d,e, ,f,u,n,c3,83,c2,a8,
(The è character has 4 bytes associated with it)

Windows Setup:
Windows Registry: HKEY_LOCAL_MACHINE -> SOFTWARE -> ORACLE -> HOME0 ->
NLS_LANG=AMERICAN_AMERICA.UTF8

HTTP headers are set for Content-Type:text/html; charset=UTF-8.

Anyone know why I would get 2 extra bytes (83,c2) added in the middle of the è character? Is the oracle client doing some other type of character set conversion before I insert it into the database.

I have also noticed that when I change the NLS_LANG from AMERICAN_AMERICA.UTF8 to AMERICAN_AMERICA.WE8MSWIN1252 that the 4 byte 'è' character works and the 2 byte character doesn't.

Comments

mtozawa-Oracle
"c3,83,c2,a8" is the UTF8 encoding of the cp1252 code "c3,a8".
Somehow the data your PHP application passed to ODCB API was treated as WE8MSWIN1252 (cp1252 codepage) by ODCB dirver, and converted to UTF8 which is the database character set.
From this, it looks that your ODBC driver is configured for WE8MSWIN1252, and your PHP application is feeding character data in UTF8 encoding to ODBC driver.
But I see you claim that your NLS_LANG is already set to UTF8...
-
Can you post the code snippet that calls ODBC APIs?

makoto
574567
Here is a simple snippet of PHP code that producing the problem:

$invar = $_GET['invar'];

#create connection
$connect = odbc_connect("DSN", "<username>", "<password>");

# query the users table for name and surname
$query = "insert into AME_LINKS (keywords, site_id) values ('$invar', 77)";

# perform the query
$result = odbc_exec($connect, $query);

if($result){
echo "<br>Insert successful<b>";
}else{
echo "<br>Insert Failed<b>";
}

# close the connection
odbc_close($connect);

I also did a simple insert statement through "Oracle ODBC 32Bit Test" Client tool, using the same ODBC driver (DSN name) that the PHP application is using. After I did the insert the characters were saved to the database correctly.

insert into AME_LINKS (keywords, site_id) values ('louis de funès', 77)
select keywords, dump(keywords, 17) from ame_links where site_id = 77

Result:
Typ=1 Len=15: l,o,u,i,s, ,d,e, ,f,u,n,c3,a8,s

It looks like data passed directly through the ODBC driver works correctly but when its passes through the PHP ODBC library the data gets corrupted.

I'm using Oracle ODBC Driver Version: 10.02.00.03
mtozawa-Oracle
I reproduced the porblem. Only differnce from yours is that it reproduces regardless the NLS_LANG setting.
Although I cannot be 100% sure, I think what is happening is that Microsoft ODBC Driver Manager converts the insert statement from Windows ACP to Unicode, then calls SQLExecDirectW (WideChar version of SQLExecDirect) of the Oracle ODBC Driver and passes the insert statement in Unicode. Since your PHP code passed the insert statement in UTF-8 encoding, and the Driver Manager converted it from Windows ACP to Unicode, the letter e with grave ("c3,a8" in UTF-8) was wrongly converted to U+00c3, U+00a8 in the Unicode insert statement, then converted to UTF8 ( "c3,83,c2,a8") by Oracle.
By looking at the PHP source code, apparently PHP odbc_exec() function calls ANSI SQLExecDirect().
I tried to verify that the Driver Manager calls SQLExecDirectW() of Oracle ODBC Driver by enabling the ODBC Tracing, but I couldn't make it work. It genetates an emply SQL.LOG file. So my investigation stops here.
-
IF YOUR METADATA NAMES ARE ASCII ONLY, AND NON-ASCII CHARACTERS APPEAR ONLY IN DATA, you can workaround this problem by binding data. For example,
$stmt = odbc_prepare($conn, "insert into AME_LINKS (keywords, site_id) values (?, ?)");
$rs = odbc_execute($stmt, array($invar,77));

Makoto
574567
Thanks for your help on this issue. I decided to scratch the ODBC driver all together (because of these issues) and just move to OCI, which works fine writing data to the DB through PHP
613565
Hi,

i am also facing the same problem. As you have mentioned that you have used OCI, can you please help me how you have implemented it. Its urgent. Please reply ASAP.
The user probably meant they moved to use PHP's OCI8 extension, about which there are multiple books and manuals available, and any "implementation" of which would be specific to your project.

-- cj
1 - 6
Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Jan 11 2008
Added on Apr 26 2007
6 comments
16,384 views