Issue inserting UTF8 data into Oracle in windows environment.
574567Apr 26 2007 — edited Dec 14 2007I have a UTF8 PHP application that is writing a string containing special characters to oracle through a ODBC connection. The Oracle database is setup for UTF8 support.
Here is the issue. I have a simple string, "louis de funès". When the data manually moved correctly in UTF8 the data comes up correctly. The Oracle dump() shows:
WORKING DATA:
String: louis de funès
select keywords, dump(keywords, 17) from ame_links where keywords like '%louis de %';
Typ=1 Len=15: l,o,u,i,s, ,d,e, ,f,u,n,c3,a8,s
However, when the same string is Inserted through the PHP application the data shows up in the db like this.
NOT - WORKING:
String: louis de funès
select keywords, dump(keywords, 17) from ame_links where keywords like '%louis de %';
Typ=1 Len=17: l,o,u,i,s, ,d,e, ,f,u,n,c3,83,c2,a8,
(The è character has 4 bytes associated with it)
Windows Setup:
Windows Registry: HKEY_LOCAL_MACHINE -> SOFTWARE -> ORACLE -> HOME0 ->
NLS_LANG=AMERICAN_AMERICA.UTF8
HTTP headers are set for Content-Type:text/html; charset=UTF-8.
Anyone know why I would get 2 extra bytes (83,c2) added in the middle of the è character? Is the oracle client doing some other type of character set conversion before I insert it into the database.
I have also noticed that when I change the NLS_LANG from AMERICAN_AMERICA.UTF8 to AMERICAN_AMERICA.WE8MSWIN1252 that the 4 byte 'è' character works and the 2 byte character doesn't.