1 2 Previous Next 21 Replies Latest reply: Apr 25, 2012 5:12 AM by 926113 RSS

    ODBC and UTF8 charset

    926113
      Win7 x32
      Oracle XE 11g

      How to set a UTF8 charset?

      My script is connecting via PDO ODBC: Driver={Oracle in XE};Dbq=mydb;Uid=user;Pwd=psw;

      The registry NLS_LANG is set to AMERICAN_AMERICA.AL32UTF8

      Windows locale is set to US.

      When I insert word "тест" it looks like "тест" in DB.
        • 1. Re: ODBC and UTF8 charset
          926113
          Any small hint :)
          • 2. Re: ODBC and UTF8 charset
            orafad
            Does client input use UTF-8 encoding? Otherwise nls_lang is likely wrong and should instead be set to indicate actual client char set (e.g. windows ansi code page).
            • 3. Re: ODBC and UTF8 charset
              926113
              Does client input use UTF-8 encoding?
              Yes, the php file encoding is set to UTF8 without BOM.

              When I insert from a php script encoded to windows 1251 charset it works correctly. Does it mean that DB is not set to UTF8?
              • 4. Re: ODBC and UTF8 charset
                orafad
                11.2 XE means database character set AL32UTF8 only.

                With a passthrough scenario, character codes need to be Utf-8. Verify that data source is encoded correctly.
                • 5. Re: ODBC and UTF8 charset
                  orafad
                  >
                  When I insert word "тест" it looks like "тест" in DB.
                  What tool did you use to look at that word?

                  Use Oracle SQL Developer to query, as it is unicode enabled, and in addition to verify code points stored:

                  select column, dump(column, 1016) from table where suitable_condition ...;
                  • 6. Re: ODBC and UTF8 charset
                    926113
                    What tool did you use to look at that word?
                    Oracle SQL developer.
                    and in addition to verify code points stored
                    It returns: Typ=1 Len=8 CharacterSet=AL32UTF8: d1,82,d0,b5,d1,81,d1,82
                    • 7. Re: ODBC and UTF8 charset
                      orafad
                      Codes stored represents Unicode code points U+0442 U+0435 U+0441 U+0442, which is correct per "word" input in op.

                      Looks like you have a presentation problem. Version of SQL Developer used? Verify that unicode fonts are setup.

                      Also what type of connection? In general SQL Developer uses jdbc, with which NLS_LANG does not matter.
                      • 8. Re: ODBC and UTF8 charset
                        926113
                        Looks like you have a presentation problem. Version of SQL Developer used?
                        I use v3.1.07 but actually there isn't problem with the SQL developer tool because when I use it for seeing, inserting/updating the data it works correctly. There is a problem when I try to insert or update (select ok) from a php script. I update a table from php script (which is utf8 encoded) using PDO ODBC driver but it saves the data in windows 1251. If I encode the php script to windows 1251 than it saves it in utf8. Is it possible that ODBC driver converts the charset while inserting or updating method?

                        PS: Before all I migrated all data from MySQL DB (which was utf8 encoded) to Oracle DB using SQL developer tool. I suppose it was migrated without converting the charset.
                        • 9. Re: ODBC and UTF8 charset
                          orafad
                          >
                          When I insert word "тест" it looks like "тест" in DB.
                          So, I have to repeat the question:

                          What tool did you use to look at that word?
                          • 10. Re: ODBC and UTF8 charset
                            926113
                            What tool did you use to look at that word?
                            SQL developer last version. Also by selecting this inserted word from DB in php script and outputting it in a browser (the page is set to utf8 charset).
                            • 11. Re: ODBC and UTF8 charset
                              orafad
                              923110 wrote:
                              What tool did you use to look at that word?
                              SQL developer last version. Also by selecting this inserted word from DB in php script and outputting it in a browser (the page is set to utf8 charset).
                              I'm lost. I'll try a different way of getting clarification:

                              In first post it was stated
                              ... "тест" it looks like "тест"
                              Where, exactly?

                              Assuming it is only from PHP script, then utf-8 codes are output as is but interpreted in a single-byte character set (0xD1 82 is "С‚" in win-1251 - and U+0442, Cyrillic small TE, in Utf-8).
                              Is the browser's detection of encoding working?
                              • 12. Re: ODBC and UTF8 charset
                                926113
                                Where, exactly?
                                Connections -> MyConnection -> Other users -> MyUser -> Tables -> MyTable -> Data Tab
                                Is the browser's detection of encoding working?
                                Yes

                                The migrated data is outputting well, inserted/updated data by SQL developer is also outputting well, there is only problem with new inserted/updated data from the php script.
                                • 13. Re: ODBC and UTF8 charset
                                  926113
                                  I changed windows locale from US to Russian and NS_LANG to russian_russia.cl8mswin1251

                                  So now the problem is also changed:

                                  select/insert/update NEW data using php script (utf8 encoded) is now OK.

                                  but migrated data (by SQL developer) presented in windows-1251 and can not be correctly outputted by php script in utf8.
                                  • 14. Re: ODBC and UTF8 charset
                                    orafad
                                    >
                                    there is only problem with new inserted/updated data from the php script.
                                    Ok this is new info, and probably closer to the problem.

                                    Insert a new row with character data via php and use the dump function on that row, e.g.

                                    select column, dump(column, 1016) from table where suitable_condition_to_retrieve_newly_inserted_row ...;
                                    1 2 Previous Next