9 Replies Latest reply: Mar 15, 2011 9:38 AM by user1045136 RSS

    BUG?? UTF-8 non-Latin database chars in IR csv export file not export right

    582691
      Hello,

      i have this issue: my database character set is UTF-8 (AL32UTF8) and contains data in a table used in IR that are Greek (non-Latin). While i can see them displayed correctly in IR and also via select / in Object Browser in SQL Workshop when i try to Download as csv the produced csv does not have the Greek characters exported correctly, while the Latin ones are ok.

      This problem is the same if i try IE or Firefox. Also the export in HTML works successfully and i see the Greek characters there correctly!

      Is there any issue with UTF-8 and non-Latin characters in export to csv from IRs ? Can someone confirm this, or has a similar export problem with UTF-8 DB and non-Latin characters ?
      How could i solve this issue ?


      TIA
        • 1. Re: Issue: UTF-8 non-Latin database chars in IR csv export file not export righ
          tfa
          Hi,

          Try setting the "Automatic CSV Encoding" to "Yes".

          This is done through navigating: "Shared Components", then on right-hand side of page clicking "Edit Definition", the automatic CSV encoding item is found under the "Globalization" tab.
          • 2. Re: Issue: UTF-8 non-Latin database chars in IR csv export file not export righ
            582691
            Hello,

            the Setting of "Automatic CSV Encoding" was already set to "YES", i didn't changed anything and still the characters are not exported correctly!

            Application Express : 4.0.2.00.07
            Database : 10.2.0.4.0
            NLS_CHARACTERSET : AL32UTF8
            Using: OHS 10g

            Has anyone tried this combination and see whether non-Latin characters are exported correctly?
            As i see it it looks like a bug to me, but the issue is very serious!

            TIA
            • 3. Re: Issue: UTF-8 non-Latin database chars in IR csv export file not export righ
              582691
              What also is very interesting is that the export in HTML has no problems though, as the characters are exported correctly as expected!

              Also:

              1. i tried to reproduce the issue in apex.oracle.com which has same characterset by putting some non-Latin characters on field data, but i was not able to, as there the chars are exported correctly!

              2. The settings in nls_session_paramaters and nls_database_parameters are the same between the my database and database of apex.oracle.com

              3. I checked the functionality for download as csv in a non - UTF-8 database character set and still the same problem. Also i checked in EPG 11g and OHS 10g configuration and still same problem.


              Can somebody test this too on a database other than that on apex.oracle.com if non-Latin characters in VARCHAR2 fields are exported (download) in csv correctly please ?

              Also it would be very helpful if somebody from apex team could inform how they did succeed facing this issue during working in apex.oracle.com database...


              Dionyssis

              Edited by: Dionyssis on Jan 17, 2011 4:55 PM
              • 4. BUG?? UTF-8 non-Latin database chars in IR csv export file not export right
                582691
                Well, i think i have located the problem. The issue is that the Download as csv process is creating a text file in ANSI codification. When i have UTF-8 data character set in my database and characters that do not belong to Latin or in Application Primary Language character set, these are not possible to be exported correctly in csv.

                I tested this in apex.oracle.com and the issue is reproducible there.

                How could i solved this? It looks to me as bug in APEX!

                Any suggestions ?

                This is a demonstration page in apex.oracle.com for the issue (Page 8):

                http://apex.oracle.com/pls/apex/f?p=20695:8:1977821579030462:::::


                Username: DKONTOMINAS@GMAIL.COM
                Password: mypass

                Edited by: Dionyssis on Jan 21, 2011 1:43 PM
                • 5. Re: BUG?? UTF-8 non-Latin database chars in IR csv export file not export right
                  joelkallman-Oracle
                  Hi Dionyssis,

                  Fortunately, this is not a bug in APEX.

                  Automatic CSV encoding in Application Express ultimately was implemented to work with the MS Excel localization behavior. If you're downloading Japanese characters in CSV to Excel, then the localized version of Excel expects the characters to be encoded in the local Windows character set (Shift JIS, I think). If you're downloading German characters in CSV to Excel, then the localized version of Excel expects windows-1252 encoding.

                  For Application Express, Automatic CSV Encoding is used in conjunction with the user's session language. In your case, if you go to the Globalization Attributes of your application 20695, I see that it's always English (en), which in APEX defaults to an encoding of windows-1252. I changed the application primary language to Greek.

                  Now when I download the CSV, it still looks like corrupted characters because my American Excel is expecting windows-1252, but I confirmed that the downloaded file is properly encoded in windows-1253, which should work with a Greek localized Excel.

                  Let me know if this works for you.

                  Joel
                  • 6. Re: BUG?? UTF-8 non-Latin database chars in IR csv export file not export right
                    582691
                    Hello Joel,

                    thanks for taking the time to answer to my Issue. Well this does not work for my case as the source of data (Database character set) is UTF-8. The Data inside the database that are shown in the IR on the Screen is UTF-8 and this is done correctly. You can see this in my example. The actual Data in the Database are from multiple languages, English, Greek, German, Bulgarian etc that's why i selected the UTF-8 character set when implementing the Database and this requirement was for all character data. Also the suggested character set from Oracle is Unicode when you create a Database and you have to support data from multiple languages.

                    What is the requirement, is that what i see in the IR (i mean in Display) i need to export in CSV file correctly and this is what i expect from the Download as CSV feature to achieve. I understand that you had in mind Excel when implementing this feature but a CSV is just an easy way to export the Data - a Comma Separated Values file, not necessarily to open them directly in Excel. Also i want to add here that in Excel you can import the Data in UTF-8 encoding when importing from CSV, which is fine for my customer. Also Excel 2008 and later understands a UTF-8 CSV file if you have placed the UTF-8 BOM character at the start of the file (well, it drops you to the wizzard, but it's almost the same as importing).

                    Since the feature you describe and if i understood correctly is creating always an ANSI encoded file in every case, even when the Database character set is UTF-8, it is impossible to export correctly if i have data that are neither in Latin, not in the other 128 country specific characters i choose in Globalization attributes and these data is that i see in Display and need to export to CSV. I believe that this feature in case the Database character set is UTF-8 should create a CSV file that is UTF-8 encoded and export correctly what i see i the screen and i suspect that others would also expect this behaviour. Or at least you can allow/implement(?) this behaviour when Automatic CSV encoding is set to No. But i stongly believe - and especially from the eyes of a user - to have different things in screen and in the depicted CSV file is a bug, not a feature.

                    I would like to have comments on this from other people here too.

                    Dionyssis
                    • 7. Re: BUG?? UTF-8 non-Latin database chars in IR csv export file not export right
                      joelkallman-Oracle
                      Hi Dionyssis,

                      I modified your application, changing the Application Primary Language back to English and I set Automatic CSV Encoding to No.

                      Now when I run page 8 in your application and I download to CSV, it is properly encoded in UTF-8. Of course, when my version of Excel opens it up directly, the characters appear corrupted again because my version of Excel expects windows-1252 encoding. However, if I import the data in Excel (Data -> From Text), and I choose File Origin of 65001: Unicode (UTF-8), all of the data appears correct.

                      +>> to have different things in screen and in the depicted CSV file is a bug, not a feature.+

                      I have now adjusted the settings of your application so that "what is depicted in CSV to be equal", and thus, this is not a bug in APEX.

                      Joel
                      • 8. Re: BUG?? UTF-8 non-Latin database chars in IR csv export file not export right
                        582691
                        Hello Joel,

                        thank you a lot for your response, now in my case too, i managed to export correctly too if i set it to No Automatic Encoding - it just not directly opens it in Excel correctly but this is accepted. Btw, it is not in my case bu tif you try in the same example i provided to export in PDF, it is not exported correctly.

                        Thanks again A LOT.

                        Dionyssis
                        • 9. Re: BUG?? UTF-8 non-Latin database chars in IR csv export file not export right
                          user1045136
                          Is there any way in APEX to incorporate the BOM (Byte Order Mark) to be included within the excel download? Or must I write my own custom excel download to include the BOM?