6 Replies Latest reply on Aug 10, 2015 1:14 PM by stueckl

    How to load diacritics in non-UTF-8 csv files with SQLcl-4.2.0.15.177.0246 (Windows 2012 R2)

    stueckl

      Hi,

       

      I'm trying to load non-UTF-8 files with SQLcl, but weren't successful so far ... please advise.

      BTW: is there a manual available for SQLcl (especially as "help load" just shows a blank line )?

       

      As far as I can reproduce it, SQLcl ignores the jvm option "file.encoding" completely regarding "load"

       

      Basically I ran into four problems:

      1. "SQLcl load" only works with UTF-8 files
      2. file.encoding is also used for the SQL script (well ... THAT may be intended)
      3. CSV with  "",  as a quoted text cannot be loaded
      4. CSV with ""  as a quoted text is not correctly interpreted

       

      I created a little test case to describe the problem ...

      Please note that although the testcase DOES spool data, this is just for the test case ... the data i need to import are provided from an external source.

       

      The table I used for the testcase

      create table ZCBRMDST_OneColumn (onecolumn varchar2(20 char));
      
      

       

      The commands to enter test data into the test table (needed, because "file.encoding" is also used for loading the script - thus you need an alternate tool to fill the column):

      truncate table SpeedySearch.ZCBRMDST_OneColumn;
      insert into SpeedySearch.ZCBRMDST_OneColumn values (' - "äöü",ÄÖÜ,ß,€');
      commit;
      
      

       

      The script (diacritics.spool.sql) to spool the data from the test table (as is, without comma, without quotes, without commas and quotes):

      set feedback off
      set echo off
      spool diacritics.csv.quotecomma
      select /*csv*/ 'qc' || oneColumn oneColumn from SpeedySearch.ZCBRMDST_OneColumn;
      spool off
      spool diacritics.csv.comma
      select /*csv*/ 'c' || replace(oneColumn, '"') oneColumn from SpeedySearch.ZCBRMDST_OneColumn;
      spool off
      spool diacritics.csv.quote
      select /*csv*/ 'q' || replace(oneColumn, ',') oneColumn from SpeedySearch.ZCBRMDST_OneColumn;
      spool off
      spool diacritics.csv.pure
      select /*csv*/ 'p' || replace(replace(oneColumn, '"'), ',') oneColumn from SpeedySearch.ZCBRMDST_OneColumn;
      spool off
      exit;
      

       

      The script was run with several "file.encoding" values ... UTF-8, ISO-8859-1, ISO-8859-15, IBM850 (resp. Cp850).

      The file encodings of the four files were checked (with Notepad++, iconv from CygWin, type, cat, ...).

       

      Then the files were imported back with this script (diacritics.load.sql):

      set feedback off
      set echo off
      set sqlformat ansiconsole
      truncate table SpeedySearch.ZCBRMDST_OneColumn;
      prompt importing diacritics.csv.quotecomma
      load SpeedySearch.ZCBRMDST_OneColumn diacritics.csv.quotecomma
      select * from SpeedySearch.ZCBRMDST_OneColumn;
      prompt importing diacritics.csv.comma
      load SpeedySearch.ZCBRMDST_OneColumn diacritics.csv.comma
      select * from SpeedySearch.ZCBRMDST_OneColumn;
      prompt importing diacritics.csv.quote
      load SpeedySearch.ZCBRMDST_OneColumn diacritics.csv.quote
      select * from SpeedySearch.ZCBRMDST_OneColumn;
      prompt importing diacritics.csv.pure
      load SpeedySearch.ZCBRMDST_OneColumn diacritics.csv.pure
      select * from SpeedySearch.ZCBRMDST_OneColumn;
      exit;
      
      

       

      This script was run with the same "file.encoding" as the spool-script.

      e.g.

      java -Djava.awt.headless=true -Xss10M -Dfile.encoding=IBM850 -jar "%SQL_HOME%\lib\oracle.sqldeveloper.sqlcl.jar" <user>/<pwd>@//<server>:<port>/<instance> @diacritics.spool.sql
      java -Djava.awt.headless=true -Xss10M -Dfile.encoding=IBM850 -jar "%SQL_HOME%\lib\oracle.sqldeveloper.sqlcl.jar" <user>/<pwd>@//<server>:<port>/<instance> @diacritics.load.sql
      

       

       

      The results are as follows:

      • "diacritics.csv.quotecomma" never ever works ... it always fails to correctly interpret the quotes within quotes when followed by a comma
      • "diacritics.csv.quote" is not correctly re-imported (doubled doublequotes during export are not converted back to single doublequotes)
        ""äöü""ÄÖÜ߀  is imported as  ""äöü""ÄÖÜ߀  instead of  "äöü"ÄÖÜ߀
      • the other two work, WHEN and only when the encoding of the file is UTF-8
        Every other encoding just produces unreadable content.

       

       

      So ... I am doing something wrong or did I run into a bug (er well ... as for me that's more a  " B U G "  ;-) ).

       

      PS: before I forget ... ASCII does work too ... but that's not really a surprise, isn't it?

       

       

      Thanks in advance

      Peter

        • 1. Re: How to load diacritics in non-UTF-8 csv files with SQLcl-4.2.0.15.177.0246 (Windows 2012 R2)
          dhalek

          Well, this may be a little bit off topic in this forum, but have you tried using external tables or SQL*Loader? These are more specialized in loading data into oracle than sqlcl where I consider the load command as a convenience function.

          Besides that, I would try to use a field separator that does not occur in the data itself. That simplifies things a lot.

           

          Hope that helps,

          dhalek

          • 2. Re: How to load diacritics in non-UTF-8 csv files with SQLcl-4.2.0.15.177.0246 (Windows 2012 R2)
            stueckl

            Well ... the idea was NOT to use a fat client ... external tables are out of reach as I have no access to the physical disks (which would be needed for external tables).

             

            Using another field-separator sounds like a good idea ... if i only knew to tell "load" how to use it ...

            And in the long run it isn't guaranteed that the other separator won't occur in the data sometime.

            CSV has a solution for all that ...

             

            Besides that using SQLcl for loading files is so much advertised (and the speed is just  W O W).

            So I just expect to work it ... the files are neither special nor something else ... just a widely used CSV format and a non UTF-8 encoding ... serious ... the above test case shows, that SQLcl is unable to re-import it's own spooled files!

            And that's not because the created CSV is buggy (it's just plain beautiful ... just as Excel or a CSVWriter would like it).

             

            Peter

            • 3. Re: How to load diacritics in non-UTF-8 csv files with SQLcl-4.2.0.15.177.0246 (Windows 2012 R2)
              thatJeffSmith-Oracle

              Weird, somehow the Help text got wiped. We'll add it back for the next update.

               

              LOAD=LOAD

              ---------

               

              Loads a comma separated value (csv) file into a table.\n\

              The first row of the file must be a header row.  The columns in the header row must match the columns defined on the table.

               

              The columns must be delimited by a comma and may optionally be enclosed in double quotes.

              Lines can be terminated with standard line terminators for windows, unix or mac.

              File must be encoded UTF8.

               

              The load is processed with 50 rows per batch.

              If AUTOCOMMIT is set in SQLCL, a commit is done every 10 batches.

              The load is terminated if more than 50 errors are found.

               

              LOAD [schema.]table_name[@db_link] file_name

              1 person found this helpful
              • 4. Re: How to load diacritics in non-UTF-8 csv files with SQLcl-4.2.0.15.177.0246 (Windows 2012 R2)
                stueckl

                Thanks Jeff ... that was the info I was missing

                File must be encoded UTF8.

                besides that, everything was mentioned in your blog though .

                BTW: it's not

                The first row of the file must be a header row.

                but rather "The first row of the file containing text must ..." as blank lines are skipped.

                And please add the info, that the header row is case sensitive. My eyes almost popped, when I ran into the error message caused by mixed case field names.

                 

                As you did not mention all the other things (especially "SQLcl cannot load well-formed CSV produced by itself"), I am positive the SQLcl team is going to fix that ... right?

                 

                BTW ... you get a batch for producing an extra large "W T F !" on my face (because I never ever expected there to be problems with importing CSV ... mainly because there are really great libs around for that).

                • 5. Re: How to load diacritics in non-UTF-8 csv files with SQLcl-4.2.0.15.177.0246 (Windows 2012 R2)
                  thatJeffSmith-Oracle

                  >>As you did not mention all the other things (especially "SQLcl cannot load well-formed CSV produced by itself"), I am positive the SQLcl team is going to fix that ... right?

                  Give me a test case, I'll be happy to test it and log it.

                   

                  >>mainly because there are really great libs around for that).

                  We use a library for reading in and writing out Excel files, but everything else exporting and importing data is our code.

                   

                  The header row case-sensitive stuff...that's probably b/c Java and Oracle are case-sensitive. If the headers are quoted by default then it's probably expected behavior.

                  1 person found this helpful
                  • 6. Re: How to load diacritics in non-UTF-8 csv files with SQLcl-4.2.0.15.177.0246 (Windows 2012 R2)
                    stueckl

                    Hi,

                     

                    the headers are quoted when using spool ... and in that case case-sensitiveness is fine (it's just the Oracle standard behaviour) ... but when I define the header mixed-case WITHOUT quotes I'd expect SQLcl to behave "Oracle-standard-alike" too (that is ... forget the case, look for the column name case-insensitive) ...

                     

                    Exporting CSV really works great ... just as you would like expect it to work (besides the ever so often seen US-only format with a comma as separator).

                    Thus just didn't expect any problems when reimporting the spooled CSV ... but ... please see above.

                     

                     

                    And for the test case ...

                    >>As you did not mention all the other things (especially "SQLcl cannot load well-formed CSV produced by itself"), I am positive the SQLcl team is going to fix that ... right?

                    Give me a test case, I'll be happy to test it and log it.

                    May I ask you to look into my very first post? There should be everything you need (just not the SQL and console scripts).

                     

                    Would copy it here again ... but answers have limited formatting options

                     

                    • the first code snippet creates the test table
                      create table ZCBRMDST_OneColumn (onecolumn varchar2(20 char));
                    • the second code snippet inserts test data (well ... just German diacritics) ... it is NO script on purpose
                      truncate table ZCBRMDST_OneColumn;
                      insert into ZCBRMDST_OneColumn values ('"äöü",ÄÖÜ,ß,€');
                      commit;
                    • the third and the fourth code snippet do the "spool" and "load" ... as the main problem are the double-quotes for CSV spooling, you only need ...
                      let's say you save it as "spool_load_csv.sql"
                      set feedback off
                      set echo off
                      set sqlformat ansiconsole
                      prompt spooling diacritics with quoted text and comma
                      spool diacritics.csv.quotecomma

                      select /*csv*/ 'qc' || oneColumn oneColumn from ZCBRMDST_OneColumn;
                      spool off
                      prompt show content of table
                      select * from ZCBRMDST_OneColumn;
                      prompt empty (delete) test table
                      delete ZCBRMDST_OneColumn;
                      prompt importing diacritics.csv.quotecomma

                      load ZCBRMDST_OneColumn diacritics.csv.quotecomma
                      prompt show what's have been imported
                      select * from ZCBRMDST_OneColumn;
                      rollback; -- for next test run

                      exit;
                    • the last code snippet is just to invoke SQLcl (with various encodings ... let's stick to UTF-8 here)
                      java -Djava.awt.headless=true -Xss10M -Dfile.encoding=UTF-8 -jar "%SQL_HOME%\lib\oracle.sqldeveloper.sqlcl.jar" <user>/<pwd>@//<server>:<port>/<instance> @spool_load_csv.sql

                     

                    Right now SQLcl won't import anything, as it thinks (50 times) there are unbalanced quotes ...

                     

                     

                    Regards