Forum Stats

  • 3,741,936 Users
  • 2,248,505 Discussions
  • 7,862,083 Comments

Discussions

how to generate data with UTF8 format in an efficient way

Rajan Sw
Rajan Sw Member Posts: 344 Blue Ribbon

Hi,

I have a table and the requirement is to spool the data into .sql file.

when I am spooling the data without UTF8 parameter, it is very quick and takes around a min but while the parameter in force, it is taking more than half an hour . The reason we are exporting in utf8 is because of the non printable characters are getting corrupted.

parameter is

export NLS_LANG=AMERICAN_AMERICA.UTF8

Spool sql is this one:



SET LINESIZE 32767

SET SERVEROUTPUT ON

SET FEEDBACK OFF

SET PAGESIZE 0

SET VERIFY OFF

SET TRIMSPOOL ON

SET TRIMOUT ON

set termout OFF

set heading OFF

SET echo OFF

SET LINES 32767


spool vw_etl_codevalue,SQL


SELECT 'INSERT INTO vw_etl_codevalue

(

category,

description,

messagecode,

messagetext,

codeorder,

deleted,

fininstkey,

txtlanguage,

codelang

)

VALUES

(

' || Decode(category,NULL,'NULL','''' || replace(category,'''','''''') || '''') || ',

' || Decode(description,NULL,'NULL','''' || replace(description,'''','''''') || '''') || ',

' || Decode(messagecode,NULL,'NULL','''' || replace(messagecode,'''','''''') || '''') || ',

' || Decode(messagetext,NULL,'NULL','''' || replace(messagetext,'''','''''') || '''') || ',

' || Decode(codeorder,NULL,'NULL','''' || replace(codeorder,'''','''''') || '''') || ',

' || Decode(deleted,NULL,'NULL','''' || replace(deleted,'''','''''') || '''') || ',

' || Decode(fininstkey,NULL,'NULL','''' || replace(fininstkey,'''','''''') || '''') || ',

' || Decode(txtlanguage,NULL,'NULL','''' || replace(txtlanguage,'''','''''') || '''') || ',

' || Decode(codelang,NULL,'NULL','''' || replace(codelang,'''','''''') || '''') || '

);'

FROM vw_etl_codevalue

/


Please share any thoughts how we can achieve without this OS parameter or any other better ways

and the current DB version is 12c R2

Tagged:

Answers

  • Paulzip
    Paulzip Member Posts: 8,374 Blue Diamond
    edited Apr 13, 2021 8:52AM

    Is your DB character set AL32UTF8? If not, then that would explain the speed issues as it will probably be converting everything.

    I think your query is very inefficient in the manner it outputs

    I would replace this pattern

    ' || Decode(category,NULL,'NULL','''' || replace(category,'''','''''') || '''') || ',

    with this..

    ' || nvl2(category, ''''||category||'''', 'null') || '),'

  • Rajan Sw
    Rajan Sw Member Posts: 344 Blue Ribbon

    Thanks Paulzip. DB character set is AL32UTF, still I can see huge difference in the time taken w/o this parameter.

    need to know if there is any other alternative to this to achieve this.


    The replace function is to handle if there is any single quote, it will replace to double single quotes, i think that we can not replace.

  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,682 Gold Crown

    I haven't tried this with character set conversion going on, but you could try taking advantage of the enhancement (12c) to the "set markup" command that allows for CSV output. The method you for the subsequent reloading will have to change - part of your overhead will be the cost of all those calls to decode and concatenate and then you're going to be doing single row inserts to reload the data when you could use an external table to reload the data at high speed from a "naked" CSV.

    Sample code: 
    select * from t1;
    
    V1         V2                 N1
    ---------- ---------- ----------
    abcdef                        99
    abc'def    abc
    abc"def                       99
    abc,def    abc
    
    Enable CSC markup
    set markup csv on delimiter ',' quote on
    
    "V1","V2","N1"
    "abcdef",,99
    "abc'def","abc",
    "abc""def",,99
    "abc,def","abc",
    
    

    I've set a comma separator and enabled quotes - the latter is strictly fixed as the "double-quote" mark, and any double-quote marks in a character output are duplicated to allow them to survive reading back. THe delimiter character can be anything you want.

    Try this SET option with all the other setup things you done, get rid of all the decode() and concatentation, and "insert into..." fixed text and see what you get. Then you could check if a simple external table definition works well for the reload.


    Regards

    Jonathan Lewis

  • Rajan Sw
    Rajan Sw Member Posts: 344 Blue Ribbon

    Thanks Jonathan for the reply..My requierement is not generate the csv files rather the insert statements.


    My primary question is with the parameter, ORACLE is taking more than half an hour while with out the parameter it is taking less than 5 mins.


    need to know can we achieve the same thing by other options

  • Paulzip
    Paulzip Member Posts: 8,374 Blue Diamond
    edited Apr 13, 2021 9:40AM

    Use q strings then

    nvl2(category, 'q''`'||category||'`''', 'null')

    Rajan Sw
  • Jonathan Lewis
    Jonathan Lewis Member Posts: 9,682 Gold Crown

    If you've GOT to generate the insert statements then there are two things to test that might affect the time.

    I suspect the excess time is probably something to do with SQL*Net translating the output string from the server character set to client character set. But it's worth knowing that SQL*Net has a built-in compression feature (See: https://jonathanlewis.wordpress.com/2010/05/07/sqlnet-compression/ ) that may allow you to reduce the work done in your case.

    The "compression" is actually "de-duplication" of previous columns - and in your case you have only one column and it probably shows virtually no duplicates. However if you split your select statement into TWO columns, where the first column is always the 'insert into table(list of columns) values(" bit and the second column is all the concatenation then SQL*Net could reduce the first column to "once per SQL*Net packet" rather than once per row. I don't know WHEN SQL*Net does it's translation, but if it de-duplicates before translating than this is a winner.

    Test 1: split the output into 2 columns to see what happens.

    Note: the larger the packet size the more rows you get per packet and the greater the benefit of de-duplication. So check your SQL*Net setting for the SDU_SIZE (and maybe for the (tcp) send_buf_size and recv_buf_size - but their defaults tend to be adequate on most platforms nowadays).

    Test 2: only select the data columns, not the constant column

    This is just to find out how much to the time relates to the stuff you have to do

    Test 3: experiment using just the 2nd column to compare the work done for decode(), nvl2(), coalesce().

    It's possible that one of these will be better than the others - though that may not be the case given that the time for executing the code without the translation is adequate anyway.

    Regards

    Jonathan Lewis

  • User_H3J7U
    User_H3J7U Member Posts: 259 Silver Badge

    Try to use sqlcl:

    set encoding utf-8
    set sqlformat insert
    select ....
    


  • Sergiusz Wolicki-Oracle
    Sergiusz Wolicki-Oracle Member Posts: 2,019 Employee

    If your database is AL32UTF8, then set NLS_LANG to AMERICAN_AMERICA.AL32UTF8. With UTF8 (deprecated), there is still conversion, even though it should be very fast. In any case, a difference between 1 min and 30 mins is hardly explainable as cost of character set conversion. Character set conversion should add a few percent, not 30x more time.

    One possible cause might be a problem with the execution plan caused by different NLS settings. When you set NLS_LANG, the session runs with settings defined in the client environment, which includes NLS_LANG and other environment variables with names starting with NLS_, for example NLS_SORT. If NLS_LANG is undefined, all NLS parameters for the session (except character set, which defaults to US7ASCII) are taken from init.ora/spfile. Of course, if there is an ON LOGON trigger defined, it can override settings in both cases.

    Include the following query before or after the original one in the spooling script and compare results for your both setups:

    SELECT * FROM NLS_SESSION_PARAMETERS ORDER BY 1


    You can also add SET AUTOTRACE ON before your query and compare the execution plans. Post the results here if you want me to take a look.


    Thanks,

    Sergiusz

    Oracle Database Development

    Jonathan Lewis
Sign In or Register to comment.