This discussion is archived
13 Replies Latest reply: Oct 5, 2012 1:07 PM by jihuyao RSS

Optimize function to correct a string to be converted to number

user414983 - oracle Newbie
Currently Being Moderated
Hi,

I have imported almost 6.5 milion rows into a table (using sql loader) from a flat file (which contains the NUL caracter (ASCII 0) - not space, not NULL). To make the load easier I set the datatype as VARCHAR2 for all columns. For columns with string data I used a TRIM(REPLACE(field,' ','')) to get rid of the NUL and it works.

But for the column that must be converted to numeric I am trying to use to_number () function and it fails because the numeric data is mixed with other characters.

To solve this problem I created this function :
"
CREATE OR REPLACE
FUNCTION string_to_number
(p_string_source IN VARCHAR2 ) RETURN VARCHAR2 IS v_output_string varchar2(150);


-- This function takes a string as parameter and has 2 outputs:
-- 1. If the source string cannot be converted to number throw 'Error' as the output message to identify the line with the issue
-- 2. A string that can be converted successfully as Number;
-- The necessity of this function came up after an import from a flat file where the resulted string contained strange characters shown as spaces

--check if '-' is exists and is on the first position or if in the source string exists more than one '.'
-- ASCII codes accepted in the string :
-- 45 '-' ; 46 '.' ; from 48 (0) to 57 (9)

BEGIN
DECLARE v_minus VARCHAR2(1);
v_dot INTEGER;
BEGIN
SELECT substr(p_string_source,1,1) INTO v_minus FROM dual;
SELECT instr(p_string_source,'.',1,2) INTO v_dot FROM dual;


--check if '-' is exists and is on the first position or if in the source string exists more than one '.'
-- ASCII codes accepted in the string :
-- 45 '-' ; 46 '.' ; from 48 (0) to 57 (9)

IF v_minus NOT IN ('-','.','0','1','2','3','4','5','6','7','8','9')

-- or there are two dots '.' in the string
OR v_dot <> 0
THEN v_output_string := 'Error';
ELSE
BEGIN

-- for every character of the string we'll check if it's a number to add it to the outcome string;
-- if it's not an accepted character it will be ignored
DECLARE v_length_source int := length(p_string_source);
v_counter int :=1;
v_add_in_number VARCHAR2(1);

BEGIN
FOR v_counter IN 1..v_length_source LOOP
BEGIN
SELECT SUBSTR(p_string_source,v_counter,1) into v_add_in_number from dual;
IF v_add_in_number IN ('-','.','0','1','2','3','4','5','6','7','8','9')
THEN v_output_string := v_output_string ||v_add_in_number;
END IF;
END;
END LOOP;
END;
-- in case the string is in format '.00034' we'll add a 0 in front of the string to be accepted as argument by TO_NUMBER function
IF v_minus = '.'
THEN v_output_string := '0'||v_output_string;
ELSE
BEGIN
v_output_string := v_output_string;
END;
END IF;
END;
END IF;

END;
RETURN v_output_string;
END;
"
The main idea is to check every string (the parameter will be the value from the Amount column) for permitted characters that compose a numeric value:
1. To begins with numeric, '-' or '.'
2. To have only one '.' (as a decimal separator);
3. To compare every character of the string with the permitted ones - the non-compliant will be rejected

This way the resulting string (v_output_string) will be successfully converted to number


I admit that I don't have much experience using PL/SQL that is why I am asking your help to optimize this function to improve its performance. Could you help me on this, please ?

TIA,

JohnP

Edited by: petresion on 04-Oct-2012 01:33
  • 1. Re: URGENT - Pls help to optimize this function
    BluShadow Guru Moderator
    Currently Being Moderated
    Please read the FAQ {message:id=9360002}, especially point 2, but also the other points, and edit your post accordingly. Then you may get more effective response to your question.
  • 2. Re: URGENT - Pls help to optimize this function
    Peter Gjelstrup Guru
    Currently Being Moderated
    Hi John,

    This is what I would do.

    1. Never write a single line of PL/SQL for such task
    2. Not use sqlldr, but an external table.
    3. Create all columns for external table as varchar2
    4. Create staging table having proper data types, column sizes and check constraints
    5. Create an error log for staging table (DML Error Logging)
    6. Load into staging table using select from external table
    7. During load convert chr(0) to NULL using NULLIF and convert the varchars to their proper types using built in functions like TO_CHAR and TO_DATE.
    8. Never use the U-word in a public forum of volunteers


    Result will be

    All "good" records are in staging table
    All "bad" records are in the error log table
    The load does not fail
    You have cleaned data without coding anything, everything is declarative
    People won't get upset about the U-word

    Regards
    Peter
  • 3. Re: URGENT - Pls help to optimize this function
    KeithJamieson Expert
    Currently Being Moderated
    I would have done this differently.
    Ok all columns as varchar2 initially for staging_table_1

    Then create staging_table_2 with all the column datatypes specified correctly and insert all the values into staging_table_2 from staging_table_1 using the log_errors clause.

    All values not inserted will be in the log_errors table specified.

    ie as Peter Said (Only with more detail)

    Edited by: Keith Jamieson on Oct 4, 2012 9:34 AM
  • 4. Re: URGENT - Pls help to optimize this function
    user414983 - oracle Newbie
    Currently Being Moderated
    Done it. Sorry for inconvenience.
  • 5. Re: URGENT - Pls help to optimize this function
    user414983 - oracle Newbie
    Currently Being Moderated
    Peter,

    First of all, thank you for your response.

    Secondly, I used SQL Loader because of time constraints - I wasn't aware at that time of the NUL element issue. And another reason to choose SQL Loader is that I've done it before, without any issues and I could cleanse the data after was already in the stage table.

    I am not very familiar with external tables - I used it once or twice about an year ago and don't feel very comfortable now - also being under time pressure.

    Is there a way to make the function I created more time-efficient ?


    Thanks in advance,

    JohnP
  • 6. Re: URGENT - Pls help to optimize this function
    chris227 Guru
    Currently Being Moderated
    for example
    with data as (
    select 'asfs' s from dual union all
    select '.34' from dual union all
    select '3.5.4' from dual union all
    select 'as54s' from dual )
    
    select
     case translate (s, '.1234567890','.')
     when '.' then to_number(s)
     end r
    from data
    
    R 
    -  
    .34 
    -  
    -  
    
    or
    
    with data as (
    select 'asfs' s from dual union all
    select '.34' from dual union all
    select '3.5.4' from dual union all
    select 'as54s' from dual )
    
    select
     case translate (s, '.1234567890','.')
     when '.' then s
     else 'error'
     end r
    from data
    
    R 
    error 
    .34 
    error 
    error 
  • 7. Re: URGENT - Pls help to optimize this function
    Peter Gjelstrup Guru
    Currently Being Moderated
    Hi John,
    Is there a way to make the function I created more time-efficient ?
    Maybe something as simple as this will do:
    SQL> CREATE OR REPLACE FUNCTION safe_number (s IN VARCHAR2)
      RETURN VARCHAR2 AS
      dummy   NUMBER;
    BEGIN
      dummy := TO_NUMBER (TRIM (s));
      RETURN TRIM (s);
    EXCEPTION
      WHEN VALUE_ERROR THEN
        RETURN 'Error';
    END safe_number;
    /
    
    Function created.
    SQL> select safe_number('x') from dual;
    
    SAFE_NUMBER('X')              
    ------------------------------
    Error                         
    1 row selected.
    
    SQL> select safe_number('1') from dual;
    
    SAFE_NUMBER('1')              
    ------------------------------
    1                             
    1 row selected.
    
    SQL> select safe_number('.1') from dual;
    
    SAFE_NUMBER('.1')             
    ------------------------------
    .1                            
    1 row selected.
    
    SQL> select safe_number(null) from dual;
    
    SAFE_NUMBER(NULL)             
    ------------------------------
                                  
    1 row selected.
    
    SQL> select safe_number(chr(0)) from dual;
    
    SAFE_NUMBER(CHR(0))           
    ------------------------------
    Error                         
    1 row selected.
    Regards
    Peter
  • 8. Re: URGENT - Pls help to optimize this function
    rp0428 Guru
    Currently Being Moderated
    >
    Is there a way to make the function I created more time-efficient ?
    >
    Yes - throw everying between the BEGIN and END away and use Oracle's TO_NUMBER function to do the conversion. In the exception handler trap the exception that gets thrown if the conversion fails and handle it as you see fit.

    There is no need at all to write your own conversion function.
  • 9. Re: URGENT - Pls help to optimize this function
    Paulie Pro
    Currently Being Moderated
    >



    Hi John,

    I am not very familiar with external tables - I used it once or twice about
    an year ago and don't feel very comfortable now - also being under time pressure.
    I would urge you to take the small amount of time it takes to learn External Tables.

    Take a quick look here:
    http://www.oracle-developer.net/display.php?id=204
    http://www.oracle-base.com/articles/9i/external-tables-9i.php
    http://psoug.org/reference/externaltab.html
    http://www.adp-gmbh.ch/ora/misc/ext_table.html


    The effort you expend will be paid back many times.


    HTH,


    Paul...

    JohnP
  • 10. Re: URGENT - Pls help to optimize this function
    user414983 - oracle Newbie
    Currently Being Moderated
    @Peter,

    Initially, I've used the to_number function on the amount field values (varchar2 type), but kept getting an error - haven't tried the exception though.

    The main idea is that I want to capture the lines with error to check the string for any wrong characters - this is why I wrote that bunch of code.

    I am not interested to know if it's an error (I know it is), but where it is actually.


    Thank you once again,

    JohnP
  • 11. Re: URGENT - Pls help to optimize this function
    chris227 Guru
    Currently Being Moderated
    I really dont see what should be wrong with translate as a pure sql approach
    with data as (
    select 'asfs' s from dual union all
    select '.34' from dual union all
    select '-.34' from dual union all
    select '0.34' from dual union all
    select '-0.34' from dual union all
    select '35.' from dual union all
    select '3.5.4' from dual union all
    select 'as54s' from dual )
     
    select
     s,
     case translate (s, '-.1234567890','-.')
     when '.'  then to_number(s,'99999999D99','NLS_NUMERIC_CHARACTERS = ''.,''')
     when '-.' then to_number(s,'99999999D99','NLS_NUMERIC_CHARACTERS = ''.,''')
     end r
    from data
    
    S     R
    asfs     -
    .34     .34
    -.34     -.34
    0.34     .34
    -0.34     -.34
    35.     35
    3.5.4     -
    as54s     - 
    
    or
    
    select 
    s
    from data
    where
    translate (s, '-.1234567890','-.') != '.'
    and
    translate (s, '-.1234567890','-.') != '-.'
    
    S
    asfs
    3.5.4
    as54s
  • 12. Re: URGENT - Pls help to optimize this function
    rp0428 Guru
    Currently Being Moderated
    >
    I am not interested to know if it's an error (I know it is), but where it is actually.
    >
    Peter told you how to do that in his first reply.

    You use DML error logging. Then the records with errors are put into the log table where you can access them, fix the error and then processs the record again.

    See Inserting Data with DML Error Logging in the DBA doc
    http://docs.oracle.com/cd/B28359_01/server.111/b28310/tables004.htm
    >
    Inserting Data with DML Error Logging

    When you load a table using an INSERT statement with subquery, if an error occurs, the statement is terminated and rolled back in its entirety. This can be wasteful of time and system resources. For such INSERT statements, you can avoid this situation by using the DML error logging feature.

    To use DML error logging, you add a statement clause that specifies the name of an error logging table into which the database records errors encountered during DML operations. When you add this error logging clause to the INSERT statement, certain types of errors no longer terminate and roll back the statement. Instead, each error is logged and the statement continues. You then take corrective action on the erroneous rows at a later time.

    DML error logging works with INSERT, UPDATE, MERGE, and DELETE statements. This section focuses on INSERT statements.
    >
    If you use the recommended EXTERNAL TABLE (as simple as creating the sql loader control file) you can use the SELECT on the external table as the source for the INSERT into the target table that has the dml error log on it.
  • 13. Re: URGENT - Pls help to optimize this function
    jihuyao Journeyer
    Currently Being Moderated
    Perform a function here on all 6.5 million of rows will never be efficient.

    I would modify Peter's approach a little bit,

    1. Load directly into staging t1 without any checking (or create an external table)
    2. Transfer from t1 to a list partitioned t2 with list values in ('yes', 'no','null','other')
    --simple check using translate()
    3. Apply your special function only on rows in the 'other' partition (hopefully much fewer rows left)
    --other checks                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points