13 Replies Latest reply: Oct 5, 2012 3:07 PM by jihuyao RSS

    Optimize function to correct a string to be converted to number

    user414983 - oracle
      Hi,

      I have imported almost 6.5 milion rows into a table (using sql loader) from a flat file (which contains the NUL caracter (ASCII 0) - not space, not NULL). To make the load easier I set the datatype as VARCHAR2 for all columns. For columns with string data I used a TRIM(REPLACE(field,' ','')) to get rid of the NUL and it works.

      But for the column that must be converted to numeric I am trying to use to_number () function and it fails because the numeric data is mixed with other characters.

      To solve this problem I created this function :
      "
      CREATE OR REPLACE
      FUNCTION string_to_number
      (p_string_source IN VARCHAR2 ) RETURN VARCHAR2 IS v_output_string varchar2(150);


      -- This function takes a string as parameter and has 2 outputs:
      -- 1. If the source string cannot be converted to number throw 'Error' as the output message to identify the line with the issue
      -- 2. A string that can be converted successfully as Number;
      -- The necessity of this function came up after an import from a flat file where the resulted string contained strange characters shown as spaces

      --check if '-' is exists and is on the first position or if in the source string exists more than one '.'
      -- ASCII codes accepted in the string :
      -- 45 '-' ; 46 '.' ; from 48 (0) to 57 (9)

      BEGIN
      DECLARE v_minus VARCHAR2(1);
      v_dot INTEGER;
      BEGIN
      SELECT substr(p_string_source,1,1) INTO v_minus FROM dual;
      SELECT instr(p_string_source,'.',1,2) INTO v_dot FROM dual;


      --check if '-' is exists and is on the first position or if in the source string exists more than one '.'
      -- ASCII codes accepted in the string :
      -- 45 '-' ; 46 '.' ; from 48 (0) to 57 (9)

      IF v_minus NOT IN ('-','.','0','1','2','3','4','5','6','7','8','9')

      -- or there are two dots '.' in the string
      OR v_dot <> 0
      THEN v_output_string := 'Error';
      ELSE
      BEGIN

      -- for every character of the string we'll check if it's a number to add it to the outcome string;
      -- if it's not an accepted character it will be ignored
      DECLARE v_length_source int := length(p_string_source);
      v_counter int :=1;
      v_add_in_number VARCHAR2(1);

      BEGIN
      FOR v_counter IN 1..v_length_source LOOP
      BEGIN
      SELECT SUBSTR(p_string_source,v_counter,1) into v_add_in_number from dual;
      IF v_add_in_number IN ('-','.','0','1','2','3','4','5','6','7','8','9')
      THEN v_output_string := v_output_string ||v_add_in_number;
      END IF;
      END;
      END LOOP;
      END;
      -- in case the string is in format '.00034' we'll add a 0 in front of the string to be accepted as argument by TO_NUMBER function
      IF v_minus = '.'
      THEN v_output_string := '0'||v_output_string;
      ELSE
      BEGIN
      v_output_string := v_output_string;
      END;
      END IF;
      END;
      END IF;

      END;
      RETURN v_output_string;
      END;
      "
      The main idea is to check every string (the parameter will be the value from the Amount column) for permitted characters that compose a numeric value:
      1. To begins with numeric, '-' or '.'
      2. To have only one '.' (as a decimal separator);
      3. To compare every character of the string with the permitted ones - the non-compliant will be rejected

      This way the resulting string (v_output_string) will be successfully converted to number


      I admit that I don't have much experience using PL/SQL that is why I am asking your help to optimize this function to improve its performance. Could you help me on this, please ?

      TIA,

      JohnP

      Edited by: petresion on 04-Oct-2012 01:33
        • 1. Re: URGENT - Pls help to optimize this function
          BluShadow
          Please read the FAQ {message:id=9360002}, especially point 2, but also the other points, and edit your post accordingly. Then you may get more effective response to your question.
          • 2. Re: URGENT - Pls help to optimize this function
            Peter Gjelstrup
            Hi John,

            This is what I would do.

            1. Never write a single line of PL/SQL for such task
            2. Not use sqlldr, but an external table.
            3. Create all columns for external table as varchar2
            4. Create staging table having proper data types, column sizes and check constraints
            5. Create an error log for staging table (DML Error Logging)
            6. Load into staging table using select from external table
            7. During load convert chr(0) to NULL using NULLIF and convert the varchars to their proper types using built in functions like TO_CHAR and TO_DATE.
            8. Never use the U-word in a public forum of volunteers


            Result will be

            All "good" records are in staging table
            All "bad" records are in the error log table
            The load does not fail
            You have cleaned data without coding anything, everything is declarative
            People won't get upset about the U-word

            Regards
            Peter
            • 3. Re: URGENT - Pls help to optimize this function
              Keith Jamieson
              I would have done this differently.
              Ok all columns as varchar2 initially for staging_table_1

              Then create staging_table_2 with all the column datatypes specified correctly and insert all the values into staging_table_2 from staging_table_1 using the log_errors clause.

              All values not inserted will be in the log_errors table specified.

              ie as Peter Said (Only with more detail)

              Edited by: Keith Jamieson on Oct 4, 2012 9:34 AM
              • 4. Re: URGENT - Pls help to optimize this function
                user414983 - oracle
                Done it. Sorry for inconvenience.
                • 5. Re: URGENT - Pls help to optimize this function
                  user414983 - oracle
                  Peter,

                  First of all, thank you for your response.

                  Secondly, I used SQL Loader because of time constraints - I wasn't aware at that time of the NUL element issue. And another reason to choose SQL Loader is that I've done it before, without any issues and I could cleanse the data after was already in the stage table.

                  I am not very familiar with external tables - I used it once or twice about an year ago and don't feel very comfortable now - also being under time pressure.

                  Is there a way to make the function I created more time-efficient ?


                  Thanks in advance,

                  JohnP
                  • 6. Re: URGENT - Pls help to optimize this function
                    chris227
                    for example
                    with data as (
                    select 'asfs' s from dual union all
                    select '.34' from dual union all
                    select '3.5.4' from dual union all
                    select 'as54s' from dual )
                    
                    select
                     case translate (s, '.1234567890','.')
                     when '.' then to_number(s)
                     end r
                    from data
                    
                    R 
                    -  
                    .34 
                    -  
                    -  
                    
                    or
                    
                    with data as (
                    select 'asfs' s from dual union all
                    select '.34' from dual union all
                    select '3.5.4' from dual union all
                    select 'as54s' from dual )
                    
                    select
                     case translate (s, '.1234567890','.')
                     when '.' then s
                     else 'error'
                     end r
                    from data
                    
                    R 
                    error 
                    .34 
                    error 
                    error 
                    • 7. Re: URGENT - Pls help to optimize this function
                      Peter Gjelstrup
                      Hi John,
                      Is there a way to make the function I created more time-efficient ?
                      Maybe something as simple as this will do:
                      SQL> CREATE OR REPLACE FUNCTION safe_number (s IN VARCHAR2)
                        RETURN VARCHAR2 AS
                        dummy   NUMBER;
                      BEGIN
                        dummy := TO_NUMBER (TRIM (s));
                        RETURN TRIM (s);
                      EXCEPTION
                        WHEN VALUE_ERROR THEN
                          RETURN 'Error';
                      END safe_number;
                      /
                      
                      Function created.
                      SQL> select safe_number('x') from dual;
                      
                      SAFE_NUMBER('X')              
                      ------------------------------
                      Error                         
                      1 row selected.
                      
                      SQL> select safe_number('1') from dual;
                      
                      SAFE_NUMBER('1')              
                      ------------------------------
                      1                             
                      1 row selected.
                      
                      SQL> select safe_number('.1') from dual;
                      
                      SAFE_NUMBER('.1')             
                      ------------------------------
                      .1                            
                      1 row selected.
                      
                      SQL> select safe_number(null) from dual;
                      
                      SAFE_NUMBER(NULL)             
                      ------------------------------
                                                    
                      1 row selected.
                      
                      SQL> select safe_number(chr(0)) from dual;
                      
                      SAFE_NUMBER(CHR(0))           
                      ------------------------------
                      Error                         
                      1 row selected.
                      Regards
                      Peter
                      • 8. Re: URGENT - Pls help to optimize this function
                        rp0428
                        >
                        Is there a way to make the function I created more time-efficient ?
                        >
                        Yes - throw everying between the BEGIN and END away and use Oracle's TO_NUMBER function to do the conversion. In the exception handler trap the exception that gets thrown if the conversion fails and handle it as you see fit.

                        There is no need at all to write your own conversion function.
                        • 9. Re: URGENT - Pls help to optimize this function
                          Paulie
                          >



                          Hi John,

                          I am not very familiar with external tables - I used it once or twice about
                          an year ago and don't feel very comfortable now - also being under time pressure.
                          I would urge you to take the small amount of time it takes to learn External Tables.

                          Take a quick look here:
                          http://www.oracle-developer.net/display.php?id=204
                          http://www.oracle-base.com/articles/9i/external-tables-9i.php
                          http://psoug.org/reference/externaltab.html
                          http://www.adp-gmbh.ch/ora/misc/ext_table.html


                          The effort you expend will be paid back many times.


                          HTH,


                          Paul...

                          JohnP
                          • 10. Re: URGENT - Pls help to optimize this function
                            user414983 - oracle
                            @Peter,

                            Initially, I've used the to_number function on the amount field values (varchar2 type), but kept getting an error - haven't tried the exception though.

                            The main idea is that I want to capture the lines with error to check the string for any wrong characters - this is why I wrote that bunch of code.

                            I am not interested to know if it's an error (I know it is), but where it is actually.


                            Thank you once again,

                            JohnP
                            • 11. Re: URGENT - Pls help to optimize this function
                              chris227
                              I really dont see what should be wrong with translate as a pure sql approach
                              with data as (
                              select 'asfs' s from dual union all
                              select '.34' from dual union all
                              select '-.34' from dual union all
                              select '0.34' from dual union all
                              select '-0.34' from dual union all
                              select '35.' from dual union all
                              select '3.5.4' from dual union all
                              select 'as54s' from dual )
                               
                              select
                               s,
                               case translate (s, '-.1234567890','-.')
                               when '.'  then to_number(s,'99999999D99','NLS_NUMERIC_CHARACTERS = ''.,''')
                               when '-.' then to_number(s,'99999999D99','NLS_NUMERIC_CHARACTERS = ''.,''')
                               end r
                              from data
                              
                              S     R
                              asfs     -
                              .34     .34
                              -.34     -.34
                              0.34     .34
                              -0.34     -.34
                              35.     35
                              3.5.4     -
                              as54s     - 
                              
                              or
                              
                              select 
                              s
                              from data
                              where
                              translate (s, '-.1234567890','-.') != '.'
                              and
                              translate (s, '-.1234567890','-.') != '-.'
                              
                              S
                              asfs
                              3.5.4
                              as54s
                              • 12. Re: URGENT - Pls help to optimize this function
                                rp0428
                                >
                                I am not interested to know if it's an error (I know it is), but where it is actually.
                                >
                                Peter told you how to do that in his first reply.

                                You use DML error logging. Then the records with errors are put into the log table where you can access them, fix the error and then processs the record again.

                                See Inserting Data with DML Error Logging in the DBA doc
                                http://docs.oracle.com/cd/B28359_01/server.111/b28310/tables004.htm
                                >
                                Inserting Data with DML Error Logging

                                When you load a table using an INSERT statement with subquery, if an error occurs, the statement is terminated and rolled back in its entirety. This can be wasteful of time and system resources. For such INSERT statements, you can avoid this situation by using the DML error logging feature.

                                To use DML error logging, you add a statement clause that specifies the name of an error logging table into which the database records errors encountered during DML operations. When you add this error logging clause to the INSERT statement, certain types of errors no longer terminate and roll back the statement. Instead, each error is logged and the statement continues. You then take corrective action on the erroneous rows at a later time.

                                DML error logging works with INSERT, UPDATE, MERGE, and DELETE statements. This section focuses on INSERT statements.
                                >
                                If you use the recommended EXTERNAL TABLE (as simple as creating the sql loader control file) you can use the SELECT on the external table as the source for the INSERT into the target table that has the dml error log on it.
                                • 13. Re: URGENT - Pls help to optimize this function
                                  jihuyao
                                  Perform a function here on all 6.5 million of rows will never be efficient.

                                  I would modify Peter's approach a little bit,

                                  1. Load directly into staging t1 without any checking (or create an external table)
                                  2. Transfer from t1 to a list partitioned t2 with list values in ('yes', 'no','null','other')
                                  --simple check using translate()
                                  3. Apply your special function only on rows in the 'other' partition (hopefully much fewer rows left)
                                  --other checks