This discussion is archived
10 Replies Latest reply: Oct 15, 2012 9:58 AM by fac586 RSS

Replace or delete all single NewLines in a Blob

Johnny Be Good Newbie
Currently Being Moderated
Hi there,

I would like to delete/replace all single New Lines (NOT the Carriage Return / New Lines) within a blob.

I tried to convert the blob in a clob and then replace all occurences of single New Lines with the replace function (e.g. replace(clob, chr(10)) ),
I even cut the clob into small chunks of varchar2 and tried to apply the replace function etc, but all without success!

I would be glad if someone could give me a hint how I could tackle this problem.

I am using
Application Express 4.1.0.00.32 on
Oracle 10g (10.2.0.4.0 - 64bit)

Thanks,
Johnny
  • 2. Re: Replace or delete all single NewLines in a Blob
    fac586 Guru
    Currently Being Moderated
    Johnny Be Good wrote:
    Hi there,

    I would like to delete/replace all single New Lines (NOT the Carriage Return / New Lines) within a blob.

    I tried to convert the blob in a clob and then replace all occurences of single New Lines with the replace function (e.g. replace(clob, chr(10)) ),
    I even cut the clob into small chunks of varchar2 and tried to apply the replace function etc, but all without success!

    I would be glad if someone could give me a hint how I could tackle this problem.
    Why are you storing character data in a BLOB instead of a CLOB?

    What does "without success" mean?

    Post what you've actually been doing, according to the guidelines in +{message:id=9360002}+.
  • 3. Re: Replace or delete all single NewLines in a Blob
    fac586 Guru
    Currently Being Moderated
    It's trivial to do this in DML with a CLOB column:
    SQL> create table t_blob (b blob, c clob);
    table T_BLOB created.
    
    SQL> insert into t_blob
      (b, c)
    values
      (   utl_raw.cast_to_raw('Fee' || chr(10) || 'Fi' || chr(13) || chr(10) || 'Fo' || chr(10) || 'Fum' || chr(13) || chr(10))
        , 'Fee' || chr(10) || 'Fo' || chr(13) || chr(10) || 'Fi' || chr(10) || 'Fum' || chr(13) || chr(10));
    1 rows inserted.
    
    SQL> select c s0, dump(cast(c as varchar2(20))) d0 from t_blob;
    
    S0                   D0                                                                             
    -------------------- --------------------------------------------------------------------------------
    Fee                  Typ=1 Len=16: 70,101,101,10,70,111,13,10,70,105,10,70,117,109,13,10              
    Fo                                                                                                     
    Fi                                                                                                    
    Fum                                                                                                    
                                                                                                          
    SQL> update t_blob set c = regexp_replace(c, '([^' || chr(13) || '])(' || chr(10) || ')', '\1');
    1 rows updated.
    
    SQL> select c s0, dump(cast(c as varchar2(20))) d0 from t_blob;
    
    S0                   D0                                                                             
    -------------------- --------------------------------------------------------------------------------
    FeeFo                Typ=1 Len=14: 70,101,101,70,111,13,10,70,105,70,117,109,13,10                    
    FiFum 
    If the BLOB is less than 32K in length, then it can also be done this way, by converting the BLOB to a VARCHAR2:
    SQL> select utl_raw.cast_to_varchar2(b) s1, dump(utl_raw.cast_to_varchar2(b)) d1 from t_blob;
    
    S1                   D1                                                                             
    -------------------- --------------------------------------------------------------------------------
    Fee                  Typ=1 Len=16: 70,101,101,10,70,111,13,10,70,105,10,70,117,109,13,10              
    Fi                                                                                                     
    Fo                                                                                                    
    Fum                                                                                                    
    
    SQL> update t_blob set b =  utl_raw.cast_to_raw(regexp_replace(utl_raw.cast_to_varchar2(b), '([^' || chr(13) || '])(' || chr(10) || ')', '\1'));
    1 rows updated.
    
    SQL> select utl_raw.cast_to_varchar2(b) s1, dump(utl_raw.cast_to_varchar2(b)) d1 from t_blob;
    
    S1                   D1                                                                             
    -------------------- --------------------------------------------------------------------------------
    FeeFi                Typ=1 Len=14: 70,101,101,70,111,13,10,70,105,70,117,109,13,10                    
    FoFum   
    To do it with a BLOB that may be longer than 32K requires using PL/SQL:
    SQL> select utl_raw.cast_to_varchar2(b) s1, dump(utl_raw.cast_to_varchar2(b)) d1 from t_blob;
    
    S1                   D1                                                                             
    -------------------- --------------------------------------------------------------------------------
    Fee                  Typ=1 Len=16: 70,101,101,10,70,111,13,10,70,105,10,70,117,109,13,10              
    Fi                                                                                                     
    Fo                                                                                                    
    Fum                                                                                                    
    
    SQL> declare
    
      bc          blob;
      c           clob;
      srcOffset   number := 1;
      dstOffset   number := 1;
      warning     number;
      langContext number := dbms_lob.default_lang_ctx;
    
    begin
    
      dbms_lob.createTemporary(c, true);
    
      select b into bc from t_blob;
      
      dbms_lob.convertToClob(
          dest_lob => c,
          src_blob => bc,
          amount => dbms_lob.getLength(bc),
          dest_offset => dstOffset,
          src_offset => srcOffset,
          blob_csid => dbms_lob.default_csid,
          lang_context => langContext,
          warning => warning);
    
      c := regexp_replace(c, '([^' || chr(13) || '])(' || chr(10) || ')', '\1');
    
      srcOffset := 1;
      dstoffset := 1;
      warning := null;
    
      dbms_lob.convertToBlob(
          dest_lob => bc,
          src_clob => c,
          amount => dbms_lob.getLength(c),
          dest_offset => dstOffset,
          src_offset => srcOffset,
          blob_csid => dbms_lob.default_csid,
          lang_context => langContext,
          warning => warning);
    
      update t_blob set b = bc;
      
    end;
    /
    
    anonymous block completed
    
    SQL> select utl_raw.cast_to_varchar2(b) s1, dump(utl_raw.cast_to_varchar2(b)) d1 from t_blob;
    
    S1                   D1                                                                             
    -------------------- --------------------------------------------------------------------------------
    FeeFi                Typ=1 Len=16: 70,101,101,70,111,13,10,70,105,70,117,109,13,10,13,10              
    FoFum                                                                                                 
    Clearly it is much more sensible (and more efficient) to store character data using a CLOB rather than a BLOB.
  • 4. Re: Replace or delete all single NewLines in a Blob
    Johnny Be Good Newbie
    Currently Being Moderated
    Hi Paul,

    the answer for your first question is simple. I use a blob instead of a clob because I upload a file (csv file) using a "File Browse" item.
    The File is then stored in FLOWS_FILES.WWV_FLOW_FILE_OBJECTS$ (wwv_flow_files) which stores the file in a blob column (blob_content).

    What I do next is to store the blob content in a file within a directory in order to use it as an external file.
    After that I query the external file and merge the content into a normal table within the database.

    The problem that I have now, is that sometimes uploaded csv files have "broken data sets" in the way that a row (one dataset) has been broken into
    two or multiple rows (with a single newline at the end of each row instead of a carriagereturn/newline for normal rows).
    This happens sometimes, when the corresponding rows/datasets contain a very large text (e.g. 3000 characters) in one column and MS excel (the tool used to save the excel origin file into csv) can't handle that long texts in one row of a column and breaks up the content into multiple rows in one cell.

    However, I need to "rebreak" the broken data sets within the uploaded blob by deleting (or replacing) the single newlines, which are an indication of a split data set.

    I tried to do that by converting the blob into a clob, e.g with this function:
    create or replace
    FUNCTION blob_to_clob (blob_in IN BLOB) RETURN CLOB AS
        v_clob CLOB;
        v_varchar VARCHAR2(32767);
        v_start PLS_INTEGER := 1;
        v_buffer PLS_INTEGER := 32767;
      BEGIN
        DBMS_LOB.CREATETEMPORARY(v_clob, TRUE);
        FOR i IN 1..CEIL(DBMS_LOB.GETLENGTH(blob_in) / v_buffer)
        LOOP
           v_varchar := UTL_RAW.CAST_TO_VARCHAR2(DBMS_LOB.SUBSTR(blob_in, v_buffer, v_start));
           DBMS_LOB.WRITEAPPEND(v_clob, LENGTH(v_varchar), v_varchar);
           v_start := v_start + v_buffer;
        END LOOP;
       RETURN v_clob;
      END blob_to_clob;
    And after the conversion, trying to replace all occurences of single newlines within that clob, e.g with this code:

    l_clob := replaceClob(l_clob,chr(10),' ');
    and the function:
    FUNCTION replaceClob (
    srcClob IN CLOB, 
    replaceStr IN VARCHAR2,
    replaceWith IN VARCHAR2) 
    RETURN CLOB 
    IS
    
    vBuffer    VARCHAR2 (32767);
    l_amount   BINARY_INTEGER := 32767;
    l_pos      INTEGER;
    l_clob_len INTEGER;
    newClob    CLOB := EMPTY_CLOB;
        
    BEGIN
    l_pos := 1;
      -- initalize the new clob
      --dbms_lob.createtemporary(newClob,TRUE);
       DBMS_LOB.CREATETEMPORARY(
              lob_loc => newClob
            , cache   => true
            , dur     => dbms_lob.session
        );
    
        
      l_clob_len := dbms_lob.getlength(srcClob);
    
      WHILE l_pos <= l_clob_len
      LOOP
        dbms_lob.read(srcClob, l_amount, l_pos, vBuffer);
    
        IF vBuffer IS NOT NULL THEN
          -- replace the text
          vBuffer := replace(vBuffer, replaceStr, replaceWith);
          -- write it to the new clob
          dbms_lob.writeappend(newClob, LENGTH(vBuffer), vBuffer);
        END IF;
        l_pos := l_pos + l_amount;
      END LOOP;
        
      RETURN newClob; 
    EXCEPTION
      WHEN OTHERS THEN
        RAISE;
    END;
    "Without sucess" means, that the newlines have not been replaced. I tried several ways like deleting (replace without a third parameter and the corresponding function), replacing (e.g. with a blank) etc.!

    I even tried to apply the standard replace function for varchars on a clob as I read somewhere that this is being supported in 10g and apex 4.1 but with the same result.

    I hope I could explain the issue a little better now and would be grateful if you had a starting point for me.
  • 5. Re: Replace or delete all single NewLines in a Blob
    fac586 Guru
    Currently Being Moderated
    Johnny Be Good wrote:
    Hi Paul,

    the answer for your first question is simple. I use a blob instead of a clob because I upload a file (csv file) using a "File Browse" item.
    The File is then stored in FLOWS_FILES.WWV_FLOW_FILE_OBJECTS$ (wwv_flow_files) which stores the file in a blob column (blob_content).

    What I do next is to store the blob content in a file within a directory in order to use it as an external file.
    After that I query the external file and merge the content into a normal table within the database.

    The problem that I have now, is that sometimes uploaded csv files have "broken data sets" in the way that a row (one dataset) has been broken into
    two or multiple rows (with a single newline at the end of each row instead of a carriagereturn/newline for normal rows).
    This happens sometimes, when the corresponding rows/datasets contain a very large text (e.g. 3000 characters) in one column and MS excel (the tool used to save the excel origin file into csv) can't handle that long texts in one row of a column and breaks up the content into multiple rows in one cell.

    However, I need to "rebreak" the broken data sets within the uploaded blob by deleting (or replacing) the single newlines, which are an indication of a split data set.
    Given that process. I'd convert from BLOB to CLOB when the file is extracted from <tt>APEX_APPLICATION_FILES</tt>, remove the problem LFs, then write it to the filesystem as a CLOB.
    I tried to do that by converting the blob into a clob, e.g with this function:
    A function like that is reinventing the wheel. 10g comes with <tt>dbms_lob.convertToClob</tt> (see above).
    And after the conversion, trying to replace all occurences of single newlines within that clob, e.g with this code:
    l_clob := replaceClob(l_clob,chr(10),' ');
    That will replace all LFs, whether they're in CR/LF pairs or not. Use <tt>regexp_replace</tt>/regular expressions to exclude LFs that follow a CR (see above).
    "Without sucess" means, that the newlines have not been replaced. I tried several ways like deleting (replace without a third parameter and the corresponding function), replacing (e.g. with a blank) etc.!

    I even tried to apply the standard replace function for varchars on a clob as I read somewhere that this is being supported in 10g and apex 4.1 but with the same result.
    I think you've been making life difficult for yourself. All that seems to involve writing more (inefficient) code than is necessary. Everything you need is built-in to 10g (it looks like you might have been using out-of-date references?)
    I hope I could explain the issue a little better now and would be grateful if you had a starting point for me.
    Should have what you need from the examples above.
  • 6. Re: Replace or delete all single NewLines in a Blob
    Johnny Be Good Newbie
    Currently Being Moderated
    Dear Paul,

    thank you very much for your efforts.
    Your answer helped me a lot.

    Kind regards
    Johnny
  • 7. Re: Replace or delete all single NewLines in a Blob
    Johnny Be Good Newbie
    Currently Being Moderated
    Hello Paul,

    I noticed another specific case in the above mentioned problem!
    But as I do not have too much experience with regular expression I still need some assistance.

    I have cases, where there is only a NewLine in a row or maybe some preceding blank spaces with a finishing NewLine!

    Could you please assist withe a regular expression to delete those NewLines, too!?
    I tried to solve that problem myself by investigating the Oracle SQL Reference, but without success yet!

    Thanks in advance for your help.

    Regards and a nice weekend
    Johnny
  • 8. Re: Replace or delete all single NewLines in a Blob
    fac586 Guru
    Currently Being Moderated
    Needs to be done in a separate replace step as there is no initial match to be retained:
    ...
    c := regexp_replace(c, '^ *$');
    c := regexp_replace(c, '([^' || chr(13) || '])(' || chr(10) || ')', '\1');
    ...
  • 9. Re: Replace or delete all single NewLines in a Blob
    Johnny Be Good Newbie
    Currently Being Moderated
    Hi Paul,

    thank you very much for you efforts!

    However, I am afraid this is not yet the solution to my problem! :-(
    I applied your proposal, but it is not working as expected.

    Let me try to explain it with the following example:
    LF = linefeed (non printable character)
    CR = carriage return (non printable character)

    This is the source text:

    This is row 4    LF
    This is row 5LF
    This is row 6    LF
    This is row 7CRLF
      LF
    This is row 9          LF
    LF
    This is row 11   CRLF
             LF
                                LF
                       bla                                LF
    This is row 16;;;;;;;;;;;;;;;;;;;;;;;;;;;CRLF
    And I want to delete ALL single 'LF' (linefeeds), so that the result should look like this:
    This is row 4    This is row 5This is row 6    This is row 7CRLF
      This is row 9          This is row 11   CRLF
                                                            bla                                This is row 16;;;;;;;;;;;;;;;;;;;;;;;;;;;CRLF
    or even better like this (deleting even preceding spaces, except one, before all single linefeeds):
    This is row 4 This is row 5 This is row 6 This is row 7CRLF
    This is row 9 This is row 11   CRLF
    bla This is row 16;;;;;;;;;;;;;;;;;;;;;;;;;;;CRLF
    Thanks again.
    I really appreciate your help !! :-)

    Regards
    Johnny
  • 10. Re: Replace or delete all single NewLines in a Blob
    fac586 Guru
    Currently Being Moderated
    Johnny Be Good wrote:

    I really appreciate your help !! :-)
    If you want further help then you'll need to create a table with sample data in a workspace on apex.oracle.com and post credentials for a guest developer account with SQL Workshop access. I only have an iPad with me this week.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points