This content has been marked as final. Show 19 replies
ascheffer wrote:Can't see that website, it's blocked from our work.
BluShadow wrote:Yes it does. Deflate itself uses Huffman and LZ compression. See also http://technology.amis.nl/blog/7626/utl_compress-gzip-and-zlib
ascheffer wrote:Does it? It doesn't mention that in the documentation.
Oracle is using the deflate algorithm in utl_compress.
The documentation however doesn't mention deflate and it doesn't mention huffman encoding. All it does say about compatibility is:
The output of the UTL_COMPRESS compressed data is compatible with gzip(with -n option)/gunzip on a single file.So, on the point of a zip file containing multiple files/folders, it's really not suitable as far as I can see
Kevin CK wrote:That's probably the route I'd go down, especially if you have multiple files in the zip file.
So my best bet would be to unzip the file using 7-zip(free) and load the files seperately myself.
The gzip format, see RFC 1952, is just some header and trailers surrounding a deflated stream, see RFC 1951. utl_compress uses the gzip format.
hi blu, here is whats on that web site
Utl_compress, gzip and zlib Oracle has a a supplied package utl_compress, which can be used to compress and decompress data with PL/SQL. According to the documentation it uses the “Lempel-Ziv compression algorithm”, and “The output of the UTL_COMPRESS compressed data is compatible with gzip”. That means it’s following the RFC 1952 specs, RFC 1951. And that may be very useful (but I have never used it), but I need compression (I’m working on a PDF-generator in PL/SQL) and decompression (unwrapping wrapped PL/SQL!) in the zlib-format, RFC 1950. Both formats use the same algorithm, RFC 1951, but have different headers and trailers. So can utl_compress be used to for compressing/decompressing data according to the zlib-specs. Yes! In fact all you need is how to calculate the Adler32-checksum, which is used as the trailer of the zlib-format. function adler32( p_src in blob ) return varchar2 is s1 pls_integer := 1; s2 pls_integer := 0; begin for i in 1 .. dbms_lob.getlength( p_src ) loop s1 := mod( s1 + to_number( rawtohex( dbms_lob.substr( p_src, 1, i ) ), 'XX' ), 65521 ); s2 := mod( s2 + s1, 65521); end loop; return to_char( s2, 'fm0XXX' ) || to_char( s1, 'fm0XXX' ); end; With this Adler32 function compressing it zlib-format is nothing more as a simple wrapper for utl_compress.lz_compress function zlib_compress( p_src in blob ) return blob is t_tmp blob; t_cpr blob; begin t_tmp := utl_compress.lz_compress( p_src ); dbms_lob.createtemporary( t_cpr, false ); t_cpr := hextoraw( '789C' ); -- zlib header dbms_lob.copy( t_cpr, t_tmp, dbms_lob.getlength( t_tmp ) - 10 - 8, 3, 11 ); dbms_lob.append( t_cpr, hextoraw( adler32( p_src ) ) ); -- zlib trailer dbms_lob.freetemporary( t_tmp ); return t_cpr; end; Unwrapping the zilb-format with utl_compress takes just a little more effort. Utl_compress.lz_decompress only works if the input has a correct trailer, which includes a crc32-checksum of the uncompressed data. And that’s something we don’t have and can’t calculate with the data we have. But utl_compress has another procedure lz_uncompress_extract, which can be used to decompress the input byte for byte. And that works without the correct trailer. But only until the last byte of the uncompressed data. If we try to get the last decompressed byte an exception ORA-29294: A data error occurred during compression or uncompression, is raised. But the value of this last byte can derived from the Adler32 checksum we have! function zlib_decompress( p_src in blob ) return blob is t_out blob; t_tmp blob; t_buffer raw(1); t_hdl binary_integer; t_s1 pls_integer; -- s1 part of adler32 checksum t_last_chr pls_integer; begin dbms_lob.createtemporary( t_out, false ); dbms_lob.createtemporary( t_tmp, false ); t_tmp := hextoraw( '1F8B0800000000000003' ); -- gzip header dbms_lob.copy( t_tmp, p_src, dbms_lob.getlength( p_src ) - 2 - 4, 11, 3 ); dbms_lob.append( t_tmp, hextoraw( '0000000000000000' ) ); -- add a fake trailer t_hdl := utl_compress.lz_uncompress_open( t_tmp ); t_s1 := 1; loop begin utl_compress.lz_uncompress_extract( t_hdl, t_buffer ); exception when others then exit; end; dbms_lob.append( t_out, t_buffer ); t_s1 := mod( t_s1 + to_number( rawtohex( t_buffer ), 'xx' ), 65521 ); end loop; t_last_chr := to_number( dbms_lob.substr( p_src, 2, dbms_lob.getlength( p_src ) - 1 ), '0XXX') - t_s1; if t_last_chr < 0 then t_last_chr := t_last_chr + 65521; end if; dbms_lob.append( t_out, hextoraw( to_char( t_last_chr, 'fm0X' ) ) ); if utl_compress.isopen( t_hdl ) then utl_compress.lz_uncompress_close( t_hdl ); end if; dbms_lob.freetemporary( t_tmp ); return t_out; end;
I've written a small package which you can use to unzip a zipfile, see http://technology.amis.nl/blog/8090/parsing-a-microsoft-word-docx-and-unzip-zipfiles-with-plsql