Skip navigation

Aggregate Function to Calculate a Hash Code for a Whole Table

score 10
You have not voted. Active

Currently is is possible to calculate a hash of column value (e.g. using function standard_hash).

What is not possible out of the box is to calculate the hash code for all rows of a table column

or even for a whole table.

This is interesting as operating systems supports hash calculation for a whole files, which enables

a simple file comparison. To compare two tables set operation and / or grouping is required, which

is costly (O(N * log(N)). An aggregated hash function (that uses XOR or SUM to combine the row results)

would perform with complexity O(N).

Of course it is possible to implement a user defined aggregate function (which I did - see http://www.db-nemec.com/MD5/CompareTablesUsingMD5Hash.html

the problem is that the context switch required make it unusable for nontrivial tables. So only native Oracle

implementation would profit from the linear algorithm.

 

Simple

 

select MD5_XOR(txt) from foo

 

or

 

select MD5_XOR(col1 ||'.'|| col2 ||'.'|| col3 ) from foo

 

would give a hash value for a table columns or a whole a table for quick compare with an other table.

Unit tests, environment compare etc. would profit.

 

Alternative syntax to calculate a hash value of a whole table or for a list of columns would be (analog to COALESCE function):

 

select MD5_XOR(col1,col2,col3 ) from foo

 

This would perform better and also it would avoid the problem with string length over 4000 character.

 

Actually there is a very similar feature in Oracle already implemented - DBMS_SQLHASH.GETHASH function.

The main idea here is to enable such functionality as an 1) aggregate function and by using of the 2) XOR combination of the row hashes to get rid of the order dependence.

Note that by using the XOR combination of the row hash codes - any ordering of the data produce an identical result.

Comments

Vote history