Database Software

1 error has occurred

Your session has timed out.

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

Aggregate Function to Calculate a Hash Code for a Whole Table

nemecjDec 17 2017 — edited Jun 10 2018

Currently is is possible to calculate a hash of column value (e.g. using function standard_hash).

What is not possible out of the box is to calculate the hash code for all rows of a table column

or even for a whole table.

This is interesting as operating systems supports hash calculation for a whole files, which enables

a simple file comparison. To compare two tables set operation and / or grouping is required, which

is costly (O(N * log(N)). An aggregated hash function (that uses XOR or SUM to combine the row results)

would perform with complexity O(N).

Of course it is possible to implement a user defined aggregate function (which I did - see http://www.db-nemec.com/MD5/CompareTablesUsingMD5Hash.html

the problem is that the context switch required make it unusable for nontrivial tables. So only native Oracle

implementation would profit from the linear algorithm.

Simple

select MD5_XOR(txt) from foo

select MD5_XOR(col1 ||'.'|| col2 ||'.'|| col3 ) from foo

would give a hash value for a table columns or a whole a table for quick compare with an other table.

Unit tests, environment compare etc. would profit.

Alternative syntax to calculate a hash value of a whole table or for a list of columns would be (analog to COALESCE function):

select MD5_XOR(col1,col2,col3 ) from foo

This would perform better and also it would avoid the problem with string length over 4000 character.

Actually there is a very similar feature in Oracle already implemented - DBMS_SQLHASH.GETHASH function.

The main idea here is to enable such functionality as an 1) aggregate function and by using of the 2) XOR combination of the row hashes to get rid of the order dependence.

Note that by using the XOR combination of the row hash codes - any ordering of the data produce an identical result.

Processing

Added on Dec 17 2017

#active, #database-ideas, #database-ideas-ideas

9 comments

5,443 views

Database Software

Aggregate Function to Calculate a Hash Code for a Whole Table

Comments

Post Details