This content has been marked as final. Show 7 replies
Due to the number of permutations of any given name, even using SOUNDEX is not going to be 100% fool-proof. There are also "mis-spellings" of names that while sound the same are, in fact different. Take Smith, Smyth, etc... while "mis-spelled" can be different people. You should investigate using some other method of identifying individuals.1 person found this helpful
UTL_MATCH (either the Jaro-Winkler or the edit distance functions) would generally be preferred. SOUNDEX is a less sophisticated algorithm.1 person found this helpful
In the general case, however, doing this sort of thing yourself is extremely difficult. There are commercial products out there that just help you do fuzzy matching on names. If you're going to build something yourself, you're likely going to spend a large amount of time trying to fine-tune the algorithm to try to balance type 1 (false positive, you match names that you shouldn't) and type 2 errors (false negative, you fail to match names that you should). To do it well will require a rather large number of meetings with users trying to figure out the appropriate balance of errors in your particular environment.
thank you for your answer
there is unfortunately no other way then using names (no other identifiers are available for these people)
i was thinking about giving it a 'weight' on how likley they are the same people or not;
other than that i dont know how this can be done better
Due to the number of permutations of any given name, even using SOUNDEX is not going to be 100% fool-proof. There are also "mis-spellings" of names that while sound the same are, in fact different. Take Smith, Smyth, etc... while "mis-spelled" can be different people. You should investigate using some other method of identifying individuals.
UserMB wrote:There is no way you can do this with just names. Names simply aren't a UID. You need other factors for certainty.
can you advise what the best approach would be to uniquely identify the same people?
so far i have only found this functions :utl_match.jaro_winkler_similarity and SOUNDEXThose are the two generic functions that Oracle offers out of the box. Unless you're on 11g in which case you should have a look at the Name Search functionality which comes with Oracle Text. [url http://docs.oracle.com/cd/E11882_01/text.112/e24435/search.htm#CCAPP9534]Find out more.
What about exactly the opposite situation ... two distinctly different people who happen to have exactly the same name?
Take a look at http://names.mongabay.com/most_common_surnames.htm
You could play with combined frequencies for first names (follow the Female First Names and Male First Names links therein) and surnames (like the Optimizer does with the Database statistical data) to get the expected cardinalities of how many different persons can have the same name (misspelled or not).
There are no exceptions around here too: http://en.wikipedia.org/wiki/List_of_the_most_common_surnames_in_Europe
Anyway not as rare as two children of the same mother having the same birth date and not being twins ;)