1 2 Previous Next 20 Replies Latest reply: Aug 31, 2008 12:13 PM by 574056 RSS

    Using jaro_winkler_similarity - Possible error

    532672
      I am using the utl_match function jaro_winkler_similarity on a table in a Oracle 10g database installation in order to find possible duplicates.
      In short, the sql statement is:
      select
      p1.id, p1.codi, p2.id, p2.codi
      from
      empreses p1, empreses p2
      where
      p1.id<>p2.id
      and
      utl_match.jaro_winkler_similarity(p1.codi,p2.codi)>=90;

      There are 10000 tuples in this table which definition is:
      describe empreses;
      Name Null? Type
      ----------------------------------------- -------- ----------------------------
      RELEVANCIA FLOAT(126)
      ID NOT NULL NUMBER
      DATA_ALTA DATE
      CODI VARCHAR2(200)

      The query ALWAYS ends with an error:
      ERROR:
      ORA-03113: end-of-file on communication channel

      How could I trace this error? Is there any other function effectively enough to search duplicates?
        • 1. Re: Using jaro_winkler_similarity - Possible error
          Billy~Verreynne
          The ORA-03113: end-of-file on communication channel question is a FAQ - you should have done a little research first.

          This error is a client driver error - that is the OCI saying "hey, the server closed my network connection to it".

          The reason for the server doing this is usually because of an internal error in the Oracle server process servicing that session. It terminates, writes dumps and traces, and tears down any open network connection it owns. Typically this is a result of an ORA-0600 error.

          Thus you will need to look at the Oracle instance's alert log for details of the trace file created by this error, and in the trace file for details on the actual error itself.
          • 2. Re: Using jaro_winkler_similarity - Possible error
            247823
            Hi,

            Let me know

            1) What is utl_match? (package name or procedure name or function name)
            2) What is jaro_winkler_similarity? (package name or procedure name or function name)

            Regards,
            Sailaja
            • 3. Re: Using jaro_winkler_similarity - Possible error
              532672
              UTL_MATCH is an oracle package. jaro_winkler_similarity is a function from that package which measure the difference between two strings. I suppose this function uses a variant of Levenhstein function and I think it's quite good to find similarities between strings.

              There are some problem regarded to that function, but I don't know how to trace it.
              • 4. Re: Using jaro_winkler_similarity - Possible error
                94799
                UTL_MATCH package contains functions for comparing the similarity of strings. The following link provides some description however Oracle does not appear to document it which perhaps brings into question whether it's use is supported.

                http://www.psoug.org/reference/utl_match.html
                • 5. Re: Using jaro_winkler_similarity - Possible error
                  94799
                  Incidentally I failed to recreate your error on 10.2.0.2. What version are you on? Can you post a full example? Also it is possible your error is data related - does query work with any subset of data?
                  Oracle Database 10g Enterprise Edition Release 10.2.0.2.0 - Production
                  With the Partitioning, OLAP and Data Mining options
                  
                  SQL> SELECT a.ename, b.ename,
                    2         UTL_MATCH.JARO_WINKLER_SIMILARITY (
                    3            a.ename, b.ename) jws
                    4  FROM  (SELECT ename, ROWNUM empno
                    5         FROM  (SELECT   ename
                    6                FROM     emp
                    7                 ORDER BY DBMS_RANDOM.VALUE)) a,
                    8        (SELECT ename, ROWNUM empno
                    9         FROM  (SELECT   ename
                   10                FROM     emp
                   11                ORDER BY DBMS_RANDOM.VALUE)) b
                   12  WHERE  b.empno = a.empno;
                  
                  ENAME      ENAME             JWS
                  ---------- ---------- ----------
                  ADAMS      MILLER              0
                  JAMES      ALLEN              60
                  WARD       BLAKE              48
                  CLARK      JAMES              46
                  TURNER     SMITH               0
                  SCOTT      CLARK              46
                  KING       ADAMS               0
                  JONES      FORD               48
                  MILLER     TURNER             55
                  SMITH      SCOTT              64
                  BLAKE      MARTIN             45
                  MARTIN     JONES               0
                  FORD       KING                0
                  ALLEN      WARD               48
                  
                  14 rows selected.
                  
                  SQL> 
                  • 6. Re: Using jaro_winkler_similarity - Possible error
                    94799
                    How could I trace this error?
                    Missed this question - when ORA-03113 occurs typically the process writes a trace file to (to the directory defined by 'user_dump_dest' parameter if I recall). A glance at this trace file may be useful to identified the cause of the error.
                    • 7. Re: Using jaro_winkler_similarity - Possible error
                      532672
                      I think the jaro_winkler_similarity function doesn't work propertly with some characters. I use a UTF8 database, but maybe there are some incorrect conversions from ISO-8859-1 words due to the automatic process responsible for filling the table from different sources.
                      Is it possible that the jaro_winkler function crash with some characters in unicode?
                      • 8. Re: Using jaro_winkler_similarity - Possible error
                        532672
                        The trace file is about 2MB, but maybe the beginning is the most hepful part:

                        *** 2006-09-08 12:38:19.005
                        ksedmp: internal or fatal error
                        ORA-07445: exception encountered: core dump [pi_jaro_winkler_int()+340] [SIGSEGV] [Invalid permissions for mapped object] [0x40FC5000] [] []
                        • 9. Re: Using jaro_winkler_similarity - Possible error
                          532672
                          just a note: the trace shows a "Invalid permissions for mapped object" but i'm logged as sys (sysdba)
                          Another part from the trace file:
                          Exception signal: 11 (SIGSEGV), code: 2 (Invalid permissions for mapped object), addr: 0x40fc5000, PC: [0xb7e761a, pi_jaro_winkler_int()+340]
                          Registers:
                          %eax: 0x00000008 %ebx: 0x40fc4ff8 %ecx: 0x0000004f
                          %edx: 0x00000000 %edi: 0x40ea4844 %esi: 0x0000001c
                          %esp: 0xbfffb210 %ebp: 0xbfffb268 %eip: 0x0b7e761a
                          %efl: 0x00210293
                          pi_jaro_winkler_int()+328 (0xb7e760e) mov 0xffffffd4(%ebp),%ebx
                          pi_jaro_winkler_int()+331 (0xb7e7611) mov %esi,0xffffffd0(%ebp)
                          pi_jaro_winkler_int()+334 (0xb7e7614) mov 0x1c(%ebp),%esi
                          pi_jaro_winkler_int()+337 (0xb7e7617) mov %edi,0xffffffc0(%ebp)
                          pi_jaro_winkler_int()+340 (0xb7e761a) movzb (%ebx,%eax),%edx
                          pi_jaro_winkler_int()+344 (0xb7e761e) add $1,%eax
                          pi_jaro_winkler_int()+347 (0xb7e7621) cmp $49,%edx
                          pi_jaro_winkler_int()+350 (0xb7e7624) jz 0xb7e7748
                          pi_jaro_winkler_int()+356 (0xb7e762a) cmp %esi,%eax
                          • 10. Re: Using jaro_winkler_similarity - Possible error
                            532672
                            By the way, the exact version of my oracle installation is:
                            Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
                            • 11. Re: Using jaro_winkler_similarity - Possible error
                              Billy~Verreynne
                              The error code "pi_jaro_winkler_int()+340" does not turn up anything from the ORA-00600/ORA-07445 Lookup Tool (https://metalink.oracle.com/metalink/plsql/f?p=130:14:2411138155659266148::::)

                              Which could mean that this is an undocumented or unpublished bug. It can be very frustrating and time consuming to attempt to resolve such a bug by oneself. I suggest that you download RDA, run it on your instance, and open an iTar with Oracle support, attaching the trace and RDA files.
                              • 12. Re: Using jaro_winkler_similarity - Possible error
                                94799
                                but i'm logged as sys (sysdba)
                                Well you shouldn't be, but I don't think this has anything to do with Oracle permissions.

                                ORA-07445 is the generic OS equivalent error to ORA-00600, the generic Oracle internal error. As I understand it SIGSEV (aka segmentation fault) is raised by the OS when an invalid memory reference is made, which rather implies a problem with the C library referenced by this function.

                                Does it have anything to do with UNICODE? No idea but it may be interesting to narrow down this to specific data examples which provide evidence for your suggestion.

                                Of course Jaro Winkler is a very cool name for a function and we should all remember to drop it into conversation in development meetings to impress our fellow developers.
                                • 13. Re: Using jaro_winkler_similarity - Possible error
                                  APC
                                  This is quite a neat looking piece of functionality. I note that UTL_MATCH is not documented: does anybody know whether it's supported? Because if it is I might be able to junk an unnecessary EXTPROC from my system by upgrading to 10g.

                                  Cheers, APC
                                  • 14. Re: Using jaro_winkler_similarity - Possible error
                                    32685
                                    Hello

                                    I haven't found anything that explicitly states it is supported, but there seems to be a fair bit of information about it, both electronic and paper publications:

                                    http://www.psoug.org/reference/utl_match.html

                                    http://books.mcgraw-hill.com/downloads/products/0072261943/0072261943_ch01.pdf#search=%22utl_match%22

                                    Re: double metaphone

                                    HTH

                                    David
                                    1 2 Previous Next