This discussion is archived
4 Replies Latest reply: May 19, 2011 9:42 AM by Ganesh Srivatsav RSS

Splitting Name into Last, First, MI and Suffix

Seyed_G Newbie
Currently Being Moderated
All,
I am working on loading the content of a DB2 view into an Oracle 10g table. I am having problems with splitting the employee name field into last name, first name and middle initial. Because the input field for storing name on the Mainframe is a free form field, there are a variety of combinations of names. The only thing that I believe is consistent is that Last name is always first and then first name, middle or middle initial or suffix after that. So far, I have identified 11 different combinations of last name, first name, middle initial and suffix. What is the best way to split the name into its respective last name, first name, middle initial and suffix columns? I have attempted several coding changes to split the names, but I believe the result at the best is %85-90 accurate. Here are some of the combinations of names that I have found so far:


DOE,JOHN D

DOE JOHN DAVID

DOE, JOHN D.

DOE, JOHN JR

DOE, JOHN D JR

DOE, JOHN D. JR

DOE JR, JOHN D

DOE, JOHN D.JR

DOE, JOHN, JR.


Does anyone have any suggestions how to split the name? If it is not possible to foresee every possible scenario and code for it, then I will load the name into a single name column instead of splitting it.


Thank you,

Seyed
  • 1. Re: Splitting Name into Last, First, MI and Suffix
    Justin Cave Oracle ACE
    Currently Being Moderated
    In general, splitting a name is a hard problem. Getting more than 90% accurate is quite challenging.

    - People have multiple middle names (I have two)
    - People have no middle name
    - People have spaces in their first and last names (John de la Cruz, for example)

    And that's before we start to deal with different ways that someone might have chosen to format the data.

    If you don't have the data stored in separate columns anywhere in the source system, you're generally better off not trying to parse it out.

    Justin
  • 2. Re: Splitting Name into Last, First, MI and Suffix
    Frank Kulash Guru
    Currently Being Moderated
    Hi,

    Here's one way:
    WITH     got_first_and_middle_name     AS
    (
         SELECT     REGEXP_SUBSTR (employee_name, '[^,]+', 1, 1)     AS last_name
         ,     REGEXP_SUBSTR (employee_name, '[^,]+', 1, 2)     AS first_and_middle_name
         ,     REGEXP_SUBSTR (employee_name, '[^,]+', 1, 3)     AS suffix
         FROM     table_x
    )
    SELECT     last_name
    ,     REGEXP_SUBSTR  ( first_and_middle_name, '^[^ ]*')     AS first_name
    ,     REGEXP_REPLACE ( first_and_middle_name || ' '
                     , '^[^ ]* +(.*)'
                     , '\1'
                     )                         AS middle_name
    ,     suffix
    FROM     got_first_and_middle_name
    ;
  • 3. Re: Splitting Name into Last, First, MI and Suffix
    Ganesh Srivatsav Guru
    Currently Being Moderated
    If these are your scenarios, you can do it like this, You should work on case to case basis if other scenarios exists.
    SQL> WITH T AS (SELECT 'DOE,JOHN D' str FROM dual UNION ALL
      2  SELECT 'DOE JOHN DAVID'  FROM dual UNION ALL
      3  SELECT 'DOE, JOHN D.'    FROM dual UNION ALL
      4  SELECT 'DOE, JOHN JR'    FROM dual UNION ALL
      5  SELECT 'DOE, JOHN D JR'  FROM dual UNION ALL
      6  SELECT 'DOE, JOHN D. SR' FROM dual UNION ALL
      7  SELECT 'DOE JR, JOHN D'  FROM dual UNION ALL
      8  SELECT 'DOE, JOHN D.JR'  FROM dual UNION ALL
      9  SELECT 'DOE, JOHN, MR.'  FROM dual )
     10  ,t1 AS (SELECT REGEXP_REPLACE(str,'( |\.|,)(JR|MR|MS|SR)(\.|,|$)') str1,str FROM T)
     11  SELECT REGEXP_SUBSTR(str1,'[^, .]+',1,1) last_name,
     12         REGEXP_SUBSTR(str1,'[^, .]+',1,2) first_name,
     13         REGEXP_SUBSTR(str1,'[^, .]+',1,3) middle_name,
     14         TRANSLATE(REGEXP_SUBSTR(str,'( |\.|,)(JR|MR|MS|SR)(\.|,|$)',1,1),'A ,.','A') suffix
     15    FROM T1;
    
    LAST_NAME       FIRST_NAME      MIDDLE_NAME     SUFFIX
    --------------- --------------- --------------- ---------------
    DOE             JOHN            D
    DOE             JOHN            DAVID
    DOE             JOHN            D
    DOE             JOHN                            JR
    DOE             JOHN            D               JR
    DOE             JOHN            D               SR
    DOE             JOHN            D               JR
    DOE             JOHN            D               JR
    DOE             JOHN                            MR
    
    9 rows selected.
    
    SQL> 
    SQL> 
  • 4. Re: Splitting Name into Last, First, MI and Suffix
    pollywog Expert
    Currently Being Moderated
    /* Formatted on 5/19/2011 12:46:45 PM (QP5 v5.149.1003.31008) */
    SELECT t.COLUMN_VALUE nm,
           REGEXP_SUBSTR (replace(t.COLUMN_VALUE,'JR'),
                          '[^[:punct:][:blank:]]+',
                          1,
                          1)
              last_name,
                    REGEXP_SUBSTR (replace(t.COLUMN_VALUE,'JR'),
                          '[^[:punct:][:blank:]]+',
                          1,
                          2) first_name,
                          REGEXP_SUBSTR (replace(t.COLUMN_VALUE,'JR'),
                          '[^[:punct:][:blank:]]+',
                          1,
                          3) middle_name
      FROM TABLE (sys.odcivarchar2list ('DOE,JOHN D',
                                        'DOE JOHN DAVID',
                                        'DOE, JOHN D.',
                                        'DOE, JOHN JR',
                                        'DOE, JOHN D JR',
                                        'DOE, JOHN D. JR',
                                        'DOE JR, JOHN D',
                                        'DOE, JOHN D.JR',
                                        'DOE, JOHN, JR.')) t;
    NM     LAST_NAME     FIRST_NAME     MIDDLE_NAME
    DOE,JOHN D     DOE     JOHN     D
    DOE JOHN DAVID     DOE     JOHN     DAVID
    DOE, JOHN D.     DOE     JOHN     D
    DOE, JOHN JR     DOE     JOHN     
    DOE, JOHN D JR     DOE     JOHN     D
    DOE, JOHN D. JR     DOE     JOHN     D
    DOE JR, JOHN D     DOE     JOHN     D
    DOE, JOHN D.JR     DOE     JOHN     D
    DOE, JOHN, JR.     DOE     JOHN     

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points