4 Replies Latest reply: May 19, 2011 4:52 PM by pollywog RSS

    Splitting Name into Last, First, MI and Suffix

      I am working on loading the content of a DB2 view into an Oracle 10g table. I am having problems with splitting the employee name field into last name, first name and middle initial. Because the input field for storing name on the Mainframe is a free form field, there are a variety of combinations of names. The only thing that I believe is consistent is that Last name is always first and then first name, middle or middle initial or suffix after that. So far, I have identified 11 different combinations of last name, first name, middle initial and suffix. What is the best way to split the name into its respective last name, first name, middle initial and suffix columns? I have attempted several coding changes to split the names, but I believe the result at the best is %85-90 accurate. Here are some of the combinations of names that I have found so far:

      DOE,JOHN D


      DOE, JOHN D.

      DOE, JOHN JR

      DOE, JOHN D JR

      DOE, JOHN D. JR

      DOE JR, JOHN D

      DOE, JOHN D.JR

      DOE, JOHN, JR.

      Does anyone have any suggestions how to split the name? If it is not possible to foresee every possible scenario and code for it, then I will load the name into a single name column instead of splitting it.

      Thank you,

        • 1. Re: Splitting Name into Last, First, MI and Suffix
          In general, splitting a name is a hard problem. Getting more than 90% accurate is quite challenging.

          - People have multiple middle names (I have two)
          - People have no middle name
          - People have spaces in their first and last names (John de la Cruz, for example)

          And that's before we start to deal with different ways that someone might have chosen to format the data.

          If you don't have the data stored in separate columns anywhere in the source system, you're generally better off not trying to parse it out.

          • 2. Re: Splitting Name into Last, First, MI and Suffix
            Frank Kulash

            Here's one way:
            WITH     got_first_and_middle_name     AS
                 SELECT     REGEXP_SUBSTR (employee_name, '[^,]+', 1, 1)     AS last_name
                 ,     REGEXP_SUBSTR (employee_name, '[^,]+', 1, 2)     AS first_and_middle_name
                 ,     REGEXP_SUBSTR (employee_name, '[^,]+', 1, 3)     AS suffix
                 FROM     table_x
            SELECT     last_name
            ,     REGEXP_SUBSTR  ( first_and_middle_name, '^[^ ]*')     AS first_name
            ,     REGEXP_REPLACE ( first_and_middle_name || ' '
                             , '^[^ ]* +(.*)'
                             , '\1'
                             )                         AS middle_name
            ,     suffix
            FROM     got_first_and_middle_name
            • 3. Re: Splitting Name into Last, First, MI and Suffix
              Ganesh Srivatsav
              If these are your scenarios, you can do it like this, You should work on case to case basis if other scenarios exists.
              SQL> WITH T AS (SELECT 'DOE,JOHN D' str FROM dual UNION ALL
                2  SELECT 'DOE JOHN DAVID'  FROM dual UNION ALL
                3  SELECT 'DOE, JOHN D.'    FROM dual UNION ALL
                4  SELECT 'DOE, JOHN JR'    FROM dual UNION ALL
                5  SELECT 'DOE, JOHN D JR'  FROM dual UNION ALL
                6  SELECT 'DOE, JOHN D. SR' FROM dual UNION ALL
                7  SELECT 'DOE JR, JOHN D'  FROM dual UNION ALL
                8  SELECT 'DOE, JOHN D.JR'  FROM dual UNION ALL
                9  SELECT 'DOE, JOHN, MR.'  FROM dual )
               10  ,t1 AS (SELECT REGEXP_REPLACE(str,'( |\.|,)(JR|MR|MS|SR)(\.|,|$)') str1,str FROM T)
               11  SELECT REGEXP_SUBSTR(str1,'[^, .]+',1,1) last_name,
               12         REGEXP_SUBSTR(str1,'[^, .]+',1,2) first_name,
               13         REGEXP_SUBSTR(str1,'[^, .]+',1,3) middle_name,
               14         TRANSLATE(REGEXP_SUBSTR(str,'( |\.|,)(JR|MR|MS|SR)(\.|,|$)',1,1),'A ,.','A') suffix
               15    FROM T1;
              LAST_NAME       FIRST_NAME      MIDDLE_NAME     SUFFIX
              --------------- --------------- --------------- ---------------
              DOE             JOHN            D
              DOE             JOHN            DAVID
              DOE             JOHN            D
              DOE             JOHN                            JR
              DOE             JOHN            D               JR
              DOE             JOHN            D               SR
              DOE             JOHN            D               JR
              DOE             JOHN            D               JR
              DOE             JOHN                            MR
              9 rows selected.
              • 4. Re: Splitting Name into Last, First, MI and Suffix
                /* Formatted on 5/19/2011 12:46:45 PM (QP5 v5.149.1003.31008) */
                SELECT t.COLUMN_VALUE nm,
                       REGEXP_SUBSTR (replace(t.COLUMN_VALUE,'JR'),
                                REGEXP_SUBSTR (replace(t.COLUMN_VALUE,'JR'),
                                      2) first_name,
                                      REGEXP_SUBSTR (replace(t.COLUMN_VALUE,'JR'),
                                      3) middle_name
                  FROM TABLE (sys.odcivarchar2list ('DOE,JOHN D',
                                                    'DOE JOHN DAVID',
                                                    'DOE, JOHN D.',
                                                    'DOE, JOHN JR',
                                                    'DOE, JOHN D JR',
                                                    'DOE, JOHN D. JR',
                                                    'DOE JR, JOHN D',
                                                    'DOE, JOHN D.JR',
                                                    'DOE, JOHN, JR.')) t;
                NM     LAST_NAME     FIRST_NAME     MIDDLE_NAME
                DOE,JOHN D     DOE     JOHN     D
                DOE JOHN DAVID     DOE     JOHN     DAVID
                DOE, JOHN D.     DOE     JOHN     D
                DOE, JOHN JR     DOE     JOHN     
                DOE, JOHN D JR     DOE     JOHN     D
                DOE, JOHN D. JR     DOE     JOHN     D
                DOE JR, JOHN D     DOE     JOHN     D
                DOE, JOHN D.JR     DOE     JOHN     D
                DOE, JOHN, JR.     DOE     JOHN