This discussion is archived
1 2 Previous Next 20 Replies Latest reply: Jan 4, 2013 3:31 PM by Frank Kulash Go to original post RSS
  • 15. Re: Date format
    Frank Kulash Guru
    Currently Being Moderated
    Hi,
    960736 wrote:
    Month Sept & October(Oct) is not working...for all Rows which has Sept & Oct/October it just given the Year

    Example
    Expiry_date             Date_part   Expiry_num
    --------------- -------------- ----------------
    Died Sept. 9, 2007.     2007           2007
    DIED 4 JUNE 1999     JUNE 1999     19990600 
    Died 12 Oct, 1999.      1999            1999
    Died Oct. 26, 1992.       1992         1992
    Died October 2, 1998.     1998         1998
    In the list of month abbreviations, the solution I posted yesterday had a '{' between 'Sep' and 'Oct'; that should be a '|'.
    There's still a problem with all-number dates (such as '11/5/00') and abbreviations longer than 3 letters (such as 'Sept.').

    It's looking more and more like a PL/SQL solution would be best, especially since you're using Oracle 10. Is that acceptable? If so, should it look for the word 'Died' right before the date?
  • 16. Re: Date format
    963739 Newbie
    Currently Being Moderated
    Yes, It has 'Died' word for all the Rows in front of date....
  • 17. Re: Date format
    Frank Kulash Guru
    Currently Being Moderated
    Hi,

    Here's an example of a user-defined function:
    CREATE OR REPLACE FUNCTION    date_died
    (   in_str  IN      VARCHAR2
    )
    RETURN     NUMBER
    DETERMINISTIC
    IS
    --     date_died returns a NUMBER such as 20130104 representing
    --     the date found in in_str immediately after the word 'Died'
    --     (case insensitive).  The date can be in a variety of formats,
    --     and does not need to include the day of the month, or even a
    --     month.  The number returned is always 8 digits long.  It ends
    --     in 00 if the date does not include a day, and it ends in 0000
    --     if the date does not include a month.
    --     When months are spelled out, they may be abrreviated, with
    --     or without a period.  Abbreviations must start with the first
    --     3 letters of the full month name.  (This is standard practice
    --     in English, but watch out if you try to adapt this for other
    --     languages.)
    --     NULL is returned if in_str does not contain the sub-string 
    --     'Died', or if what immediately follows 'Died' can not be
    --     interpreted as a full or partial date.
    
        died_pos     PLS_INTEGER := INSTR ( UPPER (in_str)
                                        , 'DIED'
                             ); 
        return_str     VARCHAR2 (8);               -- String version of number to be returned
        start_str     VARCHAR2 (30);
        str          VARCHAR2 (4000);     
    BEGIN
        IF  died_pos > 0
        THEN      -- 'DIED' found; see if date follows
             str := LTRIM ( SUBSTR (in_str, died_pos + 4)
                          , ' :-'
                   );
         start_str := REGEXP_SUBSTR ( str
                               , '^'          || -- at the very beginning of str
                             '\d{1,2}'          || -- 1 or 2 digits
                             '([-/.])'          || -- a punctuation character (\1)
                             '(\d{1,2}\1)?'     || -- (optional) another 1 or 2 digits and delimiter
                             '\d{2,4}'             -- 2 to 4 digits (year)
                           );
         IF  start_str  IS NULL
         THEN            -- Look for 'Mon. DD, YYYY' or 'DD Mon. YYYY' date
             str := REGEXP_REPLACE ( str
                                 , '[ .,]+'     -- Change all delimiters ...
                          , ' '          -- ... to a single space
                          );
             start_str := REGEXP_SUBSTR ( str
                                          , '^(((\D+ )?(\d+ )?)|(\d+ \D+ ))?\d+'
                               );
             IF start_str  IS NOT NULL
             THEN
              start_str := REGEXP_REPLACE ( start_str
                                      , '([:[:alpha:]]{3})[[:alpha:]]+ '     -- 4 or more non-digits ...
                                 , '\1 '                         -- ... replaced by first 3
                                 );
              IF  REGEXP_LIKE ( start_str
                            , '\D+ \d+ \d+'
                        )
              THEN
                  return_str := TO_CHAR ( TO_DATE ( start_str
                                                       , 'Mon DD RRRR'
                                      )
                               , 'YYYYMMDD'
                               );
              ELSIF  REGEXP_LIKE ( start_str
                                  , '\d+ \D+ \d+'
                           )
              THEN
                  return_str := TO_CHAR ( TO_DATE ( start_str
                                                     , 'DD Mon RRRR'
                                      )
                               , 'YYYYMMDD'
                               );
              ELSIF  REGEXP_LIKE ( start_str
                                  , '\D+ \d+'
                           )
              THEN
                  return_str := TO_CHAR ( TO_DATE ( start_str
                                                       , 'Mon RRRR'
                                      )
                               , 'YYYYMM'
                               ) || '00';
              ELSE
                        return_str := '12345678';
              END IF;
             END IF;
         ELSE            -- start_str is MM/DD/YYYY date (or MM/YYYY)
             IF  REGEXP_LIKE ( start_str
                           , '\D\d+\D'
                       )
             THEN
                 return_str := TO_CHAR ( TO_DATE ( start_str
                                         , 'MM/DD/RRRR'
                                  )
                              , 'YYYYMMDD'
                              );
             ELSE        
                 return_str := TO_CHAR ( TO_DATE ( start_str
                                         , 'MM/RRRR'
                                  )
                              , 'YYYYMM'
                              ) || '00';
             END IF;
         END IF;
        END IF;
    
        RETURN  TO_NUMBER (return_str);
    EXCEPTION
        WHEN OTHERS
        THEN
            IF  SQLCODE  BETWEEN -1865 AND -1839
         THEN          -- Not a valid date
             RETURN NULL;
            ELSE          -- Real error
             RAISE;
         END IF;
    END date_died;
    /
    No doubt it's not as pretty as it could be.

    The queries where you use are look great, however:
    SELECT       expiry_date
    ,       date_died (expiry_date)
    FROM       sample_table
    ORDER BY  expiry_date
    ;
    There may be even other special cases you haven't discovered yet. To help find them, you can do something like this:
    SELECT     expiry_date
    FROM     sample_table
    WHERE     UPPER (expiry_date)     LIKE '%DIED%'
    AND     date_died (expiry_date)     IS NULL
    ;
  • 18. Re: Date format
    963739 Newbie
    Currently Being Moderated
    Thanks a Lot Frank, I thought i will do manually work for some patterns. but it has cleared all the patterns for me...Once again thanks a lot....

    As you asked me....Below are the distinct patterns which are not covered....but those patterns has very less counts ( 1%)


    SELECT distinct expiry_date
    FROM     sample_table
    WHERE     UPPER (expiry_date)     LIKE '%D%'
    AND     date_died (expiry_date)     IS NULL;
    
    
    Expiry_Date
    ------------------------------------
    Deceased Aug. 1998.
    deceased
     Deceased.                    
    Died in August of 1999
    Died.
    Died 0000.
    Deceased
    Died, 1993.
    Died in July 1, 1999
    Deceased.
     Deceased 1991.               
    Died, 1999.
    Died, 1992.
    Deceased June 3, 1994.
     Deseased 1999.               
    Died in 1998.
    Died, no date given
    Died in July of 1999
  • 19. Re: Date format
    Frank Kulash Guru
    Currently Being Moderated
    Hi,
    960736 wrote:
    Thanks a Lot Frank, I thought i will do manually work for some patterns. but it has cleared all the patterns for me...Once again thanks a lot....

    As you asked me....Below are the distinct patterns which are not covered....but those patterns has very less counts ( 1%)
    As I said, you should be able to modfy the function to handle most of the situations below; all of the others are hopeless (e.g. 'deceased' with nothing else, or 'Died 0000').
    SELECT distinct expiry_date
    FROM     sample_table
    WHERE     UPPER (expiry_date)     LIKE '%D%'
    AND     date_died (expiry_date)     IS NULL;
    
    
    Expiry_Date
    ------------------------------------
    Deceased Aug. 1998.
    deceased
    Deceased.                    
    Died in August of 1999
    Died.
    Died 0000.
    Deceased
    Died, 1993.
    Died in July 1, 1999
    Deceased.
    Deceased 1991.               
    Died, 1999.
    Died, 1992.
    Deceased June 3, 1994.
    Deseased 1999.               
    Died in 1998.
    Died, no date given
    Died in July of 1999
    I looks like all of these situations (again, excepting hopless cases like 'Died, no date given') could be handled by changing 'Deceased' (and 'Deseased') in the variable str to 'Died', ignoring the words 'in' and 'of', and checking for just a year after 'Died'. If it's worthwhile, try modifying the function. If you get stuck, post your code, and your sample data.
  • 20. Re: Date format
    963739 Newbie
    Currently Being Moderated
    Yes, I have already started working on it. I think i am almost done with other cases too...

    If i have any prob, i will try to approach you...


    Thanks
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points