1 2 Previous Next 20 Replies Latest reply: Jan 4, 2013 5:31 PM by Frank Kulash Go to original post RSS
      • 15. Re: Date format
        Frank Kulash
        Hi,
        960736 wrote:
        Month Sept & October(Oct) is not working...for all Rows which has Sept & Oct/October it just given the Year

        Example
        Expiry_date             Date_part   Expiry_num
        --------------- -------------- ----------------
        Died Sept. 9, 2007.     2007           2007
        DIED 4 JUNE 1999     JUNE 1999     19990600 
        Died 12 Oct, 1999.      1999            1999
        Died Oct. 26, 1992.       1992         1992
        Died October 2, 1998.     1998         1998
        In the list of month abbreviations, the solution I posted yesterday had a '{' between 'Sep' and 'Oct'; that should be a '|'.
        There's still a problem with all-number dates (such as '11/5/00') and abbreviations longer than 3 letters (such as 'Sept.').

        It's looking more and more like a PL/SQL solution would be best, especially since you're using Oracle 10. Is that acceptable? If so, should it look for the word 'Died' right before the date?
        • 16. Re: Date format
          963739
          Yes, It has 'Died' word for all the Rows in front of date....
          • 17. Re: Date format
            Frank Kulash
            Hi,

            Here's an example of a user-defined function:
            CREATE OR REPLACE FUNCTION    date_died
            (   in_str  IN      VARCHAR2
            )
            RETURN     NUMBER
            DETERMINISTIC
            IS
            --     date_died returns a NUMBER such as 20130104 representing
            --     the date found in in_str immediately after the word 'Died'
            --     (case insensitive).  The date can be in a variety of formats,
            --     and does not need to include the day of the month, or even a
            --     month.  The number returned is always 8 digits long.  It ends
            --     in 00 if the date does not include a day, and it ends in 0000
            --     if the date does not include a month.
            --     When months are spelled out, they may be abrreviated, with
            --     or without a period.  Abbreviations must start with the first
            --     3 letters of the full month name.  (This is standard practice
            --     in English, but watch out if you try to adapt this for other
            --     languages.)
            --     NULL is returned if in_str does not contain the sub-string 
            --     'Died', or if what immediately follows 'Died' can not be
            --     interpreted as a full or partial date.
            
                died_pos     PLS_INTEGER := INSTR ( UPPER (in_str)
                                                , 'DIED'
                                     ); 
                return_str     VARCHAR2 (8);               -- String version of number to be returned
                start_str     VARCHAR2 (30);
                str          VARCHAR2 (4000);     
            BEGIN
                IF  died_pos > 0
                THEN      -- 'DIED' found; see if date follows
                     str := LTRIM ( SUBSTR (in_str, died_pos + 4)
                                  , ' :-'
                           );
                 start_str := REGEXP_SUBSTR ( str
                                       , '^'          || -- at the very beginning of str
                                     '\d{1,2}'          || -- 1 or 2 digits
                                     '([-/.])'          || -- a punctuation character (\1)
                                     '(\d{1,2}\1)?'     || -- (optional) another 1 or 2 digits and delimiter
                                     '\d{2,4}'             -- 2 to 4 digits (year)
                                   );
                 IF  start_str  IS NULL
                 THEN            -- Look for 'Mon. DD, YYYY' or 'DD Mon. YYYY' date
                     str := REGEXP_REPLACE ( str
                                         , '[ .,]+'     -- Change all delimiters ...
                                  , ' '          -- ... to a single space
                                  );
                     start_str := REGEXP_SUBSTR ( str
                                                  , '^(((\D+ )?(\d+ )?)|(\d+ \D+ ))?\d+'
                                       );
                     IF start_str  IS NOT NULL
                     THEN
                      start_str := REGEXP_REPLACE ( start_str
                                              , '([:[:alpha:]]{3})[[:alpha:]]+ '     -- 4 or more non-digits ...
                                         , '\1 '                         -- ... replaced by first 3
                                         );
                      IF  REGEXP_LIKE ( start_str
                                    , '\D+ \d+ \d+'
                                )
                      THEN
                          return_str := TO_CHAR ( TO_DATE ( start_str
                                                               , 'Mon DD RRRR'
                                              )
                                       , 'YYYYMMDD'
                                       );
                      ELSIF  REGEXP_LIKE ( start_str
                                          , '\d+ \D+ \d+'
                                   )
                      THEN
                          return_str := TO_CHAR ( TO_DATE ( start_str
                                                             , 'DD Mon RRRR'
                                              )
                                       , 'YYYYMMDD'
                                       );
                      ELSIF  REGEXP_LIKE ( start_str
                                          , '\D+ \d+'
                                   )
                      THEN
                          return_str := TO_CHAR ( TO_DATE ( start_str
                                                               , 'Mon RRRR'
                                              )
                                       , 'YYYYMM'
                                       ) || '00';
                      ELSE
                                return_str := '12345678';
                      END IF;
                     END IF;
                 ELSE            -- start_str is MM/DD/YYYY date (or MM/YYYY)
                     IF  REGEXP_LIKE ( start_str
                                   , '\D\d+\D'
                               )
                     THEN
                         return_str := TO_CHAR ( TO_DATE ( start_str
                                                 , 'MM/DD/RRRR'
                                          )
                                      , 'YYYYMMDD'
                                      );
                     ELSE        
                         return_str := TO_CHAR ( TO_DATE ( start_str
                                                 , 'MM/RRRR'
                                          )
                                      , 'YYYYMM'
                                      ) || '00';
                     END IF;
                 END IF;
                END IF;
            
                RETURN  TO_NUMBER (return_str);
            EXCEPTION
                WHEN OTHERS
                THEN
                    IF  SQLCODE  BETWEEN -1865 AND -1839
                 THEN          -- Not a valid date
                     RETURN NULL;
                    ELSE          -- Real error
                     RAISE;
                 END IF;
            END date_died;
            /
            No doubt it's not as pretty as it could be.

            The queries where you use are look great, however:
            SELECT       expiry_date
            ,       date_died (expiry_date)
            FROM       sample_table
            ORDER BY  expiry_date
            ;
            There may be even other special cases you haven't discovered yet. To help find them, you can do something like this:
            SELECT     expiry_date
            FROM     sample_table
            WHERE     UPPER (expiry_date)     LIKE '%DIED%'
            AND     date_died (expiry_date)     IS NULL
            ;
            • 18. Re: Date format
              963739
              Thanks a Lot Frank, I thought i will do manually work for some patterns. but it has cleared all the patterns for me...Once again thanks a lot....

              As you asked me....Below are the distinct patterns which are not covered....but those patterns has very less counts ( 1%)


              SELECT distinct expiry_date
              FROM     sample_table
              WHERE     UPPER (expiry_date)     LIKE '%D%'
              AND     date_died (expiry_date)     IS NULL;
              
              
              Expiry_Date
              ------------------------------------
              Deceased Aug. 1998.
              deceased
               Deceased.                    
              Died in August of 1999
              Died.
              Died 0000.
              Deceased
              Died, 1993.
              Died in July 1, 1999
              Deceased.
               Deceased 1991.               
              Died, 1999.
              Died, 1992.
              Deceased June 3, 1994.
               Deseased 1999.               
              Died in 1998.
              Died, no date given
              Died in July of 1999
              • 19. Re: Date format
                Frank Kulash
                Hi,
                960736 wrote:
                Thanks a Lot Frank, I thought i will do manually work for some patterns. but it has cleared all the patterns for me...Once again thanks a lot....

                As you asked me....Below are the distinct patterns which are not covered....but those patterns has very less counts ( 1%)
                As I said, you should be able to modfy the function to handle most of the situations below; all of the others are hopeless (e.g. 'deceased' with nothing else, or 'Died 0000').
                SELECT distinct expiry_date
                FROM     sample_table
                WHERE     UPPER (expiry_date)     LIKE '%D%'
                AND     date_died (expiry_date)     IS NULL;
                
                
                Expiry_Date
                ------------------------------------
                Deceased Aug. 1998.
                deceased
                Deceased.                    
                Died in August of 1999
                Died.
                Died 0000.
                Deceased
                Died, 1993.
                Died in July 1, 1999
                Deceased.
                Deceased 1991.               
                Died, 1999.
                Died, 1992.
                Deceased June 3, 1994.
                Deseased 1999.               
                Died in 1998.
                Died, no date given
                Died in July of 1999
                I looks like all of these situations (again, excepting hopless cases like 'Died, no date given') could be handled by changing 'Deceased' (and 'Deseased') in the variable str to 'Died', ignoring the words 'in' and 'of', and checking for just a year after 'Died'. If it's worthwhile, try modifying the function. If you get stuck, post your code, and your sample data.
                • 20. Re: Date format
                  963739
                  Yes, I have already started working on it. I think i am almost done with other cases too...

                  If i have any prob, i will try to approach you...


                  Thanks
                  1 2 Previous Next