This discussion is archived
1 2 Previous Next 20 Replies Latest reply: Jan 4, 2013 3:31 PM by Frank Kulash RSS

Date format

963739 Newbie
Currently Being Moderated
Hi All,

Happy New Year to Everyone.....


I have column name (EXPIRY_DATE)
ExPIRY_DATE
--------------------------
Died Aug. 20, 2012.
Died July 20, 2012
Died July 28, 2012.
Died July 27, 2012.
Died Feb. 7, 1999.
Died June 27, 2004.
Died Nov. 7, 2004.
Died Nov. 12, 2001.
Died June 20, 2003.
Died Dec. 3, 2003
Died Feb. 5, 2004.
I want to Trim out Died and have the Date Format MMDDYYYY format.


Thanks
  • 1. Re: Date format
    Frank Kulash Guru
    Currently Being Moderated
    Hi,

    You can do something like:
    TO_CHAR ( TO_DATE ( RTRIM ( SUBSTR (expiry_date, 6, 3) ||
                             SUBSTR (expiry_date, 10)  
                     , '.'
                     )
                , 'MonDD, YYYY'
                )
         , 'MMDDYYYY'
         )
    If you'd care to post CREATE TABLE and INSERT statements for the sample data, and the results you want, then I could test this.

    Problems like this only occur when you store date information in something other than a DATE (or TIMESTAMP) column. The best solution is to use the right data type. You might need a separate column for the non-date information, such as 'Died '.

    Given that you have the data in a string, and given that you need a string for output, the basic approach is to call TO_DATE to convert the original string into a DATE, and then use TO_CHAR to convert it back in the proper format. In this case, you'll have problems because
    (a) There are always 5 characters you need to ignore at the beginning of the string. (The first SUBSTR takes care of this).
    (b) Some of the months are abbreviated, but the 4-character months are spelled out. (The second SUBSTR takes care of this. This only works in English, and only for June and July, because the first 3 letters of the full name are the abbreviation.)
    (c) Sometimes (not always) there is a gratuitous '.' at the end of the string. (RTRIM takes care of this.)

    Sometimes having 1 digit for the day, and sometimes having 2 digits, is not a problem for TO_DATE, but it is a problem using for getting rid of the '.' at then end. Because we don't know exactly how long the day is, we don't know exacly where the year will end, so it's hard to use SUBSTR to remove the '.'.
  • 2. Re: Date format
    ranit B Expert
    Currently Being Moderated
    please try this,...
    Ranit>> 
    Ranit>> with xx as(
      2       select 'Died Aug. 20, 2012.' col from dual UNION ALL
      3       select 'Died July 20, 2012' col from dual UNION ALL
      4       select 'Died July 28, 2012.' col from dual UNION ALL
      5       select 'Died July 27, 2012.' col from dual UNION ALL
      6       select 'Died Feb. 7, 1999.' col from dual UNION ALL
      7       select 'Died Nov. 7, 2004.' col from dual UNION ALL
      8       select 'Died June 27, 2004.' col from dual UNION ALL
      9       select 'Died Nov. 12, 2001.' col from dual UNION ALL
     10       select 'Died June 20, 2003.' col from dual UNION ALL
     11       select 'Died Dec. 3, 2003' col from dual UNION ALL
     12       select 'Died Feb. 5, 2004.' col from dual
     13  ), x1 as(
     14       select REPLACE(replace(col,'.'),'Died ') col from xx
     15  )
     16  select TO_CHAR(to_date(col,'Mon DD, YYYY'),'MMDDYYYY') from x1;
    
    TO_CHAR(                                                                                                                                                                                                                    
    --------                                                                                                                                                                                                                    
    08202012                                                                                                                                                                                                                    
    07202012                                                                                                                                                                                                                    
    07282012                                                                                                                                                                                                                    
    07272012                                                                                                                                                                                                                    
    02071999                                                                                                                                                                                                                    
    11072004                                                                                                                                                                                                                    
    06272004                                                                                                                                                                                                                    
    11122001                                                                                                                                                                                                                    
    06202003                                                                                                                                                                                                                    
    12032003                                                                                                                                                                                                                    
    02052004                                                                                                                                                                                                                    
    
    11 rows selected.
    Edited by: ranit B on Jan 3, 2013 1:54 AM
    -- o/p added
  • 3. Re: Date format
    IckyIckyChiMoon Explorer
    Currently Being Moderated
    Here's one way to do it. Notice that the first part just cleans up the date, then the to_char and to_date just reformat it as desired.
    with  
      w_data  as
        ( select  Null as dttxt from dual where 1=0
          union all select 'Died Aug. 20, 2012.' from dual
          union all select 'Died July 20, 2012'  from dual
          union all select 'Died July 28, 2012.' from dual
          union all select 'Died July 27, 2012.' from dual
          union all select 'Died Feb. 7, 1999.'  from dual
          union all select 'Died June 27, 2004.' from dual
          union all select 'Died Nov. 7, 2004.'  from dual
          union all select 'Died Nov. 12, 2001.' from dual
          union all select 'Died June 20, 2003.' from dual
          union all select 'Died Dec. 3, 2003'   from dual
          union all select 'Died Feb. 5, 2004.'  from dual
        )
    select  dttxt
         ,                   substr(dttxt,6,3) || translate(substr(dttxt,instr(dttxt,' ',1,2)),'x,.','x')   as cleaned_up
         ,  to_char(to_date( substr(dttxt,6,3) || translate(substr(dttxt,instr(dttxt,' ',1,2)),'x,.','x'), 'Mon DD YYYY'), 'MMDDYYYY')   as rslt
      from  w_data
    ;
  • 4. Re: Date format
    Solomon Yakobson Guru
    Currently Being Moderated
    TO_DATE understands "non-format" text too. You just need to enclose it in double quotes:
    with t as(
              select 'Died Aug. 20, 2012.' c from dual UNION ALL
              select 'Died July 20, 2012' from dual UNION ALL
              select 'Died July 28, 2012.' from dual UNION ALL
              select 'Died July 27, 2012.' from dual UNION ALL
              select 'Died Feb. 7, 1999.' from dual UNION ALL
              select 'Died Nov. 7, 2004.' from dual UNION ALL
              select 'Died June 27, 2004.' from dual UNION ALL
              select 'Died Nov. 12, 2001.' from dual UNION ALL
              select 'Died June 20, 2003.' from dual UNION ALL
              select 'Died Dec. 3, 2003' from dual UNION ALL
              select 'Died Feb. 5, 2004.' from dual
             )
    select  to_char(to_date(c,'"Died "Mon DD, YYYY.'),'MMDDYYYY')
      from  t
    /
    
    TO_CHAR(
    --------
    08202012
    07202012
    07282012
    07272012
    02071999
    11072004
    06272004
    11122001
    06202003
    12032003
    02052004
    
    11 rows selected.
    
    SQL> 
    SY.
  • 5. Re: Date format
    963739 Newbie
    Currently Being Moderated
    Thanks Frank,
    WITH Sample_table AS (SELECT   'Died Aug. 20, 2012.' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   ' Died May 22, 1989, buried in ' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   'Died July 28, 2012.' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   'Died July 27, 2012.' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   ' Died March, 1998. ' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   'Died Nov. 7, 2004.' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   'Died June 27, 2004.' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   'Died Nov. 12, 2001.' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   'Died June 20, 2003.' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   'Died Dec. 3, 2003' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   ' Died 1995.  ' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   ' Died Aug 27 1998.    ' EXPIRY_DATE FROM DUAL)
    The Actual EXPIRY_DATE Column is a Varchar2 data type and the result column will be a Number data-type


    I have tried to add all the Patterns which i came accross................

    Edited by: 960736 on Jan 2, 2013 1:02 PM
  • 6. Re: Date format
    ranit B Expert
    Currently Being Moderated
    Did you check the query i provided?
    Does that suit your requirement?

    Hey, this was not your initial requirement?
    I don't understand why people don't post their actual requirement.

    Edited by: ranit B on Jan 3, 2013 2:40 AM
  • 7. Re: Date format
    963739 Newbie
    Currently Being Moderated
    Thanks all,

    I got it....I have lot of patterns to work on....So i am dividing them into patterns and updating the final column....
  • 8. Re: Date format
    Frank Kulash Guru
    Currently Being Moderated
    Hi,
    960736 wrote:
    WITH Sample_table AS (SELECT   'Died Aug. 20, 2012.' EXPIRY_DATE FROM DUAL
    UNION ALL
    SELECT   ' Died 1995.  ' EXPIRY_DATE FROM DUAL
    UNION ALL
    SELECT   ' Died Aug 27 1998.    ' EXPIRY_DATE FROM DUAL)
    Thanks for posing the sample data. Don't forget to post the results you want from that data. In particular, what results do you want from 'Died 1995' without a month or day? 1995? 19950000? 19950101? NULL? Something else?
    I have tried to add all the Patterns which i came accross................
    In this sample data, it looks like the date part you want to keep always include a 4-digit year. Can we count on that? If there are 2 (or more) 4-digit numbers, is it safe to take the first one? E.g. if expiry_date='Born December 6, 1920; died Dec. 5 2012', would you want 19201206?

    What version or Oracle are you using? The new form of REGEXP_SUBSTR (introduced in Oracle 11.1) might be useful here.
  • 9. Re: Date format
    963739 Newbie
    Currently Being Moderated
    Ranit...

    I am sorry, i have initially posted with my sample data which i came across, but when i dig into it i came across lot of patterns which i need to work on and it has like 400 Million records in it...So its taking some time for me to post total requirement.


    Frank,

    For the data which has only Year, we are updating as it is....intially the final column was in date Data type....but eventually we changed to Number Data type...which will be easy for most of unusual patterns.
    In this sample data, it looks like the date part you want to keep always include a 4-digit year. Can we count on that? If there are 2 (or more) 4-digit numbers, is it safe to take the first one? E.g. if expiry_date='Born December 6, 1920; died Dec. 5 2012', would you want 19201206?
    I have checked with if i have patterns like expiry_date='Born December 6, 1920; died Dec. 5 2012', But i couldn't come across... I see all having starting word as Died...By chance if i found, then i need to use Died date itself not Birth date...
  • 10. Re: Date format
    Solomon Yakobson Guru
    Currently Being Moderated
    WITH Sample_table AS (SELECT   'Died Aug. 20, 2012.' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   ' Died May 22, 1989, buried in ' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   'Died July 28, 2012.' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   'Died July 27, 2012.' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   ' Died March, 1998. ' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   'Died Nov. 7, 2004.' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   'Died June 27, 2004.' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   'Died Nov. 12, 2001.' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   'Died June 20, 2003.' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   'Died Dec. 3, 2003' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   ' Died 1995.  ' EXPIRY_DATE FROM DUAL
                UNION ALL
                SELECT   ' Died Aug 27 1998.    ' EXPIRY_DATE FROM DUAL)
    select  to_char(
                    to_date(
                            regexp_replace(
                                           regexp_replace(
                                                          regexp_replace(
                                                                         expiry_date,
                                                                         '^ *Died (\d)','Died Jan. 1, \1'
                                                                        ),
                                                          '^( *Died [^ ]+),','\1 1,'
                                                         ),
                                           '^ *Died (...)[^ ]* (\d{1,2}),* (\d{4}).*$','\2-\1-\3'
                                          ),
                           'dd-mon-yyyy'
                          ),
                    'mmddyyyy'
                   ) expiry_date
      from  Sample_table
    /
    
    EXPIRY_D
    --------
    08202012
    05221989
    07282012
    07272012
    03011998
    11072004
    06272004
    11122001
    06202003
    12032003
    01011995
    
    EXPIRY_D
    --------
    08271998
    
    12 rows selected.
    
    SQL> 
    SY.
  • 11. Re: Date format
    Frank Kulash Guru
    Currently Being Moderated
    Hi,

    This works on the sample data you posted:
    WITH     got_date_part     AS
    (
         SELECT     REGEXP_SUBSTR ( expiry_date
                         , '('               || -- month_day is ...
                                 '(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep}Oct|Nov|Dec)'     ||
                           '[[:alpha:]]*'     || --    optional extra letters (more than 3)
                           '\.?'          || --      optional period
                           '( \d{1,2})?'     || --      optional 1 or 2 digit number
                           ' ?,? ?'          || --      space-comma-space (all optional)
                         ')?'                || -- end month_day
                         '\d{4}'             -- year
                         , 1
                         , 1
                         , 'i'
                         )                 as date_part
         ,       expiry_date
         FROM    sample_table
    )
    SELECT       expiry_date
    ,       date_part
    ,       TO_NUMBER ( CASE
                              WHEN  LENGTH (date_part) = 4
                     THEN  date_part || '0000'
                     WHEN  NOT REGEXP_LIKE ( date_part
                                        , '\d+\D+\d{4}'
                                  )
                     THEN  TO_CHAR ( TO_DATE ( date_part
                                        , 'Month, yyyy'
                                    )
                             , 'YYYYMM"00"'
                             )
                     ELSE  TO_CHAR ( TO_DATE ( date_part
                                        , 'Month dd, yyyy'
                                    )
                             , 'YYYYMMDD'
                             )
                      END
                  )     AS expiry_num
    FROM       got_date_part
    ORDER BY  expiry_date
    ;
    Output:
    EXPIRY_DATE                    DATE_PART       EXPIRY_NUM
    ------------------------------ --------------- ----------
     Died 1995.                    1995              19950000
     Died Aug 27 1998.             Aug 27 1998       19980827
     Died March, 1998.             March, 1998       19980300
     Died May 22, 1989, buried in  May 22, 1989      19890522
    Died Aug. 20, 2012.            Aug. 20, 2012     20120820
    Died Dec. 3, 2003              Dec. 3, 2003      20031203
    Died July 27, 2012.            July 27, 2012     20120727
    Died July 28, 2012.            July 28, 2012     20120728
    Died June 20, 2003.            June 20, 2003     20030620
    Died June 27, 2004.            June 27, 2004     20040627
    Died Nov. 12, 2001.            Nov. 12, 2001     20011112
    Died Nov. 7, 2004.             Nov. 7, 2004      20041107
    Expiry_num is always 8 digits (assuming the year is later than 999); if the day was not given, then expiry_num ends with 00, and if the month was not given, then expiry_num ends with 0000.

    I'm still not sure what you want when there are 2 (or more) substrings that look like dates. If you need to handle cases like that, post some new sample data and teh results you want from that data.

    I stilll don't know what version of Oracle you're using. The query above will work in version 10.1 or higher. In 10.1, you may need to use '[[:digit:]]' instead of '\d'.

    Edited by: Frank Kulash on Jan 2, 2013 8:58 PM


    Consider writing a user-defined function. It will probably be slower than pure SQL, but that may not matter much. It will be a lot easier to adapt, which may matter more, since you'll probably discover more special cases you didn't know you had.
  • 12. Re: Date format
    963739 Newbie
    Currently Being Moderated
    Thanks Solomon.....

    Frank,

    Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit

    The query you provided was perfect solution for almost 90% of data....

    I have seen some new patterns like

    EXPIRY_DATE
    ------------------------------
    DIED 11/5/00          
    DIED 4 JUNE 1999     
    Died 12 Oct, 1999.
    Died: Sept. 1993.
    Died, Oct. 21, 1989.


    Thanks,
  • 13. Re: Date format
    Frank Kulash Guru
    Currently Being Moderated
    Hi.
    960736 wrote:
    ... The query you provided was perfect solution for almost 90% of data....

    I have seen some new patterns like

    EXPIRY_DATE
    ------------------------------
    DIED 11/5/00          
    The query before assumed there was always a 4-digit year. You'll have to handle dates like '11/5/00' separately. I can't spend much time on it now, but I''ll try later today.

    What's the problem with the other 4 values you posted?
    DIED 4 JUNE 1999     
    Died 12 Oct, 1999.
    Died: Sept. 1993.
    Died, Oct. 21, 1989.
  • 14. Re: Date format
    963739 Newbie
    Currently Being Moderated
    Month Sept & October(Oct) is not working...for all Rows which has Sept & Oct/October it just given the Year

    Example
    Expiry_date             Date_part   Expiry_num
    --------------- -------------- ----------------
    Died Sept. 9, 2007.     2007           2007
    DIED 4 JUNE 1999     JUNE 1999     19990600 
    Died 12 Oct, 1999.      1999            1999
    Died Oct. 26, 1992.       1992         1992
    Died October 2, 1998.     1998         1998
    Edited by: 960736 on Jan 3, 2013 8:16 AM

    Edited by: 960736 on Jan 3, 2013 8:17 AM
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points