3 Replies Latest reply: Sep 21, 2009 5:47 PM by 807580 RSS

    SimpleDateFormat unexpected behaviour

    807580
      Hi everybody,

      i'm facing an issue which is mainly due to my incomprehension of the inner mechanisms of date parsing in Java. I would like to parse a date from an input string with a specific pattern, but some inputs match this pattern when i expect them not to. Let me explain it through an example:
      public class DateIssue {
      
           public static void main(String[] args) {
                SimpleDateFormat sdf = new SimpleDateFormat("dd/MM/yyyy");
                sdf.setLenient(false);
                
                try {
                     System.out.println(sdf.parse("10/02/20001"));
                } catch (ParseException pe) {
                     pe.printStackTrace();
                }
           }
           
      }
      Here i'm expecting a 4 digit year, but when i give "20001" as a year, the simpledateformat parses it just fine, and no exception is thrown. I checked the API where i found:

      Year:  If the formatter's Calendar is the Gregorian calendar, the following rules are applied.

      * For formatting, if the number of pattern letters is 2, the year is truncated to 2 digits; otherwise it is interpreted as a number.
      * For parsing, if the number of pattern letters is more than 2, the year is interpreted literally, regardless of the number of digits. So using the pattern "MM/dd/yyyy", "01/11/12" parses to Jan 11, 12 A.D.
      * For parsing with the abbreviated year pattern ("y" or "yy"), SimpleDateFormat must interpret the abbreviated year relative to some century. It does this by adjusting dates to be within 80 years before and 20 years after the time the SimpleDateFormat instance is created. For example, using a pattern of "MM/dd/yy" and a SimpleDateFormat instance created on Jan 1, 1997, the string "01/11/12" would be interpreted as Jan 11, 2012 while the string "05/04/64" would be interpreted as May 4, 1964. During parsing, only strings consisting of exactly two digits, as defined by Character.isDigit(char), will be parsed into the default century. Any other numeric string, such as a one digit string, a three or more digit string, or a two digit string that isn't all digits (for example, "-1"), is interpreted literally. So "01/02/3" or "01/02/003" are parsed, using the same pattern, as Jan 2, 3 AD. Likewise, "01/02/-3" is parsed as Jan 2, 4 BC.

      Otherwise, calendar system specific forms are applied. For both formatting and parsing, if the number of pattern letters is 4 or more, a calendar specific long form is used. Otherwise, a calendar specific short or abbreviated form is used.


      I suspect the following excerpt:

      For parsing, if the number of pattern letters is more than 2, the year is interpreted literally,

      to be responsible for this behaviour, but i thought parsing with lenient would avoid such a "mistake", and throw an exception because of the wrong number of digits. What is the best way to check that the year is exactly four digit long? Do I have to use string.matches() or a Pattern object just for this?

      Thanks a lot

      Edited by: calvino_ind on Aug 30, 2009 1:51 PM