13 Replies Latest reply: Oct 4, 2006 2:28 AM by 807607 RSS

    Regex & java.util.Scanner

    807607
      I am trying to make a simple txt parser using regular expressions but the problem has
      appeared.
      The program's code is too long so I have stated only the part of the code implementing
      the method data_types() which doesn't work properly, it reads only two types (String) and (Boolean). If someone could help me I would be very gratefull.Why method doesn't read the rest of data types in my data_xml.xml file?

      here is the code >
      class SimpleScann{
             
             enum PARSE{
                TABLE_NAME("(\\w*)"),COLUMN_NAME("(\\w*\\Q(\\E)"),DATA_TYPE("(\\Q(\\E\\w*\\Q)\\E)");
                private String $pattern;
                PARSE(String pattern){
                     $pattern=pattern;
                }
                public String PATTERN(){
                     return $pattern;
                }
           }
       
           static void data_types() throws Exception{     
       
                File parse_file= new File("data_type.txt");
                Scanner     scann_input = new Scanner(parse_file);     
                int flag= Pattern.CASE_INSENSITIVE;
                Pattern pattern=Pattern.compile(PARSE.DATA_TYPE.PATTERN(),flag);
                Matcher matcher=null;
                
                while(scann_input.hasNextLine()){
                     matcher=pattern.matcher(scann_input.nextLine());
                     if(matcher.find()){
                          System.out.printf("%s\n",matcher.group());
                     }                    
                }
                                              
           }
       
           public static void main(String args[])
           {
                try{
                     
                     data_types();
           
                }catch(Exception e){
                     e.printStackTrace();
                }
           }
      }
      and here is the data_type.txt
      <table          > Table radi
      ako su zatvoreni tagovi     <>
      <column>
           Ime(String), Prezime(String), JMBG(Integer) ,
           Enabled(Boolean)
           
      <\column>




      best regards,
      Nikola
        • 1. Re: Regex & java.util.Scanner
          807607
          Try using a while loop instead of an if statement so that you will find all matching sequences in a given line (and not only the first one.)
          • 2. Re: Regex & java.util.Scanner
            800351
            And,
            I hope you don't use a complex enum type when much simpler equivalent codes could be written.
            • 3. Re: Regex & java.util.Scanner
              807607
              And, why are you not using DOM to load the file. This would save a load of effort.
              • 4. Re: Regex & java.util.Scanner
                807607
                No , I don't use such a complex enum type , it is only for learning purposes :)
                • 5. Re: Regex & java.util.Scanner
                  807607
                  DOM is appropriate for XML, but what I want to try is to practice a Scanner class on XML
                  , just to learn how java regular expressions work... This xml file is just an example it could be a simple .txt or something else.

                  thanks
                  • 6. Re: Regex & java.util.Scanner
                    807607
                    By any chance, is this the output you're seeing?
                    Ime(String)
                    JMBG(Boolean)
                    • 7. Re: Regex & java.util.Scanner
                      800351
                      By any chance, is this the output you're seeing?
                      Ime(String)
                      JMBG(Boolean)
                      No.
                      (String)
                      (Boolean)
                      His regex:
                      DATA_TYPE("(\\Q(\\E\\w*\\Q)\\E)")
                      should be simplyfied to;
                      DATA_TYPE("\\(\\w*\\)")
                      • 8. Re: Regex & java.util.Scanner
                        807607
                        The reason you're only matching two items is because you're reading the file one line at a time and applying the regex once per line. As Tim said, you can fix that by using while instead of if, but the real problem is much deeper: you're trying to write a scanner in the sense of a lexical analyzer, and that isn't what java.util.Scanner is for. I strongly recommend you start over, this time using Pattern and Matcher directly, not Scanner. If you happen to have a copy of MRE 3ed, there's an example of what you're trying to do on page 400. (Unfortunately, Friedl has just moved back to Japan, and hasn't had time to update the book's web site, or I could point you to the code online.) I don't have time to go into this right now, but you should pay particular attention to the find(int) method and the \G anchor.
                        • 9. Re: Regex & java.util.Scanner
                          800351
                          Hi u_a, I used to vaguely assume Scanner.nextLine() is equivalent to BufferedReader.readLine(). Am I wrong?
                          • 10. Re: Regex & java.util.Scanner
                            807607
                            People mostly use nextLine() as a synonym for either readline() or flush(), but it's more flexible than that. If any of the line-based methods like nextInt() and nextInLine() have been called one or more times, but the whole line hasn't been read, nextLine() returns the remaining characters before positioning the cursor at the beginning of the next line. Like readLine(), it doesn't return the line separator as part of the returned string.
                            • 11. Re: Regex & java.util.Scanner
                              800351
                              Friedl has just moved back to Japan, and hasn't had time to update the book's web site
                              He has a semi-private blog: http://regex.info/blog/
                              in which he writes a brief note on the third edition.
                              • 12. Re: Regex & java.util.Scanner
                                807607
                                uncle_alice I am using Scanner and regex completely in a sense of lexical analyzer.
                                Yesterday I done simple table model generator.
                                Regular expressions are not powerful enough to handle recursive patterns, but I am not trying to make a compiler. My example is so simple that Scanner can fit in just fine.

                                Thank you for the recommended literature, and please could you point me to some online examples I would like to explore reg.ex in a dept.

                                best regards Nikola
                                • 13. Re: Regex & java.util.Scanner
                                  807607
                                  Friedl's website has been updated now, and here's the code I was talking about:

                                  http://regex.info/dlisting.cgi?ed=3&id=36751

                                  That book is the best source of information there is when it comes to regexes, but there's also a good tutorial at http://www.regular-expressions.info/