7 Replies Latest reply: May 18, 2007 8:56 AM by JosAH RSS

    StreamTokenizer's ugly behaviour

    JosAH
      Greetings,

      a StreamTokenizer can read numbers and 'words'. The 'word' is considered
      a 'word character' followed by 'word character's or any character that is
      considered a digit character. A dot and a minus sign are considered a
      digit too, so "x-y" is considered a 'word' while 'x+y' are considered three
      separate tokens.

      I find this very ugly and I'm going to check the bugbase now. I can't
      imagine noone else hit this ugly behaviour ...

      kind regards,

      Jos
        • 1. Re: StreamTokenizer's ugly behaviour
          798906
          If you look in the bug db most reports against StreamTokenizer have the evaluation:

          "Due to compatibility restraints we will not further evolve this legacy class."
          • 2. Re: StreamTokenizer's ugly behaviour
            JosAH
            If you look in the bug db most reports against
            StreamTokenizer have the evaluation:

            "Due to compatibility restraints we will not further
            evolve this legacy class."
            Yup, I read a lot of the relevant articles in the bugbase; Sun didn't fix
            anything anymore starting at +- 1998. I changed my code, I use a
            Scanner now. Thank you for your reply.

            kind regards,

            Jos
            • 3. Re: StreamTokenizer's ugly behaviour
              807606
              I changed my
              code, I use a
              Scanner now. Thank you for your reply.
              A regex ot two would be better Jos!

              Sabre (running way as fast as he can)
              • 4. Re: StreamTokenizer's ugly behaviour
                JosAH
                I changed my code, I use a Scanner now. Thank you for your reply.
                A regex ot two would be better Jos!

                Sabre (running way as fast as he can)
                He said the r-word! He said the r-word! Defenestrate this rascal!
                Open up the top floor windows!

                kind regards,

                Jos ;-)
                • 5. Re: StreamTokenizer's ugly behaviour
                  JosAH
                  I've come to dislike Scanners too. I want to scan several different types
                  of tokens; think of Java tokens. The following is a nice RE for a bunch
                  of operators:
                  Pattern operators= Pattern.compile("==|!=|>=|<=");
                  When I do this:
                  if (scanner.hasNext(operators))
                     return new Token(scanner.next(operators), OPERATOR);
                  ...
                  This only works if the delimeter pattern 'eats' at least one character.
                  It fails miserably when I set the delimeter pattern to "\\s*;

                  Sabre and uncle Alice owe me an explanation. duh.

                  kind regards,

                  Jos ;-)
                  • 6. Re: StreamTokenizer's ugly behaviour
                    807606
                    Sorry Jos,

                    I don't do Scanner so you will have to wait until uncle_alice comes on line.

                    Sabre (apprentice buck passer)
                    • 7. Re: StreamTokenizer's ugly behaviour
                      JosAH
                      Sorry Jos,

                      I don't do Scanner so you will have to wait until
                      uncle_alice comes on line.

                      Sabre (apprentice buck passer)
                      Well, as long as you realize that it's all your fault you're excused ;-)

                      The question isn't that unreasonable though: I want to scan a stream
                      as if it were a bunch of, say, Java tokens; spaces aren't mandatory
                      in Java and that's the way I want it too. I have my home brew tokenizer
                      but I decided to give it a try, first using a StreamTokenizer (which won't
                      do it) but as of now: a Scanner doesn't do it either.

                      Unless uncle Alice comes up with a couple of KB regex of course ;-)

                      kind regards,

                      Jos