1 2 Previous Next 17 Replies Latest reply on Oct 11, 2009 10:46 AM by 807580 Go to original post
      • 15. Re: XML parsing
        Simpler than someone giving you the answer in working code?
        • 16. Re: XML parsing
          Plee, thanks for you solution. But the real problem is that a few month ago W3C decided that all Java software is "abusing", see http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic, and I quote:

          "We are now seeing such extreme surges in traffic that our automatic and manual methods simply cannot keep up. Increases in serving capacity are readily consumed by this traffic and our site becomes overwhelmed. As such we are taking some more drastic temporary measures which we hope to be able to back down shortly. We are sorry for the impact this is causing the community. We continue experimenting with various methods including some of those suggested by posters here.

          If you are impacted file a bug report with the developers of the library or utility you use asking them to implement a [caching] catalog solution. You may also put a caching proxy in front of your application for immediate remedy to your situation, populating the cache with a user agent we are not blocking DTD access to.

          Java based applications and libraries are presently accounting for nearly 1/4th of our DTD traffic (in the hundred of millions a day). There is also another more substantial source of traffic which the vendor is working to correct in the hopefully near future."

          So Sun/Apache, when will that caching requested by w3c be implemented? Because simple code like below fails, and the problem is between you and w3c: the Java user agent is banned by w3c.

          DocumentBuilderFactory.newInstance().newDocumentBuilder().parse ("http://www.weststats.com/Items/right_arm/");

          Exception in thread "main" java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
               at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
               at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
               at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)
               at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)
               at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
               at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
               at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
               at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
               at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
               at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
               at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
               at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
               at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
               at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
               at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
               at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
               at ro.mike.thewest.WestStatsItemParser.main(WestStatsItemParser.java:15)

          And no, I wasn't "abusing". I received the 503 error from the first request. So even caching wouldn't have worked, because I wouldn't be able to construct the cache from Java.
          • 17. Re: XML parsing
            OK. Problem fixed. How:

            Installed Squid for Windows, from http://squid.acmeconsulting.it/download/squid-2.7.STABLE7-bin.zip to C:\squid.

            Copied cachemgr.conf.default, mime.conf.default and squid.conf.default from C:\squid\etc to cachemgr.conf, mime.conf and squid.conf.

            Modified this line in squid.conf:

            http_access allow localnet
            http_access allow localhost

            Run this commands at a command prompt:

            c:\squid\sbin>.\squid.exe -i
            c:\squid\sbin>.\squid.exe -z
            c:\squid\sbin>net start squid

            Modified my Java Application, adding this lines:

            System.setProperty("http.proxyHost", "localhost");
            System.setProperty("http.proxyPort", "3128");
            System.setProperty("http.agent", "Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.7.2) Gecko/20040803");

            You can also set the properties from command line, with

            java -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128 -Dhttp.agent="Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.7.2) Gecko/20040803" yourClass

            After you successfully access the DTDs from your application and the cache is populated, you can remove the fake http.agent line, or replace it with something useful.
            1 2 Previous Next