Forum Stats

  • 3,724,516 Users
  • 2,244,774 Discussions
  • 7,851,064 Comments

Discussions

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Thread safety with newDocumentBuilder()

3004
3004 Member Posts: 204,171
edited April 2010 in Java Technology & XML
Hello,

I have the following class 'ParserService'. Just one instance of this class exists, and it's created in the servlet init method calling "new ParserService()".

After this, all incoming requests (threads executing service method) call the 'parse' method
public class ParserService
{
 private DocumentBuilderFactory dbfDom;
 
 // Constructor.
 public ParserService()
 {
  this.dbfDom = DocumentBuilderFactory.newInstance();
  this.dbfDom.setNamespaceAware(true);
  this.dbfDom.setCoalescing(true);
 }

 // ---------------- DOM PARSER ------------------------------------------------------------------------------
 public Document parse(String strDocument) throws ProxyServletException
 {
  try
  {
   return (this.dbfDom.newDocumentBuilder().parse(new InputSource(new StringReader(strDocument))));
  }
  catch (ProxyServletException pse) { throw pse; }
  catch (Exception e)   	        { throw new ProxyServletException(GenericConstants.PARSER_PARSE, e); }
 }
}
The question is if 'newDocumentBuilder()' is thread safe. Can I call the 'newDocumentBuilder()' method from different threads using the only factory 'this.dbfDom'?

Thanks in advance,

Joan.

Comments

  • EJP
    EJP Member Posts: 32,919
    It doesn't say so. I usually do this:
    DocumentBuilder builder;
    synchronized (builderFactory)
    {
      builder = builderFactory.newDocumentBuilder();
    }
    builder.parse(...)
  • 3004
    3004 Member Posts: 204,171
    Hello ejp,

    Thanks for your response. I still have some questions regarding with this subject.
    My app has a very high load (I can have 500-1000 simultaneous requests, and every request has an xml document that has to be parsed). That's why I must be very careful with synchronisation.

    To solve the way I'm parsing documents, I've thought the following solutions:

    1. The best choice would be to have only one instance of DocumentBuilderFactory, and create DocumentBuilder instances without synchronized regions. This is the code I wrote in this post. The doubt is if I can call newDocumentBuilder without synchronisation. I've developed a test where I start 100 threads, and every thread parses 1000 documents, all of them using the code above. It seems to work OK (no exceptions received) with the default parser that comes with java6 (xerces I think).

    2. Do the same but with the synchronised region (as you usually do) to ensure thread safety. But I'd like to find out if this is really necessary with xerces.

    3. Create a pool of DocumentBuilder objects. If pool is empty, I create a new DocumentBuilder, parse the document, and return the builder to the pool. But it seems that everybody is advising to take out object pooling because pooling objects will cause them to live longer than necessary.

    4. Use Thread local and have a DocumentBuilder instance at thread level. I've never used ThreadLocal, and I don't know if this can help me.

    Could you give me your opinion?

    Thanks in advance,

    Joan.
  • 800387
    800387 Member Posts: 5,078
        /**
         * Creates a new instance of a {@link javax.xml.parsers.DocumentBuilder}
         * using the currently configured parameters.
         */
        public DocumentBuilder newDocumentBuilder()
            throws ParserConfigurationException 
        {
            /** Check that if a Schema has been specified that neither of the schema properties have been set. */
            if (grammar != null && attributes != null) {
                if (attributes.containsKey(JAXPConstants.JAXP_SCHEMA_LANGUAGE)) {
                    throw new ParserConfigurationException(
                            SAXMessageFormatter.formatMessage(null, 
                            "schema-already-specified", new Object[] {JAXPConstants.JAXP_SCHEMA_LANGUAGE}));
                }
                else if (attributes.containsKey(JAXPConstants.JAXP_SCHEMA_SOURCE)) {
                    throw new ParserConfigurationException(
                            SAXMessageFormatter.formatMessage(null, 
                            "schema-already-specified", new Object[] {JAXPConstants.JAXP_SCHEMA_SOURCE}));                
                }
            }
            
            try {
                return new DocumentBuilderImpl(this, attributes, features, fSecureProcess);
            } catch (SAXException se) {
                // Handles both SAXNotSupportedException, SAXNotRecognizedException
                throw new ParserConfigurationException(se.getMessage());
            }
        }
    It appears to be thread-safe. The 'attributes' and 'features' instances variables are hashtables, which are synchronized. Unless you are modifying DocumentBuilderFactory itself in another thread (say, changing whether it is namespace aware), I do not see any issues with a simple call to newDocumentBuilder().

    - Saish
  • 3004
    3004 Member Posts: 204,171
    Thanks Saish,

    I've developed a test with 100 threads parsing 1000 documents of about 1k with 2 differents scenarios:

    1. Creating a new DocumentBuilder for every request, using one instance of DocumentBuilderfactory (the 'parse' method of previous class).
    2. Creating a ConcurrentLinkedQueue as a pool of DocumentBuilder.

    The pool version is twice faster than version 1.
    Every thread in version 1 takes about 20s, every thread in version 2 takes about 10s (on windows server dual core 4gb).

    The pool version is much faster, but developers does not recommend at all using pools.

    Should I analyse anything else to decide which version to use?

    Thanks,

    Joan.
  • 800387
    800387 Member Posts: 5,078
    I'm not sure why you are using that many threads. Take a read here

    - Saish
  • EJP
    EJP Member Posts: 32,919
    I've developed a test with 100 threads parsing 1000 documents of about 1k with 2 differents scenarios:
    I would try varying those numbers for a start. Try it with 10 threads. You might get a pleasant surprise. Which would relieve all your concerns considerably.
  • 3004
    3004 Member Posts: 204,171
    Hello,

    I've developed the following test case, N threads parsing 500 times a document of about 1k using the pool version and the instance version. These are the results:

    nr. of threads: 5
    pool version: 1 s/thread
    instance version:1,7 s/th

    nr. of threads: 10
    pool version: 1,3 s/thread
    instance version: 2,2 s/th

    nr. of threads: 25
    pool version: 2,5 s/thread
    instance version: 4 s/th

    nr. of threads: 50
    pool version: 4,6 s/thread
    instance version: 8,3 s/th

    nr. of threads: 75
    pool version: 7 s/thread
    instance version: 13,5 s/th

    nr. of threads: 100
    pool version: 10 s/thread
    instance version: 20 s/th

    nr. of threads: 250
    pool version: 26 s/thread
    instance version: 60 s/th

    nr. of threads: 500
    pool version: 55 s/thread
    instance version: 130 s/th


    Pool version always wins, and it seems that scales a bit better with more threads.

    Taking a look at this results, the only cause to avoid using the pool version is some problem with garbage collector.

    Whay do you think?

    Thanks,
    Joan.
  • EJP
    EJP Member Posts: 32,919
    what are s/thread and s/th?
  • 3004
    3004 Member Posts: 204,171
    Sorry,

    It's seconds by thread. Time in seconds that every thread takes to finish parsing the 500 documents.

    Joan.
  • EJP
    EJP Member Posts: 32,919
    But that doesn't mean anything, does it? The more threads, the fewer documents each thread has to process, so of course each thread will finish quicker. The only interesting number is the total elapsed time from start to finish.
  • 3004
    3004 Member Posts: 204,171
    No, each thread ALWAYS processes 500 documents.
  • EJP
    EJP Member Posts: 32,919
    Well I would suggest there's something wrong somewhere. You can't tell me that a thread will parse 500 documents more quickly if there are more threads doing the same thing. It's counter-intuitive and therefore suspect. You do need to posit a rational explanation for the observations at some point. I'd be quite prepared to find that there was some N related probably to the number of CPUs in the processor for which the total elapsed time was optimal, but this doesn' t make any sense at all. Does it?
  • 3004
    3004 Member Posts: 204,171
    Hello ejp,

    I don' tell you this. In fact, the results show just the opposite. When you increment the number of simultaneous threads, the time that every thread takes to finish is higher.

    For example, taking the pool version, 5 threads parsing EACH ONE 500 documents, take EACH ONE 1 second. But 500 threads parsing EACH ONE 500 documents, take EACH ONE 55 seconds. More threads, more load for the cpus, slower time.

    Am I right?

    Joan.
  • EJP
    EJP Member Posts: 32,919
    Well exactly. So there is some number N for which total throughput is optimal. What I am suggesting is that this might be closer to 10 than to 1000. It would be interesting to establish that, and therefore to establish exactly how much concurrency you really need, and to what extent an efficient solution to the concurrency issues is really important.
  • 3004
    3004 Member Posts: 204,171
    Ok,

    If I understand you, and comparing the values for, for example, 10 threads, you're saying that maybe the gaining of about 1 second per thread I have with the pool version versus the instance version is not enough to justify this pool strategy, maybe because I can get new troubles at concurrency or garbage collector level.

    Obviously I should analyse these two situations at jvm level, and conclude what's better.

    Thanks for your time,

    Joan.
  • EJP
    EJP Member Posts: 32,919
    The measurement of primary interest is surely total documents parsed per second. You may find that N=anything from 1 to 1000 to optimize that. If N=1 you don't have to worry about threads or thread pools at all. If N is large, use a thread pool, but you'll still have to decide how many initial threads you need, etc, so you will still need a good idea of what N actually is, or at least its order of magnitude. At present you're assuming it's large, which it almost certainly isn't.
This discussion has been closed.