This discussion is archived
11 Replies Latest reply: Mar 9, 2011 11:07 AM by 843441 RSS

URLConnection performance problem

843441 Newbie
Currently Being Moderated
Ok, sorry for provacative title, but I've been working on this for weeks and it seems the Java 6 API has handcuffed us for this particular application. I know I'm probably missing something so would appreciate advice and/or discussion. Here is the story:

I'm using a Socket to communicate from a standalone Java ap to a Tomcat servlet container, mainly to upload files along with some meta information about the file, and receive a response object back.

I would like to use URLConnection because it nicely separates the header info from the input/output streams used to upload file bytes and also used to receive a Java Object as a response.

However URLConnection is very slow for unknown reasons. I have tweaked settings, but it is still very slow.

So instead we are forced to use Socket communications, based on an older version of JUpload source. This successfully uploads files 10 to 20 times faster than the URLConnection method, I wish I knew why.

However, it requires me to parse the HTTP Response that comes back in the Socket InputStream.

And that is the problem, as detailed below:

The HTTP standards indicate that the header information is parsed by reading "lines". If you read enough documents, it seems that you probably can assume that the header is always "US ASCII" character set, but it is difficult to find this stated simply anywhere.

It also seems that a "blank line" terminates the header, and the next byte starts the "entity content", which in my case is a serialized Java object.

So ideally I would like to read lines using a "line reader" (from which I can extract cookies, header info, etc.) then read in my Java Object "response object" using an ObjectInputStream.

But a Java Reader object states that it may "read ahead" an arbitrary amount. So I cannot wrap the Socket InputStream in a Reader, read some lines, than wrap the Socket InputStream in an ObjectInputStream and read in my object, because the Reader may have read ahead and be holding some bytes from the "response object" data. Similarly I can't wrap the Socket InputStream in an ObjectInputStream and wrap the ObjectInputStream in a Reader for the same reason.

The ObjectInputStream actually has the crucial "readLine()" method, but it is deprecated. However, I feel for this application, we really need it. I have not been able to find a good substitute.

For now, I'm just reading bytes and making strong assumptions about US ASCII line ends, etc. to parse through the header, but would much prefer to use Java objects that encapsulate that.

I'm interested in how to best do this. Let me know if more details needed.

Regards,

-SB
  • 1. Re: Java can't do this
    796440 Guru
    Currently Being Moderated
    tl;dr

    You'll probably get better responses if you provide an [url http://sscce.org]SSCCE instead of just describing your code.
  • 2. Re: Java can't do this
    796440 Guru
    Currently Being Moderated
    840438 wrote:
    Ok, sorry for provacative title,
    And yeah, if you want to be taken seriously, get over the whole "shock value" / "now that I have your attention" nonsense and just stick to the technical issue at hand.
  • 3. Problems with URLConneciton
    tschodt Pro
    Currently Being Moderated
    840438 wrote:
    I would like to use URLConnection because it nicely separates the header info from the input/output streams used to upload file bytes and also used to receive a Java Object as a response.

    However URLConnection is very slow for unknown reasons. I have tweaked settings, but it is still very slow.
    Typically network name lookup issues.
    So instead we are forced to use Socket communications, based on an older version of JUpload source. This successfully uploads files 10 to 20 times faster than the URLConnection method, I wish I knew why.

    However, it requires me to parse the HTTP Response that comes back in the Socket InputStream.

    And that is the problem, as detailed below:

    The HTTP standards indicate that the header information is parsed by reading "lines". If you read enough documents, it seems that you probably can assume that the header is always "US ASCII" character set, but it is difficult to find this stated simply anywhere.

    It also seems that a "blank line" terminates the header, and the next byte starts the "entity content", which in my case is a serialized Java object.

    So ideally I would like to read lines using a "line reader" (from which I can extract cookies, header info, etc.) then read in my Java Object "response object" using an ObjectInputStream.

    But a Java Reader object states that it may "read ahead" an arbitrary amount. So I cannot wrap the Socket InputStream in a Reader, read some lines, than wrap the Socket InputStream in an ObjectInputStream and read in my object, because the Reader may have read ahead and be holding some bytes from the "response object" data. Similarly I can't wrap the Socket InputStream in an ObjectInputStream and wrap the ObjectInputStream in a Reader for the same reason.

    The ObjectInputStream actually has the crucial "readLine()" method, but it is deprecated. However, I feel for this application, we really need it. I have not been able to find a good substitute.

    For now, I'm just reading bytes and making strong assumptions about US ASCII line ends, etc. to parse through the header, but would much prefer to use Java objects that encapsulate that.

    I'm interested in how to best do this. Let me know if more details needed.
    [url http://download.oracle.com/javase/6/docs/api/java/io/BufferedInputStream.html#BufferedInputStream%28java.io.InputStream,%20int%29]BufferedInputStream() with a sufficiently large buffer ought to do.
    Start by calling mark().
    Wrap it in a Reader and read (the header lines) until you get an empty line. (*)
    Discard the Reader but remember how many bytes were read. Remember that lines are CRLF terminated. (*)
    Call reset() on the BufferedReader, read all the bytes of the header again (*)
    and now you can read the payload.

    (*) Another way to locate the start of the payload is to find the first CR LF CR LF sequence.
  • 4. Re: Java can't do this
    EJP Guru
    Currently Being Moderated
    The more interesting question is why you are finding URLConnection slow. It can't be DNS as Sockets engage in that too, and in fact obviously everything you are coding with Sockets is already done inside HttpURLConnection. Post some code.

    NB I corrected your title to something meaningful.
  • 5. Re: Problems with URLConneciton
    843441 Newbie
    Currently Being Moderated
    Thanks, that is the same solution I adopted for now. I didn't know how to count the bytes read (after wrapping with a Reader without making strong character set assumptions), so after resetting the stream I scan forward reading bytes through the first 0x0D 0x0A 0x0D 0x0A (CR LF CR LF, blank line), which is also a strong assumption I suppose. Technically I should probably allow for CRCR and LFLF sequences, although Tomcat appears to use CRLF for EOL.

    Edited by: sb4 on Mar 8, 2011 5:44 PM
  • 6. Re: Problems with URLConneciton
    EJP Guru
    Currently Being Moderated
    Newlines in HTTP are \r\n only.
  • 7. Re: Problems with URLConneciton
    843441 Newbie
    Currently Being Moderated
    EJP wrote:
    Newlines in HTTP are \r\n only.
    Yes, CR LF really is safe to assume I'm sure. For most general solution, I was thinking of HTTP 19.3 Tolerant Applications spec:

    "The line terminator for message-header fields is the sequence CRLF. However, we recommend that applications, when parsing such headers, recognize a single LF as a line terminator and ignore the leading CR."
  • 8. Re: URLConnection speed problems
    843441 Newbie
    Currently Being Moderated
    EJP wrote:
    The more interesting question is why you are finding URLConnection slow. It can't be DNS as Sockets engage in that too, and in fact obviously everything you are coding with Sockets is already done inside HttpURLConnection. Post some code.

    NB I corrected your title to something meaningful.
    Yes, I really want to use URLConnection. I ran a lot of tests and tried many variations; buffer sizes, chunked, not-chunked (had memory problems with not-chunked on big files), etc. I scanned the web, found a few other posts somewhere with similar problems. Under time pressure, we went to direct Socket and formatting our own HTTP multi-part request (which works but not easy for us to maintain because we're not HTTP experts).

    I will continue with experiments and tests if you think it should perform as fast as anything else.

    This is not self-contained executable code, but an example of the URLConnection upload technique we use, which we still keep active for troubleshooting because it is reliable though slow.
      public Object uploadClientFile(File afile, String strEncryptionKey, Map mapUrlParams,
        CSecurityFilter securityFilter, I.ISyncStatus syncStatus) throws Exception
      {
        String strThis = "[NetConnect.uploadClientFile] ";
    
        OutputStream outputToServlet = null;
        FileInputStream fin = null;
        ObjectInputStream inputFromServlet = null;
        String strEncoding = "UTF-8"; // to avoid deprecation. -RAN 3/2/2009
    
        String strFileName = afile.getName();
        Object objRet = null;
        long LbytesUploaded = 0L;
        long LbytesToUpload = 0L;
        long LbytesRead = 0L;
        boolean bIsEncrypt = (strEncryptionKey != null && strEncryptionKey.trim().length() > 0);
        String strResponseDataTag = null;
        byte[] arbytes = null;
        HttpURLConnection servletConnection = null;
    
        try
        {
          String servletGET = httpProtocolTag + getHost() + "/Image2000/sendData2Server?";
          StringBuffer sbuf = new StringBuffer();
          sbuf.append(this.strLoginParam + "=" + strLogin + "&" + this.strPwParam + "=" + strPw);
    
          NetConnect.addUrlParamsFromMap(sbuf, mapUrlParams, strEncoding);
    
          URL url = new URL(servletGET + sbuf.toString());
    
          servletConnection = (HttpURLConnection) url.openConnection();
          servletConnection.setRequestMethod("GET");
    
          servletConnection.setDoInput(true);
          servletConnection.setUseCaches(false);
          servletConnection.setDefaultUseCaches(false);
          servletConnection.setRequestProperty("Content-Type", "application/octet-stream");
    
          if (this.propsCookies != null && this.propsCookies.size() > 0)
            servletConnection.setRequestProperty("Cookie", CUtil.makeCookie(propsCookies));
    
          if (afile.isFile())
          {
            int iChunkBytes = 1024 * 8;
    
            // create a buffer, optimal is 16384 ???
            byte[] buffer = new byte[iChunkBytes];
            byte[] bufferout = null; // new byte[iChunkBytes * 2]; // It's probably fine to use same size as buffer.  In fact, docs say you can use buffer as both in and out buffers. -RAN 2/22/10
    
            byte[] salt = null;
            Cipher ciper = null;
            LbytesToUpload += afile.length();
    
            if (bIsEncrypt)
            {
              salt = CEncryption.initSalt();
              LbytesToUpload += salt.length;
              ciper = CEncryption.initEncryption(salt, strEncryptionKey);
              bufferout = new byte[iChunkBytes]; // It's probably fine to use same size as buffer.  In fact, docs say you can use buffer as both in and out buffers. -RAN 2/22/10
            }
    
            servletConnection.setChunkedStreamingMode(iChunkBytes); // To fix out of memory errors.
            servletConnection.setDoOutput(true);
            outputToServlet = servletConnection.getOutputStream();
    
            fin = new FileInputStream(afile);
    
            int iRead = 0;
    
            if (bIsEncrypt)
            {
              LbytesUploaded += salt.length;
              outputToServlet.write(salt, 0, salt.length);
            }
    
            long Lreads = 0;
            long LfileChunks = afile.length() / iChunkBytes; // for logging only.
            if (afile.length() % iChunkBytes != 0)
              LfileChunks++; // for logging only.
    
            while (true)
            {
              if (syncStatus.isCanceled())
                throw new Exception("file upload canceled by user");
              iRead = fin.read(buffer);
              if (iRead <= 0)
                break;
    
              Lreads++;
              LbytesRead += iRead;
              // write to output
              if (bIsEncrypt)
              {
                int iEncrypted = CEncryption.encryptData(buffer, iRead, ciper, bufferout);
                LbytesUploaded += iEncrypted;
                outputToServlet.write(bufferout, 0, iEncrypted);
    
              } else
              {
                LbytesUploaded += iRead;
                outputToServlet.write(buffer, 0, iRead);
              }
    
              syncStatus.setFileMonitorValue((int) (100 * Lreads / LfileChunks));
            }
    
            if (bIsEncrypt)
            {
              byte[] atemp = ciper.doFinal();
              LbytesUploaded += atemp.length;
              outputToServlet.write(atemp, 0, atemp.length);
            }
    
            outputToServlet.flush();
            outputToServlet.close();
          }
    
          ByteArrayOutputStream outbytes = new ByteArrayOutputStream();
          InputStream in = servletConnection.getInputStream();
          arbytes = new byte[128];
    
          while (true)
          {
            int iread = in.read(arbytes);
            if (iread > 0)
              outbytes.write(arbytes, 0, iread);
            else
              break;
          }
    
          arbytes = outbytes.toByteArray();
    
          CUtil.setCookiePropsFromHeader(servletConnection, this.propsCookies);
          strResponseDataTag = propsCookies.getProperty("response-data-tag");
    
          HTTPPostRequest.dumpBytes(strThis, arbytes, strResponseDataTag); // for debugging.
    
          ByteArrayInputStream inbytes = new ByteArrayInputStream(arbytes);
          inputFromServlet = new ObjectInputStream(inbytes);
    
          objRet = inputFromServlet.readObject();
    
          inputFromServlet.close();
        } catch (Exception e)
        {
          try
          {
            HTTPPostRequest.dumpBytes(strThis, arbytes, "ERROR-" + strResponseDataTag);
          } catch (Exception ed)
          {
            DebugLogger.printlnError(strThis, "Error calling HTTPPostRequest.dumpResponse()"
              + " in Exception handler. " + e.toString(), e, false);
          }
    
          String strMsg = "Error uploading file: " + afile.getCanonicalPath() + " - " + e.toString()
            + ".  " + "LbytesToUpload=" + LbytesToUpload + ", " + LbytesUploaded + " bytes uploaded ("
            + (LbytesUploaded / 1024) + "K).";
          DebugLogger.printlnError(strThis, strMsg, e, false);
          throw e;
        } finally
        {
          try
          {
            if (fin != null)
              fin.close();
          } catch (Exception ee)
          {
            DebugLogger.printlnError(strThis, ee.toString(), ee, false);
          }
    
          try
          {
            if (outputToServlet != null)
              outputToServlet.close();
          } catch (Exception ee)
          {
            DebugLogger.printlnError(strThis, ee.toString(), ee, false);
          }
    
          try
          {
            if (inputFromServlet != null)
            {
              inputFromServlet.close();
            }
          } catch (Exception ee)
          {
            DebugLogger.printlnError(strThis, ee.toString(), ee, false);
          }
    
          try
          {
            if (servletConnection != null)
              servletConnection.getInputStream().close();
          } catch (Exception ee)
          {
            DebugLogger.printlnError(strThis, ee.toString(), ee, false);
          }
    
        }
    
        return objRet;
      }
    Edited by: sb4 on Mar 8, 2011 6:24 PM
  • 9. Re: URLConnection speed problems
    EJP Guru
    Currently Being Moderated
    ByteArrayOutputStream outbytes = new ByteArrayOutputStream();
    You start wasting time here.
    inputFromServlet = new ObjectInputStream(inbytes);
    inputFromServlet = new ObjectInputStream(servletConnection.getInputStream());

    This saves you both space and time reading the response object. The rest of the code looks OK but I would probably consider using a CipherOutputStream around a BufferedOutputStream around servletConnection.getOutputStream(), to get better buffering behaviour, instead of using the Cipher object directly. This should speed up the output phase.
  • 10. Re: URLConnection speed problems
    843441 Newbie
    Currently Being Moderated
    EJP wrote:
    ByteArrayOutputStream outbytes = new ByteArrayOutputStream();
    You start wasting time here.
    inputFromServlet = new ObjectInputStream(inbytes);
    inputFromServlet = new ObjectInputStream(servletConnection.getInputStream());

    This saves you both space and time reading the response object. The rest of the code looks OK but I would probably consider using a CipherOutputStream around a BufferedOutputStream around servletConnection.getOutputStream(), to get better buffering behaviour, instead of using the Cipher object directly. This should speed up the output phase.
    ByteArrayOutputStream outbytes = new ByteArrayOutputStream();
    You start wasting time here.
    Yes, that was just for dumping raw bytes if there was an error reading the object. Now I would try using a BufferedInputStream wrapper, marking, and resetting if error for dumping.
    would probably consider using a CipherOutputStream around a BufferedOutputStream around
    I'll look into that, thanks.

    -------------------------

    I checked into how we read the input stream after using URLConnection to upload -- it looks like the problem is there, an inefficient read buffering, so hopefully we can use URLConnection after all and don't need JUpload style code with the header / object parsing problem originally posted.

    I still feel that the deprecated ObjectInputStream.readLine() method leaves a hole that is not filled in the Java IO api. It would be nice to have a way to alternate reading character lines and bytes and objects, as the authors of URLConnection must do.

    A possibility would be a fixed ObjectInputStream.readLine(Charset cs) that allows you to attempt such reading or a similarly fixed DataInputStream.

    An "UnbufferedReader" is another possibility, that guarantees it won't read ahead in the underlying InputStream. It might require a "markSupported()" InputStream in its constructor, but that would be OK.

    Edited by: sb4 on Mar 9, 2011 10:33 AM
  • 11. Re: URLConnection speed problems
    843441 Newbie
    Currently Being Moderated
    EJP wrote:
    This saves you both space and time reading the response object. The rest of the code looks OK but I would probably consider using a CipherOutputStream around a BufferedOutputStream around servletConnection.getOutputStream(), to get better buffering behaviour, instead of using the Cipher object directly. This should speed up the output phase.
    I realize another requirement is we want to compute a checksum of the encrypted file (not shown in that example), so I suppose we must encrypt externally for now.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points