This discussion is archived
9 Replies Latest reply: Mar 8, 2012 7:51 AM by 922634 RSS

Best performance method for retrieving all messages on IMAP server?

920251 Newbie
Currently Being Moderated
Hello all,

I am currently using the JavaMail API for retrieving e-mails on IMAP and IMAPS servers. The typical number of mails is between 10000 and 50000.

From what I have read on the FAQ, the current behavior is the following: to avoid out of memory errors, a FETCH IMAP command is used each time we will use a getContent() or a getInputStream() for a message.

Unfortunatly, this behavior seems to be costly performance-wise in my case, as my application will be often asked to retrieve every mail on the server.

I have used the FetchProfile items to boost performance as much as possible, but, content-wise, this did not seem possible.

Hence my question: how can I retrieve as much body contents as possible in one fetch? Or is there a workaround that would not imply working directly at the IMAPProtocol level?


Thank you in advance
  • 1. Re: Best performance method for retrieving all messages on IMAP server?
    bshannon Pro
    Currently Being Moderated
    The fetch method is a way to pre-fetch some data in a batch. Since the IMAP provider doesn't
    cache message body content, there's no way to pre-fetch the content.

    Since most of the data in a message is in the message content, not the message headers,
    and JavaMail isn't caching that content, you'll need to make sure your application isn't holding
    references to that content that would prevent it from being garbage collected.

    If you think the message headers themselves are causing you to run out of memory, you
    can use the IMAPMessage.invalidate method to invalidate JavaMail's cache of that data.
  • 2. Re: Best performance method for retrieving all messages on IMAP server?
    920251 Newbie
    Currently Being Moderated
    Thanks for your reply, however, the problem is not that I am running out of memory here. I am trying to accelerate the phase when retrieving every message (hence their contents) from the server, and trying to find a way that would avoid doing a fetch for every getContent().

    For instance, if I had a limit maxContents, I would do a fetch to get message contents until that maxContents limit is reached. I would repeat this until I got every message content.
    If the total size of message contents on the IMAP server was totalSize, I would then require totalSize / maxContents fetches (instead of n fetches, n being the number of mails on the server).

    Am I still being unclear about my need?
  • 3. Re: Best performance method for retrieving all messages on IMAP server?
    bshannon Pro
    Currently Being Moderated
    There's no support in JavaMail for bulk fetching the contents of more than one message.

    You could have one thread reading the messages while another thread processes them,
    allowing you to overlap the I/O with the processing, but you'll still pay for at least one
    round trip for each message.
  • 4. Re: Best performance method for retrieving all messages on IMAP server?
    920251 Newbie
    Currently Being Moderated
    Right, however, using the Folder.doCommand(IMAPFolder.ProtocolCommand) method, one could create his/her own IMAP commands for bulk fetching contents of multiple messages:

    inbox.doCommand(new IMAPFolder.ProtocolCommand() {

    @Override
    public Object doCommand(IMAPProtocol protocol) throws ProtocolException {
    Argument args = new Argument();
    args.writeString("1:11003");
    args.writeString("BODY[TEXT]");
    Response[] r = protocol.command("FETCH", args);
    Response response = r[r.length - 1];

    System.out.println("Length of response: " + r.length);

    if (response.isOK()) {
    IMAPResponse imaprep;

    for (int i = 0 ; i < r.length ; i++) {
    if (r[i] instanceof IMAPResponse) {
    imaprep = (IMAPResponse)r;


    String n = imaprep.toString();
    try {
    n = MimeUtility.decodeText(n);
    } catch (UnsupportedEncodingException e1) {
    // TODO Auto-generated catch block
    e1.printStackTrace();
    }
    File file = new File("Contents_" + i);
    FileWriter fw;
    try {
    fw = new FileWriter(file);
    BufferedWriter out = new BufferedWriter(fw);
    out.write(n);
    out.close();
    } catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    }

    // r[i] = null;

    } else {
    System.out.println("Other");

    }

    }
    }
    // dispatch remaining untagged responses
    protocol.notifyResponseHandlers(r);
    protocol.handleResult(response);

    return null;
    }
    });

    This works: I can process messages et a speed of 179mess/sec instead of the 120 I get when using pure JavaMail. My problem would be now on how to handle the data as it is in the IMAP format:

    * 1005 FETCH (BODY[TEXT] {1326549}

    Message contents

    )

    Also, I will now need to parse the Message contents, or translate them from the multiple possibilities (MultiPart, etc). Any ideas on this matter?
  • 5. Re: Best performance method for retrieving all messages on IMAP server?
    bshannon Pro
    Currently Being Moderated
    As you've discovered, you're now down at the level of a JavaMail implementor, not a JavaMail user.
    You might want to look at the JavaMail source code to find classes you can reuse to parse the
    IMAP response. Once you extract the raw MIME message content, you can use the MimeMessage
    constructor that takes an InputStream to parse it. Also look at SharedByteArrayInputStream.
  • 6. Re: Best performance method for retrieving all messages on IMAP server?
    920251 Newbie
    Currently Being Moderated
    Hello everyone!

    First off, thanks to Bill for his precious advice and tips on handling my case. After some work, I was able to do what i wanted, ie bulk fetch the contents of multiple mails in one command.

    The trick is to define your own IMAPProtocol.Command. Here is the one I defined:


    public class CustomProtocolCommand implements IMAPFolder.ProtocolCommand {

    /** Index on server of first mail to fetch **/
    int start;

    /** Index on server of last mail to fetch **/
    int end;

    public CustomProtocolCommand(int start, int end) {
    this.start = start;
    this.end = end;
    }

    @Override
    public Object doCommand(IMAPProtocol protocol) throws ProtocolException {
    Argument args = new Argument();
    args.writeString(Integer.toString(start) + ":" + Integer.toString(end));
    args.writeString("BODY[]");
    Response[] r = protocol.command("FETCH", args);
    Response response = r[r.length - 1];
    if (response.isOK()) {

    Properties props = new Properties();

    // props.setProperty("mail.imaps.ssl.trust", host);
    props.setProperty("mail.store.protocol", "imap");
    props.setProperty("mail.mime.base64.ignoreerrors", "true");
    props.setProperty("mail.imap.partialfetch", "false");
    props.setProperty("mail.imaps.partialfetch", "false");
    Session session = Session.getInstance(props, null);

    FetchResponse fetch;
    BODY body;
    MimeMessage mm;
    ByteArrayInputStream is = null;


    // last response is only result summary: not contents
    for (int i = 0 ; i < r.length - 1 ; i++) {

    if (r[i] instanceof IMAPResponse) {

    fetch = (FetchResponse)r;
    body = (BODY)fetch.getItem(0);
    is = body.getByteArrayInputStream();
    try {
    mm = new MimeMessage(session, is);

    Contents.getContents(mm, i);

    } catch (MessagingException e) {
    e.printStackTrace();
    }

    }

    }

    }
    // dispatch remaining untagged responses
    protocol.notifyResponseHandlers(r);
    protocol.handleResult(response);

    return "" + (r.length - 1);
    }

    }

    the getContents(MimeMessage mm, int i) function is a classic function that recursively prints the contents of the message to a file (many examples available on the net).

    To avoid out of memory errors, I simply set a maxDocs and maxSize limit (this has been done arbitrarily and can probably be improved!) used as follows:

    public int efficientGetContents(IMAPFolder inbox, Message[] messages) throws MessagingException {
    FetchProfile fp = new FetchProfile();
    fp.add(FetchProfile.Item.FLAGS);
    fp.add(FetchProfile.Item.ENVELOPE);
    inbox.fetch(messages, fp);
    int index = 0;
    int nbMessages = messages.length;
    final int maxDoc = 5000;
    final long maxSize = 100000000; // 100Mo

    // Message numbers limit to fetch
    int start;
    int end;

    while (index < nbMessages) {
    start = messages[index].getMessageNumber();
    int docs = 0;
    int totalSize = 0;
    boolean noskip = true; // There are no jumps in the message numbers list
    boolean notend = true;
    // Until we reach one of the limits
    while (docs < maxDoc && totalSize < maxSize && noskip && notend) {

    docs++;
    totalSize += messages[index].getSize();
    index++;
    if (notend = (index < nbMessages)) {
    noskip = (messages[index - 1].getMessageNumber() + 1 == messages[index].getMessageNumber());
    }
    }

    end = messages[index - 1].getMessageNumber();
    inbox.doCommand(new CustomProtocolCommand(start, end));

    System.out.println("Fetching contents for " + start + ":" + end);
    System.out.println("Size fetched = " + (totalSize / 1000000) + " Mo");

    }

    return nbMessages;

    }

    Note the notskip boolean: in my case, I do not want to fetch all messages. Since IMAP commands impose to use sequence such as 1:98 or 1:* (we can't use a sequence such as [1:3 4:10] to my knowledge), we check here if we have reached a skip in message numbers. If so, we can launch our IMAP command FETCH start:end BODY[]

    If anyone sees any improvements, they are more than welcome!

    To JavaMail developers: any chance in seeing this feature more accessible in future devs?

    Hope this may help others out there. I will post performance gain measures to make sure this work has not been useless :-)
  • 7. Re: Best performance method for retrieving all messages on IMAP server?
    bshannon Pro
    Currently Being Moderated
    If all you're doing with this folder is bulk fetching messages, and nothing else, this will probably work.
    Otherwise, there's an additional issue to be aware of...

    Message numbers that you see through the JavaMail API may not be the same as
    the message numbers used in the IMAP protocol commands if any messages have been expunged,
    especially including if the messages were expunged by another client accessing the same folder.

    The com.sun.mail.imap.Utility.toMessageSet() method performs this mapping (and handles ranges, gaps, etc.),
    but it relies on a set of IMAPMessage objects. The MessageCache.seqnumOf() method does this mapping for
    a single message number.
  • 8. Re: Best performance method for retrieving all messages on IMAP server?
    920251 Newbie
    Currently Being Moderated
    Well spotted. I ran into this issue further down in my processing chain.

    Since I have been using UIDs for referencing emails on my client side, I simply used those when fetching:
    UID FETCH 1:... BODY[]

    The sequence numbers being UIDs. I believe there is also a UIDSet class for handling those if anyone does not wish to rewrite its own Set procedures.
  • 9. Re: Best performance method for retrieving all messages on IMAP server?
    922634 Newbie
    Currently Being Moderated
    Thank you! This is very helpful. I did quick tests and see significant performance gain.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points