This discussion is archived
3 Replies Latest reply: Jan 6, 2013 2:02 AM by bshannon RSS

Get content from e-mail with text/html as content type

981862 Newbie
Currently Being Moderated
Hey there,

I am working on an e-mail application in Java. This application basically enters a mailbox every few minutes and loops trough unread e-mails for certain subjects. If a subject is found I want to retreive the content of said e-mail. Retreiving the content works fine with e-mails sent from Gmail, Outlook (desktop client), Hotmail.

However, when I am trying to get the content of an e-mail sent by an Office 365 webclient I get returned an text/html content type. I printed the content and found out it exists out of HTML code. But this HTML code isn't a good format:


+<html dir=3D"ltr">+
+<head>+
+<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-=+
+1">+
+<style type=3D"text/css" id=3D"owaParaStyle"></style>+
+</head>+
+<body fpstyle=3D"1" ocsi=3D"0">+
+<div style=3D"direction: ltr;font-family: Tahoma;color: #000000;font-size: =+
+10pt;"> +
+<div><span style=3D"font-family: 'Segoe UI', Helvetica, Arial, sans-serif;"=+
+>Geachte heer/mevrouw,</span><br style=3D"font-family: 'Segoe UI', Helvetic=+
+a, Arial, sans-serif;">+
+<br style=3D"font-family: 'Segoe UI', Helvetica, Arial, sans-serif;">+
+<span style=3D"font-family: 'Segoe UI', Helvetica, Arial, sans-serif;">Wij =+
hebben uw inzending ontvangen en gecontroleerd. Hierbij het verslag van</sp=
an><br style=3D"font-family: 'Segoe UI', Helvetica, Arial, sans-serif;">
+<span style=3D"font-family: 'Segoe UI', Helvetica, Arial, sans-serif;">de c=+
ontrole.</span><br style=3D"font-family: 'Segoe UI', Helvetica, Arial, sans=
-serif;">

Note the = symbols.

When I am trying to get the content of Gmail, Outlook or Hotmail I get back text/plain as content type, just text no HTML or random symbols.

How can I solve this, I tried parsing the content with Jsoup, but the random = symbols cause problems.

Any help is appreciated,

Thanks!
  • 1. Re: Get content from e-mail with text/html as content type
    bshannon Pro
    Currently Being Moderated
    This was answered in your stackoverflow post:
    http://stackoverflow.com/questions/14066910/parse-text-html-data-with-javamail
  • 2. Re: Get content from e-mail with text/html as content type
    981862 Newbie
    Currently Being Moderated
    bshannon wrote:
    This was answered in your stackoverflow post:
    http://stackoverflow.com/questions/14066910/parse-text-html-data-with-javamail
    I know, but isn't there another way to get the content of a part which is marked with content type = "text/html".

    part.writeTo(OutputStream) gives back the whole raw message including headers. Plus it will print line-break characters. For other parts I know you can just do part.getContent to get it's content.

    part.getInputStream doesn't seem to work for me. I get back an empty line when printing the stream.

    I only need the HTML part of the message, I tried to remove all headers by doing the following:

    Enumeration headers = part.getAllHeaders();
    while (headers.hasMoreElements()) {
    Header h = (Header) headers.nextElement();
    System.out.println(h.getName() + ": " + h.getValue());
    part.removeHeader(h.getName());
    }

    But I get the following exception: javax.mail.IllegalWriteException: IMAPMessage is read-only

    while I've opened the folder as Read and Write. I really don't know how to continue with this,

    Help is very much appreciated!

    Edited by: 978859 on 4-jan-2013 6:59

    Edited by: 978859 on 4-jan-2013 7:04
  • 3. Re: Get content from e-mail with text/html as content type
    bshannon Pro
    Currently Being Moderated
    If you're sure the message has data and the getInputStream method isn't returning the data, we'll need to do some debugging.
    Find the JavaMail FAQ and read the debugging section. Then post the protocol trace showing what happens when you use
    getInputStream. Also, try using the msgshow.java demo program to read the message. That will help determine whether there's
    a bug in your code, a bug in the server, or something else is wrong.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points