6 Replies Latest reply: Aug 19, 2013 11:50 PM by Raunaq RSS

    Handling multibyte characters


      Hi ,


      I have created a procedure which sends e-mail using UTL_SMTP.


      The procedure has a part in which we add the attachments to e-mail.


      Now , the issue is when i am adding an attachment which contains multibyte characters , these characters are replaced with '?'.


      Can anyone provide any guidance on this?

        • 1. Re: Handling multibyte characters
          Sergiusz Wolicki-Oracle

          Have you tried UTL_MAIL instead?  UTL_SMTP is very low level and it is not easy to get an attachment added properly.  Is your attachment of text/plain type?


          UTL_STMP.WRITE_DATA documentation has this paragraph:


          "Text (VARCHAR2) data sent using WRITE_DATA is converted to US7ASCII before it is sent. If the text contains multibyte characters, each multibyte character in the text that cannot be converted to US7ASCII is replaced by a '?' character. If 8BITMIME extension is negotiated with the SMTP server using the EHLO subprogram, multibyte VARCHAR2 data can be sent by first converting the text to RAW using the UTL_RAW package, and then sending the RAW data using WRITE_RAW_DATA."


          If the server has not negotiated 8BITMIME, you have to transfer-encode the content before sending, e.g. using quoted-printable or base64 encoding (see UTL_ENCODE). Proper MIME headers need to be sent to inform the recipient of the encoding.



          -- Sergiusz

          • 2. Re: Handling multibyte characters

            The reason i am not using UTL_MAIL is because i need to support sending multiple attachments .

            But you have provided very helpful information , kind of what i was looking for.


            But , mine is the latter case i.e. the non 8BITMIME part.

            • 3. Re: Handling multibyte characters

              I have added this code


              PROCEDURE add_mail_attachment



                           IN_ATT_MIME_TYPE  in  varchar2 default 'text/plain; charset=us-ascii',

                            in_attachment     IN  NCLOB    ,

                            in_att_file_name  in  varchar2 character set any_cs default null,

                            in_mail_conn      IN  UTL_SMTP.CONNECTION



              l_mail_conn       UTL_SMTP.connection;

              l_step            PLS_INTEGER  := 24573;




                 --Adding the attachments

                  if in_att_file_name is not null then


                    UTL_SMTP.WRITE_DATA(L_MAIL_CONN, 'Content-Type: ' || IN_ATT_MIME_TYPE ||'charset="us-ascii"'||'; name="' || IN_ATT_FILE_NAME || '"' || UTL_TCP.CRLF);

                    UTL_SMTP.write_data(l_mail_conn, 'Content-Transfer-Encoding: base64' || UTL_TCP.crlf);

                    UTL_SMTP.write_data(l_mail_conn, 'Content-Disposition: attachment; filename="' || in_att_file_name || '"' || UTL_TCP.crlf || UTL_TCP.crlf);


                    for I in 0..TRUNC((DBMS_LOB.GETLENGTH(IN_ATTACHMENT) - 1 )/L_STEP) LOOP

                      UTL_SMTP.write_data(l_mail_conn, UTL_RAW.cast_to_raw(UTL_ENCODE.base64_encode(DBMS_LOB.substr(in_attachment, l_step, i * l_step + 1))));

                    END LOOP;


                    utl_smtp.write_data(l_mail_conn, utl_tcp.crlf || utl_tcp.crlf);

                  end if;



              Let me know if there is any error in this??


              I am passing text/plain in IN_ATT_MIME_TYPE parameter.

              Text_file.txt in in_att_file_name

              A NCLOB value which contains chinese characters in in_attachment .


              I have similar pocedures for CLOB and BLOB which are working fine.


              Still i am getting ? , can you suggest anything?


              Message was edited by: Raunaq

              • 4. Re: Handling multibyte characters

                Guys anymore suggestions?

                Still i am stuck with the same problem

                • 5. Re: Handling multibyte characters
                  Sergiusz Wolicki-Oracle

                  First, you should not append 'charset="us-asci"' in this line:


                    UTL_SMTP.WRITE_DATA(L_MAIL_CONN, 'Content-Type: ' || IN_ATT_MIME_TYPE ||'charset="us-ascii"'||'; name="' || IN_ATT_FILE_NAME || '"' || UTL_TCP.CRLF);


                  The default IN_ATT_MIME_TYPE has this clause already, hence you would have a duplicate. Moreover, you add it without the required preceding semicolon. Further, in the Content-Type, you should pass the original character set of the file, not "us-ascii". This character set must support characters included in the file.


                  Second, the NCLOB is not written correctly either. UTL_ENCODE.BASE64_ENCODE expects a RAW value. If you give it an NVARCHAR2 value returned by DBMS_LOB.SUBSTR, then PL/SQL will implicitly apply HEXTORAW.to the value. HEXTORAW fails, if the NCLOB content is not a valid sequence of hex digits. Treating the content of NCLOB as a string of hex digits is obviously not your goal. You should use UTL_I18N.STRING_TO_RAW to convert NVARCHAR2 from DBMS_LOB.SUBSTR to the desired target encoding (the one specified in Content-Type) and cast it to RAW at the same time. UTF-8 (i.e. AL32UTF8) is usually the best choice for the target encoding. You should then apply UTL_RAW.CAST_TO_VARCHAR2 to change the RAW representation of base64-encoded value to VARCHAR2 expected by UTL_SMTP.WRITE_DATA.


                  Of course, passing DBMS_LOB.SUBSTR result directly to UTL_ENCODE.BASE64_ENCODE would make sense for a BLOB attachment. However, even then the encoded result should be passed to UTL_RAW.CAST_TO_VARCHAR2, not UTL_RAW.CAST_TO_RAW.


                  Third, if you use UTF-8 as Content-Type encoding, you may want to prepend three bytes (0xEF 0xBB 0xBF) to the NCLOB value before base64 encoding. This three-byte character is the UTF-8 Byte Order Mark. It helps some editors, such as Notepad, to recognize the file as encoded in UTF-8.


                  Fourth, if the target encoding is UTF-8, l_step should be no more than 8191. This is to avoid intermediate values exceeding 32767 bytes.


                  Fifth, the whole procedure will not work well on EBCDIC platform. In contrary to what documentation says, UTL_SMTP.WRITE_DATA does not seem to convert data to US7ASCII before sending (unless the package is ported separately by platform vendors). I guess this is not your worry but I thought I will mention this, just in case.





                  • 6. Re: Handling multibyte characters

                    It's working fine now.


                    First , i found out whether the SMTP server supports 8BITMIME extension , in my case it did


                    Then Keeping everything the same , i first converted data into RAW and then used WRITE_RAW_DATA.