7 Replies Latest reply: Jul 15, 2014 11:07 AM by user626164 RSS

    Characterset conversion, MQ UTF-8 to AQ WE8ISO8859P1

    user626164

      Hi all. I have a sticky issue. We are receiving XML data over MQ to AQ. MQ is sending UTF-8 data, as it should. Our Linux server is set up for UTF-8. BUT our database, (11.2.0.4.0) is set up with  we8iso8859p1, so MQManager does not convert. In addition, MQ defines this paylaod as MQSTR, so AQ forces the payload into text_body. Our MQ admins say that the mqmanager converts to the OS dataset automatically. Since there are many queues on this mqmanager we can not change this setting. We also can not change the MQSTR setting on the sending application, as there are multiple consumers for this data.

       

      We need a way, on AQ's MQGET issued from the Java API behind the scense, to tell MQ to translate to the Oracle database characterset. According to the MQ documentation the MQ Java API allows for this, and according to Oracle AQ documentation this is mentioned as possible, but I can find NO examples or giudance on how to do it.

       

      Can anyone help with this? Preferably with an example.

        • 1. Re: Characterset conversion, MQ UTF-8 to AQ WE8ISO8859P1
          Top.Gun

          1st step - the documentation:

           

          Oracle® Database Globalization Support Guide

          http://docs.oracle.com/cd/E11882_01/server.112/e10729/toc.htm


          • 2. Re: Characterset conversion, MQ UTF-8 to AQ WE8ISO8859P1
            user626164

            We have been reading the AQ documentation, not this. This seems to indicate we can bypass AQ completely and connect directly to MQ from within PL/SQL.

             

            Does this mean we can forego using AQ and MGW?

             

            At present, we have been doing dequeue procedures from the AQ queue_table that stores messages retrieved from MQ.

            • 3. Re: Characterset conversion, MQ UTF-8 to AQ WE8ISO8859P1
              user626164

              We have been reading the AQ documentation, not this. This seems to indicate we can bypass AQ completely and connect directly to MQ from within PL/SQL.

               

              Does this mean we can forego using AQ and MGW?

               

              At present, we have been doing dequeue procedures from the AQ queue_table that stores messages retrieved from MQ.

              • 4. Re: Characterset conversion, MQ UTF-8 to AQ WE8ISO8859P1
                Top.Gun

                Normally, if you were using a client such as sqlplus, you setup the client environment variable to UTF-8.

                This means the database will recognise the data from the client is UTF-8, and will auto convert to we8iso8859p1.

                You'll need to try this for MQ....

                • 5. Re: Characterset conversion, MQ UTF-8 to AQ WE8ISO8859P1
                  Top.Gun

                  Alternatively, you can convert your database to AL32UTF8 (ie UTF-8), but it would be a large task.....

                  • 6. Re: Characterset conversion, MQ UTF-8 to AQ WE8ISO8859P1
                    WGabriel

                    Hello,

                     

                    to your last point: "This seems to indicate we can bypass AQ completely and connect directly to MQ from within PL/SQL."

                    Yes, that is indeed possible. In this case you neither need to create an AQ nor Message Gateway (MGW) for an inbound propagation.

                     

                    Using the MQ API it is possible get MQ data from within PL/SQL. But here you have to create a PL/SQL wrapper

                    module for a java stored procedure. In the JSP you have to import the necessary java classes from IBM

                    (i.e. import com.ibm.mq.MQEnvironment, com.ibm.mq.MQException, com.ibm.mq.MQMessage etc.).

                     

                    You can then store a MQ message in e.g. a CLOB (or XMLTYPE) object using a character conversion

                    function (here to ISO88591):

                     

                    public CLOB getMessageAsCLOB(...)

                    {

                    MQMessage msg = new MQMessage();

                    ...

                    byte[] bytes = new byte[msg.getMessageLength()];

                    msg.readFully(bytes);

                    String payloadStr = new String(bytes, "ISO-8859-1");

                     

                    clob = CLOB.createTemporary(getConnection(), false, 10);

                    clob.open(1);

                    clob.setString(1L, payloadStr);

                    clob.close();

                    ...

                    return clob;

                     

                    Besides this, if you are using the message gateway (MGW) for an inbound propagation and an AQ queue you have to

                    check if the payload data (CLOB or XMLTYPE) in UTF8 format will be transformed correctly according to the

                    database character set (ISO8859P1) after DBMS_AQ.DEQUEUE.

                     

                    Kind regards,

                    WoG

                    • 7. Re: Characterset conversion, MQ UTF-8 to AQ WE8ISO8859P1
                      user626164

                      OK, so here is what we did.

                       

                      First, the data we are receiving is XML format.

                       

                      It turns out that we had multiple character set issues in all the pieces.

                       

                      MQ was not converting messages to the OS character set of UTF-8, our admin made an assumption without checking that the MQ Manager was configured to shop standards, when it was not. Instead, it was configured to convert to 8859-1 (which is Oracle DB character set we8iso8859p1). This was done (assumption) because the original parts of the application using this are c programs that dequeued mainframe-sourced data from MQ, that had been converted by MQ from EBCDIC to ASCII, and inserted directly into Oracle without touching the OS file-system directly. So they configured MQ to do the conversion before the application received it.

                       

                      The issue is that the new source data I was getting was created on Windows server in non-UTF-8, (UCS-2?), converted to UTF-8 when enqueued to MQ, then converted to we8iso8859p1 in our incoming MQ. MQ does not care if it is XML format. Plus that meant we had invalid characters in MQ before we saw it on the dequeue, which caused issues.

                       

                      There is also an issue with this conversion sequence, where one of the substitution characters that results is invalid in we8iso8859p1. This is a x'1A', or Oracle chr(26). this is apparently a known problem that is not addressed by default in any conversion process I found. So it needs to be handled programmatically. I saw many explanations for this issue, including Windows using lower value characters as control characters, but that does not seem to be the case here.

                       

                      So we set up a separate MQ manager that conforms to shop standards and converts incoming data to the OS character set UTF-8. In this case that means no conversion, (UTF-8 to UTF-8). This meant that the conversion error would be moved out of MQ to a place where we have programmatic control over it and can handle it ourselves.

                       

                      We then created a dequeue transformation function to convert the incoming SYS.MGW_BASIC_MSG_T TEXT_BODY payload into xmltype. When this happens, if any xml message has invalid characters it is written to the message error queue and then message error queue table. Since this is rare, we handle these in a separate process where we replace the chr(26) resulting in a valid xml doc. Any other errors, if any, are handled after that in the normal error handling process.

                       

                      Oracle is also smart enough to read the xml doc and convert the characters set to we8iso8859p1, and also update the encoding="UTF-8" to encoding="ISO-8859-1" in the XML document header.

                       

                      So at this point we are up and running.

                       

                      While I and many others here have urged converting all out DB's to UTF-8, just the DB I am working on is sitting around 25TB compressed, and it is not the largest. So it is no small task and not likely to be done soon.