11 Replies Latest reply on Nov 26, 2019 7:54 AM by Martien van den Akker

    Character encoding in Oracle Service Bus

    3376635

      Hi, I am trying to get the first 2000 characters of a specific field in a form to use it for creating a new form via ODI-OSB. In ODI, the string was successfully trimmed down to VARCHAR2 (2000) and was stored in the database. However, when OSB tries to create the form it returns an error saying that the field is more than 2000 characters. Currently we are using utf-8 in the XQuery. Any suggestions?

        • 1. Re: Character encoding in Oracle Service Bus
          Martien van den Akker

          Do you mean that you try to save a string of 2000 chars from OSB into this varchar column in the database?

          If so, then check the database table, using SQL Developer. Is the column defined as varchar2(2000 char) or varchar2(2000 byte)?

           

          UTF-8 encodes the string with upto 4 bytes per character (https://en.wikipedia.org/wiki/UTF-8 ). So a UTF-8 string of 2000 characters may need more than 2000 bytes.

          If your database has UTF-8 as character set and you define the column as varchar2(2000 char), it should support the variable encoding.

           

          Regards,
          Martien

          • 2. Re: Character encoding in Oracle Service Bus
            3376635

            Thanks Martien. I have consulted with my colleague on checking the column field defined however we cannot check if it is varchar2(2000 char) or varchar2(2000 byte). Only varchar2(2000) was displayed. Do you have a suggestion on how to check this one?

             

            Also for the character set in the database, we have AL32UTF8 and AL16UTF16.

             

            Thanks.characterset.png

            • 3. Re: Character encoding in Oracle Service Bus
              Martien van den Akker

              Hi,

               

              You can see it nicely in SQLDeveloper:

              I've looked for how to see it in the database. It's in the user_tab_cols:

              select table_name, column_name, data_type, data_length, char_used

              from user_tab_cols

              where table_name = 'MY_TABLE'

               

              It's the char_used column, it could by 'B' for 'BYTES' and 'C' for 'CHAR'. (See also https://docs.oracle.com/cd/B19306_01/server.102/b14237/statviews_2094.htm )

               

              The default is BYTES, so it's likely that at your column bytes is used. But it can be changed to CHAR quite easily.

               

              Kind regards,

              Martien

              • 4. Re: Character encoding in Oracle Service Bus
                3376635

                Hi Martien,

                 

                Please see details below on the char_used -

                OSB

                Source table

                 

                Thank you.

                • 5. Re: Character encoding in Oracle Service Bus
                  Martien van den Akker

                  Hi,

                   

                  They're both Bytes apparently. That means that if there are no transformations/enrichments the source value should fit into the target. But it does not take into account the variance in number of bytes per character. So theoretically, it still could be that you have 2000 characters where the average has more than 2 bytes, it could not fit in 4000 bytes.

                   

                  You also talk about first 2000 chars of a string and this column is 4000.

                  Could you show a bit more or your code (Xquery), source value/target value and most of all: the exact exception?

                   

                  Kind regards,
                  Martien

                  • 6. Re: Character encoding in Oracle Service Bus
                    3376635

                    Hi Martien,

                     

                    For now, we can provide this information and will update you soon.

                     

                    The screenshot below is from a form and was stored in one of the source table.

                    When this was processed by OSB to create another form, the log table returned an error which states that comments field should not be more than 2000 characters.

                    As per checking on the BLOB, this happened on the comments field -

                     

                    Thank you.

                    • 7. Re: Character encoding in Oracle Service Bus
                      Martien van den Akker

                      Hi,

                       

                      I see three times 'TEST25' + 'END' in the comments field (with a chariage return ascii(13) for each line). In your comments message I see it 6 times 'TEST25', but no 'END'. These aren't the complete text I presume. Is the text in reality exactly the same, or is there some transformation code that duplicates the lines unintendly?

                      What are the exact string lengths?

                       

                      Kind regards,
                      Martien

                      • 8. Re: Character encoding in Oracle Service Bus
                        3376635

                        Hi Martien van den Akker, apologies on the very late response. Please see comments below -

                         

                        Q: I see three times 'TEST25' + 'END' in the comments field (with a chariage return ascii(13) for each line). In your comments message I see it 6 times 'TEST25', but no 'END'. These aren't the complete text I presume.

                        A: For the first screenshot which has an END, this is from the BP form and the second screenshot is from the log table wherein the nextline character has been transformed causing the bottom/last characters in the form being cut.

                         

                        Below is the sample line of codes in the XQuery, please note I have removed other variables and renamed others also -

                         

                        xquery version "1.0" encoding "utf-8";

                        (:: OracleAnnotationVersion "1.0" ::)

                        declare namespace ns2="NS2";

                        (:: import schema at "../Schema/SCHEMA2.xsd" ::)

                        declare namespace ns1="NS1";

                        (:: import schema at "../Schema/SCHEMA1.xsd" ::)

                        declare variable $variableCreateFORM as element() (:: schema-element(ns1:VOWCCollection) ::) external;

                        declare function local:func($variableCreateFORM as element() (:: schema-element(ns1:VOWCCollection) ::)) as element() (:: schema-element(ns2:createBPRecord) ::) {

                                    <ns2:createBPRecord>

                                    <shortname>{dvmtr:lookup("shortname")}</shortname>

                                    <authcode>{dvmtr:lookup("authcode")}</authcode>

                                    <BPXML>

                                    <List_wrapper>

                                    <_bp>

                                       <status>Completed</status>

                                       <Comments>{fn:data($variableCreateFORM/ns1:UpdateUnifierVowc/ns1:comments)}</Comments>

                                    </_bp></List_wrapper>

                                    </BPXML>

                                </ns2:createBPRecord>

                        };

                         

                        local:func($variableCreateFORM)

                         

                         

                         

                        Let me know if you have other questions.

                        • 9. Re: Character encoding in Oracle Service Bus
                          Martien van den Akker

                          Unfortunately it is quite unclear what happens here.
                          What I would do is try to intercept (log) the message before and after the xquery. And then try to test the xquery from JDeveloper.
                          Could you post those messages before and after?

                           

                          By the way, is 'variableCreateFORM' or the child element in fact  UpdateUnifierVowc a repetative/iterative element? Are there more occurences of that element in the input message with comments? In that case the transform to text by fn:data() might not be what you expect.

                           

                          Kind regards,
                          Martien

                          • 10. Re: Character encoding in Oracle Service Bus
                            3376635

                            Hi Martien, I have attached sample log file for this issue. Thank you.

                            • 11. Re: Character encoding in Oracle Service Bus
                              Martien van den Akker

                              Hi,

                               

                              I assume that the contents of cells A4 and B4 have to be concatenated.

                              I did, and it was enclosed with double quotes and the attributes were enclosed by double double quotes. So I concatenated them and removed the obsolete quotes.

                               

                              Then I abstracted the contents of the comments element into a new document:

                              It is 3K long. It apparently has windows line endings (CRLF). This might be because it is copied and pasted into Excel. I don't know how you got it from OSB into the excel? But since the list has 244 lines, transforming it to Linux format, would remove the CR's and that would bring the length on 2927 bytes.

                              But what is typical is that besides the line-endings because of the formatting of the xml, it also contains &#13; which is the xml character encoding of the carriage return. If I remove those, keeping only the Linux LineFeeds, I get:

                              It is 1712 characters, which would fit in the column neatly.

                               

                              What you could try is to use fn-bea:serialize(XML) on the comments element, instead of the data() function.

                              Or use fn:replace() function to replace the &#13; characters to CR's or to remove them at all. (FunctX XQuery Functions: fn:replace )

                               

                              Kind regards,
                              Martien

                              1 person found this helpful