11 Replies Latest reply on Mar 13, 2007 2:37 PM by 561826

    File Adapter skipping malformed records

    561826
      I am receiving a tab delimited file via FTP from an automated process which ordinarily wouldn't be a problem except I consistently end up with a rejectedMessage because the file I am processing has a footer record with a line count on it.

      The file is tab delimited and has 10 fields. It begins with 6 lines of header and ends with 1 line of footer.

      The Header was easy enough to skip over, but the footer is causing me no end of problems because the File Adapter is rejecting the entire file when it hits this malformed record.

      I tried to turn on uniqueMessageSeparator as ${eol} but that didn't seem to do anything different (I suspect it is because ${eol} is also a field terminator for the last field on each row but I'm not sure).

      I tried using the "choice" method of conditional processing but is seems that I must provide a string that is an exact match for nxsd:conditionValue but there is no exact match that I can provide because the data doesn't create a specific string (each record is just a series of names and numbers). I don't seem to have the option to say "not one of these strings", which I could populate with the Header/Footer strings, nor do I seem to be able to use a regex match.

      I also tried to use the "Sequence" method of conditional processing to treat any record that doesn't begin with a digit as a "NotData" Message consisting of 1 string element terminated by ${eol} and any record that begins with 8 digits (The first field on the Data lines is a date in YYYYMMDD format) as a "Data" record through the "nxsd:startsWith" function but it seems I again can only create different message types on files when there is some specific string to test for. That's not going to work for me.

      There is no specific string on the records to identify them. The format of the line could be used to identify the records if a function supporting a regex could be found. Ideally I'd like to say any line that has 9 tabs is a "Data" line, and any line that doesn't is a "NotData" line and can be ignored but I haven't found any function that will take a regular expression match as a condition for processing in the XSD.

      Further, when I tried to deploy the "Sequence" method using a regex for the nxsd:startsWith value (just to see what would happen) the moment it picked up a file, it seemed to put my BPEL Process into an infinite loop consuming 100% CPU. I had to kill the app server to stop it.

      A detailed reference on how the File Adapter processes files and matches them against an XSD and what functions are available for ignoring/skipping bad records and/or conditional processing would be appreciated. I've found a couple scant references to the options available in the nxsd namespace, but hardly anything I'd consider real documentation.

      The uniqueMessageSeparator seems like it could be useful to me but I can't get it to do anything meaningful.

      Any ideas on how to get this file parsed reliably would be appreciated. I would consider it rather poor if the only way for me to process this file reliably was to take each line as a "LineData" message and have to parse it inside the BPEL Process itself.

      Thanks
        • 1. Re: File Adapter skipping malformed records
          Rahulgupta-Oracle
          Hi,
          Can you send a sample tab delimited file of your process.

          Thanks
          Rahul
          • 2. Re: File Adapter skipping malformed records
            561826
            I expected my thread watch to let me know if anyone had responded. Sorry about the delay. Not sure how to attach files.

            I've made a sample file with the tabs replaced by ~.
            The trailing ~ characters at the end of some lines are there because the tabs are there in the source file. There is also a blank line at the very end below LINE COUNT.

            Here is the file:
            PRODUCT:~USER DATA INDICES~
            PACKAGE:~ALL USERS~
            DATA DATES:~20070227 to 20070227~
            DATA STATUS:~FINAL~

            Date~User~Field3~Field4~Field5~Field6~Field7~Field8~Field9~Field10~
            20070227~User1~-1.0~21.0~3.01~-14.0~985.0~-6.132345~723.0~Open
            20070227~User2~2.0~22.0~3.02~34.0~325.0~6.285413~732.0~Closed
            20070227~User3~3.0~-32.0~3.03~-44.0~755.0~-6.384765~723.0~Pending
            20070227~User4~-4.0~42.0~3.04~64.0~375.0~6.493753~732.0~Working
            20070227~User5~5.0~52.0~3.05~94.0~355.0~-6.592375~723.0~On Hold
            20070227~User6~6.0~-62.0~3.06~-84.0~235.0~6.691743~732.0~Pending
            20070227~User7~7.0~72.0~3.07~54.0~855.0~-6.537594~723.0~On Hold
            20070227~User8~-8.0~82.0~3.08~14.0~655.0~6.982459~732.0~Working
            20070227~User9~9.0~-92.0~3.09~-24.0~155.0~-6.723474~723.0~Closed
            20070227~User10~10.0~102.0~3.10~54.0~415.0~6.842475~732.0~Open
            20070227~User11~-11.0~-12.0~3.02~-74.0~125.0~-6.225678~723.0~Closed
            LINE COUNT:~18~
            • 3. Re: File Adapter skipping malformed records
              Rahulgupta-Oracle
              Hi,
              Try using nxsd:style="array" for the data with nxsd:arrayTerminatedBy="Line" tp specify the end of the data part, to avoid the footer

              HTH
              Rahul
              • 4. Re: File Adapter skipping malformed records
                564808
                Is it possible to get a copy of your final xsd?

                Thks Craig
                • 5. Re: File Adapter skipping malformed records
                  561826
                  Again, my apologies for the delay. The "Watch Thread" function is not doing anything for me to notify me of posts and my other duties have distracted me from checking this forum daily. I really appreciate the assistance.

                  I'm sorry I don't understand about where to put the nxsd:style="array"

                  Does that mean each line is to be treated as an array? I see how that can solve my parsing problem, but once I get the records into my process, I transform it using an XSLT to a different format and spit it back out to a file. If I take each line in as an array won't I lose my field names making my XSLT harder to build? What will the XSLT do with the "short" record?

                  Can you give me the full XSD you think I should use?
                  I'm only familiar with the GUI tool and haven't had a chance to try this yet because I'm not sure what your intent is.

                  Thanks.
                  • 6. Re: File Adapter skipping malformed records
                    561826
                    Craig_BPEL,

                    I'll be happy to share once I have something worth sharing. :)
                    • 7. Re: File Adapter skipping malformed records
                      Rahulgupta-Oracle
                      Mike,
                      Each line (except the header & footer) of your file will be treated as a cell of an array. You can define a complex type, which will represent the type of value in the cell of the array. This complex type is essentially you line items, which are tab delimited .. except the last element which is terminated by $EOL. Please refer to the chapter on creating Native XSD (chapter 7)in the Adapters guide at :
                      http://download-west.oracle.com/docs/cd/B31017_01/integrate.1013/b28994/toc.htm

                      If you still face any issue, please let me know.

                      HTH
                      Rahul
                      • 8. Re: File Adapter skipping malformed records
                        561826
                        Ok I've been trying several different ways with two different data files.

                        With the first file there's this strange effect that for every line, there's an additional newline. I opened the file with a hex editor to ensure there's no garbage in the file and can confirm that the newline characters are just the \r\n typical of a Windows text file. To compensate I doubled by number of header records which made the procedure happier. However it still failed on the Footer record.

                        Here's the error I'm getting:
                        [Line=10213, Col=18] Expected "\t" for the data starting at the specified position, while trying to read the data for "element with name Field3", using "style" as "terminated" and "terminatedBy" as "\t", but not found.
                        Ensure that "\t", exists for the data starting at the specified position.

                        That line begins with "LINE COUNT:". I'm not sure what to say about that.... It's looking for another Data Record, but it has clearly met the array terminator of LINE COUNT:. I've tried just using LINE as the terminator but it made no difference.


                        The second file is failing just after the first data record read. First off, I don't seem to get the double line count problem so I'm actually missing data records with the header count being doubled to compensate for the first file. I specified 10 header records so line 11 is the first parsed record and as the error indicates, failure is on line 12, column 1.

                        Here's the error:
                        [Line=12, Col=1] Expected either "${eol}" or "LINE COUNT:" at the specified position in the native data, while trying to read the data for "element with name DataRecord", using "style" as "array", "cellSeparatedBy" as "${eol}", and "arrayTerminatedBy" as "LINE COUNT:", but not found.
                        Ensure that either "${eol}" or "LINE COUNT:", exists at the specified position in the native data.



                        Finally here's a paste of the schema I'm using:

                        <?xml version="1.0" encoding="UTF-8" ?>

                        <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                        xmlns:nxsd="http://xmlns.oracle.com/pcbpel/nxsd"
                        targetNamespace="http://schemas.domain.dom/File_Read"
                        xmlns:tns="http://schemas.domain.dom/File_Read"
                        elementFormDefault="qualified"
                        attributeFormDefault="unqualified"
                        nxsd:encoding="ASCII"
                        nxsd:hasHeader="true"
                        nxsd:headerLines="10"
                        nxsd:headerLinesTerminatedBy="${eol}"
                        nxsd:stream="chars"
                        nxsd:version="NXSD">
                        <xsd:element name="FileFeed">
                        <xsd:complexType>
                        <xsd:sequence>
                        <xsd:element name="DataRecord" minOccurs="1" maxOccurs="unbounded" nxsd:style="array" nxsd:cellSeparatedBy="${eol}" nxsd:arrayTerminatedBy="LINE COUNT:">
                        <xsd:complexType>
                        <xsd:sequence>
                        <!-- Skip details of elements terminated by \t except last one by ${eol} -->
                        </xsd:sequence>
                        </xsd:complexType>
                        </xsd:element>
                        <xsd:element name="Footer" minOccurs="0" maxOccurs="1" type="xsd:string" nxsd:style="terminated" nxsd:terminatedBy="${eol}">
                        </xsd:element>
                        </xsd:sequence>
                        </xsd:complexType>
                        </xsd:element>
                        </xsd:schema>
                        • 9. Re: File Adapter skipping malformed records
                          Srimant-Oracle
                          Try This:

                          <?xml version="1.0" encoding="UTF-8" ?>

                          <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                          xmlns:nxsd="http://xmlns.oracle.com/pcbpel/nxsd"
                          targetNamespace="http://schemas.domain.dom/File_Read"
                          xmlns:tns="http://schemas.domain.dom/File_Read"
                          elementFormDefault="qualified"
                          attributeFormDefault="unqualified"
                          nxsd:encoding="ASCII"
                          nxsd:hasHeader="true"
                          nxsd:headerLines="6"
                          nxsd:headerLinesTerminatedBy="${eol}"
                          nxsd:stream="chars"
                          nxsd:version="NXSD">

                          <xsd:element name="FileFeed">
                               <xsd:complexType>
                                    <xsd:sequence>
                                         <xsd:element name="DataRecord" minOccurs="1" maxOccurs="unbounded" nxsd:style="array" nxsd:arrayTerminatedBy="LINE COUNT:">
                                              <xsd:complexType>
                                                   <xsd:sequence>
                                                        <xsd:element name="Data" type="xsd:string" nxsd:style="terminated" nxsd:terminatedBy="${eol}"/>
                                                   </xsd:sequence>
                                              </xsd:complexType>
                                         </xsd:element>
                                         <xsd:element name="Footer" minOccurs="0" maxOccurs="1" type="xsd:string" nxsd:style="terminated" nxsd:terminatedBy="${eol}"/>
                               </xsd:sequence>
                          </xsd:complexType>
                          </xsd:element>
                          </xsd:schema>

                          The only difference is that I have removed the nxsd:cellSeparatedBy from the DataRecord element. If you specify the cellSeparatedBy as ${eol} in DataRecord, then you should not specify the ${eol} in the nested element as the translator has already eaten up the ${eol} character.

                          Here's the output from the schema above

                          <FileFeed xmlns="http://schemas.domain.dom/File_Read">
                          <DataRecord>
                          <Data>20070227     User1     -1.0     21.0     3.01     -14.0     985.0     -6.132345     723.0     Open</Data>
                          </DataRecord>
                          <DataRecord>
                          <Data>20070227     User2     2.0     22.0     3.02     34.0     325.0     6.285413     732.0     Closed</Data>
                          </DataRecord>
                          <DataRecord>
                          <Data>20070227     User3     3.0     -32.0     3.03     -44.0     755.0     -6.384765     723.0     Pending</Data>
                          </DataRecord>
                          <DataRecord>
                          <Data>20070227     User4     -4.0     42.0     3.04     64.0     375.0     6.493753     732.0     Working</Data>
                          </DataRecord>
                          <DataRecord>
                          <Data>20070227     User5     5.0     52.0     3.05     94.0     355.0     -6.592375     723.0     On Hold</Data>
                          </DataRecord>
                          <DataRecord>
                          <Data>20070227     User6     6.0     -62.0     3.06     -84.0     235.0     6.691743     732.0     Pending</Data>
                          </DataRecord>
                          <DataRecord>
                          <Data>20070227     User7     7.0     72.0     3.07     54.0     855.0     -6.537594     723.0     On Hold</Data>
                          </DataRecord>
                          <DataRecord>
                          <Data>20070227     User8     -8.0     82.0     3.08     14.0     655.0     6.982459     732.0     Working</Data>
                          </DataRecord>
                          <DataRecord>
                          <Data>20070227     User9     9.0     -92.0     3.09     -24.0     155.0     -6.723474     723.0     Closed</Data>
                          </DataRecord>
                          <DataRecord>
                          <Data>20070227     User10     10.0     102.0     3.10     54.0     415.0     6.842475     732.0     Open</Data>
                          </DataRecord>
                          <DataRecord>
                          <Data>20070227     User11     -11.0     -12.0     3.02     -74.0     125.0     -6.225678     723.0     Closed</Data>
                          </DataRecord>
                          <Footer>     18     </Footer>
                          </FileFeed>
                          • 10. Re: File Adapter skipping malformed records
                            561826
                            Thanks Srimant this is good news.

                            By removing the cellsSeparatedBy="${eol}" that seems to have fixed the file that was erroring after reading the first data line.

                            I am still getting the same error from the file that is doubling up on the newlines.

                            My suspicion is that the extraneous newlines is actually a totally different problem not related to the schema so unless someone here knows why that might be happening I'm thinking I should post it as a separate topic.

                            Any ideas on why I'm getting the extra newlines anyone?
                            • 11. Re: File Adapter skipping malformed records
                              561826
                              Not sure how I missed it the first time around, but I found the double newline problem.

                              After being convinced that the double newline problem had to be a line ending encoding problem because there just weren't that many other options, I looked at the problem file again and saw that each newline was actually a 0D0D0A instead of just a 0D0A. The extra carriage return is what was causing the problem and after stripping it out the file processed just fine.

                              Thank you rahul and Srimant both for helping me out.