5 Replies Latest reply: May 25, 2012 3:23 PM by 938348 RSS

    Error in Add/Replace Bulk Load component - illegal character in XML

    938348
      Has anyone ever seen the bulk load component complain about some illegal character in xml? I see this error and not sure what exactly the problem is:

      ERROR [SocketReader] - Received error message from server: Character is not legal in XML 1.0

      It's a very simple graph - reading data from clover data file and ingesting it straight into Endeca using the out of the box bulk load component.

      Thanks for your help!

      Edited by: 935345 on May 18, 2012 11:48 AM
        • 1. Re: Error in Add/Replace Bulk Load component - illegal character in XML
          Frank
          The MDEX Engine requires all data to be valid XML characters, so the Bulk Load Interface should not allow those non-XML characters to be ingested. In 2.2, it was a known issue that such non-XML characters were allowed to be ingested, which would cause errors at query time in the MDEX. This was fixed in 2.3, with the error that you saw.
          • 2. Re: Error in Add/Replace Bulk Load component - illegal character in XML
            938430
            To remove invalid characters from the input stream you can use following transformation.
            //#CTL2
            
            // Transforms input record into output record.
            function integer transform() {
               string regex = "([^\\u0009\\u000a\\u000d\\u0020-\\uD7FF\\uE000-\\uFFFD]|[\\u0092\\u007F]+)";
               $0.YourData = replace($YourData,regex,"");
            
               return ALL;
            }
            simple place it in a reformat component and you are good to go.
            • 3. Re: Error in Add/Replace Bulk Load component - illegal character in XML
              938348
              Thank you for providing this workaround. The issue is that I`m not sure exactly what field has this illegal character and the logging is not detailed enough to indicate that either. Is there any way to find out so that I can apply this transform to a specific field?
              • 4. Re: Error in Add/Replace Bulk Load component - illegal character in XML
                Alex F
                Assuming you are on EID 2.3, this transformation will apply the fix to all your string fields and print on your console the fields that had non-compliant XML 1.0 characters.
                //#CTL2
                
                string[] fields;
                
                // Transforms input record into output record.
                function integer transform() {
                     $out.0.* = $in.0.*;
                
                     for(integer i = $in.0.length() - 1; i >=0 ; i--) {
                          if (getFieldType($in.0.*, i) == "string" && getFieldType($out.0.*, i) == "string") {
                               if (!isNull($in.0.*, i)) {
                                    string originalValue = getStringValue($in.0.*, i);
                                    string newValue = originalValue.replace("([^\\u0009\\u000a\\u000d\\u0020-\\uD7FF\\uE000-\\uFFFD]|[\\u0092\\u007F]+)","");
                                    if (originalValue != newValue) {
                                         fields[i] = getFieldName($in.0, i);
                                    }
                                    setStringValue($out.0.*, i, newValue);
                               }
                          }
                     }
                
                     return OK;
                }
                
                // Called during component initialization.
                // function boolean init() {}
                
                // Called during each graph run before the transform is executed. May be used to allocate and initialize resources
                // required by the transform. All resources allocated within this method should be released
                // by the postExecute() method.
                // function void preExecute() {}
                
                // Called only if transform() throws an exception.
                // function integer transformOnError(string errorMessage, string stackTrace) {}
                
                // Called during each graph run after the entire transform was executed. Should be used to free any resources
                // allocated within the preExecute() method.
                function void postExecute() {
                     printErr("Fields with non-compliant XML 1.0 characters");
                     for(integer i = 0; i < fields.length(); i++) {
                          if (fields[i] != null) {
                               printErr(fields);
                          }
                     }
                }

                // Called to return a user-defined error message when an error occurs.
                // function string getMessage() {}
                -- Alex