This discussion is archived
5 Replies Latest reply: May 25, 2012 1:23 PM by 938348 RSS

Error in Add/Replace Bulk Load component - illegal character in XML

938348 Newbie
Currently Being Moderated
Has anyone ever seen the bulk load component complain about some illegal character in xml? I see this error and not sure what exactly the problem is:

ERROR [SocketReader] - Received error message from server: Character is not legal in XML 1.0

It's a very simple graph - reading data from clover data file and ingesting it straight into Endeca using the out of the box bulk load component.

Thanks for your help!

Edited by: 935345 on May 18, 2012 11:48 AM
  • 1. Re: Error in Add/Replace Bulk Load component - illegal character in XML
    Frank Explorer
    Currently Being Moderated
    The MDEX Engine requires all data to be valid XML characters, so the Bulk Load Interface should not allow those non-XML characters to be ingested. In 2.2, it was a known issue that such non-XML characters were allowed to be ingested, which would cause errors at query time in the MDEX. This was fixed in 2.3, with the error that you saw.
  • 2. Re: Error in Add/Replace Bulk Load component - illegal character in XML
    938430 Newbie
    Currently Being Moderated
    To remove invalid characters from the input stream you can use following transformation.
    //#CTL2
    
    // Transforms input record into output record.
    function integer transform() {
       string regex = "([^\\u0009\\u000a\\u000d\\u0020-\\uD7FF\\uE000-\\uFFFD]|[\\u0092\\u007F]+)";
       $0.YourData = replace($YourData,regex,"");
    
       return ALL;
    }
    simple place it in a reformat component and you are good to go.
  • 3. Re: Error in Add/Replace Bulk Load component - illegal character in XML
    938348 Newbie
    Currently Being Moderated
    Thank you for providing this workaround. The issue is that I`m not sure exactly what field has this illegal character and the logging is not detailed enough to indicate that either. Is there any way to find out so that I can apply this transform to a specific field?
  • 4. Re: Error in Add/Replace Bulk Load component - illegal character in XML
    Alex F Newbie
    Currently Being Moderated
    Assuming you are on EID 2.3, this transformation will apply the fix to all your string fields and print on your console the fields that had non-compliant XML 1.0 characters.
    //#CTL2
    
    string[] fields;
    
    // Transforms input record into output record.
    function integer transform() {
         $out.0.* = $in.0.*;
    
         for(integer i = $in.0.length() - 1; i >=0 ; i--) {
              if (getFieldType($in.0.*, i) == "string" && getFieldType($out.0.*, i) == "string") {
                   if (!isNull($in.0.*, i)) {
                        string originalValue = getStringValue($in.0.*, i);
                        string newValue = originalValue.replace("([^\\u0009\\u000a\\u000d\\u0020-\\uD7FF\\uE000-\\uFFFD]|[\\u0092\\u007F]+)","");
                        if (originalValue != newValue) {
                             fields[i] = getFieldName($in.0, i);
                        }
                        setStringValue($out.0.*, i, newValue);
                   }
              }
         }
    
         return OK;
    }
    
    // Called during component initialization.
    // function boolean init() {}
    
    // Called during each graph run before the transform is executed. May be used to allocate and initialize resources
    // required by the transform. All resources allocated within this method should be released
    // by the postExecute() method.
    // function void preExecute() {}
    
    // Called only if transform() throws an exception.
    // function integer transformOnError(string errorMessage, string stackTrace) {}
    
    // Called during each graph run after the entire transform was executed. Should be used to free any resources
    // allocated within the preExecute() method.
    function void postExecute() {
         printErr("Fields with non-compliant XML 1.0 characters");
         for(integer i = 0; i < fields.length(); i++) {
              if (fields[i] != null) {
                   printErr(fields);
              }
         }
    }

    // Called to return a user-defined error message when an error occurs.
    // function string getMessage() {}
    -- Alex                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points