This discussion is archived
5 Replies Latest reply: Sep 24, 2010 1:22 AM by Jörg RSS

Replace all control characters

pedro.riky Newbie
Currently Being Moderated
Hi all i have a file from ovms and i want to replace all the control characters because when i parse it i get an error:
An invalid XML character (Unicode: 0xc) was found in the element content of the document.


I try new String( blobStampe,Charset.forName("UTF-8").name())
and

new String( blobStampe,Charset.forName("UTF-8").name()).replaceAll("/xh}", "")

and

new String( blobStampe,Charset.forName("UTF-8").name()).replaceAll("\\xh}", "")

and
new String( blobStampe,Charset.forName("UTF-8").name()).replaceAll("\\{ctrl\\}", "");


but don't works!!
  • 1. Re: Replace all control characters
    843810 Newbie
    Currently Being Moderated
    str = str.replaceAll("\\p{Cntrl}+", "");
  • 2. Re: Replace all control characters
    pedro.riky Newbie
    Currently Being Moderated
    Hi thanks it works fine, but i try and is not a good solution for me because by this way i lose all text format.
    I get this error
    (Unicode: 0x8) was found in the element content of the document

    but i don't understand what character is 0x8.


    Any idea to replace it?
  • 3. Re: Replace all control characters
    DrClap Expert
    Currently Being Moderated
    That one is a backspace character. However instead of trying to clean up the XML document (which is not well-formed due to the presence of illegal characters like that one) it's better to send it back to whoever created it, and advise them of the problem. Or file a bug report against the software that produced it.

    If removing illegal characters causes the document to "lose all text format" (whatever that may mean in your application), then that suggests that somebody designed the XML format to use illegal characters to format text in some way. That person should be informed of the problem and asked to redesign the XML format to not use illegal characters.
  • 4. Re: Replace all control characters
    pedro.riky Newbie
    Currently Being Moderated
    I serach andd search and i don't find a definitive solution i try this:

    class UnicodeRewriter
    {
    
         private Pattern pattern;
         private Matcher matcher;
    
         /**     * Constructs a rewriter using the given regular expression;     * the syntax is the same as for 'Pattern.compile'.     */
         public UnicodeRewriter(String regularExpression)
         {
              this.pattern = Pattern.compile(regularExpression);
         }
    
         public UnicodeRewriter()
         {
              this("\\\\u([0-9a-fA-F]{4})");
         }
    
         /**     * Returns the input subsequence captured by the given group     * during the previous match operation.     */
         public String group(int i)
         {
              return matcher.group(i);
         }
    
         /**     * Overridden to compute a replacement for each match. Use     * the method 'group' to access the captured groups.     */
         public String replacement()
         {
              return Character.toString((char) Integer.parseInt(group(1), 16));
    
         }
    
         /**     * Returns the result of rewriting 'original' by invoking the method 'replacement' for each match of the regular expression supplied to the constructor.     */
         public String rewrite(CharSequence original)
         {
              return rewrite(original, new StringBuffer(original.length())).toString();
         }
    
         /**     * Returns the result of appending the rewritten 'original' to 'destination'.     * We have to use StringBuffer rather than the more obvious and general Appendable because of Matcher's interface (Sun bug 5066679).     * Most users will prefer the single-argument rewrite, which supplies a temporary StringBuffer itself.     */
         public StringBuffer rewrite(CharSequence original, StringBuffer destination)
         {
              this.matcher = pattern.matcher(original);
              while (matcher.find())
              {
                   matcher.appendReplacement(destination, "");
                   destination.append(replacement());
              }
              matcher.appendTail(destination);
              return destination;
         }
    
    }
    but not works.

    My string is


    Any idea?
  • 5. Re: Replace all control characters
    Jörg Explorer
    Currently Being Moderated
    My string is
    If you check your messages before posting [Preview], you might speed up the assistance.
    Code in the form of an SSCCE has the same effect.