This discussion is archived
1 2 Previous Next 19 Replies Latest reply: Jul 30, 2010 7:29 PM by EJP RSS

Detect special character from a String

800332 Newbie
Currently Being Moderated
Hello guys,

Some of the files in unix box contain special characters . I need to strip these special character in the filename to underscore. however, I do not want to strip off special character like _ - ^ . @

Pattern escaper = Pattern.compile("([^a-zA-z0-9_-^.@])");

I couldn't strip off the caret. Appreciate any advice please. Thanks in advance!

Cheers,
Mark
  • 1. Re: Detect special character from a String
    807580 Newbie
    Currently Being Moderated
    I don't see how that regex can do anything sensible. You need to get a better understanding of regex before going any further. In particular you need to understand character classes. See [http://www.regular-expressions.info/|http://www.regular-expressions.info/].
  • 2. Re: Detect special character from a String
    800332 Newbie
    Currently Being Moderated
    My function

         
    public static String removeSpecialChar(String str) throws IOException {
              Pattern pattern = Pattern.compile("([^a-zA-z0-9\\/:@._-])");
              Matcher matcher = pattern.matcher(str);
              if(!matcher.matches()){
                   String str1 = matcher.replaceAll("_"); 
                   return str1;
              } else {
                   System.out.println("no match");
                   return null;
              }
         }
    when run it, i got the following result:
    C:\workspace\TestOnly\bin>java -cp . com.test.TestClass C:\workspace\TestOnly\1^2.txt
    str1 = C:\workspace\TestOnly\12.txt
    appreciate any advice please. Thanks in advance!

    Cheers,
    Mark
  • 3. Re: Detect special character from a String
    807580 Newbie
    Currently Being Moderated
    You don't need to test for a match. In fact your logic is screwy but might just happen to work. All you need to do is the replaceAll() on the matcher without the matcher.matches() test.

    If you are going to compile the pattern each time then you can just use
    return str.replaceAll("[^a-zA-z0-9\\/:@._-]","_");
  • 4. Re: Detect special character from a String
    EJP Guru
    Currently Being Moderated
    The period, hyphen, and ^ are all metacharacters so if you want to include them in the character set you have to escape them:
    "[^a-zA-z0-9\\/:@\._\-\^]"
    In the case of the hyphen you can move it to the front instead:
    "[^-a-zA-z0-9\\/:@\._\^]"
  • 5. Re: Detect special character from a String
    800332 Newbie
    Currently Being Moderated
    Hi Sabre,

    My rename function are like below:
         public static void renameFolder(String from, String to) throws IOException{
              // File (or directory) with old name 
              File file1 = new File(from); 
              // File (or directory) with new name 
              File file2 = new File(to); 
              // Rename file (or directory) 
              boolean success = file1.renameTo(file2); 
              if (!success) { // File was not successfully renamed 
                   System.out.println("failed to rename : " + file1);
              }
         }
    
    
         public static void main(String[] args) throws IOException {
              renameFolder(args[0], args[0].replaceAll("[^a-zA-z0-9\\/:@._-]","_"));
         }
    when I run it, I got below error:
    C:\workspace\TestOnly\bin>java -cp . com.test.TestClass C:\workspace\TestOnly\1^2.txt
    failed to rename : C:\workspace\TestOnly\12.txt
    I need the caret not to be replaced. Appreciate your further advice please. Thanks in advance!

    Cheers,
    Mark Thien
  • 6. Re: Detect special character from a String
    807580 Newbie
    Currently Being Moderated
    kmthien wrote:
    I need the caret not to be replaced.
    Then include it in the set of characters! Note - a '^' at the start the set is a meta character and means 'not' so you will need to follow ejp's advice in reply #4.
  • 7. Re: Detect special character from a String
    YoungWinston Expert
    Currently Being Moderated
    kmthien wrote:
    when I run it, I got below error:
    C:\workspace\TestOnly\bin>java -cp . com.test.TestClass C:\workspace\TestOnly\1^2.txt
    failed to rename : C:\workspace\TestOnly\12.txt
    Well this looks to me like a message produced by your program, possibly due to the fact that it can't do the rename rather than any problem with your replacement.
    I need the caret not to be replaced. Appreciate your further advice please. Thanks in advance!
    From the look of the message it was. However:
    1. Why are you replacing all these characters with "_"? Unless this is yet another Java regex metacharacter I'm not aware of, seems to me you'll get some weird results.
    2. What is '\V'? I can't see any reference to it in the Pattern docs.
    3. To the best of my knowledge (and again I've been caught out with Java peculiarities before, so I may be wrong) '^' is only a character class metacharacter if it is the first character after the '['. Similarly, '-' is only a metacharacter if surrounded by other characters, so it can go either at the start or the end of the '[....]' (here you're using '^' to mean "not", so it'll have to go at the end).

    Winston
  • 8. Re: Detect special character from a String
    807580 Newbie
    Currently Being Moderated
    YoungWinston wrote:
    3. To the best of my knowledge (and again I've been caught out with Java peculiarities before, so I may be wrong) '^' is only a character class metacharacter if it is the first character after the '['.
    Correct.
  • 9. Re: Detect special character from a String
    800332 Newbie
    Currently Being Moderated
    my objective is to replace all the characters with underscore except 0-9 a-z A-Z @ ^ _ . - \ / :

    i really don't know what am I doing wrong here:
    replaceAll("[^a-zA-z0-9\\/:@._-^]","_") 
    when I put the caret at the end as you can see above, I got below exception:
    Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal character range near index 18
    [^a-zA-z0-9\/:@._-^]
                      ^
            at java.util.regex.Pattern.error(Unknown Source)
            at java.util.regex.Pattern.range(Unknown Source)
            at java.util.regex.Pattern.clazz(Unknown Source)
            at java.util.regex.Pattern.sequence(Unknown Source)
            at java.util.regex.Pattern.expr(Unknown Source)
            at java.util.regex.Pattern.compile(Unknown Source)
            at java.util.regex.Pattern.<init>(Unknown Source)
            at java.util.regex.Pattern.compile(Unknown Source)
            at java.lang.String.replaceAll(Unknown Source)
            at com.test.TestClass.main(TestClass.java:39)
  • 10. Re: Detect special character from a String
    807580 Newbie
    Currently Being Moderated
    Since in a character class the '-' char is a meta character used to specify a range then when one wants to include it in the character class then one can escape it or place it at the beginning or at the end. So you have to make the '-' char the last char and not the '^' .
  • 11. Re: Detect special character from a String
    EJP Guru
    Currently Being Moderated
    What you are doing wrong here is ignoring my advice. I even gave you the correct regex, apart from omitting the double \ the compiler needs. A rare event. Don't squander the opportunity.
  • 12. Re: Detect special character from a String
    807580 Newbie
    Currently Being Moderated
    YoungWinston wrote:
    2. What is '\V'? I can't see any reference to it in the Pattern docs.
    I think what you're looking at is a pair of backslashes followed by a forward slash. In fact, it should be four backslashes, not two. The two backslashes become one when the Java code is compiled, and that backslash escapes the forward slash, which doesn't need it. Four backslashes will become two, which will be treated as an escaped backslash.

    There's also a typo in the original regex, which has been faithfully reproduced in every reply. The second range, A-z, matches all of the letters, uppercase and lowercase, plus several punctuation characters whose code points happen to lie between the two letter ranges. Here's how I believe the regex should look:
    "[^a-zA-Z0-9\\\\/:@._^-]"
    The caret and the hyphen are "escaped" by their positions, the backslash is correctly double-escaped, and none of the other characters need escaping in a character class (not even the dot).
  • 13. Re: Detect special character from a String
    YoungWinston Expert
    Currently Being Moderated
    uncle_alice wrote:
    YoungWinston wrote:
    2. What is '\V'? I can't see any reference to it in the Pattern docs.
    I think what you're looking at is a pair of backslashes followed by a forward slash.
    Doh-h! Of course it is. :-)

    Progress may not be the best language ever created, but the designers did have the good sense to use '~' as its string escape character (ie, the one recognized by its string processors). Saves so much grief when you're writing regexes.

    Winston
  • 14. Re: Detect special character from a String
    800332 Newbie
    Currently Being Moderated
    Hi Alice,

    I tried your suggestion:
         public static void main(String[] args) throws IOException {
              rename(args[0], args[0].replaceAll("[^a-zA-Z0-9\\\\/:@._^-]","_"));
         }
    
         public static void rename(String from, String to) throws IOException{
              // File (or directory) with old name 
              File file1 = new File(from); 
              // File (or directory) with new name 
              File file2 = new File(to); 
              // Rename file (or directory) 
              boolean success = file1.renameTo(file2); 
              if (!success) { // File was not successfully renamed 
                   System.out.println("failed to rename : " + file1);
              }
         }
    
    C:\workspace\TestOnly\bin>java -cp . com.test.TestClass C:\Apps\mud\1~2.txt
    
    C:\workspace\TestOnly\bin>java -cp . com.test.TestClass C:\Apps\mud\1^2.txt
    failed to rename : C:\Apps\mud\12.txt
    
    C:\workspace\TestOnly\bin>java -cp . com.test.TestClass C:\Apps\mud\1@2.txt
    
    C:\workspace\TestOnly\bin>java -cp . com.test.TestClass C:\Apps\mud\1@2$-_3mn.txt
    
    C:\workspace\TestOnly\bin>java -cp . com.test.TestClass C:\Apps\mud\1@2$-_3mn.txt
    failed to rename : C:\Apps\mud\1@2$-_3mn.txt
    
    C:\workspace\TestOnly\bin>java -cp . com.test.TestClass C:\Apps\mud\1@2$-_3m'n.txt
    
    C:\workspace\TestOnly\bin>java -cp . com.test.TestClass C:\Apps\mud\1@2$-_3m'^n.txt
    failed to rename : C:\Apps\mud\1@2$-_3m'n.txt
    
    C:\workspace\TestOnly\bin>java -cp . com.test.TestClass C:\Apps\mud\1@2$-_3m'^n.txt
    but however, whenever I put in a caret in the filename, it always failed to rename. Appreciate your further advice please. thanks in advance!

    Cheers,
    Mark
1 2 Previous Next