This content has been marked as final. Show 3 replies
Short answer: too many forward-slashes, not enough backslashes.
PHP takes Perl-style regex literals, which use '/' as their (default) delimiter, and reproduces them inside PHP string literals. It also retains the Perl-style modifiers, meaning the 'i', 's', 'm', etc., following the closing delimiter. In Java, you drop the regex-specific delimiters, and replace the modifiers with symbolic constants like Pattern.CASE_INSENSITIVE, which are passed as the second argument to Pattern.compile() factory method. They can also be included in the regex itself in the form of inline modifiers, like (?i).
The other big difference is that PHP is much more lenient than Java when it comes to backslashes in string literals. In a Java string literal, any backslash that isn't part of a recognized escape sequence like \t, \\ or \", is flagged a an error. That means, in order to use a regex escape sequence like \w, you have to escape the backslash to get it through the string literal.
So, for the most part, converting PHP regexes to Java means dropping the forward slashes (or whatever other regex delimiter you were using) and doubling all the backslashes. The exception is the double-quote character; it still has to be escaped, and you only use one backslash. Here's a first cut at translating your regexes.
Acknowledgements to Jeffrey Friedl, with special thanks for adding a PHP chapter to the third edition of The Book. ^_^
String urlRegex = "a\\s+[^>]*?class=l\\s+[^>]*?href\\s?=[\\s'\"]+(.*?)['\"]+.*?>[^<]*</a>"; String mp3Regex = "(?i)a\\s+[^>]*?href\\s?=[\\s'\"]+(.*?(mp3))['\"]+.*?>[^<]*</a>";