Skip to Main Content

Java Programming

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

regex to remove undesired HTML tags from HTML page.

807589Sep 30 2008 — edited Oct 5 2008
I'm looking for a regex that will remove ALL HTML tags except for a few that I'd like to put in a list such as: (P|H1|LI|<rest of list>). The regex would remove the < -tag stuff- > for those tags NOT in the list. Tags in the list can include blanks. For example: < P> or <LI >. The -tag stuff- can include any legal HTML including blanks. What I came up with that worked for listed tags that did NOT contain spaces: <[^(P|H1|LI)].*?>. This one fails to recognize and not remove: < P> (with a space) but does work with <P>.
I tried adding \\s* to skip optional leading spaces but this didn't work.

Comments

Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Nov 2 2008
Added on Sep 30 2008
21 comments
1,999 views