regex to remove undesired HTML tags from HTML page.
807589Sep 30 2008 — edited Oct 5 2008I'm looking for a regex that will remove ALL HTML tags except for a few that I'd like to put in a list such as: (P|H1|LI|<rest of list>). The regex would remove the < -tag stuff- > for those tags NOT in the list. Tags in the list can include blanks. For example: < P> or <LI >. The -tag stuff- can include any legal HTML including blanks. What I came up with that worked for listed tags that did NOT contain spaces: <[^(P|H1|LI)].*?>. This one fails to recognize and not remove: < P> (with a space) but does work with <P>.
I tried adding \\s* to skip optional leading spaces but this didn't work.