11

I'm hoping someone has already written this:

A servlet filter that can be configured with regular expression search/replace patterns and applies them to the HTML output.

Does such a thing exist?

4
  • What exactly do you want to change? The request URL or response body? Tuckey's UrlRewriteFilter is excellent, but it is intented to rewrite URL's (like as possible with well known Apache HTTPD's RewriteRule). To change the response body, you'll have to be more specific about the functional requirement. No such filter comes to mind, but this smells too much like sanitizing user-controlled input to prevent XSS. In such case, regex is absolutely the wrong tool for the job.
    – BalusC
    Commented Feb 16, 2011 at 0:12
  • I'm sorry I was unclear. I've edited the question to indicate that I want to modify the HTML output. Commented Feb 17, 2011 at 13:59
  • What exactly in the HTML output? Since using regex to parse and modify HTML is an extremely poor practice, no such filter was ever written. Please clarify the functional requirement more. Why would you need a filter for this? Why not just make changes straight in the view side? Etc.
    – BalusC
    Commented Feb 17, 2011 at 15:14
  • We have want to incorporate a vendor's JSP-based web application into our own through frames. We need to removed every target="_parent" from their output. They gave us only the compiled JSPs. I think the easiest way to make the change is to add a filter that modifies the output. Commented Feb 18, 2011 at 18:47

3 Answers 3

15

I couldn't find one, so I wrote one:

RegexFilter.java

package com.example;

import java.io.IOException;
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.Enumeration;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.regex.Pattern;

import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletResponse;

/**
 * Applies search and replace patterns. To initialize this filter, the
 * param-names should be "search1", "replace1", "search2", "replace2", etc.
 */
public final class RegexFilter implements Filter {
    private List<Pattern> searchPatterns;
    private List<String> replaceStrings;

    /**
     * Finds the search and replace strings in the configuration file. Looks for
     * matching searchX and replaceX parameters.
     */
    public void init(FilterConfig filterConfig) {
        Map<String, String> patternMap = new HashMap<String, String>();

        // Walk through the parameters to find those whose names start with
        // search
        Enumeration<String> names = (Enumeration<String>) filterConfig.getInitParameterNames();
        while (names.hasMoreElements()) {
            String name = names.nextElement();
            if (name.startsWith("search")) {
                patternMap.put(name.substring(6), filterConfig.getInitParameter(name));
            }
        }
        this.searchPatterns = new ArrayList<Pattern>(patternMap.size());
        this.replaceStrings = new ArrayList<String>(patternMap.size());

        // Walk through the parameters again to find the matching replace params
        names = (Enumeration<String>) filterConfig.getInitParameterNames();
        while (names.hasMoreElements()) {
            String name = names.nextElement();
            if (name.startsWith("replace")) {
                String searchString = patternMap.get(name.substring(7));
                if (searchString != null) {
                    this.searchPatterns.add(Pattern.compile(searchString));
                    this.replaceStrings.add(filterConfig.getInitParameter(name));
                }
            }
        }
    }

    public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
        // Wrap the response in a wrapper so we can get at the text after calling the next filter
        PrintWriter out = response.getWriter();
        CharResponseWrapper wrapper = new CharResponseWrapper((HttpServletResponse) response);
        chain.doFilter(request, wrapper);

        // Extract the text from the completed servlet and apply the regexes
        String modifiedHtml = wrapper.toString();
        for (int i = 0; i < this.searchPatterns.size(); i++) {
            modifiedHtml = this.searchPatterns.get(i).matcher(modifiedHtml).replaceAll(this.replaceStrings.get(i));
        }

        // Write our modified text to the real response
        response.setContentLength(modifiedHtml.getBytes().length);
        out.write(modifiedHtml);
        out.close();
    }

    public void destroy() {
        this.searchPatterns = null;
        this.replaceStrings = null;
    }
}

CharResponseWrapper.java

package com.example;

import java.io.CharArrayWriter;
import java.io.PrintWriter;

import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpServletResponseWrapper;

/**
 * Wraps the response object to capture the text written to it.
 */
public class CharResponseWrapper extends HttpServletResponseWrapper {
    private CharArrayWriter output;

    public CharResponseWrapper(HttpServletResponse response) {
        super(response);
        this.output = new CharArrayWriter();
    }

    public String toString() {
        return output.toString();
    }

    public PrintWriter getWriter() {
        return new PrintWriter(output);
    }
}

Example web.xml

<web-app>
    <filter>
      <filter-name>RegexFilter</filter-name>
      <filter-class>com.example.RegexFilter</filter-class>
      <init-param><param-name>search1</param-name><param-value><![CDATA[(<\s*a\s[^>]*)(?<=\s)target\s*=\s*(?:'_parent'|"_parent"|_parent|'_top'|"_top"|_top)]]></param-value></init-param>
      <init-param><param-name>replace1</param-name><param-value>$1</param-value></init-param>
    </filter>
    <filter-mapping>
      <filter-name>RegexFilter</filter-name>
      <url-pattern>/*</url-pattern>
    </filter-mapping>
</web-app>
2
  • Awesome stuff, just used this to help me solve a similar issue! Commented Jun 12, 2012 at 18:59
  • 1
    I would recommend an out.flush() before the out.close() to prevent errors like these: java.net.ProtocolException: Didn't meet stated Content-Length, wrote: '27026' bytes instead of stated: '27023' bytes.
    – rudolfv
    Commented Apr 4, 2014 at 14:02
5

I am not sure if this is what looking for, but there is a URL rewrite filter. It supports regex. Please see here http://www.tuckey.org/urlrewrite/

Hope this helps.

1
2

SiteMesh is popular for this type of work.


SiteMesh has moved into a standalone Project: http://www.sitemesh.org/

Not the answer you're looking for? Browse other questions tagged or ask your own question.