0

I am using Apache 2.4.59 under Debian as reverse proxy. I can't make it rewrite links in HTML (at all), and I tried everything I could find on various forums: SetOutputFilter, AddOutputFilter, inflate;proxy-html;deflate, specifying extra ProxyHTMLLinks etc. Nothing works to rewrite links inside the HTML.

I now created a fully self-contained MWE (apache2 config, Makefile to run the server and curl to fetch the page through the proxy), here: https://github.com/eudoxos/rproxy .

The apache config contains:

ProxyRequests Off
ProxyPass /proxied/ http://localhost:8080/
ProxyPassReverse /proxied/ http://localhost:8080/
<Location /proxied/>
   ProxyHTMLEnable On
   ProxyHTMLLinks link href
   AddOutputFilterByType inflate;proxy-html;substitute;deflate text/html
   ProxyHTMLURLMap ^/ /proxied/
   Substitute "s@Title@REPLACED TITLE@"
</Location>

where substitute tests that filter machinery is engaged.

The simple index.html

<!DOCTYPE HTML><HTML><head><meta charset="utf-8"><link rel="stylesheet" href="/style.css"><title>Main page</title></head><body><h1>Title</h1></body></HTML>

is returned with <h1>REPLACED TITLE</h1>, but <link … href="/style.css"> is intact (should become <link … href="/proxied/style.css">).

Analyzing the log output, I see the filters being run in order on the proxied index.html:

  1. inflate:

    [filter:trace4] Content-Type 'text/html' ... 
    [filter:trace4] ... matched 'text/html' 
    [filter:trace2] Content-Type condition for 'inflate' matched 
    
  2. proxy-html (but NO ACTION HAPPENS — why?):

    [xml2enc:debug] AH01430: Content-Type is text/html 
    [xml2enc:debug] AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate 
    [xml2enc:debug] AH01439: xml2enc: consuming 156 bytes from bucket 
    [xml2enc:debug] AH01441: xml2enc: converted 156/156 bytes 
    [filter:trace4] Content-Type 'text/html;charset=utf-8' ... 
    [filter:trace4] ... matched 'text/html' 
    [filter:trace2] Content-Type condition for 'proxy-html' matched 
    
  3. substitute (replaces title via regex)

    [filter:trace4] Content-Type 'text/html;charset=utf-8' ... 
    [filter:trace4] ... matched 'text/html' 
    [filter:trace2] Content-Type condition for 'substitute' matched 
    [substitute:trace8] Line read (140 bytes): <html><head><meta charset="utf-8"><link rel="stylesheet" href="/style.css"><title>Main page</title></head><body><h1>Title</h1></body></html> 
    [substitute:trace8] Replacing regex:'Title' by 'REPLACED TITLE' 
    [substitute:trace8] Matching found 
    [substitute:trace8] Result: 'REPLACED TITLE' 
    
  4. deflate:

    [filter:trace4] Content-Type 'text/html;charset=utf-8' ... 
    [filter:trace4] ... matched 'text/html' 
    [filter:trace2] Content-Type condition for 'deflate' matched
    

You are welcome to run the test yourself locally. Any contribution/idea is appreciated.

1 Answer 1

0

After inserting tracing logs into mod_proxy_html.c, the very simple reason is that:

ProxyHTMLURLMap ^/ /proxied/ R

needs the R flag (as "regex"). After that, everything works.

PS: the regex should be ^/(?!/) so that protocol-relative URLs (starting with //) are not matched.

May this serve those who face the same issue in the future.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .