Fastest method to escape HTML tags as HTML entities?

Question

I'm writing a Chrome extension that involves doing a lot of the following job: sanitizing strings that might contain HTML tags, by converting <, > and & to <, > and &, respectively.

(In other words, the same as PHP's htmlspecialchars(str, ENT_NOQUOTES) – I don't think there's any real need to convert double-quote characters.)

This is the fastest function I have found so far:

function safe_tags(str) {
    return str.replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;') ;
}

But there's still a big lag when I have to run a few thousand strings through it in one go.

Can anyone improve on this? It's mostly for strings between 10 and 150 characters, if that makes a difference.

(One idea I had was not to bother encoding the greater-than sign – would there be any real danger with that?)

Why? In most cases that you want to do this, you want to insert the data into the DOM, in which case you should forget about escaping it and just make a textNode from it. — Quentin, Commented Mar 31, 2011 at 11:30
@David Dorward: perhaps he wanted to sanitize POST data, and the server does not round-trip the data correctly. — Lie Ryan, Commented Mar 31, 2011 at 11:35
@Lie — if so, then the solution is "For Pete's sake, fix the server as you have a big XSS hole" — Quentin, Commented Mar 31, 2011 at 13:12
@David Dorward: it is possible that the case is he do not have control over the server. I've been into such situation recently where I was writing a greasemonkey script to workaround a couple of things I don't like in my university's website; I had to do a POST on a server that I do not have control to and sanitize POST data using javascript (since the raw data comes from a rich textbox, and so has heaps of html tags which does not do round trip on the server). The web admin was ignoring my request for them to fix the website, so I had no other choice. — Lie Ryan, Commented Mar 31, 2011 at 13:40
I have a use-case where I need to display an error message in a div. The error message can contain HTML and newlines. I want to escape the HTML and replace the newlines with <br>. Then put the result into a div for display. — mozey, Commented Jul 29, 2013 at 9:09

Kevin Reilly · Accepted Answer · 2015-05-28 20:24:54Z

135

Here's one way you can do this:

var escape = document.createElement('textarea');
function escapeHTML(html) {
    escape.textContent = html;
    return escape.innerHTML;
}

function unescapeHTML(html) {
    escape.innerHTML = html;
    return escape.textContent;
}

Here's a demo.

edited May 28, 2015 at 20:24

Kevin Reilly

6,1942 gold badges25 silver badges19 bronze badges

answered Feb 12, 2012 at 17:58

Web_Designer

73.9k93 gold badges208 silver badges266 bronze badges

Redesigned the demo. Here's a fullscreen version: jsfiddle.net/Daniel_Hug/qPUEX/show/light
– Web_Designer
Commented May 2, 2013 at 15:25
19

Not sure how/what/why - but this is genius.
– rob_james
Commented Jun 18, 2014 at 12:12
5

Looks like it is leveraging the TextArea element's existing code for escaping literal text. Very nice, I think this little trick is going to find another home.
– Ajax
Commented Jan 4, 2016 at 8:41
3

@jazkat I'm not using that function. The escape variable I use, I define myself in the example.
– Web_Designer
Commented Jul 4, 2017 at 0:08
2

but does this lose white space etc.
– Andrew
Commented Jan 14, 2018 at 19:41

| Show 3 more comments

Martijn · Accepted Answer · 2011-03-31 12:32:04Z

101

You could try passing a callback function to perform the replacement:

var tagsToReplace = {
    '&': '&amp;',
    '<': '&lt;',
    '>': '&gt;'
};

function replaceTag(tag) {
    return tagsToReplace[tag] || tag;
}

function safe_tags_replace(str) {
    return str.replace(/[&<>]/g, replaceTag);
}

Here is a performance test: http://jsperf.com/encode-html-entities to compare with calling the replace function repeatedly, and using the DOM method proposed by Dmitrij.

Your way seems to be faster...

Why do you need it, though?

edited Mar 31, 2011 at 12:32

answered Mar 31, 2011 at 12:26

Martijn

13.6k4 gold badges49 silver badges59 bronze badges

2

There is no need to escape >.
– user142019
Commented Mar 10, 2013 at 13:50
8

Actually if you put the escaped value in an html element's attribute, you need to escape the > symbol. Otherwise it would break the tag for that html element.
– Zlatin Zlatev
Commented Oct 7, 2013 at 15:42
2

In normal text escaped characters are rare. It's better to call replace only when needed, if you care about max speed: if (/[<>&"]/.test(str) { ... }
– Vitaly
Commented Oct 26, 2014 at 4:22
7

@callum: No. I am not interested in enumerating cases in which I think "something could go wrong" (not least because it's the unexpected/forgotten cases that'll hurt you, and when you least expect it at that). I am interested in coding to standards (so the unexpected/forgotten cases can't hurt you by definition). I can't stress how important this is. > is a special character in HTML, so escape it. Simple as that. :)
– Lightness Races in Orbit
Commented Jul 20, 2015 at 15:30
4

@LightnessRacesinOrbit It's relevant because the question is what is the fastest possible method. If it's possible to skip the > replacement, that would make it faster.
– callum
Commented Jul 20, 2015 at 17:37

| Show 7 more comments

Aram Kocharyan · Accepted Answer · 2012-11-24 04:24:21Z

31

Martijn's method as a prototype function:

String.prototype.escape = function() {
    var tagsToReplace = {
        '&': '&amp;',
        '<': '&lt;',
        '>': '&gt;'
    };
    return this.replace(/[&<>]/g, function(tag) {
        return tagsToReplace[tag] || tag;
    });
};

var a = "<abc>";
var b = a.escape(); // "&lt;abc&gt;"

answered Nov 24, 2012 at 4:24

Aram Kocharyan

20.4k11 gold badges83 silver badges98 bronze badges

13

Add to String like this it should be escapeHtml since it's not an escaping for a String in general. That is String.escapeHtml is correct, but String.escape raises the question, "escape for what?"
– L. Cornelius Dol
Commented Mar 13, 2014 at 3:12
3

Yeah good idea. I've moved away from extending the prototype these days to avoid conflicts.
– Aram Kocharyan
Commented Mar 13, 2014 at 23:34
1

If your browser has support for Symbol, you could use that instead to avoid polluting the string-key namespace. var escape = new Symbol("escape"); String.prototype[escape] = function(){ ... }; "text"[escape]();
– Ajax
Commented Jan 4, 2016 at 8:58
plus one for the example.
– Timo
Commented Sep 30, 2020 at 18:12

Add a comment |

Todd · Accepted Answer · 2019-03-09 20:50:53Z

22

An even quicker/shorter solution is:

escaped = new Option(html).innerHTML

This is related to some weird vestige of JavaScript whereby the Option element retains a constructor that does this sort of escaping automatically.

Credit to https://github.com/jasonmoo/t.js/blob/master/t.js

answered Mar 9, 2019 at 20:50

Todd

2212 silver badges2 bronze badges

5

Neat one-liner but the slowest method after regex. Also, the text here can have whitespace stripped, according to the spec
– ShortFuse
Commented Jan 6, 2020 at 19:25
Note that @ShortFuse's "slowest method" link makes my system run out of RAM (with ~6GB free) and firefox seems to stop allocating just before it's out of memory so instead of killing the offending process, linux will sit there and let you do a hard power off.
– Luc
Commented Jul 11, 2020 at 9:09

Add a comment |

Community · Accepted Answer · 2017-05-23 12:26:33Z

16

The fastest method is:

function escapeHTML(html) {
    return document.createElement('div').appendChild(document.createTextNode(html)).parentNode.innerHTML;
}

This method is about twice faster than the methods based on 'replace', see http://jsperf.com/htmlencoderegex/35 .

Source: https://stackoverflow.com/a/17546215/698168

edited May 23, 2017 at 12:26

CommunityBot

11 silver badge

answered Jun 19, 2015 at 5:38

Julien Kronegg

5,1411 gold badge50 silver badges63 bronze badges

JSPerf shut-down in 2017, unfortunately - can you repost it to jsbench.me ?
– Dai
Commented Nov 3, 2022 at 2:01
@Dai : unfortunately I cant repost it as I'm not the benchmark author.
– Julien Kronegg
Commented Nov 3, 2022 at 8:20

Add a comment |

Kevin Hakanson · Accepted Answer · 2015-05-31 13:28:55Z

The AngularJS source code also has a version inside of angular-sanitize.js.

var SURROGATE_PAIR_REGEXP = /[\uD800-\uDBFF][\uDC00-\uDFFF]/g,
    // Match everything outside of normal chars and " (quote character)
    NON_ALPHANUMERIC_REGEXP = /([^\#-~| |!])/g;
/**
 * Escapes all potentially dangerous characters, so that the
 * resulting string can be safely inserted into attribute or
 * element text.
 * @param value
 * @returns {string} escaped text
 */
function encodeEntities(value) {
  return value.
    replace(/&/g, '&amp;').
    replace(SURROGATE_PAIR_REGEXP, function(value) {
      var hi = value.charCodeAt(0);
      var low = value.charCodeAt(1);
      return '&#' + (((hi - 0xD800) * 0x400) + (low - 0xDC00) + 0x10000) + ';';
    }).
    replace(NON_ALPHANUMERIC_REGEXP, function(value) {
      return '&#' + value.charCodeAt(0) + ';';
    }).
    replace(/</g, '&lt;').
    replace(/>/g, '&gt;');
}

Wow, that non-alphanum regex is intense. I don't think the | in the expression is needed though. — Ajax, Commented Jan 4, 2016 at 9:14

baptx · Accepted Answer · 2016-08-07 18:34:16Z

10

All-in-one script:

// HTML entities Encode/Decode

function htmlspecialchars(str) {
    var map = {
        "&": "&amp;",
        "<": "&lt;",
        ">": "&gt;",
        "\"": "&quot;",
        "'": "&#39;" // ' -> &apos; for XML only
    };
    return str.replace(/[&<>"']/g, function(m) { return map[m]; });
}
function htmlspecialchars_decode(str) {
    var map = {
        "&amp;": "&",
        "&lt;": "<",
        "&gt;": ">",
        "&quot;": "\"",
        "&#39;": "'"
    };
    return str.replace(/(&amp;|&lt;|&gt;|&quot;|&#39;)/g, function(m) { return map[m]; });
}
function htmlentities(str) {
    var textarea = document.createElement("textarea");
    textarea.innerHTML = str;
    return textarea.innerHTML;
}
function htmlentities_decode(str) {
    var textarea = document.createElement("textarea");
    textarea.innerHTML = str;
    return textarea.value;
}

http://pastebin.com/JGCVs0Ts

edited Aug 7, 2016 at 18:34

answered Jun 29, 2012 at 22:39

baptx

3,7766 gold badges36 silver badges44 bronze badges

I didn't downvote, but all regex style replace will fail to encode unicode... So, anyone using a foreign language is going to be disappointed. The <textarea> trick mentioned above is really cool and handles everything quickly and securely.
– Ajax
Commented Jan 4, 2016 at 8:59
1

The regex works fine for me with a number of non-Latin Unicode characters. I wouldn't expect anything else. How do you think this wouldn't work? Are you thinking of single-byte codepages that require HTML entities? That's what the 3rd and 4th function are for, and explicitly not the 1st and second. I like the differentiation.
– ygoe
Commented Feb 29, 2016 at 17:30
@LonelyPixel I don't think he will see your comment if you don't mention him ("Only one additional user can be notified; the post owner will always be notified")
– baptx
Commented Feb 29, 2016 at 19:31
I didn't know targeted notifications exist at all. @Ajax please see my comment above.
– ygoe
Commented Mar 1, 2016 at 8:00
@LonelyPixel I see now. For some reason I didn't think there was a textarea style replacement in this answer. I was, indeed, thinking of double codepoint big unicode values, like Mandarin. I mean, it would be possible to make a regex smart enough, but when you look at the shortcuts that browser vendors can take, I would feel pretty good betting that textarea will be much faster (than a completely competent regex). Did someone post a benchmark on this answer? I swore I had seen one.
– Ajax
Commented Mar 2, 2016 at 2:41

Add a comment |

Dave Brown · Accepted Answer · 2015-07-26 13:33:15Z

4

function encode(r) {
  return r.replace(/[\x26\x0A\x3c\x3e\x22\x27]/g, function(r) {
	return "&#" + r.charCodeAt(0) + ";";
  });
}

test.value=encode('How to encode\nonly html tags &<>\'" nice & fast!');

/*
 \x26 is &ampersand (it has to be first),
 \x0A is newline,
 \x22 is ",
 \x27 is ',
 \x3c is <,
 \x3e is >
*/

<textarea id=test rows=11 cols=55>www.WHAK.com</textarea>

answered Jul 26, 2015 at 13:33

Dave Brown

9439 silver badges6 bronze badges

Add a comment |

ShortFuse · Accepted Answer · 2020-01-05 21:18:42Z

I'll add XMLSerializer to the pile. It provides the fastest result without using any object caching (not on the serializer, nor on the Text node).

function serializeTextNode(text) {
  return new XMLSerializer().serializeToString(document.createTextNode(text));
}

The added bonus is that it supports attributes which is serialized differently than text nodes:

function serializeAttributeValue(value) {
  const attr = document.createAttribute('a');
  attr.value = value;
  return new XMLSerializer().serializeToString(attr);
}

You can see what it's actually replacing by checking the spec, both for text nodes and for attribute values. The full documentation has more node types, but the concept is the same.

As for performance, it's the fastest when not cached. When you do allow caching, then calling innerHTML on an HTMLElement with a child Text node is fastest. Regex would be slowest (as proven by other comments). Of course, XMLSerializer could be faster on other browsers, but in my (limited) testing, a innerHTML is fastest.

Fastest single line:

new XMLSerializer().serializeToString(document.createTextNode(text));

Fastest with caching:

const cachedElementParent = document.createElement('div');
const cachedChildTextNode = document.createTextNode('');
cachedElementParent.appendChild(cachedChildTextNode);

function serializeTextNode(text) {
  cachedChildTextNode.nodeValue = text;
  return cachedElementParent.innerHTML;
}

https://jsperf.com/htmlentityencode/1

iman · Accepted Answer · 2014-11-02 07:50:16Z

2

Martijn's method as single function with handling " mark (using in javascript) :

function escapeHTML(html) {
    var fn=function(tag) {
        var charsToReplace = {
            '&': '&amp;',
            '<': '&lt;',
            '>': '&gt;',
            '"': '&#34;'
        };
        return charsToReplace[tag] || tag;
    }
    return html.replace(/[&<>"]/g, fn);
}

answered Nov 2, 2014 at 7:50

iman

6,1521 gold badge20 silver badges23 bronze badges

this solution I have also found in Vue framework github.com/vuejs/vue/blob/…
– Luckylooke
Commented Feb 16, 2021 at 18:12

Add a comment |

gilmatic · Accepted Answer · 2018-11-07 20:51:51Z

2

I'm not entirely sure about speed, but if you are looking for simplicity I would suggest using the lodash/underscore escape function.

answered Nov 7, 2018 at 20:51

gilmatic

1,8341 gold badge15 silver badges16 bronze badges

Add a comment |

robertf · Accepted Answer · 2024-01-10 15:05:12Z

1

Based on the comment by @Vitaly, and the fact the OP indicated the text "might contain HTML tags", the following helps significantly with text that rarely needs to be escaped.

  static escapeHtml(str) {
    return !(/[<>&"']/.test(str)) ? str :
           str.replaceAll('&', '&amp;')   .replaceAll('<', '&lt;')
              .replaceAll('>', '&gt;')    .replaceAll('"', '&quot;');
  }

answered Jan 10 at 15:05

robertf

2671 silver badge11 bronze badges

Add a comment |

suncat100 · Accepted Answer · 2018-03-20 19:14:41Z

-5

A bit late to the show, but what's wrong with using encodeURIComponent() and decodeURIComponent()?

answered Mar 20, 2018 at 19:14

suncat100

2,1861 gold badge19 silver badges23 bronze badges

1

Those do something completely unrelated
– callum
Commented Apr 4, 2018 at 16:22
2

Perhaps the biggest abuse of the word "completely" I have ever heard. For example, in relation to the main topic question, it could be used to decode a html string (obviously for some kinda storage reason), regardless of html tags, and then easily encode it back to html again when and if required.
– suncat100
Commented Apr 5, 2018 at 17:27
1

@callum is correct: the question asks about html entities, and you answer about uri components, which are completely different.
– Derek Henderson
Commented Aug 4, 2021 at 14:24

Add a comment |

Collectives™ on Stack Overflow

Fastest method to escape HTML tags as HTML entities?

13 Answers 13

Not the answer you're looking for? Browse other questions tagged
javascript
html
regex
performance
string
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

13 Answers 13

Not the answer you're looking for? Browse other questions tagged javascripthtmlregexperformancestring or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
javascript
html
regex
performance
string
or ask your own question.