35

Given the text

<b>This is some text</b>

I want to write it to my page so that it shows up like this:

<b>This is some text</b>

and not like this

This is some text

using escape("<b>This is some text</b>") gives me this lovely gem in firefox

%3Cb%3EThis%20is%20some%20text%3C/b%3E

not exaclty what I'm after. Any ideas?

3
  • are you trying to do this with POJS or would you consider using a framework? Commented Mar 9, 2011 at 20:11
  • 1
    I discovered using JQuery.text() instead of JQuery.html() does the trick.
    – Micah
    Commented Mar 9, 2011 at 20:24
  • Does this answer your question? Can I escape HTML special chars in JavaScript?
    – Flimm
    Commented Jan 26, 2022 at 20:01

8 Answers 8

62

This should work for you: http://blog.nickburwell.com/2011/02/escape-html-tags-in-javascript.html

function escapeHTML( string )
{
    var pre = document.createElement('pre');
    var text = document.createTextNode( string );
    pre.appendChild(text);
    return pre.innerHTML;
}

Security Warning

The function doesn't escape single and double quotes, which if used in the wrong context, may still lead to XSS. For example:

 var userWebsite = '" onmouseover="alert(\'gotcha\')" "';
 var profileLink = '<a href="' + escapeHtml(userWebsite) + '">Bob</a>';
 var div = document.getElemenetById('target');
 div.innerHtml = profileLink;
 // <a href="" onmouseover="alert('gotcha')" "">Bob</a>

Thanks to buffer for pointing out this case. Snippet taken out of this blog post.

6
  • 2
    wow, great solution, people should notice this one and vote up more!
    – darma
    Commented Nov 11, 2011 at 20:07
  • 3
    It is a great solution, although it does have a dependency on the DOM. If you are using JavaScript outside of a browser, you will need one of the other solutions below. Commented Feb 11, 2014 at 22:25
  • 3
    It does not escape quotes and you might incorrectly assume that it's safe to insert the content as HTML. Example: benv.ca/2012/10/2/you-are-probably-misusing-DOM-text-methods
    – user
    Commented Sep 23, 2014 at 6:50
  • 2
    limc, PLEASE UPDATE THIS WITH A SECURE SOLUTION. I downvoted it for now as its scary people out there may be implementing this -- I WILL TAKE OFF DOWNVOTE AND THEN UPVOTE WHEN I SEE YOU'VE UPDATED YOUR ANSWER. Thx!
    – Cody
    Commented Oct 2, 2015 at 16:54
  • @user Your link is broken.
    – Flimm
    Commented Jan 26, 2022 at 19:53
42

I like @limc's answer for situations where the HTML DOM document is available.

I like @Michele Bosi's and @Paolo's answers for non HTML DOM document environment such as Node.js.

@Michael Bosi's answer can be optimized by removing the need to call replace 4 times with a single invocation of replace combined with a clever replacer function:

function escape(s) {
    let lookup = {
        '&': "&amp;",
        '"': "&quot;",
        '\'': "&apos;",
        '<': "&lt;",
        '>': "&gt;"
    };
    return s.replace( /[&"'<>]/g, c => lookup[c] );
}
console.log(escape("<b>This is 'some' text.</b>"));

@Paolo's range test can be optimized with a well chosen regex and the for loop can be eliminated by using a replacer function:

function escape(s) {
    return s.replace(
        /[^0-9A-Za-z ]/g,
        c => "&#" + c.charCodeAt(0) + ";"
    );
}
console.log(escape("<b>This is 'some' text</b>"));

As @Paolo indicated, this strategy will work for more scenarios.

6
  • 1
    Stephan, this is the most elegant solution I've seen yet -- really appreciate your answer! [ upvoted ].
    – Cody
    Commented Oct 2, 2015 at 16:58
  • 2
    Guys, if you want a complete solution, move over to: github.com/janl/mustache.js/blob/master/mustache.js#L55 It includes all characters!! Thanks @Error for pointing out the article that lead me to that method Commented Sep 20, 2016 at 20:29
  • 2
    The second one is perfect for NodeJS where there is no DOM; however, I would expand it to not include many other common characters. It is also best compatible with any new special additions to the HTML spec by encoding everything that is not in the regex list of characters to skip. Commented Aug 7, 2019 at 22:36
  • 1
    You should also escape apostrophe (') because it can be used instead of quotation mark in HTML to wrap attribute values. You can replace it with &apos;.
    – Finesse
    Commented Sep 6, 2021 at 7:51
  • 1
    @JoãoAntunes some years later, I found your comment still valuable and I actually used that bit of code from mustache.js. I'd say your comment should become an actual answer, and maybe point to a specific file revision, so to show the exact line, e.g. as of today
    – superjos
    Commented Apr 27, 2023 at 11:17
27

I ended up doing this:

function escapeHTML(s) { 
    return s.replace(/&/g, '&amp;')
            .replace(/"/g, '&quot;')
            .replace(/</g, '&lt;')
            .replace(/>/g, '&gt;');
}
2
  • 6
    This is identical to kapa/Headshota's answer posted more than a year before yours, -1 for copying their answer. (Adding indentation should have been an edit instead of taking the karma for yourself.)
    – Luc
    Commented May 31, 2020 at 9:13
  • You should also escape apostrophe (') because it can be used instead of quotation mark in HTML. You can replace it with &apos;.
    – Finesse
    Commented Sep 6, 2021 at 7:52
7

Try this htmlentities for javascript

function htmlEntities(str) {
    return String(str).replace(/&/g, '&amp;').replace(/</g, '&lt;').replace(/>/g, '&gt;').replace(/"/g, '&quot;');
}
1
  • Fine for PHP, but question's tags indicate JavaScript as the desired solution language. Commented Feb 11, 2014 at 22:27
4

Traditional Escaping

If you're using XHTML, you'll need to use a CDATA section. You can use these in HTML, too, but HTML isn't as strict.

I split up the string constants so that this code will work inline on XHTML within CDATA blocks. If you are sourcing your JavaScript as separate files, then you don't need to bother with that. Note that if you are using XHTML with inline JavaScript, then you need to enclose your code in a CDATA block, or some of this will not work. You will run into odd, subtle errors.

function htmlentities(text) {
    var escaped = text.replace(/\]\]>/g, ']]' + '>]]&gt;<' + '![CDATA[');
    return '<' + '![CDATA[' + escaped + ']]' + '>';
}

DOM Text Node

The "proper" way to escape text is to use the DOM function document.createTextNode. This doesn't actually escape the text; it just tells the browser to create a text element, which is inherently unparsed. You have to be willing to use the DOM for this method to work, however: that is, you have use methods such as appendChild, as opposed to the innerHTML property and similar. This would fill an element with ID an-element with text, which would not be parsed as (X)HTML:

var textNode = document.createTextNode("<strong>This won't be bold.  The tags " +
    "will be visible.</strong>");
document.getElementById('an-element').appendChild(textNode);

jQuery DOM Wrapper

jQuery provides a handy wrapper for createTextNode named text. It's quite convenient. Here's the same functionality using jQuery:

$('#an-element').text("<strong>This won't be bold.  The tags will be " +
    "visible.</strong>");
1
  • @cHao Yes. It's pretty popular because it's strict. You know what you're going to get.
    – Zenexer
    Commented Jul 1, 2013 at 22:31
2

Here's a function that replaces angle brackets with their html entities. You might want to expand it to include other characters too.

function htmlEntities( html ) {
    html = html.replace( /[<>]/g, function( match ) {
        if( match === '<' ) return '&lt;';
        else return '&gt;';
    });
    return html;
}

console.log( htmlEntities( '<b>replaced</b>' ) ); // &lt;b&gt;replaced&lt;/b&gt;
2

You can encode all characters in your string:

function encode(e){return e.replace(/[^]/g,function(e){return"&#"+e.charCodeAt(0)+";"})}

Or just target the main characters to worry about (&, inebreaks, <, >, " and ') like:

function encode(r){
return r.replace(/[\x26\x0A\<>'"]/g,function(r){return"&#"+r.charCodeAt(0)+";"})
}

test.value=encode('Encode HTML entities!\n\n"Safe" escape <script id=\'\'> & useful in <pre> tags!');

testing.innerHTML=test.value;

/*************
* \x26 is &ampersand (it has to be first),
* \x0A is newline,
*************/
<textarea id=test rows="9" cols="55"></textarea>

<div id="testing">www.WHAK.com</div>

0

I use the following function that escapes every character with the &#nnn; notation except a-z A-Z 0-9 and space

function Escape( s )
{
    var h,
        i,
        n,
        c;

    n = s.length;
    h = '';

    for( i = 0; i < n; i++ )
    {
        c = s.charCodeAt( i );
        if( ( c >= 48 && c <= 57 ) 
          ||( c >= 65 && c <= 90 ) 
          ||( c >= 97 && c <=122 )
          ||( c == 32 ) )
        {
            h += String.fromCharCode( c );
        }
        else
        {
            h += '&#' + c + ';';
        }
    }

    return h;
}

Example:

Escape('<b>This is some text</b>')

returns

&#60;b&#62;This is some text&#60;&#47;b&#62;

The function is code injection attacks proof, unicode proof, pure JavaScript.

This approach is about 50 times slower than the one that creates the DOM text node but still the funcion escapes a one milion (1,000,000) characters string in 100-150 milliseconds.

(Tested on early 2011 MacBook Pro - Safari 9 - Mavericks)

Not the answer you're looking for? Browse other questions tagged or ask your own question.