5
\$\begingroup\$

After importing some products from a csv strange characters have shown up on the page and it would be too much work to manually go to each product and remove them so I made this script to deploy on that product page and remove them.

$(function() {
    var p_desc = $(".rte").html();
    var re = /\?ÕÌ_|Š|š|Ž|ž|À|Á|Â|Ã|Ä|Å|Æ|Ç|È|É|Ê|Ë|Ì|Í|Î|Ï|Ñ|Ò|Ó|Ô|Õ|Ö|Ø|Ù|Ú|Û|Ü|Ý|Þ|ß|à|á|â|ã|ä|å|æ|ç|è|é|ê|ë|ì|í|î|ï|ð|ñ|ò|ó|ô|õ|ö|ø|ù|ú|û|ý|þ|ÿ|_Œ‚|__|_/g;
    var result = p_desc.replace(re, ' ');
    var new_p_desc = result.replace(/[^\x00-\x7F]/g, "").replace(/\?/g, '');
    $(".rte").html(new_p_desc);
});

My script is working fine but not sure if it could be made better. Was this the best way to go about it?

\$\endgroup\$
1
  • \$\begingroup\$ Can you provide a sample of the strange characters? There is probably a better solution that deleting them. \$\endgroup\$ Commented Dec 22, 2016 at 5:50

2 Answers 2

11
\$\begingroup\$

RegEx Improvements

The regex can be shortened by using case-insensitive match with i flag. We can remove the characters which are added as both lowercase and uppercase in the regex.

After removing lowercase characters regex will be as below

\?ÕÌ_|Š|Ž|À|Á|Â|Ã|Ä|Å|Æ|Ç|È|É|Ê|Ë|Ì|Í|Î|Ï|Ñ|Ò|Ó|Ô|Õ|Ö|Ø|Ù|Ú|Û|Ü|Ý|Þ|ß|ð|ÿ|_Œ‚|__|_

Here's live demo of regex

The regex can be further improved by using character class which will make the matches faster than OR conditions

\?ÕÌ_|_Œ‚|[ŠŽÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝÞßðÿ_]+

Adding + quantifier also has positive effect on the number of steps taken to match characters when the characters in the character class are consecutive/adjacent to each other.

Here's the demo on RegEx101, without + quantifierScreenshot and with + quantifierscreenshot applied on the same data. Note that in these demos, PHP is selected as the steps taken to match is not shown for JavaScript. Also, the regex is different, it also contains lowercase counterparts of those special characters as i flag is not working with PHP and don't want to apply u(Unicode) flag as it is not supported in JavaScript.

These demos are created only to show difference when + is applied on character class. The effect should be similar in JavaScript.

Note that the __(two underscores) are redundant as _ is already added in character class and with g flag it'll remove all occurrences.

Method Chaining

As replace returns a string, any other string method can be called on it. Multiple calls to replace can be chained.

str.replace(someRegexOrString, someString)
    .replace(someOtherRegexOrString, someOtherString);

This is equivalent to

var temp = str.replace(someRegexOrString, someString);
var result = temp.replace(someOtherRegexOrString, someOtherString);

Replacing HTML

jQuery html() accepts a function which will receive the current innerHTML of the element on which the method is called as parameter and replaces the returned content to the element.

The code can be written as

$('.rte').html(function(index, currentHTML) {
    return doSomeOperationOn(currentHTML);
});

Complete Code

With above changes, the code will be

$(document).ready(function() {
    var regex = /\?ÕÌ_|_Œ‚|[ŠŽÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÑÒÓÔÕÖØÙÚÛÜÝÞßðÿ_]+/gi;

    $('.rte').html(function(i, oldHTML) {
        return oldHTML.replace(regex, ' ')
            .replace(/[^\x00-\x7F]|\?/g, '');
    });
});

$(document).ready(function() { is more readable than $(function() {. So, you may also consider using more expressive form.

\$\endgroup\$
0
2
\$\begingroup\$

The code may be correct in itself, but it does the wrong thing.

If by strange you mean unknown to someone who only knows English, that's no excuse for removing any letters you don't know. Would you really want to look at street signs for Cafs (which were legitimate Cafés before)?

If you get strange character sequences like ö, that's an encoding problem and you need to fix it properly instead of hiding it.

If you really have to keep your code, at least be honest and replace each unknown character with a question mark or the Unicode replacement character so that it is clearly visible that something unexpected happened here.

\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.