234

I have a string:

var string = "aaaaaa<br />&dagger; bbbb<br />&Dagger; cccc"

And I would like to split this string with the delimiter <br /> followed by a special character.

To do that, I am using this:

string.split(/<br \/>&#?[a-zA-Z0-9]+;/g);

I am getting what I need, except that I am losing the delimiter. Here is the example: http://jsfiddle.net/JwrZ6/1/

How can I keep the delimiter?

5
  • if you know the delimiter beforehand, why not just do... var delim = "<br/>"; ? Commented Aug 17, 2012 at 7:59
  • Thank you @SiGanteng, I know the beforehand delimiter but I can't make it working for my example. I need to keep the delimiter to be <br /> followed by the special character because sometimes I can have a <br /> not followed by the special char and this one don't have to be splitted. Commented Aug 17, 2012 at 8:02
  • 4
    Good question, I have a similar case where knowing the delimiter doesn't help. I'm splitting on "]&[". So really my delimiter is "&" but splitting on that is not precise enough, I need to get the brackets either side to determine a proper split. However, I need those brackets back in my split strings. 1 in each, either side.
    – PandaWood
    Commented Oct 23, 2014 at 0:44
  • 2
    @PandaWood So, you would use .split(/(?<=\[)&(?=[)/) these days. Commented Aug 3, 2021 at 6:54
  • Similar question (without regex): stackoverflow.com/q/4514144/9157799 Commented Feb 2, 2023 at 1:49

11 Answers 11

301

I was having similar but slight different problem. Anyway, here are examples of three different scenarios for where to keep the deliminator.

"1、2、3".split("、") == ["1", "2", "3"]
"1、2、3".split(/(、)/g) == ["1", "、", "2", "、", "3"]
"1、2、3".split(/(?=、)/g) == ["1", "、2", "、3"]
"1、2、3".split(/(?!、)/g) == ["1、", "2、", "3"]
"1、2、3".split(/(.*?、)/g) == ["", "1、", "", "2、", "3"]

Warning: The fourth will only work to split single characters. ConnorsFan presents an alternative:

// Split a path, but keep the slashes that follow directories
var str = 'Animation/rawr/javascript.js';
var tokens = str.match(/[^\/]+\/?|\//g);
10
  • 5
    I was looking for something like the third example, but this only works if the elements are only one character - it will split into individual characters otherwise. I had to go the tedious RegExp.exec route in the end.
    – Gordon
    Commented Dec 30, 2016 at 21:19
  • 5
    I don't understand why everybody is using /g Commented Jan 11, 2017 at 18:22
  • 2
    How would use this regex "1、2、3".split(/(?!、)/g) == ["1、", "2、", "3"] for full words? For example "foo1, foo2, foo3,"
    – Waltari
    Commented Nov 6, 2017 at 10:17
  • 2
    Translation of the .match non-greedy solution for these examples: "11、22、33".match(/.*?、|.+$/g) -> ["11、", "22、", "33"]. Note /g modifier is crucial for match. Commented Apr 6, 2020 at 13:29
  • 1
    Perfect answer. #2 was exactly what I needed. I wish more questions were answered with concise examples covering multiple variants.
    – Matuszek
    Commented Oct 3, 2020 at 19:38
130

Use (positive) lookahead so that the regular expression asserts that the special character exists, but does not actually match it:

string.split(/<br \/>(?=&#?[a-zA-Z0-9]+;)/g);

See it in action:

var string = "aaaaaa<br />&dagger; bbbb<br />&Dagger; cccc";
console.log(string.split(/<br \/>(?=&#?[a-zA-Z0-9]+;)/g));

5
  • When I use this code, it adds a 0 at the end of each string Commented Jul 24, 2019 at 20:56
  • 2
    I cannot find anything about positive lookahead in the link you gave. Commented Mar 23, 2020 at 20:55
  • @PaulJones the content was moved in the intervening time. Thanks for letting me know, I fixed the link.
    – Jon
    Commented Apr 15, 2020 at 16:11
  • 2
    Comment for myself when I come back to this: 'positive' lookahead is (?=)
    – Sam Lahm
    Commented Apr 5, 2021 at 8:28
  • Torsten Walter's answer is somewhat nicer since the delimeters are put into their own array item. Easier to operate on.
    – Simon E.
    Commented Nov 8, 2021 at 3:41
91

If you wrap the delimiter in parantheses it will be part of the returned array.

string.split(/(<br \/>&#?[a-zA-Z0-9]+);/g);
// returns ["aaaaaa", "<br />&dagger;", "bbbb", "<br />&Dagger;", "cccc"]

Depending on which part you want to keep change which subgroup you match

string.split(/(<br \/>)&#?[a-zA-Z0-9]+;/g);
// returns ["aaaaaa", "<br />", "bbbb", "<br />", "cccc"]

You could improve the expression by ignoring the case of letters string.split(/()&#?[a-z0-9]+;/gi);

And you can match for predefined groups like this: \d equals [0-9] and \w equals [a-zA-Z0-9_]. This means your expression could look like this.

string.split(/<br \/>(&#?[a-z\d]+;)/gi);

There is a good Regular Expression Reference on JavaScriptKit.

4
  • 5
    Even better, I doesen't know that we can keep only a part of the delimiter. In fact I need to keep only the special char, I can do it with this: string.split(/<br \/>(&#?[a-zA-Z0-9]+;)/g); Commented Aug 17, 2012 at 8:16
  • 1
    You can optimize your expression by ignoring the case of words. Or match for a predefined character class. I'll update my answer. Commented Aug 17, 2012 at 8:44
  • 4
    Why is this so low.. Its perfect and so flexible
    – Tofandel
    Commented Nov 22, 2019 at 7:37
  • 5
    This is certainly the easiest way, and the most readable syntax. Commented Dec 31, 2019 at 14:04
6

If you group the split pattern, its match will be kept in the output and it is by design:

If separator is a regular expression with capturing parentheses, then each time separator matches, the results (including any undefined results) of the capturing parentheses are spliced into the output array.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split#description

You don't need a lookahead or global flag unless your search pattern uses one.

const str = `How much wood would a woodchuck chuck, if a woodchuck could chuck wood?`

const result = str.split(/(\s+)/);
console.log(result);

// We can verify the result
const isSame = result.join('') === str;
console.log({ isSame });

You can use multiple groups. You can be as creative as you like and what remains outside the groups will be removed:

const str = `How much wood would a woodchuck chuck, if a woodchuck could chuck wood?`

const result = str.split(/(\s+)(\w{1,2})\w+/);
console.log(result, result.join(''));

1
5

answered it here also JavaScript Split Regular Expression keep the delimiter

use the (?=pattern) lookahead pattern in the regex example

var string = '500x500-11*90~1+1';
string = string.replace(/(?=[$-/:-?{-~!"^_`\[\]])/gi, ",");
string = string.split(",");

this will give you the following result.

[ '500x500', '-11', '*90', '~1', '+1' ]

Can also be directly split

string = string.split(/(?=[$-/:-?{-~!"^_`\[\]])/gi);

giving the same result

[ '500x500', '-11', '*90', '~1', '+1' ]
2
  • Why not just immediately split, as in Jon's accepted answer?
    – Gordon
    Commented Dec 30, 2016 at 21:21
  • @Gordon... :) I could just do that... updated the code... Cheers
    – Fry
    Commented Jan 3, 2017 at 12:06
5

I made a modification to jichi's answer, and put it in a function which also supports multiple letters.

String.prototype.splitAndKeep = function(separator, method='seperate'){
    var str = this;
    if(method == 'seperate'){
        str = str.split(new RegExp(`(${separator})`, 'g'));
    }else if(method == 'infront'){
        str = str.split(new RegExp(`(?=${separator})`, 'g'));
    }else if(method == 'behind'){
        str = str.split(new RegExp(`(.*?${separator})`, 'g'));
        str = str.filter(function(el){return el !== "";});
    }
    return str;
};

jichi's answers 3rd method would not work in this function, so I took the 4th method, and removed the empty spaces to get the same result.

edit: second method which excepts an array to split char1 or char2

String.prototype.splitAndKeep = function(separator, method='seperate'){
    var str = this;
    function splitAndKeep(str, separator, method='seperate'){
        if(method == 'seperate'){
            str = str.split(new RegExp(`(${separator})`, 'g'));
        }else if(method == 'infront'){
            str = str.split(new RegExp(`(?=${separator})`, 'g'));
        }else if(method == 'behind'){
            str = str.split(new RegExp(`(.*?${separator})`, 'g'));
            str = str.filter(function(el){return el !== "";});
        }
        return str;
    }
    if(Array.isArray(separator)){
        var parts = splitAndKeep(str, separator[0], method);
        for(var i = 1; i < separator.length; i++){
            var partsTemp = parts;
            parts = [];
            for(var p = 0; p < partsTemp.length; p++){
                parts = parts.concat(splitAndKeep(partsTemp[p], separator[i], method));
            }
        }
        return parts;
    }else{
        return splitAndKeep(str, separator, method);
    }
};

usage:

str = "first1-second2-third3-last";

str.splitAndKeep(["1", "2", "3"]) == ["first", "1", "-second", "2", "-third", "3", "-last"];

str.splitAndKeep("-") == ["first1", "-", "second2", "-", "third3", "-", "last"];
2
  • very useful! Thank you! Just for people passing by... This splits as separate element newlines chars. If you do not want this behaviour, use 'gs' instead of 'g' Commented Dec 1, 2021 at 9:19
  • You really helped me go out of my bug. Thanks
    – rozacek
    Commented Oct 5, 2022 at 13:16
5

Most of the existing answers predate the introduction of lookbehind assertions in JavaScript in 2018. You didn't specify how you wanted the delimiters to be included in the result. One typical use case would be sentences delimited by punctuation ([.?!]), where one would want the delimiters to be included at the ends of the resulting strings. This corresponds to the fourth case in the accepted answer, but as noted there, that solution only works for single characters. Arbitrary strings with the delimiters appended at the end can be formed with a lookbehind assertion:

'It is. Is it? It is!'.split(/(?<=[.?!])/)
/* [ 'It is.', ' Is it?', ' It is!' ] */
1
  • This is probably the best answer of this thread. Lookbehind/lookahead can also be more powerful than this Commented Aug 29, 2023 at 7:41
3

I know that this is a bit late but you could also use lookarounds

var string = "aaaaaa<br />&dagger; bbbb<br />&Dagger; cccc";
var array = string.split(/(?<=<br \/>)/);
console.log(array);

1

An extension function splits string with substring or RegEx and the delimiter is putted according to second parameter ahead or behind.

    String.prototype.splitKeep = function (splitter, ahead) {
        var self = this;
        var result = [];
        if (splitter != '') {
            var matches = [];
            // Getting mached value and its index
            var replaceName = splitter instanceof RegExp ? "replace" : "replaceAll";
            var r = self[replaceName](splitter, function (m, i, e) {
                matches.push({ value: m, index: i });
                return getSubst(m);
            });
            // Finds split substrings
            var lastIndex = 0;
            for (var i = 0; i < matches.length; i++) {
                var m = matches[i];
                var nextIndex = ahead == true ? m.index : m.index + m.value.length;
                if (nextIndex != lastIndex) {
                    var part = self.substring(lastIndex, nextIndex);
                    result.push(part);
                    lastIndex = nextIndex;
                }
            };
            if (lastIndex < self.length) {
                var part = self.substring(lastIndex, self.length);
                result.push(part);
            };
            // Substitution of matched string
            function getSubst(value) {
                var substChar = value[0] == '0' ? '1' : '0';
                var subst = '';
                for (var i = 0; i < value.length; i++) {
                    subst += substChar;
                }
                return subst;
            };
        }
        else {
            result.add(self);
        };
        return result;
    };

The test:

    test('splitKeep', function () {
        // String
        deepEqual("1231451".splitKeep('1'), ["1", "231", "451"]);
        deepEqual("123145".splitKeep('1', true), ["123", "145"]);
        deepEqual("1231451".splitKeep('1', true), ["123", "145", "1"]);
        deepEqual("hello man how are you!".splitKeep(' '), ["hello ", "man ", "how ", "are ", "you!"]);
        deepEqual("hello man how are you!".splitKeep(' ', true), ["hello", " man", " how", " are", " you!"]);
        // Regex
        deepEqual("mhellommhellommmhello".splitKeep(/m+/g), ["m", "hellomm", "hellommm", "hello"]);
        deepEqual("mhellommhellommmhello".splitKeep(/m+/g, true), ["mhello", "mmhello", "mmmhello"]);
    });
1

I've been using this:

String.prototype.splitBy = function (delimiter) {
  var 
    delimiterPATTERN = '(' + delimiter + ')', 
    delimiterRE = new RegExp(delimiterPATTERN, 'g');

  return this.split(delimiterRE).reduce((chunks, item) => {
    if (item.match(delimiterRE)){
      chunks.push(item)
    } else {
      chunks[chunks.length - 1] += item
    };
    return chunks
  }, [])
}

Except that you shouldn't mess with String.prototype, so here's a function version:

var splitBy = function (text, delimiter) {
  var 
    delimiterPATTERN = '(' + delimiter + ')', 
    delimiterRE = new RegExp(delimiterPATTERN, 'g');

  return text.split(delimiterRE).reduce(function(chunks, item){
    if (item.match(delimiterRE)){
      chunks.push(item)
    } else {
      chunks[chunks.length - 1] += item
    };
    return chunks
  }, [])
}

So you could do:

var haystack = "aaaaaa<br />&dagger; bbbb<br />&Dagger; cccc"
var needle =  '<br \/>&#?[a-zA-Z0-9]+;';
var result = splitBy(haystack , needle)
console.log( JSON.stringify( result, null, 2) )

And you'll end up with:

[
  "<br />&dagger; bbbb",
  "<br />&Dagger; cccc"
]
-3

I've also came up with this solution. No regex needed, very readable.

const str = "hello world what a great day today balbla"
const separatorIndex = str.indexOf("great")
const parsedString = str.slice(separatorIndex)

console.log(parsedString)

Not the answer you're looking for? Browse other questions tagged or ask your own question.