Revisions to How to capture multiple repeated groups?

deleted 30 characters in body

Source Link

edited Oct 3, 2023 at 18:08

8.6k
4
12
42

The problem with the attempted code, as discussed, is that there is one capture group matching repeatedly so in the end only the last match can be kept.

Instead, instruct the regex to match (and capture) all pattern instances in the string, what can be done in any regex implementation (language). So come up with the regex pattern for this.

The defining property of the shown sample data is that the patterns of interest are separated by commas so we can match anything-but-a-comma, using a negated character class

[^,]+

and match (capture) globally, to get all matches in the string.

If your pattern need be more restrictive then adjust the exclusion list. For example, to capture words separated by any of the listed punctuation

[^,.!-]+

This extracts all words from hi,there-again!, without the punctuation. (The - itself should be given first or last in a character class, unless it's used in a range like a-z or 0-9.)

In Python

import re

string = "HELLO,THERE,WORLD"

pattern = r"([^,]+)"
matches = re.findall(pattern,string)

print(matches)

import re

string = "HELLO,THERE,WORLD"

pattern = r"([^,]+)"
matches = re.findall(pattern,string)

print(matches)

In Perl (and many other compatible systems)

use warnings;
use strict;
use feature 'say';

my $string = 'HELLO,THERE,WORLD';

my @matches = $string =~ /([^,]+)/g;

say "@matches";

use warnings;
use strict;
use feature 'say';

my $string = 'HELLO,THERE,WORLD';

my @matches = $string =~ /([^,]+)/g;

say "@matches";

(In this specific example the capturing () in fact aren't needed since we collect everything that is matched. But they don't hurt and in general they are needed.)

The approach above works as it stands for other patterns as well, including the one attempted in the question (as long as you remove the anchors which make it too specific). The most common one is to capture all words (usually meaning [a-zA-Z0-9_]), with the pattern \w+. Or, as in the question, get only the substrings of upper-case ascii letters[A-Z]+.

The problem with the attempted code, as discussed, is that there is one capture group matching repeatedly so in the end only the last match can be kept.

Instead, instruct the regex to match (and capture) all pattern instances in the string, what can be done in any regex implementation (language). So come up with the regex pattern for this.

The defining property of the shown sample data is that the patterns of interest are separated by commas so we can match anything-but-a-comma, using a negated character class

[^,]+

and match (capture) globally, to get all matches in the string.

If your pattern need be more restrictive then adjust the exclusion list. For example, to capture words separated by any of the listed punctuation

[^,.!-]+

This extracts all words from hi,there-again!, without the punctuation. (The - itself should be given first or last in a character class, unless it's used in a range like a-z or 0-9.)

In Python

import re

string = "HELLO,THERE,WORLD"

pattern = r"([^,]+)"
matches = re.findall(pattern,string)

print(matches)

In Perl (and many other compatible systems)

use warnings;
use strict;
use feature 'say';

my $string = 'HELLO,THERE,WORLD';

my @matches = $string =~ /([^,]+)/g;

say "@matches";

(In this specific example the capturing () in fact aren't needed since we collect everything that is matched. But they don't hurt and in general they are needed.)

The approach above works as it stands for other patterns as well, including the one attempted in the question (as long as you remove the anchors which make it too specific). The most common one is to capture all words (usually meaning [a-zA-Z0-9_]), with the pattern \w+. Or, as in the question, get only the substrings of upper-case ascii letters[A-Z]+.

The problem with the attempted code, as discussed, is that there is one capture group matching repeatedly so in the end only the last match can be kept.

Instead, instruct the regex to match (and capture) all pattern instances in the string, what can be done in any regex implementation (language). So come up with the regex pattern for this.

The defining property of the shown sample data is that the patterns of interest are separated by commas so we can match anything-but-a-comma, using a negated character class

[^,]+

and match (capture) globally, to get all matches in the string.

If your pattern need be more restrictive then adjust the exclusion list. For example, to capture words separated by any of the listed punctuation

[^,.!-]+

This extracts all words from hi,there-again!, without the punctuation. (The - itself should be given first or last in a character class, unless it's used in a range like a-z or 0-9.)

In Python

import re

string = "HELLO,THERE,WORLD"

pattern = r"([^,]+)"
matches = re.findall(pattern,string)

print(matches)

In Perl (and many other compatible systems)

use warnings;
use strict;
use feature 'say';

my $string = 'HELLO,THERE,WORLD';

my @matches = $string =~ /([^,]+)/g;

say "@matches";

(In this specific example the capturing () in fact aren't needed since we collect everything that is matched. But they don't hurt and in general they are needed.)

The approach above works as it stands for other patterns as well, including the one attempted in the question (as long as you remove the anchors which make it too specific). The most common one is to capture all words (usually meaning [a-zA-Z0-9_]), with the pattern \w+. Or, as in the question, get only the substrings of upper-case ascii letters[A-Z]+.

added 374 characters in body

Source Link

edited Feb 15, 2023 at 21:34

zdim

66.5k
5
56
85

The problem with the attempted code, as discussed, is that there is one capture group matching repeatedly so in the end only the last match can be kept.

Instead, instruct the regex to match (and capture) all pattern instances in the string, what can be done in any regex implementation (language). So come up with the regex pattern for this.

The defining property of the shown sample data is that the patterns of interest are separated by commas so we can match anything-but-a-comma, using a negated character class

[^,]+

and match (capture) globally, to get all matches in the string.

If your pattern need be more restrictive then adjust the exclusion list. For example, to capture words separated by any of the listed punctuation

[^,.!-]+

This extracts all words from hi,there-again!, without the punctuation. (The - itself should be given first or last in a character class, unless it's used in a range like a-z or 0-9.)

In Python

import re

string = "HELLO,THERE,WORLD"

pattern = r"([^,]+)"
matches = re.findall(pattern,string)

print(matches)

In Perl (and many other compatible systems)

use warnings;
use strict;
use feature 'say';

my $string = 'HELLO,THERE,WORLD';

my @matches = $string =~ /([^,]+)/g;

say "@matches";

(In this specific example the capturing () in fact aren't needed since we collect everything that is matched. But they don't hurt and in general they are needed.)

The approach above works as it stands for other patterns as well, including the one attempted in the question (as long as you remove the anchors which make it too specific). The most common one is to capture all words (usually meaning [a-zA-Z0-9_]), with the pattern \w+. Or, as in the question, get only the substrings of upper-case ascii letters[A-Z]+.

The problem with the attempted code, as discussed, is that there is one capture group matching repeatedly so in the end only the last match can be kept.

Instead, instruct the regex to match (and capture) all pattern instances in the string, what can be done in any regex implementation (language). So come up with the regex pattern for this.

The defining property of the shown sample data is that the patterns of interest are separated by commas so we can match anything-but-a-comma, using a negated character class

[^,]+

and match (capture) globally, to get all matches in the string.

If your pattern need be more restrictive then adjust the exclusion list. For example, to capture words separated by any of the listed punctuation

[^,.!-]+

This extracts all words from hi,there-again!, without the punctuation. (The - itself should be given first or last in a character class, unless it's used in a range like a-z or 0-9.)

In Python

import re

string = "HELLO,THERE,WORLD"

pattern = r"([^,]+)"
matches = re.findall(pattern,string)

print(matches)

In Perl (and many other compatible systems)

use warnings;
use strict;
use feature 'say';

my $string = 'HELLO,THERE,WORLD';

my @matches = $string =~ /([^,]+)/g;

say "@matches";

(In this specific example the capturing () in fact aren't needed since we collect everything that is matched. But they don't hurt and in general they are needed.)

The problem with the attempted code, as discussed, is that there is one capture group matching repeatedly so in the end only the last match can be kept.

Instead, instruct the regex to match (and capture) all pattern instances in the string, what can be done in any regex implementation (language). So come up with the regex pattern for this.

The defining property of the shown sample data is that the patterns of interest are separated by commas so we can match anything-but-a-comma, using a negated character class

[^,]+

and match (capture) globally, to get all matches in the string.

If your pattern need be more restrictive then adjust the exclusion list. For example, to capture words separated by any of the listed punctuation

[^,.!-]+

This extracts all words from hi,there-again!, without the punctuation. (The - itself should be given first or last in a character class, unless it's used in a range like a-z or 0-9.)

In Python

import re

string = "HELLO,THERE,WORLD"

pattern = r"([^,]+)"
matches = re.findall(pattern,string)

print(matches)

In Perl (and many other compatible systems)

use warnings;
use strict;
use feature 'say';

my $string = 'HELLO,THERE,WORLD';

my @matches = $string =~ /([^,]+)/g;

say "@matches";

(In this specific example the capturing () in fact aren't needed since we collect everything that is matched. But they don't hurt and in general they are needed.)

The approach above works as it stands for other patterns as well, including the one attempted in the question (as long as you remove the anchors which make it too specific). The most common one is to capture all words (usually meaning [a-zA-Z0-9_]), with the pattern \w+. Or, as in the question, get only the substrings of upper-case ascii letters[A-Z]+.

added 62 characters in body

Source Link

edited Feb 15, 2023 at 21:20

zdim

66.5k
5
56
85

The problem with the attempted code, as discussed, is that there is one capture group matching repeatedly so in the end only the last match can be kept.

Instead, instruct the regex to match (and capture) all pattern instances in the string, what can be done in any regex implementation (language). So come up with the regex pattern for this.

The defining property of the shown sample data is that the patterns of interest are separated by commas so we can match anything-but-a-comma, using a negated character class

[^,]+

and match (capture) globally --, to get all matches in the string.

If your pattern need be more restrictive then adjust the exclusion list. For example, to capture words separated by any of the listed punctuation

[^,.!-]+

This extracts all words from hi,there-again!, without the punctuation. (The - itself should be given first or last in a character class, unless it's used in a range like a-z or 0-9.)

In Python

import re

string = "HELLO,THERE,WORLD"

pattern = r"([^,]+)"
matches = re.findall(pattern,string)

print(matches)

In Perl (and many other compatible systems)

use warnings;
use strict;
use feature 'say';

my $string = 'HELLO,THERE,WORLD';

my @matches = $string =~ /([^,]+)/g;

say "@matches";

(In this specific example the capturing () in fact aren't needed since we collect everything that is matched. But they don't hurt and in general they are needed.)

The problem with the attempted code, as discussed, is that there is one capture group matching repeatedly so in the end only the last match can be kept.

Instead, instruct the regex to match (and capture) all pattern instances in the string, what can be done in any regex implementation (language). So come up with the regex pattern for this.

The defining property of the shown sample data is that the patterns of interest are separated by commas so we can match anything-but-a-comma, using a negated character class

[^,]+

and match (capture) globally -- get all matches in the string.

If your pattern need be more restrictive adjust the exclusion list. For example, to capture words separated by any of the listed punctuation

[^,.!-]+

This extracts all words from hi,there-again!, without the punctuation. (The - should be given first or last in a character class.)

In Python

import re

string = "HELLO,THERE,WORLD"

pattern = r"([^,]+)"
matches = re.findall(pattern,string)

print(matches)

In Perl (and many other compatible systems)

use warnings;
use strict;
use feature 'say';

my $string = 'HELLO,THERE,WORLD';

my @matches = $string =~ /([^,]+)/g;

say "@matches";

(In this specific example the capturing () in fact aren't needed since we collect everything that is matched. But they don't hurt and in general they are needed.)

The problem with the attempted code, as discussed, is that there is one capture group matching repeatedly so in the end only the last match can be kept.

Instead, instruct the regex to match (and capture) all pattern instances in the string, what can be done in any regex implementation (language). So come up with the regex pattern for this.

The defining property of the shown sample data is that the patterns of interest are separated by commas so we can match anything-but-a-comma, using a negated character class

[^,]+

and match (capture) globally, to get all matches in the string.

If your pattern need be more restrictive then adjust the exclusion list. For example, to capture words separated by any of the listed punctuation

[^,.!-]+

This extracts all words from hi,there-again!, without the punctuation. (The - itself should be given first or last in a character class, unless it's used in a range like a-z or 0-9.)

In Python

import re

string = "HELLO,THERE,WORLD"

pattern = r"([^,]+)"
matches = re.findall(pattern,string)

print(matches)

In Perl (and many other compatible systems)

use warnings;
use strict;
use feature 'say';

my $string = 'HELLO,THERE,WORLD';

my @matches = $string =~ /([^,]+)/g;

say "@matches";

(In this specific example the capturing () in fact aren't needed since we collect everything that is matched. But they don't hurt and in general they are needed.)

deleted 21 characters in body

Source Link

edited Aug 8, 2022 at 18:09

zdim

66.5k
5
56
85

Loading

Source Link

created Mar 8, 2022 at 18:07

zdim

66.5k
5
56
85

Loading

Collectives™ on Stack Overflow

Return to Answer