Zum Inhalt springen

Regex

Aus Wikipedia
S Suachagebnis griagd ma midn Regex
(?<=\.) {2,}(?=[A-Z])
Zwoa Laazoachn miassn gfundn wean, owa nua wens noch an Punkt (.) afdredn und voa an groussn Buachstom.

A Regex oda Regular Expression (boarisch: Regulära Ausdruck) is a Sequenz vo Zoachn, wo a Suachmusta definiad.

Regex wean in da Softwareentwicklung vawendd owa aa in Texteditorn, wo s zan Suacha und Dasetzn vo Zoachnkeedn vawendd wean. So konst in ana Wikipedia olle Weata aussasuacha, wo mit A oofanga und mid -bichl afhean. Do is wuascht, wejchane Zoachn dazwischn liegn. Sowos geht nua mid an Regex.

D Syntax vo d Regex variiad a wengal zwischn vaschiednan Apps.

Oafoche Regex

[Werkeln | Am Gwëntext werkeln]
Operator Effekt
. Da Punktoperator driffd jeds Zoachn.
[ ] A Box (Kostn) dameglicht s Findn vo oanzlna Zoachn in an Text oda in ana Zoachnkeedn.
[^ ] A Complement Box (Gengdoalskostn) dameglicht, dass oanzlne Zoachn in an Text oda ana Zoachnkeedn ned gfundn wean.
^ A Caret Anchor (Zoachnanka) driffd en Ofang vo ana Zein (oda jeda Zein im Multiline Mode)
$ A Dollar Anchor(Dollaranka) driffd s End vo ana Zein (oda jeda Zein im Multiline Mode)
( ) Runde Klamman (parentheses) defininan an markiadn Untaausdruck (marked subexpression). Dea gfundaned Textowschnidd ko spada wieda owgruafa wean.
\n n is a Ziffa vo 1 to 9; driffd wos da nte markiade Untaausdruck driffd. Den Operator gibts ned in da daweitadn Regex-Syntax.
* A oanzlns Zoachn gfoigt vo "*" driffd Nui oda meah Kopien vo dem Ausdruck. Beispuisweis, "ab*c" driffd "ac", "abc", "abbbc" etc. "[xyz]*" driffd "", "x", "y", "zx", "zyx", und so weida.
  • \n*, where n is a digit from 1 to 9, matches zero or more iterations of what the nth marked subexpression matched. For example, "(a.)c\1*" matches "abcab" and "abcabab" but not "abcac".
  • A Ausdruck wo vo "\(" and "\)" eihgschlossn is, gfoigt vo an "*" guit ois invalid.
  • "^[MH]uad"
    • Driffd Muad und Huad owa nua am Ofang vo ana Zein.
  • "[MH]uad$"
    • Driffd Muad und Huad owa nua am End vo ana Zein.
[egh] oans vo d Zoachn „e“, „g“ oder „h“
[0-6] a Ziffa vo „0“ bis „6“ (Bindestriich gem an Bereich oo)
[A-Za-z0-9] a beliabiga lateinischa Buachstob oda a beliabige Ziffa
[^a] a beliabigs Zoachn aussa „a“ („^“ voa ana Zoachnklass moant Negation)
[-A-Z], [A-Z-] (bzw. [A-Z\-a-z], owa ned noch POSIX) D Auswoi enthoid aa en Bindestrich „-“

Es gibt Zoachnklassn, wo fiadefiniat san. Des wead owa ned in oin Implementiarunga glei untastitzt. Zoachnklassn san beispuisweis:

\d digit a Ziffa, oiso [0-9] (und evtl. aa no weidane Zoizoachn, wia Unicode usw.)
\D no digit a Zoachn, wo koa Ziffa is, oiso [^\d]
\w word character a Buachstob, a Ziffa oda a Untastrich, oiso [a-zA-Z_0-9] (und evtl. aa no ned-lateinische Buachstom, z. B. Umlaut)
\W no word character a Zoachn, wo weda Buachstob Zoi no Untastrich is, oiso [^\w]
\s whitespace moast mindast s Laazoachn und d Klass vo d Steiazoachn \f, \n, \r, \t und \v
\S no whitespace a Zoachn, wo koa Whitespace is, oiso [^\s]

Zoachnklassn noch POSIX-Standard

[Werkeln | Am Gwëntext werkeln]
POSIX Ned-Standard Perl/Tcl Vim Java ASCII Bschrieb
[:ascii:][1] \p{ASCII} [\x00-\x7F] ASCII characters (ASCII Zoachn)
[:alnum:] \p{Alnum} [A-Za-z0-9] Alphanumeric characters (alphanumerische Zoachn)
[:word:][1] \w \w \w [A-Za-z0-9_] Alphanumeric characters plus "_" (alphanum. Zoachn plus "_")
\W \W \W [^A-Za-z0-9_] Non-word characters (Ned-Woat Zoachn)
[:alpha:] \a \p{Alpha} [A-Za-z] Alphabetic characters (Buachstom)
[:blank:] \s \p{Blank} [ [[\t]]] Space and tab (Laazoachn und Tabs)
\b \< \> \b (?<=\W)(?=\w)|(?<=\w)(?=\W) Word boundaries (Woatgrenzn)
\B (?<=\W)(?=\W)|(?<=\w)(?=\w) Non-word boundaries (Ned-Woat-Grenzn)
[:cntrl:] \p{Cntrl} [\x00-\x1F\x7F] Control characters (Steiazoachn)
[:digit:] \d \d \p{Digit} or \d [0-9] Digits (Ziffan)
\D \D \D [^0-9] Non-digits (Ned-Ziffan)
[:graph:] \p{Graph} [\x21-\x7E] Visible characters (Sichtbore Zoachn)
[:lower:] \l \p{Lower} [a-z] Lowercase letters (kloane Buachstom)
[:print:] \p \p{Print} [\x20-\x7E] Visible characters and the space character (Sichtbore Zoachn & Laazoachn)
[:punct:] \p{Punct} [][!"#$%&'()*+,./:;<=>?@\^_`{|}~-] Punctuation characters (Zoachnsetzung bzw. Interpunktion)
[:space:] \s \_s \p{Space} or \s [ \t\r\n\v\f] Whitespace characters (Laazoachn)
\S \S \S [^ \t\r\n\v\f] Non-whitespace characters (Ned-Laazoachn)
[:upper:] \u \p{Upper} [A-Z] Uppercase letters (grousse Buachstom)
[:xdigit:] \x \p{XDigit} [A-Fa-f0-9] Hexadecimal digits (hexadezimale Zoachn)

Quantifier (Quantifiziara oda Wiedahoifaktorn) legn fest, wia oft a Ausdruck, oiso a vurigs Zoachn bzw. a vurige Zoachnkeedn zuaglossn is.

? Da vurige Ausdruck is optionai, ea ko fiakema, braucht owa ned. Des hoasst, da Ausdruck kimmt nui- oda oamoi fia. (Des entspricht {0,1})
+ Da vurige Ausdruck muass mindastns oamoi fiakema, deaf owa aa efta fiakema. (Des is aa {1,})
* Da vurige Ausdruck deaf beliabi oft (aa koamoi) fiakema. (Des is aa {0,})
{n} Da vurige Ausdruck muass exakt n-moi fiakema. (Des is aa {n,n})
{min,} Da vurige Ausdruck muass mindastens min-moi fiakema.
{min,max} Da vurige Ausdruck muass mindastens min-moi und deaf maximai max-moi fiakema.
{0,max} Da vurige Ausdruck deaf maximai max-moi fiakema.
  • a+ is „a“ owa aa „aaaa“
  • [0-9]+ is „0123456789“ owa aa „072345“
  • [ab]+ is „a“, „b“, „aa“, „bbaab“ usw.
  • [0-9]{2,5} is mindastns zwoa und maximai 5 Ziffan, z. B. „91“ oder „63091“

Praktische Beispui

[Werkeln | Am Gwëntext werkeln]
Operator Bschrieb Beispui
. Driffd normai jeds Zoachn auss a neie Zein.
In eckadn Klamman is da Punkt weatle gmoant.
$string1 = "Hello World\n";
if ($string1 =~ m/...../) {
  print "$string1 has length >= 5.\n";
}

Output:

Hello World
 has length >= 5.
( ) Grupiad Zoachn za oan Element.
Wen a Ausdruck in rundn Klamman gfunden wead, ko spada duach $1, $2, ... draf zuagriffa wean.
$string1 = "Hello World\n";
if ($string1 =~ m/(H..).(o..)/) {
  print "We matched '$1' and '$2'.\n";
}

Output:

We matched 'Hel' and 'o W'.
+ Driffd as voaherige Zoachn oamoi oda meahmois.
$string1 = "Hello World\n";
if ($string1 =~ m/l+/) {
  print "There are one or more consecutive letter \"l\"'s in $string1.\n";
}

Output:

There are one or more consecutive letter "l"'s in Hello World.
? Driffd as voaherige Zoachn nuimoi oda oamoi.
$string1 = "Hello World\n";
if ($string1 =~ m/H.?e/) {
  print "There is an 'H' and a 'e' separated by ";
  print "0-1 characters (e.g., He Hue Hee).\n";
}

Output:

There is an 'H' and a 'e' separated by 0-1 characters (e.g., He Hue Hee).
? Modifiziad an *, +, ? or {M,N} Regex, wo voahea kimmt, so dass a meglichst sejtn gfundn wead (non-greedy match).
$string1 = "Hello World\n";
if ($string1 =~ m/(l.+?o)/) {
  print "The non-greedy match with 'l' followed by one or\n";
  print "more characters is 'llo' rather than 'llo Wo'.\n";
}

Output:

The non-greedy match with 'l' followed by one or
more characters is 'llo' rather than 'llo Wo'.
* Driffd as voaherige Zoachn nuimoi oda meahmois.
$string1 = "Hello World\n";
if ($string1 =~ m/el*o/) {
  print "There is an 'e' followed by zero to many ";
  print "'l' followed by 'o' (e.g., eo, elo, ello, elllo).\n";
}

Output:

There is an 'e' followed by zero to many 'l' followed by 'o' (e.g., eo, elo, ello, elllo).
{M,N} Definiad a Minimum M und a Maximum N vo Zoachn-Iwaeihstimmunga (match count).
N ko ausglossn wean und M ko 0 sei: {M} driffd "genau" M moi; {M,} driffd "zmindast" M moi; {0,N} driffd "hextns" N moi.
x* y+ z? is so equivalent za x{0,} y{1,} z{0,1}.
$string1 = "Hello World\n";
if ($string1 =~ m/l{1,2}/) {
  print "There exists a substring with at least 1 ";
  print "and at most 2 l's in $string1\n";
}

Output:

There exists a substring with at least 1 and at most 2 l's in Hello World
[…] Definiad a Reih vo meglichn Zoachn-Iwaeihstimmunga.
$string1 = "Hello World\n";
if ($string1 =~ m/[aeiou]+/) {
  print "$string1 contains one or more vowels.\n";
}

Output:

Hello World
 contains one or more vowels.
| Separiad oitanative Meglikeidn.
$string1 = "Hello World\n";
if ($string1 =~ m/(Hello|Hi|Pogo)/) {
  print "$string1 contains at least one of Hello, Hi, or Pogo.";
}

Output:

Hello World
 contains at least one of Hello, Hi, or Pogo.
\b Driffd a Nuibroadngrenz (zero-width boundary) zwischn am Zoachn vo da Woatklass (schaug untn) und entweda am Zoachn vo da Ned-Woatklass oder ana Kantn; säim wia

(^\w|\w$|\W\w|\w\W).

$string1 = "Hello World\n";
if ($string1 =~ m/llo\b/) {
  print "There is a word that ends with 'llo'.\n";
}

Output:

There is a word that ends with 'llo'.
\w Driffd a alphanumerisches Zoachn, eihschliassle "_";
säim wia [A-Za-z0-9_] in ASCII, und
[\p{Alphabetic}\p{GC=Mark}\p{GC=Decimal_Number}\p{GC=Connector_Punctuation}]

in Unicode, wo Alphabetic mehra ois wia lateinische Buachstom moant und Decimal_Number mehra ois wia arabische Ziffan moant.

$string1 = "Hello World\n";
if ($string1 =~ m/\w/) {
  print "There is at least one alphanumeric ";
  print "character in $string1 (A-Z, a-z, 0-9, _).\n";
}

Output:

There is at least one alphanumeric character in Hello World
 (A-Z, a-z, 0-9, _).
\W Driffd a ned-alphanumerisches Zoachn, ausschliassle "_";
same as [^A-Za-z0-9_] in ASCII, und
[^\p{Alphabetic}\p{GC=Mark}\p{GC=Decimal_Number}\p{GC=Connector_Punctuation}]

in Unicode.

$string1 = "Hello World\n";
if ($string1 =~ m/\W/) {
  print "The space between Hello and ";
  print "World is not alphanumeric.\n";
}

Output:

The space between Hello and World is not alphanumeric.
\s Driffd a Laazoachn,
wo in ASCII a Tab(ulator), a Zeinfiaschub, a Seitnfiaschub, Wognrucklaf und a Laazoachn san; in Unicode stimmts aa mid Laazoachn ohne Untabrechung, vo da naxtn Zein und dena Laazoachn mid variabla Broadn (unta andam) iwaeih.
$string1 = "Hello World\n";
if ($string1 =~ m/\s.*\s/) {
  print "In $string1 there are TWO whitespace characters, which may";
  print " be separated by other characters.\n";
}

Output:

In Hello World
 there are TWO whitespace characters, which may be separated by other characters.
\S Driffd ois NUA KOA Laazoachn.
$string1 = "Hello World\n";
if ($string1 =~ m/\S.*\S/) {
  print "In $string1 there are TWO non-whitespace characters, which";
  print " may be separated by other characters.\n";
}

Output:

In Hello World
 there are TWO non-whitespace characters, which may be separated by other characters.
\d Driffd a Ziffa;
säim ois wia [0-9] in ASCII;
in Unicode, säim ois wia \p{Digit} or \p{GC=Decimal_Number}, wo a säim is ois wia \p{Numeric_Type=Decimal}.
$string1 = "99 bottles of beer on the wall.";
if ($string1 =~ m/(\d+)/) {
  print "$1 is the first number in '$string1'\n";
}

Output:

99 is the first number in '99 bottles of beer on the wall.'
\D Drifft a Ned-Ziffa;
säim ois wia [^0-9] in ASCII oda \P{Digit} in Unicode.
$string1 = "Hello World\n";
if ($string1 =~ m/\D/) {
  print "There is at least one character in $string1";
  print " that is not a digit.\n";
}

Output:

There is at least one character in Hello World
 that is not a digit.
^ Matches the beginning of a line or string.
$string1 = "Hello World\n";
if ($string1 =~ m/^He/) {
  print "$string1 starts with the characters 'He'.\n";
}

Output:

Hello World
 starts with the characters 'He'.
$ Matches the end of a line or string.
$string1 = "Hello World\n";
if ($string1 =~ m/rld$/) {
  print "$string1 is a line or string ";
  print "that ends with 'rld'.\n";
}

Output:

Hello World
 is a line or string that ends with 'rld'.
\A Matches the beginning of a string (but not an internal line).
$string1 = "Hello\nWorld\n";
if ($string1 =~ m/\AH/) {
  print "$string1 is a string ";
  print "that starts with 'H'.\n";
}

Output:

Hello
World
 is a string that starts with 'H'.
\z Matches the end of a string (but not an internal line).[2]
$string1 = "Hello\nWorld\n";
if ($string1 =~ m/d\n\z/) {
  print "$string1 is a string ";
  print "that ends with 'd\\n'.\n";
}

Output:

Hello
World
 is a string that ends with 'd\n'.
[^…] Matches every character except the ones inside brackets.
$string1 = "Hello World\n";
if ($string1 =~ m/[^abc]/) {
 print "$string1 contains a character other than ";
 print "a, b, and c.\n";
}

Output:

Hello World
 contains a character other than a, b, and c.
  1. 1,0 1,1 33.3.1.2 Character Classes — Emacs lisp manual — Version 25.1. In: gnu.org. 2016. Abgerufen am 13. Aprü 2017.
  2. Damian Conway: Regular Expressions, End of String. In: Perl Best Practices, S. 240, O'Reilly 2005, ISBN 978-0-596-00173-5