234

I want to extract substrings from a string that match a regex pattern.

So I'm looking for something like this:

func matchesForRegexInText(regex: String!, text: String!) -> [String] {
   ???
}

So this is what I have:

func matchesForRegexInText(regex: String!, text: String!) -> [String] {

    var regex = NSRegularExpression(pattern: regex, 
        options: nil, error: nil)

    var results = regex.matchesInString(text, 
        options: nil, range: NSMakeRange(0, countElements(text))) 
            as Array<NSTextCheckingResult>

    /// ???

    return ...
}

The problem is, that matchesInString delivers me an array of NSTextCheckingResult, where NSTextCheckingResult.range is of type NSRange.

NSRange is incompatible with Range<String.Index>, so it prevents me of using text.substringWithRange(...)

Any idea how to achieve this simple thing in swift without too many lines of code?

0

16 Answers 16

385

Even if the matchesInString() method takes a String as the first argument, it works internally with NSString, and the range parameter must be given using the NSString length and not as the Swift string length. Otherwise it will fail for "extended grapheme clusters" such as "flags".

As of Swift 4 (Xcode 9), the Swift standard library provides functions to convert between Range<String.Index> and NSRange.

func matches(for regex: String, in text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex)
        let results = regex.matches(in: text,
                                    range: NSRange(text.startIndex..., in: text))
        return results.map {
            String(text[Range($0.range, in: text)!])
        }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

Example:

let string = "🇩🇪€4€9"
let matched = matches(for: "[0-9]", in: string)
print(matched)
// ["4", "9"]

Note: The forced unwrap Range($0.range, in: text)! is safe because the NSRange refers to a substring of the given string text. However, if you want to avoid it then use

        return results.flatMap {
            Range($0.range, in: text).map { String(text[$0]) }
        }

instead.


(Older answer for Swift 3 and earlier:)

So you should convert the given Swift string to an NSString and then extract the ranges. The result will be converted to a Swift string array automatically.

(The code for Swift 1.2 can be found in the edit history.)

Swift 2 (Xcode 7.3.1) :

func matchesForRegexInText(regex: String, text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex, options: [])
        let nsString = text as NSString
        let results = regex.matchesInString(text,
                                            options: [], range: NSMakeRange(0, nsString.length))
        return results.map { nsString.substringWithRange($0.range)}
    } catch let error as NSError {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

Example:

let string = "🇩🇪€4€9"
let matches = matchesForRegexInText("[0-9]", text: string)
print(matches)
// ["4", "9"]

Swift 3 (Xcode 8)

func matches(for regex: String, in text: String) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex)
        let nsString = text as NSString
        let results = regex.matches(in: text, range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range)}
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

Example:

let string = "🇩🇪€4€9"
let matched = matches(for: "[0-9]", in: string)
print(matched)
// ["4", "9"]
22
  • 12
    You saved me from becoming insane. Not kidding. Thank you so much!
    – mitchkman
    Commented Jan 10, 2015 at 20:27
  • 1
    @MathijsSegers: I have updated the code for Swift 1.2/Xcode 6.3. Thanks for letting me know!
    – Martin R
    Commented Apr 16, 2015 at 13:01
  • 1
    but what if i want to search for strings between a tag? I need the same result (match information) like: regex101.com/r/cU6jX8/2. which regex pattern would you suggest? Commented Aug 18, 2015 at 21:09
  • The update is for Swift 1.2, not Swift 2. The code doesn't compile with Swift 2.
    – PatrickNLT
    Commented Sep 13, 2015 at 19:17
  • 1
    Thanks! What if you only want to extract what's actually between () in the regex? For example, in "[0-9]{3}([0-9]{6})" I'd only want to get the last 6 numbers.
    – p4bloch
    Commented Sep 23, 2015 at 23:01
75

My answer builds on top of given answers but makes regex matching more robust by adding additional support:

  • Returns not only matches but returns also all capturing groups for each match (see examples below)
  • Instead of returning an empty array, this solution supports optional matches
  • Avoids do/catch by not printing to the console and makes use of the guard construct
  • Adds matchingStrings as an extension to String

Swift 4.2

//: Playground - noun: a place where people can play

import Foundation

extension String {
    func matchingStrings(regex: String) -> [[String]] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: []) else { return [] }
        let nsString = self as NSString
        let results  = regex.matches(in: self, options: [], range: NSMakeRange(0, nsString.length))
        return results.map { result in
            (0..<result.numberOfRanges).map {
                result.range(at: $0).location != NSNotFound
                    ? nsString.substring(with: result.range(at: $0))
                    : ""
            }
        }
    }
}

"prefix12 aaa3 prefix45".matchingStrings(regex: "fix([0-9])([0-9])")
// Prints: [["fix12", "1", "2"], ["fix45", "4", "5"]]

"prefix12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["prefix12", "12"]]

"12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["12", "12"]], other answers return an empty array here

// Safely accessing the capture of the first match (if any):
let number = "prefix12suffix".matchingStrings(regex: "fix([0-9]+)su").first?[1]
// Prints: Optional("12")

Swift 3

//: Playground - noun: a place where people can play

import Foundation

extension String {
    func matchingStrings(regex: String) -> [[String]] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: []) else { return [] }
        let nsString = self as NSString
        let results  = regex.matches(in: self, options: [], range: NSMakeRange(0, nsString.length))
        return results.map { result in
            (0..<result.numberOfRanges).map {
                result.rangeAt($0).location != NSNotFound
                    ? nsString.substring(with: result.rangeAt($0))
                    : ""
            }
        }
    }
}

"prefix12 aaa3 prefix45".matchingStrings(regex: "fix([0-9])([0-9])")
// Prints: [["fix12", "1", "2"], ["fix45", "4", "5"]]

"prefix12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["prefix12", "12"]]

"12".matchingStrings(regex: "(?:prefix)?([0-9]+)")
// Prints: [["12", "12"]], other answers return an empty array here

// Safely accessing the capture of the first match (if any):
let number = "prefix12suffix".matchingStrings(regex: "fix([0-9]+)su").first?[1]
// Prints: Optional("12")

Swift 2

extension String {
    func matchingStrings(regex: String) -> [[String]] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: []) else { return [] }
        let nsString = self as NSString
        let results  = regex.matchesInString(self, options: [], range: NSMakeRange(0, nsString.length))
        return results.map { result in
            (0..<result.numberOfRanges).map {
                result.rangeAtIndex($0).location != NSNotFound
                    ? nsString.substringWithRange(result.rangeAtIndex($0))
                    : ""
            }
        }
    }
}
8
  • 1
    Good idea about the capture groups. But why is "guard" Swiftier than "do/catch"??
    – Martin R
    Commented Oct 17, 2016 at 5:27
  • I agree with people such as nshipster.com/guard-and-defer who say Swift 2.0 certainly seems to be encouraging a style of early return [...] rather than nested if statements. The same holds true for nested do/catch statements IMHO. Commented Oct 17, 2016 at 8:47
  • try/catch is the native error handling in Swift. try? can be used if you are only interested in the outcome of the call, not in a possible error message. So yes, guard try? .. is fine, but if you want to print the error then you need a do-block. Both ways are Swifty.
    – Martin R
    Commented Oct 17, 2016 at 8:58
  • 3
    I have added unittests to your nice snippet, gist.github.com/neoneye/03cbb26778539ba5eb609d16200e4522
    – neoneye
    Commented Nov 28, 2016 at 13:22
  • 2
    Was about to write my own based on the @MartinR answer until i saw this. Thanks!
    – Oritm
    Commented May 11, 2017 at 22:30
39
+100

The fastest way to return all matches and capture groups in Swift 5

extension String {
    func match(_ regex: String) -> [[String]] {
        let nsString = self as NSString
        return (try? NSRegularExpression(pattern: regex, options: []))?.matches(in: self, options: [], range: NSMakeRange(0, nsString.length)).map { match in
            (0..<match.numberOfRanges).map { match.range(at: $0).location == NSNotFound ? "" : nsString.substring(with: match.range(at: $0)) }
        } ?? []
    }
}

Returns a 2-dimentional array of strings:

"prefix12suffix fix1su".match("fix([0-9]+)su")

returns...

[["fix12su", "12"], ["fix1su", "1"]]

// First element of sub-array is the match
// All subsequent elements are the capture groups
3
  • is options: [] really required?
    – Higgs
    Commented Feb 5, 2021 at 11:40
  • How do we know this is the fastest way to do it? Commented Oct 10, 2023 at 3:43
  • Fantastic canned solution if not yet on iOS16
    – Fattie
    Commented Mar 8 at 20:24
15

If you want to extract substrings from a String, not just the position, (but the actual String including emojis). Then, the following maybe a simpler solution.

extension String {
  func regex (pattern: String) -> [String] {
    do {
      let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpressionOptions(rawValue: 0))
      let nsstr = self as NSString
      let all = NSRange(location: 0, length: nsstr.length)
      var matches : [String] = [String]()
      regex.enumerateMatchesInString(self, options: NSMatchingOptions(rawValue: 0), range: all) {
        (result : NSTextCheckingResult?, _, _) in
        if let r = result {
          let result = nsstr.substringWithRange(r.range) as String
          matches.append(result)
        }
      }
      return matches
    } catch {
      return [String]()
    }
  }
} 

Example Usage:

"someText 👿🏅👿⚽️ pig".regex("👿⚽️")

Will return the following:

["👿⚽️"]

Note using "\w+" may produce an unexpected ""

"someText 👿🏅👿⚽️ pig".regex("\\w+")

Will return this String array

["someText", "️", "pig"]
3
  • 1
    This is what I wanted
    – Kyle KIM
    Commented Feb 3, 2016 at 13:10
  • 1
    Nice! It needs a little adjustment for Swift 3, but it's great.
    – Jelle
    Commented Dec 11, 2016 at 9:19
  • @Jelle what is the adjustment it needs? I'm using swift 5.1.3 Commented Mar 16, 2020 at 6:15
13

Update for iOS 16: Regex, RegexBuilder 👷‍♀️

Xcode previously supported Regex with Apple's NSRegularExpression. The Swift API was verbose and challenging to use correctly, so Apple released Regex Literal support and RegexBuilder this year. The Regex flavor used by Regex types is the same as NSRegularExpression, i.e. the ICU Unicode specification.

The API has been simplified going forward to tidy up complex String range-based parsing logic in iOS 16 / macOS 13 as well as improve performance.

Another advantage of using literals is that we get compile time errors in case we use invalid RegEx syntax: Cannot parse regular expression... with a clear description of the RegEx error. Enjoy!

RegEx literals in Swift 5.7

func parseLine(_ line: Substring) throws -> MailmapEntry {

    let regex = /\h*([^<#]+?)??\h*<([^>#]+)>\h*(?:#|\Z)/

    guard let match = line.prefixMatch(of: regex) else {
        throw MailmapError.badLine
    }

    return MailmapEntry(name: match.1, email: match.2)
}

We are able to match using:

  1. firstMatch(of:): Returns the first match for the regex within this collection, where the regex is created by the given closure (RegEx literal).

  2. prefixMatch(of:): Returns a match if this string is matched by the given regex at its start.

  3. wholeMatch(of:): Matches a regex in its entirety, where the regex is created by the given closure (RegEx literal).

  4. matches(of:): Returns a collection containing all non-overlapping matches of the regex, created by the given closure (RegEx literal).

I've linked to the docs above. The new RegEx literal syntax has multiple new APIs such as trimmingPrefix(), contains() and more, so I do encourage exploring the docs further for more nuanced use cases.

There is equivalent syntax of the above methods where we call prefixMatch(in:) on the Regex literal itself and pass in the string to search in. I prefer the syntax above however choose whichever you prefer.

Example code:

let aOrB = /[ab]+/

if let stringMatch = try aOrB.firstMatch(in: "The year is 2022; last year was 2021.") {
    print(stringMatch.0)
} else {
    print("No match.")
}
// prints "a"

RegexBuilder in Swift 5.7

RegexBuilder is a new API released by Apple aimed at making RegEx code easier to write in Swift. We can translate the Regex literal /\h*([^<#]+?)??\h*<([^>#]+)>\h*(?:#|\Z)/ from above into a more declarative form using RegexBuilder if we want more readability.

Do note that we can use raw strings in a RegexBuilder and also interleave Regex Literals in the builder if we want to balance readability with conciseness.

import RegexBuilder

let regex = Regex {
    ZeroOrMore(.horizontalWhitespace)
    Optionally {
        Capture(OneOrMore(.noneOf("<#")))
    }
        .repetitionBehavior(.reluctant)
    ZeroOrMore(.horizontalWhitespace)
    "<"
    Capture(OneOrMore(.noneOf(">#")))
    ">"
    ZeroOrMore(.horizontalWhitespace)
    /#|\Z/
}

The RegEx literal /#|\Z/ is equivalent to:

ChoiceOf {
   "#"
   Anchor.endOfSubjectBeforeNewline
}

Composable RegexComponent

RegexBuilder syntax is similar to SwiftUI also in terms of composability because we can reuse RegexComponents within other RegexComponents:

struct MailmapLine: RegexComponent {
    @RegexComponentBuilder
    var regex: Regex<(Substring, Substring?, Substring)> {
        ZeroOrMore(.horizontalWhitespace)
        Optionally {
            Capture(OneOrMore(.noneOf("<#")))
        }
            .repetitionBehavior(.reluctant)
        ZeroOrMore(.horizontalWhitespace)
        "<"
        Capture(OneOrMore(.noneOf(">#")))
        ">"
        ZeroOrMore(.horizontalWhitespace)
        ChoiceOf {
           "#"
            Anchor.endOfSubjectBeforeNewline
        }
    }
}

Source: Some of this code is taken from the WWDC 2022 video "What's new in Swift".

0
12

I found that the accepted answer's solution unfortunately does not compile on Swift 3 for Linux. Here's a modified version, then, that does:

import Foundation

func matches(for regex: String, in text: String) -> [String] {
    do {
        let regex = try RegularExpression(pattern: regex, options: [])
        let nsString = NSString(string: text)
        let results = regex.matches(in: text, options: [], range: NSRange(location: 0, length: nsString.length))
        return results.map { nsString.substring(with: $0.range) }
    } catch let error {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

The main differences are:

  1. Swift on Linux seems to require dropping the NS prefix on Foundation objects for which there is no Swift-native equivalent. (See Swift evolution proposal #86.)

  2. Swift on Linux also requires specifying the options arguments for both the RegularExpression initialization and the matches method.

  3. For some reason, coercing a String into an NSString doesn't work in Swift on Linux but initializing a new NSString with a String as the source does work.

This version also works with Swift 3 on macOS / Xcode with the sole exception that you must use the name NSRegularExpression instead of RegularExpression.

7

Swift 4 without NSString.

extension String {
    func matches(regex: String) -> [String] {
        guard let regex = try? NSRegularExpression(pattern: regex, options: [.caseInsensitive]) else { return [] }
        let matches  = regex.matches(in: self, options: [], range: NSMakeRange(0, self.count))
        return matches.map { match in
            return String(self[Range(match.range, in: self)!])
        }
    }
}
1
  • 7
    Be careful with above solution: NSMakeRange(0, self.count) is not correct, because self is a String (=UTF8) and not an NSString (=UTF16). So the self.count is not necessarily the same as nsString.length (as used in other solutions). You can replace the range calculation with NSRange(self.startIndex..., in: self)
    – pd95
    Commented Jun 29, 2020 at 22:27
5

@p4bloch if you want to capture results from a series of capture parentheses, then you need to use the rangeAtIndex(index) method of NSTextCheckingResult, instead of range. Here's @MartinR 's method for Swift2 from above, adapted for capture parentheses. In the array that is returned, the first result [0] is the entire capture, and then individual capture groups begin from [1]. I commented out the map operation (so it's easier to see what I changed) and replaced it with nested loops.

func matches(for regex: String!, in text: String!) -> [String] {

    do {
        let regex = try NSRegularExpression(pattern: regex, options: [])
        let nsString = text as NSString
        let results = regex.matchesInString(text, options: [], range: NSMakeRange(0, nsString.length))
        var match = [String]()
        for result in results {
            for i in 0..<result.numberOfRanges {
                match.append(nsString.substringWithRange( result.rangeAtIndex(i) ))
            }
        }
        return match
        //return results.map { nsString.substringWithRange( $0.range )} //rangeAtIndex(0)
    } catch let error as NSError {
        print("invalid regex: \(error.localizedDescription)")
        return []
    }
}

An example use case might be, say you want to split a string of title year eg "Finding Dory 2016" you could do this:

print ( matches(for: "^(.+)\\s(\\d{4})" , in: "Finding Dory 2016"))
// ["Finding Dory 2016", "Finding Dory", "2016"]
2
  • This answer made my day. I spent 2 hours searching for a solution that can satisfy regualr expression with the additional capturing of groups.
    – Ahmad
    Commented Mar 9, 2018 at 22:55
  • This works but it will crash if any range is not found. I modified this code so that the function returns [String?] and in the for i in 0..<result.numberOfRanges block, you have to add a test that only appends the match if the range != NSNotFound, otherwise it should append nil. See: stackoverflow.com/a/31892241/2805570
    – stef
    Commented Jun 16, 2018 at 3:19
4

Most of the solutions above only give the full match as a result ignoring the capture groups e.g.: ^\d+\s+(\d+)

To get the capture group matches as expected you need something like (Swift4) :

public extension String {
    public func capturedGroups(withRegex pattern: String) -> [String] {
        var results = [String]()

        var regex: NSRegularExpression
        do {
            regex = try NSRegularExpression(pattern: pattern, options: [])
        } catch {
            return results
        }
        let matches = regex.matches(in: self, options: [], range: NSRange(location:0, length: self.count))

        guard let match = matches.first else { return results }

        let lastRangeIndex = match.numberOfRanges - 1
        guard lastRangeIndex >= 1 else { return results }

        for i in 1...lastRangeIndex {
            let capturedGroupIndex = match.range(at: i)
            let matchedString = (self as NSString).substring(with: capturedGroupIndex)
            results.append(matchedString)
        }

        return results
    }
}
2
  • This is great if you're wanting just the first result, to get each result it needs for index in 0..<matches.count { around let lastRange... results.append(matchedString)}
    – Geoff
    Commented Feb 20, 2018 at 16:41
  • the for clause should look like this: for i in 1...lastRangeIndex { let capturedGroupIndex = match.range(at: i) if capturedGroupIndex.location != NSNotFound { let matchedString = (self as NSString).substring(with: capturedGroupIndex) results.append(matchedString.trimmingCharacters(in: .whitespaces)) } }
    – DonBaron
    Commented Sep 17, 2018 at 11:36
2

This is how I did it, I hope it brings a new perspective how this works on Swift.

In this example below I will get the any string between []

var sample = "this is an [hello] amazing [world]"

var regex = NSRegularExpression(pattern: "\\[.+?\\]"
, options: NSRegularExpressionOptions.CaseInsensitive 
, error: nil)

var matches = regex?.matchesInString(sample, options: nil
, range: NSMakeRange(0, countElements(sample))) as Array<NSTextCheckingResult>

for match in matches {
   let r = (sample as NSString).substringWithRange(match.range)//cast to NSString is required to match range format.
    println("found= \(r)")
}
2

This is a very simple solution that returns an array of string with the matches

Swift 3.

internal func stringsMatching(regularExpressionPattern: String, options: NSRegularExpression.Options = []) -> [String] {
        guard let regex = try? NSRegularExpression(pattern: regularExpressionPattern, options: options) else {
            return []
        }

        let nsString = self as NSString
        let results = regex.matches(in: self, options: [], range: NSMakeRange(0, nsString.length))

        return results.map {
            nsString.substring(with: $0.range)
        }
    }
1
  • Be careful using NSMakeRange(0, self.count), because self is a String (=UTF8) and not an NSString (=UTF16). So the self.count is not necessarily the same as nsString.length (as used in other solutions). You can replace the range calculation with NSRange(self.startIndex..., in: self).
    – atereshkov
    Commented Sep 9, 2021 at 10:23
2

update @Mike Chirico's to Swift 5

extension String{



  func regex(pattern: String) -> [String]?{
    do {
        let regex = try NSRegularExpression(pattern: pattern, options: NSRegularExpression.Options(rawValue: 0))
        let all = NSRange(location: 0, length: count)
        var matches = [String]()
        regex.enumerateMatches(in: self, options: NSRegularExpression.MatchingOptions(rawValue: 0), range: all) {
            (result : NSTextCheckingResult?, _, _) in
              if let r = result {
                    let nsstr = self as NSString
                    let result = nsstr.substring(with: r.range) as String
                    matches.append(result)
              }
        }
        return matches
    } catch {
        return nil
    }
  }
}
2
+200

On iOS 16 there is new syntax that makes this way easier. For example, for anything within brackets in this string

let randomLog = "2493875469750,1678798470864,{latitude: 50, longitude: 43}"

if let match = randomLog.firstMatch(of: /\{.*\}/) {
    print(match.output)
}

This prints

"{"latitude": 50, "longitude": 43}"

In order to become a Swift Regex Pro, or just for more info, have a look at WWDC 2022: https://developer.apple.com/videos/play/wwdc2022/110357/

1
  • 1
    The new regex literal Swift syntax is part of Swift 5.7, not iOS 16. And other answers already cover this feature.
    – HangarRash
    Commented Mar 15, 2023 at 17:16
1

Big thanks to Lars Blumberg his answer for capturing groups and full matches with Swift 4, which helped me out a lot. I also made an addition to it for the people who do want an error.localizedDescription response when their regex is invalid:

extension String {
    func matchingStrings(regex: String) -> [[String]] {
        do {
            let regex = try NSRegularExpression(pattern: regex)
            let nsString = self as NSString
            let results  = regex.matches(in: self, options: [], range: NSMakeRange(0, nsString.length))
            return results.map { result in
                (0..<result.numberOfRanges).map {
                    result.range(at: $0).location != NSNotFound
                        ? nsString.substring(with: result.range(at: $0))
                        : ""
                }
            }
        } catch let error {
            print("invalid regex: \(error.localizedDescription)")
            return []
        }
    }
}

For me having the localizedDescription as error helped understand what went wrong with escaping, since it's displays which final regex swift tries to implement.

1

basic phone number matching

let phoneNumbers = ["+79990001101", "+7 (800) 000-11-02", "+34 507 574 147 ", "+1-202-555-0118"]

let match: (String) -> String = {
    $0.replacingOccurrences(of: #"[^\d+]"#, with: "", options: .regularExpression)
}

print(phoneNumbers.map(match))
// ["+79990001101", "+78000001102", "+34507574147", "+12025550118"]
1

You can use matching(regex:) on the string like:

let array = try "Your String To Search".matching(regex: ".")

using this simple extension:

public extension String {
    func matching(regex: String) throws -> [String] {
        let regex = try NSRegularExpression(pattern: regex)
        let results = regex.matches(in: self, range: NSRange(startIndex..., in: self))
        return results.map { String(self[Range($0.range, in: self)!]) }
    }
}

Not the answer you're looking for? Browse other questions tagged or ask your own question.