0

I have a function inside a macro using regular expression in Excel 2016 (VBA) that should remove all numbers from the text so that I basically end up with only alphabetic characters. The catch is that these numbers are not just digits, but they can also be Roman numerals (only including Roman numerals one through four, which is I, II, III and IV). As an example, take the following list of possible items:

Program Manager 3
Systems Engineer 3
Secretary III 12345
Consultant
IT Instructor 3
Computer Operations Manager 1
User Support Specialist 2
Engineering Tech II 12345
IT Instructor 2
Network Tech 3

My function uses the following VBA regular expression code to replace the digits and Roman numerals (I’m not worried about trimming or anything at this point):

Public Function RemoveNumbers(Txt As String) As String
    With CreateObject("VBScript.RegExp")
        .Global = True
        .IgnoreCase = True
        .Pattern = "[0-9]|\s[i]+|\s[iv]$"
        RemoveNumbers = .Replace(Txt, "")
    End With
End Function

Generally, that works OK except I’ve run into one problem. My RegEx incorrectly alters the phrase IT Instructor 2 and turns it into ITnstructor (because of the space and then the word Instructor, which starts with an I which is the same as Roman numeral one). I’ve tried finding the answer online and have tested many variations to get RegEx to exclude the phrase Instructor in the search but I can’t get it to work. Some of the patterns that I’ve tried to use include:

        .Pattern = "\b(!Instructor)\b|[0-9]|\s[i]+|\s[iv]$"

        .Pattern = "\b(!Instructor)\b\w+|[0-9]|\s[i]+|\s[iv]$"

        .Pattern = "(!Instructor\b)|[0-9]|\s[i]+|\s[iv]$"
...etc

And since I have to remove the Roman numeral one (I), I can’t use the following as a workaround:

        .Pattern = "[0-9]|\s[i]{2,}|\s[iv]$"

Is it possible to exclude a string (such as Instructor) from being part of the search using Excel 2016 VBA regular expressions? If so, can someone point me in the correct direction on how to exclude items during a VBA RegEx?

Thanks

3
  • I don't usually regex with VBA but something like [^(Instructor)] might work Commented Mar 5, 2018 at 17:36
  • What if you replace [0-9] with [:digit:]?
    – LPChip
    Commented Mar 5, 2018 at 17:53
  • So you also want to remove things like II ?? Commented Mar 5, 2018 at 18:21

1 Answer 1

1

I figured it out. The following syntax works for me (lots of trial and error):

    .Pattern = "\b(?!(?:Instructor)\b)(?:[0-9]+|\s[i]+|\s[iv]$)\b"

-- EDITED to add the details below --

I added an additional word (i.e., Info) to the RegEx exclusion:

"\b(?!(?:Info|Instructor)\b)(?:[0-9]+|\s[i]+|\s[iv]$)\b"

Details of the pattern:

  • \b - Set word boundary
  • (?!(?:Info|Instructor)\b) - Zero or one (?) exclusions (!) for the words that follow. Non-capturing group because we don't want them. The \b at the end is a word boundry
  • (?:[0-9]+|\s[i]+|\s[iv]$) - Match one or more 0 - 9 digits. Match a space (\s) followed by one or more i characters. Match a space followed by iv ($ says search at the end)
  • | = OR (used throughout)
  • \b - Trailing word boundary

-- EDITED because ultimately, this worked best for me --

        .Pattern = "\b(?:[0-9]+|\s[i]+|\s[iv]+$)\b"
6
  • Nice! Any chance you can break down that pattern? I was stuck at the same point OP was - how do you ignore "I" with a space before it, not not a roman numeral. I see you just went with looking for "Instructor", which is smarter than my way, as I'm still stuck.
    – BruceWayne
    Commented Mar 5, 2018 at 18:32
  • 1
    Hm, it seems this works too, unless I'm missing something? (?:[0-9]+|\s[i]+|\s[iv]$)\b
    – BruceWayne
    Commented Mar 5, 2018 at 18:33
  • 1
    @BruceWayne I figured it would be easier to just exclude an entire word versus trying to word with searching on "I". I edited the answer and added the breakdown of the expression.
    – SOSidb
    Commented Mar 5, 2018 at 19:32
  • 1
    @BruceWayne You were close with the above, the starting word boundry was needed and I had to add a + symbol to tell RegEx one or more after [iv]. The end product (thus far) for me is "\b(?:[0-9]+|\s[i]+|\s[iv]+$)\b" and it seems I didn't need the beginning exclusion portion after all.
    – SOSidb
    Commented Mar 6, 2018 at 14:33
  • How would I use this in excel ? Sorry new to VBA in excel , been searching and trying things for a few hours cant figure out how to implement this code .
    – justif
    Commented Mar 22, 2018 at 1:37

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .