6

I have to parse out the system name from a larger string. The system name has a prefix of "ABC" and then a number. Some examples are:

ABC500
ABC1100
ABC1300

the full string where i need to parse out the system name from can look like any of the items below:

ABC1100 - 2ppl
ABC1300
ABC 1300
ABC-1300
Managers Associates Only (ABC1100 - 2ppl)

before I saw the last one, i had this code that worked pretty well:

string[] trimmedStrings = jobTitle.Split(new char[] { '-', '–' },StringSplitOptions.RemoveEmptyEntries)
                           .Select(s => s.Trim())
                           .ToArray();

return trimmedStrings[0];

but it fails on the last example where there is a bunch of other text before the ABC.

Can anyone suggest a more elegant and future proof way of parsing out the system name here?

3
  • 8
    IMHO, Regex is the right way. you should form a regex that matches characters suffixed by numerics
    – Saravanan
    Commented May 20, 2013 at 13:56
  • 1
    The RegEx (?<=ABC)[0-9]+ should get you straight to the numeric part.
    – dash
    Commented May 20, 2013 at 13:59
  • You might want to check out A sscanf() Replacement for .NET. Commented May 20, 2013 at 14:21

4 Answers 4

7

One way to do this:

string[] strings =
{
    "ABC1100 - 2ppl",
    "ABC1300",
    "ABC 1300",
    "ABC-1300",
    "Managers Associates Only (ABC1100 - 2ppl)"
};

var reg = new Regex(@"ABC[\s,-]?[0-9]+");

var systemNames = strings.Select(line => reg.Match(line).Value);

systemNames.ToList().ForEach(Console.WriteLine);

prints:

ABC1100
ABC1300
ABC 1300
ABC-1300
ABC1100

demo

10
  • What does the * do? As opposed to the + used in Shaamaan's answer. Commented May 20, 2013 at 14:01
  • Plus, I don't think they ask anywhere in the question for just the numeric part. Commented May 20, 2013 at 14:02
  • * indicates any amount of decimals (including no decimals at all). + indicates one or more is required.
    – MBender
    Commented May 20, 2013 at 14:03
  • @AshBurlaczenko thanks for note. I really think whether * or + is required solely depends on the domain. Is ABC should be extracted or not?.. Commented May 20, 2013 at 14:04
  • @IlyaIvanov, I wasn't questioning your usage of *, I don't do Regex's so was asking so I could understand it better. Commented May 20, 2013 at 14:06
2

You really could leverage a Regex and get better results. This one should do the trick [A-Za-z]{3}\d+, and here is a Rubular to prove it. Then in the code use it like this:

var matches = Regex.Match(someInputString, @"[A-Za-z]{3}\d+");
if (matches.Success) {
    var val = matches.Value;
}
1

You can use a regular expression to parse this. There may be better expressions, but this one works for your case:

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication1
{
  class Program
  {
    static void Main(string[] args)
    {
      string txt="ABC500";

      string re1="((?:[a-z][a-z]+))";   
      string re2="(\\d+)"

      Regex r = new Regex(re1+re2,RegexOptions.IgnoreCase|RegexOptions.Singleline);
      Match m = r.Match(txt);
      if (m.Success)
      {
            String word1=m.Groups[1].ToString();
            String int1=m.Groups[2].ToString();
            Console.Write("("+word1.ToString()+")"+"("+int1.ToString()+")"+"\n");
      }
    }
  }
}
1

You should definitely use Regex for this. Depending on the exact nature of the system name, something like this could prove to be enough:

Regex systemNameRegex = new Regex(@"ABC[0-9]+");

If the ABC part of the name can change, you can modify the Regex to something like this:

Regex systemNameRegex = new Regex(@"[a-zA-Z]+[0-9]+");

Not the answer you're looking for? Browse other questions tagged or ask your own question.