1

I have a task where i need to parse C# scripts and look for a certain method attribute and extract parts from it, and i wonder if there is a more elegant way than how i do it:

[Info("Title", "Author", "5.2.5", ResourceId = 819)]

Here is what i do:

// foreach line in script
if (line.Contains("[Info(") && line.Contains("ResourceId"))
{
    var _attributes = line
        .Replace(" ", "")
        .Replace("\"", "")
        .Replace("[Info(", "")
        .Replace(")]", "")
        .Replace("ResourceId=", "")
        .Split(new string[] { "," }, StringSplitOptions.RemoveEmptyEntries);
        // Do stuff with _attributes[0] _attributes[1] etc..
        break;
}
3
  • 1
    Text parsing is best done with regular expressions. Creating a regular expression automatically from a string can be achieved with this site: txt2re.com Commented Apr 11, 2016 at 9:31
  • This should have gone to code review maybe?
    – Kixoka
    Commented Apr 11, 2016 at 9:40
  • This task is ideal for regular expressions. But I disagree with Siderite that 'Test parsing is best done with regular expressions'. People tend to use regular expressions when it is not appropriate. Always try to use simpler method() (like string methods) than regular expressions if possible. Regular expressions is slow and uses lots of memory.
    – jdweng
    Commented Apr 11, 2016 at 9:51

2 Answers 2

5

The easiest solution nowadays would be to use Roslyn. You can parse the code, find actual attributes (rather than things that look like the attribute you're looking for), and handle them all in a way that's C#-proper.

Here's a simple example:

var infoAttributes = CSharpSyntaxTree.ParseText(@"
namespace MyNamespace
{
    public class SomeClass
    {
        const string SomeConstant = ""Hi!"";

        [Info(""Some book"", ""Ray Brandenburg"", ""5.2.5"", ResourceId = 819)]
        public void SomeMethod()
        {

        }

        [InfoAttribute(SomeConstant, 42, ""Banana"")]
        public void SomeMethod2()
        {

        }

        // [Info(""Not going to happen"", ""Hilary Clinton"", ""1.2.0"")]
        public void SomeMethod3()
        {

        }
    }
}
")
.GetRoot()
.DescendantNodes()
.OfType<AttributeSyntax>()
.Where(i => i.Name.ToString() == "Info" || i.Name.ToString() == "InfoAttribute")
.Where
(
  i => 
    i.ArgumentList.Arguments.Count(j => j.NameEquals == null) == 3 
    && i.ArgumentList.Arguments[0].GetFirstToken().IsKind(SyntaxKind.StringLiteralToken)
    && i.ArgumentList.Arguments[1].GetFirstToken().IsKind(SyntaxKind.StringLiteralToken)
    && i.ArgumentList.Arguments[2].GetFirstToken().IsKind(SyntaxKind.StringLiteralToken)
)
.Select
(
  i =>
  new 
  {
    Title = (string)i.ArgumentList.Arguments[0].GetFirstToken().Value,
    Author = (string)i.ArgumentList.Arguments[1].GetFirstToken().Value,
    Version = (string)i.ArgumentList.Arguments[2].GetFirstToken().Value,
    ResourceId = 
      i.ArgumentList.Arguments
       .Where(j => j.NameEquals != null && j.NameEquals.Name.ToString() == "ResourceId")
       .Select(j => j.ChildNodes().Skip(1).First().GetFirstToken().Value.ToString())
       .FirstOrDefault()
  }
);

infoAttributes.Dump();

At this level, this is only doing parsing of the source code. To make things simpler, I added defensive clauses to only make this work with literal values - you'll probably want to turn those into warnings to be handled manually or something. The code correctly handles any trivia (e.g. whitespace), code that looks like attribute declaration but isn't, comments and plenty of other possible issues. There's still a simplifying assumption - the values must be literals (string or otherwise). The example will only find one Info attribute - the one on SomeMethod2 uses a constant and a different constructor overload, and the one on SomeMethod3 is commented out.

Another level is creating a compilation tree from this. That's a bit more involved, but allows you to make everything work as if it were real C# code - for example, the attribute on SomeMethod2 will resolve SomeConstant correctly. Of course, if you really want to be 100% correct, this requires gathering all the dependencies etc., which sounds like an overkill. Unless this is a real problem in your code, warnings should do fine for the outliers. If local constants are used often in your code, expanding the code to handle a local literal constant is still pretty easy.

As a disclaimer, this surely isn't the best way to do the parsing using Roslyn. It's just the first thing that came to mind and took just a while to get going. I'm still finding better ways of dealing with Roslyn pretty much every day :)

3
  • This is very interesting indeed, could you possibly provide an example of how to get the attribute and values? Commented Apr 11, 2016 at 9:48
  • @Dan-LeviTømta Added sample code. Note that it can be made more or less complex depending on your exact requirements - I went the cautious way for the most part, with a few simplifying assumptions.
    – Luaan
    Commented Apr 11, 2016 at 10:53
  • This is truely very interesting, i did not know of Roslyn, thank you for letting me know of it, i will definetively have this as a reference for later projects. Commented Apr 11, 2016 at 15:23
3

If for some reason what @Luaan suggests cannot be done, you can use an expression such as this: \[Info\("(.+?)", "(.+?)", "([\d.]+)", ResourceId\s*=\s*(\d+)\)\] to match and extract the values you are after.

An example is available here.

EDIT: As pointed out by @Evk, this expression will also match commented attributes. If this is not something which you are after, please let me know.

EDIT: As per your query, you would need to use something like so: \[Info\("(.+?)", "(.+?)", "?([\d.]+)"?, ResourceId\s*=\s*(\d+)\)\]. In this case, the quotation marks for the 3rd argument are followed by the ? character, which instructs the engine that the quotation marks might not be there. An example is available here.

6
  • Don't forget to handle comments (//, /* ... */) in a proper way. If you just use this regex - it will match all commented attributes too.
    – Evk
    Commented Apr 11, 2016 at 9:38
  • @Evk: Yeah but the OP uses .contains, and commented code does not seem to be an issue. I'll add a note just in case.
    – npinti
    Commented Apr 11, 2016 at 9:40
  • Well that is more comment to author to not forget about this, since he might have not realized they can be commented.
    – Evk
    Commented Apr 11, 2016 at 9:44
  • Thanks! What if by any reason say the third attribute value "5.2.5" differs to this 5.2.5. Would i try with a expression that matches this or could i make one regular expression that also take care of this? Sorry i am not that familiar with using regular expressions. Commented Apr 11, 2016 at 9:56
  • Im trying to give the last regular expression a go, i'm having difficulties gettings match. Should i not comment the slashes and quotes in the expression? Regex re = new Regex("\\[Info\\(\"(.+?)\", \"(.+?)\", \" ? ([\\d.] +)\"?, ResourceId\\s*=\\s*(\\d+)\\)\\]"); Commented Apr 11, 2016 at 13:51

Not the answer you're looking for? Browse other questions tagged or ask your own question.