5
\$\begingroup\$

I have a working C# (version 5) function that I use to match an input string to one of many unique regular expression patterns and return the replacement string associated with the matched pattern (via Regex.Replace). I've tested it well enough to know that the code acts as intended and is reliable.

One benefit of this approach is that it is readable (to me) and easy to edit any of the constant string variables. However, it is what I would consider to be "the long way."

Am I missing out on a more elegant technique that doesn't require the newing up of the seven Regex variables and a long if-else if-else block? If I need to add more patterns, then I'd be newing up more variables and adding to the if-else if-else block. I have not yet seen any better techniques in Pluralsight or in Stack Overflow (those were more concerned with fixing a specific expression and bug hunting).

The code block you see below was created in Visual Studio 2013 Ultimate as a SQL Database Project and published to a SQL Server Database (version 2014), so that the User Defined Function can be used in T-SQL Select queries. All this because T-SQL does not have true regular expression functionality as exists in C# NET.

Most of the space of this function is taken up by variable definition.

  • Two strings (pattern and replacement) -- at first empty, to be assigned at the end of the function.
  • Seven Regular Expression patterns (the search input must match to one-and-only-one pattern).
  • Seven replacement strings, where each replacement string is paired to a pattern.
  • Seven Regex variables

The behavior takes place in a long if-else if-else block. If the search matches any of the patterns, then the pattern and replacement variables are assigned.

Finally, the replacement string is returned (via the Regex.Replace(search, pattern, replacement) function).

Are there any better approaches?

using System.Data.SqlTypes;
using System.Text.RegularExpressions;

namespace CustomClrFunctions
{
    /// <summary>
    /// This set of CLR functions is published to the CustomClrFunctions Database to apply 
    /// true Regex match and replacement functionality, as T-SQL does not (yet) provide
    /// that feature.
    /// </summary>
    public partial class UserDefinedFunctions
    {
        /// <summary>
        /// This function replaces the OW format into ADE format for ELA standards.
        /// </summary>
        /// <param name="search">An ELA standard in OW format. Example: "LA.11-12.11-12.L.1.a"</param>
        /// <returns>The same ELA standard translated to ADE format.</returns>
        [Microsoft.SqlServer.Server.SqlFunction]
        public static SqlString RegexReplaceElaHs(SqlChars search)
        {
            /* Known patterns and replacements (to replace the search term) */
            const string pattern1 = @"LA\.11-12\.11-12\.(\w{1,4}).(\d{1,2})(\.\w)?";
            const string replacement1 = "LA.11-12.$1.$2$3";

            const string pattern2 = @"LA\.9-10\.9-10\.(\w{1,4}).(\d{1,2})(\.\w)?";
            const string replacement2 = "LA.9-10.$1.$2$3";

            const string pattern3 = @"LA\.K-12\.CCSS\.ELA-Literacy\.CCRA\.(\w{1,2})\.(\d{1,2})";
            const string replacement3 = "CCRA.$1.$2";

            const string pattern4 = @"LA\.11-12\.(\d{1,2})\.(\w{1,2})\.(\d{1,2})";
            const string replacement4 = "LA.$1.$2.$3";

            const string pattern5 = @"LA\.9-10\.(\d{1,2})\.(\w{1,2})\.(\d{1,2})";
            const string replacement5= "LA.$1.$2.$3";

            const string pattern6 = @"LA\.6-8\.6-8\.(\w{1,4}).(\d{1,2})(\.\w)?";
            const string replacement6 = "LA.6-8.$1.$2$3";

            const string pattern7 = @"LA\.8\.8\.(\w{1,2}).(\d)(\.\w)?";
            const string replacement7 = "LA.8.$1.$2$3";

            var regex1 = new Regex(pattern1);
            var regex2 = new Regex(pattern2);
            var regex3 = new Regex(pattern3);
            var regex4 = new Regex(pattern4);
            var regex5 = new Regex(pattern5);
            var regex6 = new Regex(pattern6);
            var regex7 = new Regex(pattern7);

            string pattern;
            string replacement;

            /*  The following if-else block assigns 
             *  values to "pattern" and "replacement" 
             *  depending on which pattern matches "search"
             */

            if (regex1.IsMatch(new string(search.Value)))
            {
                pattern = pattern1;
                replacement = replacement1;
            }
            else if (regex2.IsMatch(new string(search.Value)))
            {
                pattern = pattern2;
                replacement = replacement2;
            }
            else if (regex3.IsMatch(new string(search.Value)))
            {
                pattern = pattern3;
                replacement = replacement3;
            }
            else if (regex4.IsMatch(new string(search.Value)))
            {
                pattern = pattern4;
                replacement = replacement4;
            }
            else if (regex5.IsMatch(new string(search.Value)))
            {
                pattern = pattern5;
                replacement = replacement5;
            }
            else if (regex6.IsMatch(new string(search.Value)))
            {
                pattern = pattern6;
                replacement = replacement6;
            }
            else if (regex7.IsMatch(new string(search.Value)))
            {
                pattern = pattern7;
                replacement = replacement7;
            }
            else
            {
                pattern = string.Empty;
                replacement = string.Empty;
            }

            // This returns the transformation of the "search" value in ADE format.
            // replacement is a string replacement.
            return Regex.Replace(new string(search.Value), pattern, replacement);
        }
    }
}
\$\endgroup\$

2 Answers 2

3
\$\begingroup\$

The biggest and most obvious improvement you're missing out on is that regular expressions in .NET can be compiled (which can give a huge performance boost), and you can use readonly fields to make sure you don't recompile the Regex every time you call the method.

private static readonly Regex _pattern1 = new Regex(@"LA\.11-12\.11-12\.(\w{1,4}).(\d{1,2})(\.\w)?", RegexOptions.Compiled);

This could literally save tens of milliseconds per pattern, which means in your case it could save a lot of time. You should then create a local string for search.Value, to save more time and keep things readable, as well as some other miscellaneous tweaks. In the end you end up with:

private static readonly Regex _pattern1 = new Regex(@"LA\.11-12\.11-12\.(\w{1,4}).(\d{1,2})(\.\w)?", RegexOptions.Compiled);
private static readonly string _replacement1 = "LA.11-12.$1.$2$3";

// Remaining patterns

[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString RegexReplaceElaHs(SqlChars search)
{
    var value = new string(search.Value);

    if (_pattern1.IsMatch(value))
    {
        return _pattern1.Replace(value, _replacement1);
    }

    if (_pattern2.IsMatch(value))
    {
        return _pattern2.Replace(value, _replacement2);
    }

    // Remaining matches

    // Final return is just the default value, which is what yours does anyway
    return value;
}

Then, you could make a private static readonly Dictionary<Regex, replacement> _replacements = ... and loop it:

foreach (var entry in _replacements)
{
    if (entry.Key.IsMatch(value))
    {
        return entry.Replace(value, entry.Value);
    }
}

And ta-da, you eliminated many LoC and got a lot of processing time back.


I'll post up a much more complete solution later, but you should be able to make this work as expected. :)

\$\endgroup\$
8
  • \$\begingroup\$ a Dictionary with a Regex as a Key ;-] if this is not dictionary abuse that I don't know what is :-P \$\endgroup\$
    – t3chb0t
    Commented Jun 23, 2017 at 20:07
  • \$\begingroup\$ @t3chb0t That's fair, though with C#7.0 I would use a Tuple<Regex, string>, in OP's case I prefer the syntax of a dictionary. :) \$\endgroup\$ Commented Jun 23, 2017 at 21:01
  • \$\begingroup\$ It's the creation of the Dictionary and the for-each statement in the latter half of your answer that worked well for me. It certainly looks more compact. I look forward to seeing the rest of your solution. \$\endgroup\$ Commented Jun 23, 2017 at 21:01
  • \$\begingroup\$ @RandomHandle A Dictionary<Regex, string> may not be the greatest option, but for C#5/6 it's the one I'd choose. The jabbing by t3chb0t isn't just for fun - he does bring up a valid point. ;) \$\endgroup\$ Commented Jun 23, 2017 at 21:05
  • \$\begingroup\$ @EBrown -- Is there a purpose of defining the Regex variables on a class level rather than on a function level? \$\endgroup\$ Commented Jun 23, 2017 at 21:05
1
\$\begingroup\$

The solution with a Dictionary is definitely an improvement becasue it reduces a lot of repetitions and gathers all regexes and replacements together but I'd like to point to one flaw it might have. I don't know if it applies to your use case here but you should keep this in mind in case you should want to use this technique for something else.

A Dictionary<> does not maintain the order of its elements (See: Why is a Dictionary “not ordered”? . This means if you had a long list and wanted to have the most probable cases first there is no guaratee they will stay in the same order as you added them.


For the reason detailed above I find it's better to create a simple object like (I used read/write properties but an immutable object would be more appropriate)

class Translation
{
    public Regex Matcher { get; set; }

    public string Replacement { get; set; } 

    public bool CanTranslate(string value)
    {
        return Matcher.IsMatch(value);
    }

    public string Translate(string value)
    {
        return Matcher.Replace(value, Replacement);
    }
}

and put them in a List<Translation> so that they are always processed in the original order.

var translation = translations.FirstOrDefault(t => t.CanTranslate(value));
if (translation != null)
{
    return translation.Translate(value);
}

Additionaly you can already implement the translation in this object and thus simplify the code a little bit more by completely encapsulating the fact that you actually work with Regex. If you would want to use another technique for translating, the Translation would be the only thing you'd have to change.

You could for example also create an interface for it if you wanted to have translations that work with either Regex or other methods.

However the most comfortable solution would be to implement one more method like this one (either by the class itself or an extension):

bool TryTranslate(string value, out string translated)
{
    if(CanTranslate(value))
    {
        translated = Translate(value);
        return true;
    }
    translated = null;
    return false;
}

and use it with a loop like this:

foreach(var translation in translations)
{
    string translated;
    if(translation.TryTranslate(value, out translated))
    {
        return translated;
    }
}
\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.