7
\$\begingroup\$

Once upon a time, there was a duck that wanted to know where and how user code was calling into the VBA standard library and Excel object model. To match the rest of its API, the poor little duck had to dig through pages and pages and pages of MSDN documentation, and instantiate a Declaration object for each and every single module, class, enum, function, property, even and whatnot.

For example, the ColorConstants built-in module from the VBA standard library would be hard-coded as a series of Declaration fields, and provided to through a static getter that used reflection to pull all the members:

public static IEnumerable<Declaration> Declarations
{
    get
    {
        if (_standardLibDeclarations == null)
        {
            var nestedTypes = typeof(VbaStandardLib).GetNestedTypes(BindingFlags.NonPublic).Where(t => Attribute.GetCustomAttribute(t, typeof(CompilerGeneratedAttribute)) == null);
            var fields = nestedTypes.SelectMany(t => t.GetFields());
            var values = fields.Select(f => f.GetValue(null));
            _standardLibDeclarations = values.Cast<Declaration>();
        }

        return _standardLibDeclarations;
    }
}

//...

private class ColorConstantsModule
{
    private static readonly QualifiedModuleName ColorConstantsModuleName = new QualifiedModuleName("VBA", "ColorConstants");
    public static readonly Declaration ColorConstants = new Declaration(new QualifiedMemberName(ColorConstantsModuleName, "ColorConstants"), VbaLib.Vba, "VBA", "ColorConstants", false, false, Accessibility.Global, DeclarationType.Module);
    public static Declaration VbBlack = new ValuedDeclaration(new QualifiedMemberName(ColorConstantsModuleName, "vbBlack"), ColorConstants, "VBA.ColorConstants", "Long", Accessibility.Global, DeclarationType.Constant, "0");
    public static Declaration VbBlue = new ValuedDeclaration(new QualifiedMemberName(ColorConstantsModuleName, "vbBlue"), ColorConstants, "VBA.ColorConstants", "Long", Accessibility.Global, DeclarationType.Constant, "16711680");
    public static Declaration VbCyan = new ValuedDeclaration(new QualifiedMemberName(ColorConstantsModuleName, "vbCyan"), ColorConstants, "VBA.ColorConstants", "Long", Accessibility.Global, DeclarationType.Constant, "16776960");
    public static Declaration VbGreen = new ValuedDeclaration(new QualifiedMemberName(ColorConstantsModuleName, "vbGreen"), ColorConstants, "VBA.ColorConstants", "Long", Accessibility.Global, DeclarationType.Constant, "65280");
    public static Declaration VbMagenta = new ValuedDeclaration(new QualifiedMemberName(ColorConstantsModuleName, "vbMagenta"), ColorConstants, "VBA.ColorConstants", "Long", Accessibility.Global, DeclarationType.Constant, "16711935");
    public static Declaration VbRed = new ValuedDeclaration(new QualifiedMemberName(ColorConstantsModuleName, "vbRed"), ColorConstants, "VBA.ColorConstants", "Long", Accessibility.Global, DeclarationType.Constant, "255");
    public static Declaration VbWhite = new ValuedDeclaration(new QualifiedMemberName(ColorConstantsModuleName, "vbWhite"), ColorConstants, "VBA.ColorConstants", "Long", Accessibility.Global, DeclarationType.Constant, "16777215");
    public static Declaration VbYellow = new ValuedDeclaration(new QualifiedMemberName(ColorConstantsModuleName, "vbYellow"), ColorConstants, "VBA.ColorConstants", "Long", Accessibility.Global, DeclarationType.Constant, "65535");
}

This was very much unfortunate, because not only was it a fairly ugly and not-quite-justified use of reflection, it meant that the VBE add-in would only ever know of hard-coded declarations, and hard-coding the declarations for every member of every possible VBA host application that could ever run Rubberduck, would be beyond ridiculous - even the thought of it is ludicrous. Imagine the above, times 700. Now imagine you want to add a Declaration for every parameter of every function out there: the result would be an unmaintainable pile of hard-coded goo.

So I decided to scratch that and go with a wildly different approach instead. Not any less ludicrous though.

When hosted in MS-Excel 2010, the below code yields 37,135 built-in declarations in... well, about two thirds of a second:

VBA declarations added in 77ms
Excel declarations added in 582ms
stdole declarations added in 2ms
37135 built-in declarations added.

It's being used like this, in the RubberduckParser.ParseParallel method:

if (!_state.AllDeclarations.Any(declaration => declaration.IsBuiltIn))
{
    // multiple projects can (do) have same references; avoid adding them multiple times!
    var references = projects.SelectMany(project => project.References.Cast<Reference>())
        .GroupBy(reference => reference.Guid)
        .Select(grouping => grouping.First());

    foreach (var reference in references)
    {
        var stopwatch = Stopwatch.StartNew();
        var declarations = _comReflector.GetDeclarationsForReference(reference);
        foreach (var declaration in declarations)
        {
            _state.AddDeclaration(declaration);
        }
        stopwatch.Stop();
        Debug.WriteLine("{0} declarations added in {1}ms", reference.Name, stopwatch.ElapsedMilliseconds);
    }

    Debug.WriteLine("{0} built-in declarations added.", _state.AllDeclarations.Count(d => d.IsBuiltIn));
}

So, it works wonderfully well - too well even (the resolver code wasn't quite ready to handle that many declarations). Given a Reference, we load its COM type library, start iterating its types and member, and yield return a Declaration as soon as we have enough information to provide one.

using System;
using System.Collections.Generic;
using System.Runtime.InteropServices;
using System.Runtime.InteropServices.ComTypes;
using Microsoft.Vbe.Interop;
using Rubberduck.VBEditor;
using FUNCFLAGS = System.Runtime.InteropServices.ComTypes.FUNCFLAGS;
using TYPEDESC = System.Runtime.InteropServices.ComTypes.TYPEDESC;
using TYPEKIND = System.Runtime.InteropServices.ComTypes.TYPEKIND;
using FUNCKIND = System.Runtime.InteropServices.ComTypes.FUNCKIND;
using INVOKEKIND = System.Runtime.InteropServices.ComTypes.INVOKEKIND;
using PARAMFLAG = System.Runtime.InteropServices.ComTypes.PARAMFLAG;
using TYPEATTR = System.Runtime.InteropServices.ComTypes.TYPEATTR;
using FUNCDESC = System.Runtime.InteropServices.ComTypes.FUNCDESC;
using ELEMDESC = System.Runtime.InteropServices.ComTypes.ELEMDESC;
using VARDESC = System.Runtime.InteropServices.ComTypes.VARDESC;

namespace Rubberduck.Parsing.Symbols
{
    public class ReferencedDeclarationsCollector
    {
        /// <summary>
        /// Controls how a type library is registered.
        /// </summary>
        private enum REGKIND
        {
            /// <summary>
            /// Use default register behavior.
            /// </summary>
            REGKIND_DEFAULT = 0,
            /// <summary>
            /// Register this type library.
            /// </summary>
            REGKIND_REGISTER = 1,
            /// <summary>
            /// Do not register this type library.
            /// </summary>
            REGKIND_NONE = 2
        }

        [DllImport("oleaut32.dll", CharSet = CharSet.Unicode)]
        private static extern void LoadTypeLibEx(string strTypeLibName, REGKIND regKind, out ITypeLib TypeLib);

        private static readonly IDictionary<VarEnum, string> TypeNames = new Dictionary<VarEnum, string>
        {
            {VarEnum.VT_DISPATCH, "DISPATCH"},
            {VarEnum.VT_VOID, string.Empty},
            {VarEnum.VT_VARIANT, "Variant"},
            {VarEnum.VT_BLOB_OBJECT, "Object"},
            {VarEnum.VT_STORED_OBJECT, "Object"},
            {VarEnum.VT_STREAMED_OBJECT, "Object"},
            {VarEnum.VT_BOOL, "Boolean"},
            {VarEnum.VT_BSTR, "String"},
            {VarEnum.VT_LPSTR, "String"},
            {VarEnum.VT_LPWSTR, "String"},
            {VarEnum.VT_I1, "Variant"}, // no signed byte type in VBA
            {VarEnum.VT_UI1, "Byte"},
            {VarEnum.VT_I2, "Integer"},
            {VarEnum.VT_UI2, "Variant"}, // no unsigned integer type in VBA
            {VarEnum.VT_I4, "Long"},
            {VarEnum.VT_UI4, "Variant"}, // no unsigned long integer type in VBA
            {VarEnum.VT_I8, "Variant"}, // LongLong on 64-bit VBA
            {VarEnum.VT_UI8, "Variant"}, // no unsigned LongLong integer type in VBA
            {VarEnum.VT_INT, "Long"}, // same as I4
            {VarEnum.VT_UINT, "Variant"}, // same as UI4
            {VarEnum.VT_DATE, "Date"},
            {VarEnum.VT_DECIMAL, "Currency"}, // best match?
            {VarEnum.VT_EMPTY, "Empty"},
            {VarEnum.VT_R4, "Single"},
            {VarEnum.VT_R8, "Double"},
        };

        private string GetTypeName(ITypeInfo info)
        {
            string typeName;
            string docString; // todo: put the docString to good use?
            int helpContext;
            string helpFile;
            info.GetDocumentation(-1, out typeName, out docString, out helpContext, out helpFile);

            return typeName;
        }

        public IEnumerable<Declaration> GetDeclarationsForReference(Reference reference)
        {
            var projectName = reference.Name;
            var path = reference.FullPath;

            var projectQualifiedModuleName = new QualifiedModuleName(projectName, projectName);
            var projectQualifiedMemberName = new QualifiedMemberName(projectQualifiedModuleName, projectName);

            var projectDeclaration = new Declaration(projectQualifiedMemberName, null, null, projectName, false, false, Accessibility.Global, DeclarationType.Project);
            yield return projectDeclaration;

            ITypeLib typeLibrary;
            LoadTypeLibEx(path, REGKIND.REGKIND_NONE, out typeLibrary);

            var typeCount = typeLibrary.GetTypeInfoCount();
            for (var i = 0; i < typeCount; i++)
            {
                ITypeInfo info;
                typeLibrary.GetTypeInfo(i, out info);

                if (info == null)
                {
                    continue;
                }

                var typeName = GetTypeName(info);
                var typeDeclarationType = GetDeclarationType(typeLibrary, i);

                QualifiedModuleName typeQualifiedModuleName;
                QualifiedMemberName typeQualifiedMemberName;
                if (typeDeclarationType == DeclarationType.Enumeration ||
                    typeDeclarationType == DeclarationType.UserDefinedType)
                {
                    typeQualifiedModuleName = projectQualifiedModuleName;
                    typeQualifiedMemberName = new QualifiedMemberName(projectQualifiedModuleName, typeName);
                }
                else
                {
                    typeQualifiedModuleName = new QualifiedModuleName(projectName, typeName);
                    typeQualifiedMemberName = new QualifiedMemberName(typeQualifiedModuleName, typeName);
                }

                var moduleDeclaration = new Declaration(typeQualifiedMemberName, projectDeclaration, projectDeclaration, typeName, false, false, Accessibility.Global, typeDeclarationType, null, Selection.Home);
                yield return moduleDeclaration;

                IntPtr typeAttributesPointer;
                info.GetTypeAttr(out typeAttributesPointer);

                var typeAttributes = (TYPEATTR)Marshal.PtrToStructure(typeAttributesPointer, typeof (TYPEATTR));
                //var implements = GetImplementedInterfaceNames(typeAttributes, info);

                for (var memberIndex = 0; memberIndex < typeAttributes.cFuncs; memberIndex++)
                {
                    IntPtr memberDescriptorPointer;
                    info.GetFuncDesc(memberIndex, out memberDescriptorPointer);
                    var memberDescriptor = (FUNCDESC) Marshal.PtrToStructure(memberDescriptorPointer, typeof (FUNCDESC));

                    var memberNames = new string[255]; // member name at index 0; array contains parameter names too
                    int namesArrayLength;
                    info.GetNames(memberDescriptor.memid, memberNames, 255, out namesArrayLength);

                    var memberName = memberNames[0];

                    var funcValueType = (VarEnum)memberDescriptor.elemdescFunc.tdesc.vt;
                    var memberDeclarationType = GetDeclarationType(memberDescriptor, funcValueType);

                    var asTypeName = string.Empty;
                    if (memberDeclarationType != DeclarationType.Procedure && !TypeNames.TryGetValue(funcValueType, out asTypeName))
                    {
                        asTypeName = funcValueType.ToString(); //TypeNames[VarEnum.VT_VARIANT];
                    }

                    var memberDeclaration = new Declaration(new QualifiedMemberName(typeQualifiedModuleName, memberName), moduleDeclaration, moduleDeclaration, asTypeName, false, false, Accessibility.Global, memberDeclarationType, null, Selection.Home);
                    yield return memberDeclaration;

                    var parameterCount = memberDescriptor.cParams - 1;
                    for (var paramIndex = 0; paramIndex < parameterCount; paramIndex++)
                    {
                        var paramName = memberNames[paramIndex + 1];

                        var paramPointer = new IntPtr(memberDescriptor.lprgelemdescParam.ToInt64() + Marshal.SizeOf(typeof (ELEMDESC))*paramIndex);
                        var elementDesc = (ELEMDESC) Marshal.PtrToStructure(paramPointer, typeof (ELEMDESC));
                        var isOptional = elementDesc.desc.paramdesc.wParamFlags.HasFlag(PARAMFLAG.PARAMFLAG_FOPT);
                        var asParamTypeName = string.Empty;

                        var isByRef = false;
                        var isArray = false;
                        var paramDesc = elementDesc.tdesc;
                        var valueType = (VarEnum) paramDesc.vt;
                        if (valueType == VarEnum.VT_PTR || valueType == VarEnum.VT_BYREF)
                        {
                            //var paramTypeDesc = (TYPEDESC) Marshal.PtrToStructure(paramDesc.lpValue, typeof (TYPEDESC));
                            isByRef = true;
                            var paramValueType = (VarEnum) paramDesc.vt;
                            if (!TypeNames.TryGetValue(paramValueType, out asParamTypeName))
                            {
                                asParamTypeName = TypeNames[VarEnum.VT_VARIANT];
                            }
                            //var href = paramDesc.lpValue.ToInt32();
                            //ITypeInfo refTypeInfo;
                            //info.GetRefTypeInfo(href, out refTypeInfo);

                            // todo: get type info?
                        }
                        if (valueType == VarEnum.VT_CARRAY || valueType == VarEnum.VT_ARRAY || valueType == VarEnum.VT_SAFEARRAY)
                        {
                            // todo: tell ParamArray arrays from normal arrays
                            isArray = true;
                        }

                        yield return new ParameterDeclaration(new QualifiedMemberName(typeQualifiedModuleName, paramName), memberDeclaration, asParamTypeName, isOptional, isByRef, isArray);
                    }
                }

                for (var fieldIndex = 0; fieldIndex < typeAttributes.cVars; fieldIndex++)
                {
                    IntPtr ppVarDesc;
                    info.GetVarDesc(fieldIndex, out ppVarDesc);

                    var varDesc = (VARDESC) Marshal.PtrToStructure(ppVarDesc, typeof (VARDESC));

                    var names = new string[255];
                    int namesArrayLength;
                    info.GetNames(varDesc.memid, names, 255, out namesArrayLength);

                    var fieldName = names[0];
                    var fieldValueType = (VarEnum)varDesc.elemdescVar.tdesc.vt;
                    var memberType = GetDeclarationType(varDesc, typeDeclarationType);

                    string asTypeName;
                    if (!TypeNames.TryGetValue(fieldValueType, out asTypeName))
                    {
                        asTypeName = TypeNames[VarEnum.VT_VARIANT];
                    }

                    yield return new Declaration(new QualifiedMemberName(typeQualifiedModuleName, fieldName), moduleDeclaration, moduleDeclaration, asTypeName, false, false, Accessibility.Global, memberType, null, Selection.Home);
                }
            }           
        }

        //private IEnumerable<string> GetImplementedInterfaceNames(TYPEATTR typeAttr, ITypeInfo info)
        //{
        //    for (var implIndex = 0; implIndex < typeAttr.cImplTypes; implIndex++)
        //    {
        //        int href;
        //        info.GetRefTypeOfImplType(implIndex, out href);

        //        ITypeInfo implTypeInfo;
        //        info.GetRefTypeInfo(href, out implTypeInfo);

        //        var implTypeName = GetTypeName(implTypeInfo);

        //        yield return implTypeName;
        //        //Debug.WriteLine(string.Format("\tImplements {0}", implTypeName));
        //    }
        //}

        private DeclarationType GetDeclarationType(ITypeLib typeLibrary, int i)
        {
            TYPEKIND typeKind;
            typeLibrary.GetTypeInfoType(i, out typeKind);

            DeclarationType typeDeclarationType = DeclarationType.Control; // todo: a better default
            if (typeKind == TYPEKIND.TKIND_ENUM)
            {
                typeDeclarationType = DeclarationType.Enumeration;
            }
            else if (typeKind == TYPEKIND.TKIND_COCLASS || typeKind == TYPEKIND.TKIND_INTERFACE ||
                     typeKind == TYPEKIND.TKIND_ALIAS || typeKind == TYPEKIND.TKIND_DISPATCH)
            {
                typeDeclarationType = DeclarationType.Class;
            }
            else if (typeKind == TYPEKIND.TKIND_RECORD)
            {
                typeDeclarationType = DeclarationType.UserDefinedType;
            }
            else if (typeKind == TYPEKIND.TKIND_MODULE)
            {
                typeDeclarationType = DeclarationType.Module;
            }
            return typeDeclarationType;
        }

        private DeclarationType GetDeclarationType(FUNCDESC funcDesc, VarEnum funcValueType)
        {
            DeclarationType memberType;
            if (funcDesc.invkind.HasFlag(INVOKEKIND.INVOKE_PROPERTYGET))
            {
                memberType = DeclarationType.PropertyGet;
            }
            else if (funcDesc.invkind.HasFlag(INVOKEKIND.INVOKE_PROPERTYPUT))
            {
                memberType = DeclarationType.PropertyLet;
            }
            else if (funcDesc.invkind.HasFlag(INVOKEKIND.INVOKE_PROPERTYPUTREF))
            {
                memberType = DeclarationType.PropertySet;
            }
            else if (funcValueType == VarEnum.VT_VOID)
            {
                memberType = DeclarationType.Procedure;
            }
            else if (funcDesc.funckind == FUNCKIND.FUNC_PUREVIRTUAL)
            {
                memberType = DeclarationType.Event;
            }
            else
            {
                memberType = DeclarationType.Function;
            }
            return memberType;
        }

        private DeclarationType GetDeclarationType(VARDESC varDesc, DeclarationType typeDeclarationType)
        {
            var memberType = DeclarationType.Variable;
            if (varDesc.varkind == VARKIND.VAR_CONST)
            {
                memberType = typeDeclarationType == DeclarationType.Enumeration
                    ? DeclarationType.EnumerationMember
                    : DeclarationType.Constant;
            }
            else if (typeDeclarationType == DeclarationType.UserDefinedType)
            {
                memberType = DeclarationType.UserDefinedTypeMember;
            }
            return memberType;
        }
    }
}

Should I break it down further? How readable / maintainable is it? I left some //todo comments in there, so I'll be coming back to this code in a number of weeks/months - what am I going to be regretting?

\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

I think you can make this a tiny bit clearer:

if (!_state.AllDeclarations.Any(declaration => declaration.IsBuiltIn))
{
    // multiple projects can (do) have same references; avoid adding them multiple times!
    var references = projects.SelectMany(project => project.References.Cast<Reference>())
        .GroupBy(reference => reference.Guid)
        .Select(grouping => grouping.First());

    foreach (var reference in references)
    {
        var stopwatch = Stopwatch.StartNew();
        var declarations = _comReflector.GetDeclarationsForReference(reference);
        foreach (var declaration in declarations)
        {
            _state.AddDeclaration(declaration);
        }
        stopwatch.Stop();
        Debug.WriteLine("{0} declarations added in {1}ms", reference.Name, stopwatch.ElapsedMilliseconds);
    }

You don't actually need to construct the whole group at all. You can use a HashSet instead:

var deduper = new HashSet<Guid>();

var references = projects
        .SelectMany(project => project.References.Cast<Reference>());

foreach (var reference in references)
{
    if (!deduper.Add(reference.Guid))
    {
        continue;
    }
    // do your stuff.
}

In general a DistinctBy extension method comes in damn handy:

public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> source, Func<T, TKey> keySelector)
{
    if (source == null)
    {
        throw new ArgumentNullException(nameof(source));
    }
    if (keySelector == null)
    {
        throw new ArgumentNullException(nameof(keySelector));
    }
    var deduper = new HashSet<TKey>();
    return source.Where(item => deduper.Add(keySelector(item)));
}

You can get really fancy and pass in an IEqualityComparer<T> if you want to.

Which means you could simply do:

var references = projects
        .SelectMany(project => project.References.Cast<Reference>())
        .DistinctBy(r => r.Guid);

foreach (var reference in references)
{

You'll notice that I prefer to keep SelectMany simple and add the call to Cast later. That's a personal preference thing but I find it easier to scan that way. Although that's still true, as you note in the comments, you can't do that here :)


GetDeclarationsForReference seems much too long. You should break it up.

e.g.

for (var paramIndex = 0; paramIndex < parameterCount; paramIndex++)
{
    yield return CreateParameterDeclaration(/* lots of parameters */);
}

It's a pain to do it but you'll thank yourself in the long run! Not much of a review... I'll hopefully take a closer look later in the week.

\$\endgroup\$
1
  • 1
    \$\begingroup\$ I like that DistinctBy idea! The Cast<Reference>() needs to be inside the SelectMany though, otherwise the compiler can't infer the generic type involved, and explicit generic type arguments are a greater evil than a Cast<T>() inside a SelectMany ;-) \$\endgroup\$ Commented Mar 8, 2016 at 2:09

Not the answer you're looking for? Browse other questions tagged or ask your own question.