2
\$\begingroup\$

I have a piece of code (below) that checks a data table (45,000 rows) to see if the row matches any entries in a range elsewhere on the workbook (using custom functions). The routine then creates an array of Flags and writes that array back to a field in the table - so a series of Sumifs formulae reference only those records that are relevant to the user's selections.

  • Edited to add Context:

    The data table is referenced by a Report sheet containing c. 22,000 Sumifs formulae (plus as many dependent formulae again) to create a Profit and Loss report. (and I know that it's probably not the best way to do it, but I'm bound by the end user, who wants it in excel and doesn't want to use a Pivot table, no matter how well designed). Maybe this is the reason that writing to the Table is so slow.

  • Here's one of the sumifs:

    =$D14*SUMIFS(Table1[Value],Table1[Map Code],$B14,Table1[Service],$A14,Table1[Flag],"TRUE",Table1[Period],Z$5,Table1[Type],$AA$4,Table1[Year],$Z$4)

    So I can add in some helper cells and make those a bit more efficient, but there's still a lot of them.

So here's the code

Sub flagselected()

  Dim datablock As Variant
  Dim x As Long, i As Integer
  Dim p As Integer
  Dim selectedUnits() As String
  Dim selectKey() As String
  Dim selectFlag() As Variant
  Dim startTime As Variant
  Dim midTime As Variant
  Dim endTime As Variant
  Dim postTrans As Variant
  Dim targetService As String, targetMapCode As Integer, multiplier As Integer
  Dim cell As Range, gap1 As Integer

  startTime = Now

  Application.ScreenUpdating = False
  Application.Calculation = xlCalculationManual

  datablock = Sheets("DataBlock").Range("Table1")
  selectedUnits = RangeToArray(Sheets("Tables").Range("SelectedCC"))
  ReDim selectKey(1 To UBound(datablock))
  ReDim selectFlag(1 To UBound(datablock))

  For x = LBound(datablock) To UBound(datablock)  ' loops thru the datablock
    If Contains(selectedUnits, datablock(x, 2)) = True Then 'only considers this row if it's in selected units
      selectFlag(x) = True
    End If
  Next x

  midTime = Now

  selectFlag = Application.WorksheetFunction.Transpose(selectFlag)
  postTrans = Now

  ' Sheets("datablock").Range("table1[flag]").Value2 = selectFlag  ' Commented out to test unload to range
  Sheets("datablock").Range("P2:p" & UBound(selectFlag)).Value2 = selectFlag ' this range is outside the Table

  endTime = Now
  Debug.Print "Started " & startTime
  Debug.Print "Variable filled " & midTime
  Debug.Print "Transposed " & postTrans
  Debug.Print "Unloaded to range " & endTime

  'Application.Calculation = xlCalculationAutomatic
  Application.Calculation = xlCalculationSemiautomatic

End Sub

The code seems to work fine (although I'm sure can be improved) but it's the last activity that's tripping me up. When I try to populate the Flag field in the table with the array, it takes 18 or 19 seconds. When I comment that row out and populate a range on the same worksheet but outside of the Table, it's near instantaneous.

Is there a table property or action that I can switch off for a short time while I write to it so that it's the same as writing to a simple range? There are a lot of formulae on the Workbook that reference the table, so I don't really want to convert it to a range and then recreate the table (unless that would leave formulae untouched?)

The results of the debug.print rows are as follows

when using Sheets("datablock").Range("table1[flag]").Value2 = selectFlag

Started 14/11/2019 09:15:01
Variable filled 14/11/2019 09:15:01
Transposed 14/11/2019 09:15:01
Unloaded to range 14/11/2019 09:15:19

When using Sheets("datablock").Range("P2:p" & UBound(selectFlag)).Value2 = selectFlag

Started 14/11/2019 09:18:30
Variable filled 14/11/2019 09:18:30
Transposed 14/11/2019 09:18:30
Unloaded to range 14/11/2019 09:18:30

Edited to add the custom functions. I'll try Mattieu's suggestions tomorrow and revert.

Function Contains(arr, v) As Boolean
Dim rv As Boolean, lb As Long, ub As Long, i As Long
    lb = LBound(arr)
    ub = UBound(arr)
    For i = lb To ub
        If arr(i) = v Then
            rv = True
            Exit For
        End If
    Next i
    Contains = rv
End Function

Function RangeToArray(ByVal my_range As Range) As String()

Dim vArray As Variant
Dim sArray() As String
Dim i As Long

vArray = my_range.value
ReDim sArray(1 To UBound(vArray))

For i = 1 To UBound(vArray)
    sArray(i) = vArray(i, 1)
Next

RangeToArray = sArray()

End Function

Thanks for help with formatting Question and responses so far.

\$\endgroup\$
5
  • \$\begingroup\$ The write times for a range and a table (listobject) are nearly identical. Are you using Worksheet_Change event by chance? \$\endgroup\$
    – TinMan
    Commented Nov 14, 2019 at 11:52
  • \$\begingroup\$ No - this sub runs as part of another routine where the User has selected one or more values from a listbox. The routine then populates the range SelectedCC and calls this sub. It's baffling me as I've tested it multiple times both with the line populating Table1[Flag] and the line populating the range, and the time difference is enormous \$\endgroup\$ Commented Nov 14, 2019 at 11:59
  • 1
    \$\begingroup\$ Does RangeToArray() or Contains() change the calculation mode? I would test the calculation mode the line before writing the data. \$\endgroup\$
    – TinMan
    Commented Nov 14, 2019 at 12:22
  • \$\begingroup\$ Thanks for the tip - I'd not thought to look at those - but sadly, no. I even put in an extra line to set calc to manual again, just before the unload part of the routine, and the time difference between the 2 is still 19 seconds. \$\endgroup\$ Commented Nov 14, 2019 at 12:32
  • 2
    \$\begingroup\$ Would be nice to include the helper functions RangeToArray and Contains; that way reviewers could copy and compile your code making fewer assumptions. \$\endgroup\$ Commented Nov 14, 2019 at 16:56

1 Answer 1

2
\$\begingroup\$
Sub flagselected()

The procedure is implicitly Public. This is potentially confusing, because in most programming languages (including VB.NET) the implicit default would be Private. Consider always making all access modifiers explicit.

Procedure names should be PascalCase, to adhere to both the naming conventions in place (everything in the VBA standard library and Excel object model uses this naming convention) and the modern naming conventions recommended for VB.NET code, which IMO apply perfectly well to VBA code too). Big huge kudos for avoiding Hungarian Notation prefixing though, but alllowercase isn't ideal.

  Dim datablock As Variant
  Dim x As Long, i As Integer
  Dim p As Integer
  Dim selectedUnits() As String
  Dim selectKey() As String
  Dim selectFlag() As Variant
  Dim startTime As Variant
  Dim midTime As Variant
  Dim endTime As Variant
  Dim postTrans As Variant
  Dim targetService As String, targetMapCode As Integer, multiplier As Integer
  Dim cell As Range, gap1 As Integer

Don't do this to yourself, especially in procedure scopes that are any longer than just a handful of lines: this Great Wall of Declarations at the top of the procedure is a huge distraction, and only serves to make it harder to see what's used where.

Instead, consider declaring local variables where you're using them. That way it's much harder to declare a variable... and then not use it anywhere.

Rubberduck (a free, open-source VBIDE add-in project I manage) can't find any uses for the following local variables, which are declared but never assigned or even referred to:

  • i
  • p
  • targetService
  • targetMapCode
  • multiplier
  • cell
  • gap1

It also warns about variables declared As Integer, strongly suggesting to use As Long instead.

The 2-spaces indent is non-standard (default is 4 spaces), but it's consistent so it's not too distracting.

  Application.ScreenUpdating = False
  Application.Calculation = xlCalculationManual

Whenever this global state is toggled, it should be obligatory to handle runtime errors: that way if anything goes wrong during execution, the global application state is guaranteed to always be reset back to what it was.

But why toggle this at all, if we're only writing to a worksheet once? Disabling ScreenUpdating is useful when you have inefficient code with .Select and .Activate and Excel ends up spending more time repainting itself than running your code, and making Calculation manual is useful when you make so many worksheet writes that Excel is constantly recalculating - but here, none of this is happening: toggling off screen repaints and deferring calculations isn't buying you anything here - I'd just remove these instructions completely.

datablock = Sheets("DataBlock").Range("Table1")

There's quite a bit of implicit code going on here - this would be explicit equivalent:

datablock = ActiveWorkbook.Worksheets("DataBlock").Range("Table1").Value

Note that the implicit ActiveWorkbook reference is very much a potential bug: if the "DataBlock" sheet exists in ThisWorkbook (the VBA project's host document) at compile-time, then there's no need to dereference it from any Sheets collection - the worksheet has a CodeName that you can use anywhere in your VBA project: this "code name" is the sheet module's (Name) property. Set it to a valid VBA identifier name (e.g. DataBlockSheet), and then you can do this:

datablock = DataBlockSheet.Range("Table1").Value

Now, this looks like you're getting the named Range for a ListObject table. Why not retrieve the actual ListObject reference and make it explicit that you're looking at a ListObject and not just any other Range?

datablock = DataBlockSheet.ListObjects("Table1").DataBodyRange.Value

Out of 6 references to the datablock variable, 5 are passing it as an argument to LBound or UBound; the other is reading a specific value in the table, and is probably indeed more efficient as an in-memory array read - but the LBound/UBound stuff shouldn't need to be re-computed 5 times.

ReDim selectKey(1 To UBound(datablock))
ReDim selectFlag(1 To UBound(datablock))

For x = LBound(datablock) To UBound(datablock) ' loops thru the datablock

Because the array came from Range.Value, it's necessarily a 1-based, 2D variant array: the LBound will always be 1. You've hard-coded that 1 in 2 places, and computing it for the For x loop - that's inconsistent, but it's good practice to never assume what array bounds are when looping, so kudos for that.

I'd declare a local.

Dim datablockRows As Long
datablockRows = UBound(datablock)

And then...

ReDim selectKey(1 To datablockRows)
ReDim selectFlag(1 To datablockRows)

For x = 1 To datablockRows

Note that the loops thru the datablock comment isn't saying anything that the code isn't already sayign loud & clear: it's redundant, and potentially distracting & confusing. Imagine you rename datablock to tableData: the comment now needs to be updated, only to keep up with the code. Don't bother writing comments that say what - write comments that say why instead.

If Contains(selectedUnits, datablock(x, 2)) = True Then 'only considers this row if it's in selected units
  selectFlag(x) = True
End If

That's a better comment already (although, the Contains method name is kind of already making that clear enough), but I suspect that results in something like \$O(n^2)\$ complexity: you're iterating up to every single row in selectedUnits (presumably that's very few rows?) for every single row in datablock. We don't know how Contains is implemented, but it looks like it's basically reinventing the wheel of WorksheetFunction.Match, which as a native function should theoretically perform better than a VBA equivalent. I do like the abstraction, but to me Contains all by itself isn't sufficient to tell enough about its usage: StringContains would obviously be an abstraction over InStr / finding a given value within a string, and ArrayContains would obviously be an abstraction over finding a given value within an array.

The Boolean literal value True is redundant in the conditional expression. Contains already returns a Boolean: it is a Boolean expression that If will be happy to work with.

If Contains(selectedUnits, datablock(x, 2)) Then

Now, that conditional is assigning to a literal Boolean value, and if there was an Else block it would be assigning the inverse value... but there's no Else block here, and selectFlags is a Variant array. This means the selectFlags array contains Variant/True and Variant/Empty after the loop.

If having explicit FALSE Boolean literal values (rather than empty cells) is ok, then I'd recommend removing the conditional block, and assigning directly to the array subscript:

selectFlag(x) = Contains(selectedUnits, datablock(x, 2))

And now we get to the worksheet write operation...

Sheets("datablock").Range("P2:p" & UBound(selectFlag)).Value2 = selectFlag

I like the version that doesn't assume what specific worksheet column the destination column is located in; this one will break if the table's [flag] column is moved anywhere. Actually it won't break - it'll just happily wreck the table by writing to the wrong column. Why assign to Value2 though? You're using Value everywhere else. Range.Value2 is useful when reading values that are of a Date or Currency data type, under certain specific circumstances (you get a Double instead of a Date or Currency - but most of the time you want to work with Date and Currency and not Double). ...but we're dealing with Boolean values here.

The worksheet doesn't need to be dereferenced again, nor does the target range. It does, but only because you haven't persisted it to a local variable when you dereferenced it before the loop.

I'd have a variable for the table, assigned near the beginning - just before datablock gets assigned:

Dim dataTable As ListObject
Set dataTable = DataBlockSheet.ListObjects("Table1")

Dim dataBlock As Variant
datablock = dataTable.DataBodyRange.Value

And with that you'd have a reference to your table for writing back the values:

dataTable.ListColumns("flag").DataBodyRange.Value = selectFlag

Now, given this:

selectedUnits = RangeToArray(Sheets("Tables").Range("SelectedCC"))

You could probably scrap a lot of that code (if not all of it), and simply have a formula in the flag column, something that might look like this:

=NOT(ISERROR(MATCH([@Column2],SelectedCC[@Column1],0)))
\$\endgroup\$
8
  • \$\begingroup\$ Excellent post as always +1.but I have couple nits to pick. You know far more about this stuff than I ever will but I think that it is a misnomer to say that LBound() and UBound()`` compute the size of the Arrays. The size of the Arrays are computed at time of declaration or redeclaration. LBound()` and `UBound()`` are simply referencing the Array's header information. \$\endgroup\$
    – TinMan
    Commented Nov 14, 2019 at 22:46
  • \$\begingroup\$ The difference in speed is roughly 12 MS per million cycles. I would elimiante LBound() because it is a given and I can type 1 a lot faster than LBound(datablock) . IMO datablockRows is just another brick to add to the Great Wall. \$\endgroup\$
    – TinMan
    Commented Nov 14, 2019 at 22:46
  • \$\begingroup\$ @TinMan that is correct - LBound and UBound are basically language keywords, not quite functions, and the array pointer just readily contains this information. Regardless, dereferencing that pointer 3 times in a row strikes me as redundant, hence introducing a local variable to hold it. As for performance, I paid zero attention to it... \$\endgroup\$ Commented Nov 14, 2019 at 22:51
  • \$\begingroup\$ @MathieuGuindon Many thanks for your superb critique. As you can see I'm self-taught and pretty new so it's much appreciated. Some went above my head, but I'll learn. I've edited the original post to provide more context (hope I've done it right). My first thought was to use a formula in the Flag field, however, that seemed to slow the whole model down to standstill. Hence my wish to write to the Field from vba. Which brings us back to the point of my original query. Writing to the table field is very much slower than writing to a range outside of the table. \$\endgroup\$ Commented Nov 15, 2019 at 9:17
  • \$\begingroup\$ And that's what I was trying to get to understand. I've made your suggested changes to the code, and yet still the very last step, writing to the table, takes 18 seconds - despite calculation being switched to manual. I'd have thought that manual calc would have meant that it didn't matter how many formulae reference the table? \$\endgroup\$ Commented Nov 15, 2019 at 9:18

Not the answer you're looking for? Browse other questions tagged or ask your own question.