1
\$\begingroup\$

The code is doing everything that I need it to. It's taking much too long though upwards of a couple minutes to complete the macro. Does anyone know how I code optimize this VBA code to run quicker? I'm new to VBA and I'm not quite sure how to proceed. This code will be running on roughly 35,000 lines of data.

Public Sub matchRow()

    Dim dumpSheet, referencesheet, outputSheet As Worksheet
    Dim startRow, outputRow, tempDumpRow, tempActiveRow, referenceRowCount, finishedreferenceIndex As Integer
    Dim finishedreference() As String
    Dim isExist As Boolean


    'Set sheets
    Set dumpSheet = Sheets("Dump")
    Set referencesheet = Sheets("Active Directory")
    Set outputSheet = Sheets("Output")

    'Set start row of each sheet for data
    startRow = 5
    outputRow = 5

    'Get row count from Active Depository sheet
    referenceRowCount = referencesheet.Range("B5:D5").End(xlDown).Row

    'Set index
    finishedreferenceIndex = 5

    'Re-define array
    ReDim finishedreference(5 To referenceRowCount - 1)

    'Set the start row
    tempDumpRow = startRow

    'Here I looped with OR state, you can modify it to AND start if you want
    Do While dumpSheet.Range("B" & tempDumpRow) <> "" Or dumpSheet.Range("C" & tempDumpRow) <> "" Or dumpSheet.Range("D" & tempDumpRow) <> ""

        'Reset exist flag
        isExist = False

        'loop all row in Active Depository sheet
        For tempActiveRow = 5 To referenceRowCount Step 1

            'If row is not finished for checking.
            If UBound(Filter(finishedreference, tempActiveRow)) < 0 Then

                'If all cell are equal
                If dumpSheet.Range("B" & tempDumpRow) = referencesheet.Range("B" & tempActiveRow) And _
                   dumpSheet.Range("C" & tempDumpRow) = referencesheet.Range("C" & tempActiveRow) And _
                   dumpSheet.Range("D" & tempDumpRow) = referencesheet.Range("D" & tempActiveRow) Then

                    'Set true to exist flag
                    isExist = True

                    'Store finished row
                    finishedreference(finishedreferenceIndex) = tempActiveRow

                    finishedreferenceIndex = finishedreferenceIndex + 1

                    'exit looping
                    Exit For

                End If

            End If

        Next tempActiveRow

        'Show result
        outputSheet.Range("B" & outputRow) = dumpSheet.Range("B" & tempDumpRow)
        outputSheet.Range("C" & outputRow) = dumpSheet.Range("C" & tempDumpRow)
        outputSheet.Range("D" & outputRow) = dumpSheet.Range("D" & tempDumpRow)

        If isExist Then
            outputSheet.Range("E" & outputRow) = ""
        Else
            outputSheet.Range("E" & outputRow) = "Item found in ""Dump"" but not in ""Active Directory"""
        End If

        'increase output row
        outputRow = outputRow + 1

        'go next row
        tempDumpRow = tempDumpRow + 1

    Loop

    'loop all row in Active Depository sheet
    For tempActiveRow = 5 To referenceRowCount Step 1

        'If row is not finished for checking.
        If UBound(Filter(finishedreference, tempActiveRow)) < 0 Then

            'Show result
            outputSheet.Range("B" & outputRow) = referencesheet.Range("B" & tempActiveRow)
            outputSheet.Range("C" & outputRow) = referencesheet.Range("C" & tempActiveRow)
            outputSheet.Range("D" & outputRow) = referencesheet.Range("D" & tempActiveRow)
            outputSheet.Range("E" & outputRow) = "Item found in ""Active Directory"" but not in ""Dump"""

            'increase output row
            outputRow = outputRow + 1

        End If

    Next tempActiveRow

End Sub
\$\endgroup\$
6
  • 1
    \$\begingroup\$ Could you add detail to explain what your code does? \$\endgroup\$
    – IEatBagels
    Commented Nov 26, 2019 at 19:23
  • 1
    \$\begingroup\$ My immediate suggestion is to look through previous Code Reviews for VBA and pick out some constant themes - apply those themes to your code and then resubmit for review. \$\endgroup\$
    – AJD
    Commented Nov 26, 2019 at 20:22
  • 1
    \$\begingroup\$ Key themes to look for are Option Explicit, declaring variables on one line (hint: Dim dumpSheet, referencesheet, outputSheet As Worksheet does not do what you think it does), explicitly using default actions (e.g. Range(x).Value="" instead of Range(x) = "") and declaring variables near where you use them. \$\endgroup\$
    – AJD
    Commented Nov 26, 2019 at 20:26
  • 1
    \$\begingroup\$ Reading the previous reviews you will see many examples of using arrays to improve performance. The reason I mention all this is because while there are many people here happy to review code and provide advice, we are not a free refactoring or rewriting service. Your main question "Someone suggested loading data as variant arrays as opposed to ranges. I'm stuck." implies that you are looking for specific help, not a review. There are also many examples of using arrays instead of ranges on StackOverflow. Remember to filter searches with '[vba]' \$\endgroup\$
    – AJD
    Commented Nov 26, 2019 at 20:29
  • 1
    \$\begingroup\$ Using SQL is great for these type of set based comparisons. You can use ADO in Excel if you wish, although it plays nicer in an actual database. I'd recommend checking that out though! \$\endgroup\$ Commented Nov 26, 2019 at 21:55

1 Answer 1

1
\$\begingroup\$

This is the sample dataset I created. The OP's code suggests that the Active Directory tab has an extra row.

Sample Dataset

Always Use Worksheets CodeNames Whenever Possible

Referencing worksheets by their code names will avoid naming conflicts while working with multiple workbooks and changing the worksheet name will not break any code.

  • Sheets("Active Directory") -> wsActiveDirectory
  • Sheets("Dump")-> wsDump
  • Sheets("Output") -> wsOutput

Use Constants for Magic Numbers

Using constants for values that should only be set once will make your code easier to read and maintain. Constants will also throw an error if you try to change their values.

Before

startRow = 5
outputRow = 5

After

Const startRow As Long = 5, outputRow As Long = 5

Matching Lists

Dictionaries are optimised for fast lookups. Using a Scripting.Dictionary will match the values will easily make the code run 100 times faster.

The trick is to create a composite key for all fields. Note: make sure to use a delimiter.

1;Towney;Research and Development

Private Function getKey(ByVal rowIndex As Long, ByRef Target As Range) As String
    getKey = Target.Cells(rowIndex, 1) & ";" & Target.Cells(rowIndex, 2) & ";" & Target.Cells(rowIndex, 3)
End Function

SQL Solution

As Ryan Wildry stated "Using SQL is great for these type of set based comparisons." but this can be a little tricky. The way I did it is I pasted my datasets into an Access Database as tables and use the Query Designer to help me write the code.

SELECT Dump.ID, Dump.Name, Dump.Department, "Item found in ""Dump"" but not in ""Active Directory""" AS [Found]
FROM Dump
WHERE (((Exists (SELECT NULL
  FROM [Active Directory]
  WHERE ([Active Directory].ID = Dump.ID) AND ([Active Directory].Name = Dump.Name) AND ([Active Directory].Department = Dump.Department)
))=False));
UNION ALL SELECT [Active Directory].ID, [Active Directory].Name, [Active Directory].Department, "Item found in ""Active Directory"" but not in ""[Active Directory]""" AS [Found]
FROM [Active Directory]
WHERE (((Exists (SELECT NULL
  FROM [Dump]
  WHERE ([Active Directory].ID = Dump.ID) AND ([Active Directory].Name = Dump.Name) AND ([Active Directory].Department = Dump.Department)
))=False));

I then aliased the tables to make it easier to replace the table names with the Excel Table Definition.

 SELECT t1.Name, t1.ID,  t1.Department, 'Item found in "Dump" but not in "Active Directory"' AS [Found]
 FROM [Dump$B4:E23] As t1
 WHERE (((Exists (SELECT NULL
   FROM [Active Directory] As t2
   WHERE (t2.ID = t1.ID) And (t2.Name = t1.Name) And (t2.Department = t1.Department)
 ))=False))
 UNION ALL
 SELECT t1.Name, t1.ID,  t1.Department, 'Item found in "Active Directory" but not in "Dump"' AS [Found]
 FROM [Active Directory$B4:E20] As t1
 WHERE (((Exists (SELECT NULL
   FROM [Dump] As t2
   WHERE (t2.ID = t1.ID) And (t2.Name = t1.Name) And (t2.Department = t1.Department)
 ))=False))

Now that I had the SQL working, I replaced the messages and created a single Select statement that I could modify to handle both selecting record in Dump and not in Active Directory or selecting records in Active Directory that are not in Dump.

SELECT t1.ID, t1.Name, t1.Department, "Message" AS [Found]
FROM [Dump] As t1
WHERE (((Exists (SELECT NULL
  FROM [Active Directory] As t2
  WHERE (t2.ID = t1.ID) AND (t2.Name = t1.Name) AND (t2.Department = t1.Department)
))=False));

Sub FindUnmatchedRowsCopyFromRecordset()

Create a recordset and use Range.CopyFromRecordset to transfer the records.

Sample SQl:

SELECT t1.Name, t1.ID,  t1.Department, 'Item found in "Dump" but not in "Active Directory"' AS [Found]
FROM [Dump$B4:E23] As t1
WHERE (((Exists (SELECT NULL
  FROM [Active Directory$B4:E20] As t2
  WHERE (t2.ID = t1.ID) And (t2.Name = t1.Name) And (t2.Department = t1.Department)
))=False))
UNION ALL
SELECT t1.Name, t1.ID,  t1.Department, 'Item found in "Active Directory" but not in "Dump"' AS [Found]
FROM [Active Directory$B4:E20] As t1
WHERE (((Exists (SELECT NULL
  FROM [Dump$B4:E23] As t2
  WHERE (t2.ID = t1.ID) And (t2.Name = t1.Name) And (t2.Department = t1.Department)
))=False))

Sub FindUnmatchedRowsAppend()

This is a slightly more complicated technique that appends the records directly to the Output tab.

Sample SQl:

INSERT INTO [Output$B4:E4] SELECT t3.* FROM (SELECT t1.Name, t1.ID,  t1.Department, 'Item found in "Dump" but not in "Active Directory"' AS [Found]
FROM [Dump$B4:E23] As t1
WHERE (((Exists (SELECT NULL
  FROM [Active Directory$B4:E20] As t2
  WHERE (t2.ID = t1.ID) And (t2.Name = t1.Name) And (t2.Department = t1.Department)
))=False))
UNION ALL
SELECT t1.Name, t1.ID,  t1.Department, 'Item found in "Active Directory" but not in "Dump"' AS [Found]
FROM [Active Directory$B4:E20] As t1
WHERE (((Exists (SELECT NULL
  FROM [Dump$B4:E23] As t2
  WHERE (t2.ID = t1.ID) And (t2.Name = t1.Name) And (t2.Department = t1.Department)
))=False))) as t3

Code

Option Explicit

Sub FindUnmatchedRowsAppend()
    Dim conn As Object
    Set conn = getThisWorkbookConnection
    conn.Open

    DeleteOutputResults
    Dim OutputDef As String
    OutputDef = getTableDefinition(wsOutput)

    Dim SQL As String
    SQL = "INSERT INTO " & OutputDef & " SELECT t3.* FROM (" & getOutputResultQuery & ") as t3"

    conn.Execute SQL
    conn.Close
End Sub

Public Sub FindUnmatchedRowsCopyFromRecordset()
    Dim conn As Object
    Set conn = getThisWorkbookConnection
    conn.Open
    Dim SQL As String
    SQL = getOutputResultQuery

    Dim rs As Object
    Set rs = conn.Execute(SQL)

    DeleteOutputResults
    wsOutput.Range("B5").CopyFromRecordset rs

    conn.Close

End Sub

Private Function getOutputResultQuery() As String
    Dim ActiveDirectoryDef As String
    ActiveDirectoryDef = getTableDefinition(wsActiveDirectory)

    Dim DumpDef As String
    DumpDef = getTableDefinition(wsDump)

    Const BaseSQl As String = "SELECT t1.Name, t1.ID,  t1.Department, '@Message' AS [Found]" & vbNewLine & _
        "FROM [xlTable1] As t1" & vbNewLine & _
        "WHERE (((Exists (SELECT NULL" & vbNewLine & _
        "  FROM [xlTable2] As t2" & vbNewLine & _
        "  WHERE (t2.ID = t1.ID) And (t2.Name = t1.Name) And (t2.Department = t1.Department)" & vbNewLine & _
        "))=False))"

    Dim SelectDump As String
    SelectDump = Replace(BaseSQl, "[xlTable1]", DumpDef)
    SelectDump = Replace(SelectDump, "[xlTable2]", ActiveDirectoryDef)
    SelectDump = Replace(SelectDump, "@Message", "Item found in ""Dump"" but not in ""Active Directory""")

    Dim SelectAD As String
    SelectAD = Replace(BaseSQl, "[xlTable1]", ActiveDirectoryDef)
    SelectAD = Replace(SelectAD, "[xlTable2]", DumpDef)
    SelectAD = Replace(SelectAD, "@Message", "Item found in ""Active Directory"" but not in ""Dump""")

    Dim SQL As String
    SQL = SelectDump & vbNewLine & "UNION ALL" & vbNewLine & SelectAD

    getOutputResultQuery = SQL
End Function

Private Sub DeleteOutputResults()
    Dim Target As Range
    With wsOutput
        Set Target = .Range("B4:E4", .Cells(.Rows.Count, "B").End(xlUp))
        Target.Offset(1).ClearContents
    End With
End Sub

Private Function getTableDefinition(ws As Worksheet) As String
    Dim Target As Range
    Select Case ws.Name
        Case wsActiveDirectory.Name, wsDump.Name
            With ws
                Set Target = .Range("B4:E4", .Cells(.Rows.Count, "B").End(xlUp))
            End With
            If ws Is wsActiveDirectory Then
                Rem Remove Summary Row
                Set Target = Target.Resize(Target.Rows.Count - 1)
            End If
        Case wsOutput.Name
            With ws
                Set Target = .Range("B4:E4", .Cells(.Rows.Count, "B").End(xlUp))
            End With
    End Select

    getTableDefinition = getTableDefinitionFromRange(Target)
End Function

Private Function getThisWorkbookConnection() As Object
    Dim conn As Object
    Set conn = CreateObject("ADODB.Connection")
    With conn
       .Provider = "Microsoft.ACE.OLEDB.12.0"
       .ConnectionString = "Data Source=" & ThisWorkbook.FullName & ";" & _
       "Extended Properties=""Excel 12.0 Xml;HDR=YES"";"
    End With

    Set getThisWorkbookConnection = conn
End Function

Private Function getTableDefinitionFromRange(Target As Range) As String
    Dim SheetName As String
    SheetName = Target.Parent.Name
    Dim Address As String
    Address = Target.Address(RowAbsolute:=False, ColumnAbsolute:=False)
    getTableDefinitionFromRange = "[" & SheetName & "$" & Address & "]"
End Function

Download Link

ADDump.xlsm

\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.