6
\$\begingroup\$

The below function is used to Remove Duplicate Lines per each cell.
It works without problem, but it is slow with a range of only one column and 17k rows.
Actually, (with the same range in addition to another two column) , I am using a different excellent code by @VBasic2008 Link which perform complex tasks than my function and it takes just 1.5 second to finish.
I do not know where is the bottleneck on my code and How to optimize it.
There is no problem to totally change my codes or provide a new one. In advance, pleased for all your help.

Option Explicit
Option Compare Text
 
Function RemoveDuplicateLines(ByVal Text As String, Optional delimiter As String = vbLf) As String
 
    Dim dictionary As Object
    Dim x, part
 
    Set dictionary = CreateObject("Scripting.Dictionary")
    dictionary.CompareMode = vbTextCompare
    For Each x In Split(Text, delimiter)
        part = Trim(x)
        If part <> "" And Not dictionary.Exists(part) Then
            dictionary.Add part, Nothing
        End If
    Next
 
    If dictionary.Count > 0 Then
        RemoveDuplicateLines = Join(dictionary.keys, delimiter)
    Else
        RemoveDuplicateLines = ""
    End If
 
    Set dictionary = Nothing
End Function
 
Sub Remove_Duplicate_Lines()
    With Application
       .Calculation = xlCalculationManual
       .ScreenUpdating = False
       .EnableEvents = False
    End With
 
On Error GoTo Errorhandler
 
    Dim ws As Worksheet: Set ws = sh2
     Dim crg As Range
     Set crg = ws.Range("O2:O" & ws.Cells(Rows.Count, "O").End(xlUp).Row) '#Column contains Combined URL
 
   Dim arr: arr = crg.Value2
   Dim i As Long
    For i = LBound(arr) To UBound(arr)
         arr(i, 1) = RemoveDuplicateLines(arr(i, 1))
    Next i
   crg.value = arr
 
Errorhandler:
    With Application
      .Calculation = xlCalculationAutomatic
      .ScreenUpdating = True
      .EnableEvents = True
    End With
End Sub
\$\endgroup\$
1
  • 1
    \$\begingroup\$ You should better define « slow », is it 5 sec, 30 sec, 5 mins ? \$\endgroup\$ Commented Aug 4, 2022 at 6:23

1 Answer 1

6
\$\begingroup\$

Binding

Late-binding is slower than early-binding. Add a reference to Microsoft Scripting Runtime:
ref

then change Dim dictionary As Object to Dim dict As Dictionary and Set dictionary = CreateObject("Scripting.Dictionary") to Set dict = New Dictionary. Note that these are not the final declarations but more on this below.

Declare variables

The line Dim x, part simply declares 2 variables of type Variant by default. Although x needs to be a Variant because it's used in a For Each... loop you should still declare it as a best practice. part however should be declared as String. Never use Variant if you already know the var type because the extra wrapping is using extra resources.

Also, a For... To... loop is slightly faster on a 1D array, compared to a For Each..., so really you don't need x to be Variant at all but rather an iterator declared as Long and an array of String() as that is returned by Split.

Efficiency

You are presuming that all values in the O column are strings. It's better to only run the RemoveDuplicateLines for strings only and not for anything else like blanks or numbers. So, use VarType to check for type.

If the number of individual lines returned by the call to Split is exactly the same as the number of keys in the dictionary (i.e. no duplicates) then there is no need to join the keys because the original string would already be satisfactory as the result. Same goes for trimming - if trim does not remove any character then we can use the original text as long as there were no duplicates either.

You could avoid a lot of string copying by changing the string by reference and not returning as a result of the function. This improves efficiency a lot.

Using a Static dictionary will remove the need to instantiate a Dictionary on each call.

Other

You should not restore the state of the application to 'On' as maybe it was intentionally off before running your macro. So, store state, turn things off and finally restore when done.

To make the main method reusable, you should pass the range from a higher level method call so that you can run your macro on other ranges as well.

No need for Option Compare Text as the Dictionary.CompareMode option takes care of text comparison for keys.

Solution

Run Main method below:

Option Explicit

Public Sub Main()
    Dim rng As Range
    '
    On Error Resume Next
    With ActiveSheet 'Or whatever worksheet
        Set rng = .Range("O2:O" & .Cells(Rows.Count, "O").End(xlUp).Row) 'Or whatever range
    End With
    On Error GoTo 0
    If rng Is Nothing Then Exit Sub 'Or display a message
    '
    RemoveDuplicateLinesFromRange rng
End Sub

Public Sub RemoveDuplicateLinesFromRange(ByVal rng As Range _
                                       , Optional ByVal delimiter As String = vbLf)
    If rng Is Nothing Then
        Err.Raise 91, , "Range not set"
    ElseIf rng.Areas.Count > 1 Then
        Err.Raise 5, , "Non-contigous range"
    End If
    '
    Dim xlCalc As XlCalculation: xlCalc = Application.Calculation
    Dim displayOn As Boolean:    displayOn = Application.ScreenUpdating
    Dim eventsOn As Boolean:     eventsOn = Application.EnableEvents
    '
    With Application
       If xlCalc <> xlCalculationManual Then .Calculation = xlCalculationManual
       If displayOn Then .ScreenUpdating = False
       If eventsOn Then .EnableEvents = False
    End With
    '
    Dim arr() As Variant
    Dim i As Long
    Dim j As Long
    '
    If rng.Count = 1 Then
        ReDim arr(1 To 1, 1 To 1)
        arr(1, 1) = rng.Value2
    Else
        arr = rng.Value2
    End If
    '
    If UBound(arr, 1) < UBound(arr, 2) Then
        For i = LBound(arr, 1) To UBound(arr, 1)
            For j = LBound(arr, 2) To UBound(arr, 2)
                RemoveDuplicateLines arr(i, j), delimiter
            Next j
        Next i
    Else
        For j = LBound(arr, 2) To UBound(arr, 2)
            For i = LBound(arr, 1) To UBound(arr, 1)
                RemoveDuplicateLines arr(i, j), delimiter
            Next i
        Next j
    End If
    rng.Value2 = arr
    '
    With Application
       If xlCalc <> xlCalculationManual Then .Calculation = xlCalc
       If displayOn Then .ScreenUpdating = True
       If eventsOn Then .EnableEvents = True
    End With
End Sub

Private Sub RemoveDuplicateLines(ByRef v As Variant _
                               , Optional ByVal delimiter As String = vbLf)
    If VarType(v) <> vbString Then Exit Sub
    '
    Static dict As dictionary
    Dim parts() As String
    Dim i As Long
    Dim hasChanged As Boolean
    '
    If dict Is Nothing Then
        Set dict = New dictionary
        dict.CompareMode = vbTextCompare
    Else
        dict.RemoveAll
    End If
    '
    parts = Split(v, delimiter)
    If LBound(parts) = UBound(parts) Then
        v = Trim$(v)
        Exit Sub
    End If
    '
    For i = LBound(parts, 1) To UBound(parts, 1)
        If TrimIfNeeded(parts(i)) Then hasChanged = True
        dict(parts(i)) = Empty
    Next
    '
    If hasChanged Or (UBound(parts, 1) - LBound(parts, 1) + 1 > dict.Count) Then
        v = Join(dict.Keys, delimiter)
    End If
End Sub

Private Function TrimIfNeeded(ByRef Text As String) As Boolean
    Dim size As Long: size = Len(Text)
    If size = 0 Then Exit Function
    '
    Text = Trim$(Text)
    TrimIfNeeded = (size > Len(Text))
End Function

Final thoughts

You might want to check for formulas. When you read an entire range, you could have a combination of formulas and values so you might want to update code to exclude formula cells from the macro.

\$\endgroup\$
1
  • 2
    \$\begingroup\$ It works excellently 👍, your code just took 0.4 second to finish, It’s like rocket speed. I will follow your notes and recommendations. \$\endgroup\$
    – Leedo
    Commented Aug 3, 2022 at 13:25

Not the answer you're looking for? Browse other questions tagged or ask your own question.