0
\$\begingroup\$

Background

The CSV files being used can be 6+ mill records in size, using a small sample size to proof the methodology.

Why are you doing this on excel? Access or SQL would be quicker

My company has disabled the VBE for anyone who is not recognised as a developer (progressive, I know.....).

I want to submit a third party macro & be recognised as a developer to continue building small applications when native tools do not fulfil my needs.

Objective

  1. Import two CSV files & add to respective arrays for comparison (attached for reference)
  2. Files contain account & unit balance data (dummy data for this example)
  3. Consolidate records onto the 'original' array
  4. If a record from the 'original' file is also on the 'external' file, the units from the 'external' file should be added to the corresponding record on the 'original' file.
  5. Output a CSV file with double quote text qualifiers & ";" delimiters.

enter image description here enter image description here

I have compiled the below code to complete but really appreciate any feedback on how to optimise given the size of the data that will be used.

Sub Import_Files_Create_Consolidated_File()

    Const txtFilepath = "C:\Users\jason\OneDrive\Desktop\VBA_Folder\Macros\Balance_Consolidation\"
    Dim arrOrig, arrExt As Variant
    Dim i, j, n, n2, x As Long
    Dim joinOrig, joinExt, strTempArray1d, strTempArray1dExt As String
    Dim FSOreadOrigBal As Object: Set FSOreadOrigBal = CreateObject("Scripting.FileSystemObject")
    Dim FSOreadExtBal As Object: Set FSOreadExtBal = CreateObject("Scripting.FileSystemObject")
    Dim FSOcreatetxt As Object: Set FSOcreatetxt = CreateObject("Scripting.FileSystemObject")
    'I move output file to later in the procedure after err handling is finished
    Dim outputFile As Object: Set outputFile = FSOcreatetxt.CreateTextFile(txtFilepath & "Output.csv", True)
    Dim fdImportOrigBal As FileDialog: Set fdImportOrigBal = Application.FileDialog(msoFileDialogFilePicker)
    Dim fdImportExtBal As FileDialog: Set fdImportExtBal = Application.FileDialog(msoFileDialogFilePicker)
    Dim strOrigBal, strExtBal As String
    Dim tempArray1d, tempArray1dExt, finalArrayOrig2d, finalArrayExt2d As Variant
    
        'Allow the user to select their input files, Enter err handling later
        With fdImportOrigBal
            .Filters.Clear
            .Filters.Add "CSV files", "*.CSV*"
            .Title = "Select original balance file"
            .AllowMultiSelect = False
            .InitialFileName = txtFilepath
            .Show
        End With
                
            strOrigBal = fdImportOrigBal.SelectedItems(1)
            
            Set FSOreadOrigBal = FSOreadOrigBal.OpenTextFile(strOrigBal, 1)
    
        With fdImportExtBal
            .Filters.Clear
            .Filters.Add "CSV files", "*.CSV*"
            .Title = "Select external balance file"
            .AllowMultiSelect = False
            .InitialFileName = txtFilepath
            .Show
        End With
                
            strExtBal = fdImportExtBal.SelectedItems(1)
            
            Set FSOreadExtBal = FSOreadExtBal.OpenTextFile(strExtBal, 1)
        
        'Compile both files into a 1d array
        arrOrig = Split(FSOreadOrigBal.ReadAll, vbNewLine)
        arrExt = Split(FSOreadExtBal.ReadAll, vbNewLine)
        
        'So I want to check if an item is on the external array but not on the original array_
        'If not found, move the item from the external array & add to the original array_
        'Important that the item just moved is deleted from the external array, otherwise it will incorrectly sum the units when_
        'we complete this check
        'I will need to only use the first 4 columns for the unique key, as units will differ even if on both external & original arrays.
        
        ReDim tempArray1d(UBound(arrOrig))
        
        'Create unique key on original array & pass it to a temp array
        For n = LBound(arrOrig, 1) + 1 To UBound(arrOrig, 1)
            tempArray1d(n) = Split(arrOrig(n), ",")(0) & Split(arrOrig(n), ",")(1) & Split(arrOrig(n), ",")(2) & Split(arrOrig(n), ",")(3)
        Next n
        
        ReDim tempArray1dExt(UBound(arrExt))
        
        'Create unique key on original array & pass it to a temp array_
        'Using a UDF & unique key, search the temp original array to find the record from the external array_
        'If not found, resize original array & add the record which has not been found.
        For n = LBound(arrExt, 1) + 1 To UBound(arrExt, 1)
        
            tempArray1dExt(n) = Split(arrExt(n), ",")(0) & Split(arrExt(n), ",")(1) & Split(arrExt(n), ",")(2) & Split(arrExt(n), ",")(3)
            strTempArray1dExt = tempArray1dExt(n)
            
                         'Credit function - https://stackoverflow.com/questions/38267950/check-if-a-value-is-in-an-array-or-not-with-excel-vba
                         If IsInArray(strTempArray1dExt, tempArray1d) = False Then

                            ReDim Preserve arrOrig(UBound(arrOrig) + 1)
                            arrOrig(UBound(arrOrig)) = arrExt(n)
                            arrExt(n) = ""
                            
                            End If
        Next n
        
        'Now I want to resize my array before making a 2d array by removing the blank indexes
        
        ReDim tempArray1dExt(0)
        
        For n = LBound(arrExt, 1) To UBound(arrExt, 1)
            'Write non blank rows to a different temp array & then resize & write back to arrExt
            
            If arrExt(n) <> "" Then
                tempArray1dExt(UBound(tempArray1dExt)) = arrExt(n)
                ReDim Preserve tempArray1dExt(UBound(tempArray1dExt) + 1)
            End If
           
        Next n
        
        ReDim arrExt(UBound(tempArray1dExt) - 1)

        For n = LBound(tempArray1dExt, 1) To UBound(tempArray1dExt, 1) - 1
             arrExt(n) = tempArray1dExt(n)
        Next n

        'Create the template for my final 2d array, will be used to write to a CSV output
        
        ReDim finalArrayOrig2d(UBound(arrOrig), 4)

        For n = LBound(arrOrig, 1) To UBound(arrOrig, 1)
            
            tempArray1d = Split(arrOrig(n), ",")

                For n2 = 0 To 4
                    finalArrayOrig2d(n, n2) = tempArray1d(n2)
                Next n2
        Next n
        
        'Create the template for my final 2d external array, will be used to sum units if unique key is found
        
        ReDim finalArrayExt2d(UBound(arrExt), 4)

        For n = LBound(arrExt, 1) To UBound(arrExt, 1)
            
            tempArray1d = Split(arrExt(n), ",")

                For n2 = 0 To 4
                    finalArrayExt2d(n, n2) = tempArray1d(n2)
                Next n2
        Next n
        
        'Change data type to double in final arrays for units addition
        
        For i = LBound(finalArrayOrig2d, 1) + 1 To UBound(finalArrayOrig2d, 1)
            finalArrayOrig2d(i, 4) = CDbl(finalArrayOrig2d(i, 4))
        Next i
        
        For i = LBound(finalArrayExt2d, 1) + 1 To UBound(finalArrayExt2d, 1)
            finalArrayExt2d(i, 4) = CDbl(finalArrayExt2d(i, 4))
        Next i
        
        'Enter a loop to determine if there is a match & add units if there is
        
            For i = LBound(finalArrayOrig2d, 1) + 1 To UBound(finalArrayOrig2d, 1)
                joinOrig = Join(Array(finalArrayOrig2d(i, 0), finalArrayOrig2d(i, 1), finalArrayOrig2d(i, 2), finalArrayOrig2d(i, 3)))
                
                    'Enter external balance file & compile unique key
                    For j = LBound(finalArrayExt2d, 1) + 1 To UBound(finalArrayExt2d, 1)
                        joinExt = Join(Array(finalArrayExt2d(j, 0), finalArrayExt2d(j, 1), finalArrayExt2d(j, 2), finalArrayExt2d(j, 3)))

                            'Add units here is a match
                            If joinOrig = joinExt Then
                                 finalArrayOrig2d(i, 4) = finalArrayOrig2d(i, 4) + finalArrayExt2d(j, 4)
                            End If
                    Next j
            Next i
        
        'Wrap final array in double quote text qualifiers
        For i = LBound(finalArrayOrig2d, 1) To UBound(finalArrayOrig2d, 1)
            For j = LBound(finalArrayOrig2d, 2) To UBound(finalArrayOrig2d, 2)
            
            finalArrayOrig2d(i, j) = """" & finalArrayOrig2d(i, j) & """"
            Next j
        Next i
        
        'Write the updated original array to a CSV file
        For n = LBound(finalArrayOrig2d, 1) To UBound(finalArrayOrig2d, 1)
            outputFile.WriteLine Join(Array(finalArrayOrig2d(n, 0), finalArrayOrig2d(n, 1), finalArrayOrig2d(n, 2), finalArrayOrig2d(n, 3), finalArrayOrig2d(n, 4)), ";")
        Next n


End Sub

Public Function IsInArray(stringToBeFound As String, arr As Variant) As Boolean
    Dim i As Long
    For i = LBound(arr) To UBound(arr)
        If arr(i) = stringToBeFound Then
            IsInArray = True
            Exit Function
        End If
    Next i
    IsInArray = False

End Function
\$\endgroup\$

2 Answers 2

1
\$\begingroup\$

A well-known code development aphorism is "Make it work. Make it right. Make it fast" (Kent Beck). The most nebulous statement here is "Make it Right". I would propose that making code easier to read/understand, maintain, and test is a reasonable interpretation of "Make it Right".

Making it Work

Since you've posted this question on CR, it is assumed that the code works.
Step 1 complete!

Making it Right

Comprehension of code for a first time reader is made much easier if it emulates an outline of a document. That is, a well written document uses a nested structure consisting of a title, sections, paragraphs, and sentences.

When a module's code consists of a single long procedure, it is like publishing a document where there is a title (EntryPoint Sub), no sections, and a single, really really long paragraph (the code). Sure, it's possible to read it. But, it is painful - and the reader is likely to give up if the ideas are not presented in a way that can be easily organized in the reader's head. (BTW: the 'first time reader' includes yourself when the code is set aside for a few days/weeks/months).

So, how to evaluate your code to avoid the single long paragraph in the above analogy? The simplest metric to apply is lines of code (Loc). Which, of course, only generates a new question - How many lines of code are just right? There is no perfect answer. But, if a reader has to scroll up and down frequently to see and understand the content in a function, then that function is likely too long. (BTW: after completing the "Make it Work" step, my own code always contains functions that are too long)

Import_Files_Create_Consolidated_File has 167 lines of code...that's a lot of scrolling. Everyone's setup is different, but, when I have the VBE open, my editor pane displays around 32 lines (+/- depending on my Immediate and Locals Windows height). For lack of any better criteria, I would set out to initially get Import_Files_Create_Consolidated_File to fit within 32 lines of code.

Despite my justifications, 32-Loc per function is an arbitrary goal (because, admittedly, one could just buy a really big monitor to avoid scrolling). The point of establishing a function Loc limit is that it naturally forces the use of helper functions to carry out specific tasks. After refactoring, Import_Files_Create_Consolidated_File will contain function calls with names that describe the high-level tasks necessary to 'ImportFiles' and 'CreateConsolidatedFile'. Additionally, coding best practices are also naturally brought to bear during refactoring to meet this goal. Two in particular: The Single Responsibility Principle (SRP) and Don't Repeat Yourself (DRY).

Import_Files_Create_Consolidated_File can be easily reduced to the following code:

Option Explicit

Private Const txtFilepath As String = "C:\Users\jason\OneDrive\Desktop\VBA_Folder\Macros\Balance_Consolidation\"

Private strOrigBal As String
Private strExtBal As String

Sub Import_Files_Create_Consolidated_File()
    
    ImportFiles
    
    CreateConsolidatedFile strOrigBal, strExtBal, txtFilepath & "Output.csv"
End Sub
 
Private Sub ImportFiles()
        
    Dim fdImportOrigBal As FileDialog
    Set fdImportOrigBal = Application.FileDialog(msoFileDialogFilePicker)
    
    Dim fdImportExtBal As FileDialog
    Set fdImportExtBal = Application.FileDialog(msoFileDialogFilePicker)
        
    'Allow the user to select their input files, Enter err handling later
    With fdImportOrigBal
        .Filters.Clear
        .Filters.Add "CSV files", "*.CSV*"
        .Title = "Select original balance file"
        .AllowMultiSelect = False
        .InitialFileName = txtFilepath
        .Show
    End With
            
    strOrigBal = fdImportOrigBal.SelectedItems(1)
        
    With fdImportExtBal
        .Filters.Clear
        .Filters.Add "CSV files", "*.CSV*"
        .Title = "Select external balance file"
        .AllowMultiSelect = False
        .InitialFileName = txtFilepath
        .Show
    End With
            
    strExtBal = fdImportExtBal.SelectedItems(1)
End Sub

Public Sub CreateConsolidatedFile(ByVal pOrigBalFilepath As String, _
    ByVal pExtBalFilepath As String, ByVal pOutputFilepath As String)
    
    Dim FSOreadOrigBal As Object
    Set FSOreadOrigBal = CreateObject("Scripting.FileSystemObject")
    
    Dim FSOreadExtBal As Object
    Set FSOreadExtBal = CreateObject("Scripting.FileSystemObject")
    
    Set FSOreadOrigBal = FSOreadOrigBal.OpenTextFile(pOrigBalFilepath, 1)
    Set FSOreadExtBal = FSOreadExtBal.OpenTextFile(pExtBalFilepath, 1)
    
    'Compile both files into a 1d array

    '** The rest of the code **

End Sub

Import_Files_Create_Consolidated_File now contains 4 lines of code...Check. It presents better because the wall-of-declarations is gone and it is easy to tell what the high level tasks are required to be accomplished. Note: I've moved two variables and a constant to module-scope to simplify the hand-off of data between the two procedures. That's not necessarily a good practice, but handing off the data differently brings in more concepts that would distract from the goals of this answer.

ImportFiles now contains 29 lines of code. This meets that 32-line criteria - and it is easy to read and understand. Still, I want to refactor this function to be even shorter to demonstrate the DRY principle. There is a fair amount of repeated code here. Removing the repeated code using a new function results in:

Private Sub ImportFiles()

    strOrigBal = GetFilePathFromUser(txtFilepath, "Select original balance file")
    
    strExtBal = GetFilePathFromUser(txtFilepath, "Select external balance file")
        
End Sub

Private Function GetFilePathFromUser(ByVal pInitialPath As String, _
    ByVal pTitle As String) As String
    
    Dim fDialog As FileDialog
    Set fDialog = Application.FileDialog(msoFileDialogFilePicker)
    
    With fDialog
        .Filters.Clear
        .Filters.Add "CSV files", "*.CSV*"
        .Title = pTitle
        .AllowMultiSelect = False
        .InitialFileName = pInitialPath
        .Show
    End With
            
    GetFilePathFromUser = fDialog.SelectedItems(1)
End Function

ImportFiles is now really short. It's now so short that it's content can be relocated back to the entry point Sub. Doing so removes the need for the module-scope fields introduced by the initial refactoring. The code now looks like:

Option Explicit

Sub Import_Files_Create_Consolidated_File()
    
    Const txtFilepath As String _
        = "C:\Users\jason\OneDrive\Desktop\VBA_Folder\Macros\Balance_Consolidation\"
    
    Dim strOrigBal As String
    Dim strExtBal As String
    
    strOrigBal = GetFilePathFromUser(txtFilepath, "Select original balance file")
    
    strExtBal = GetFilePathFromUser(txtFilepath, "Select external balance file")
    
    CreateConsolidatedFile strOrigBal, strExtBal, txtFilepath & "Output.csv"
End Sub
 
Public Sub CreateConsolidatedFile(ByVal pOrigBalFilepath As String, _
    ByVal pExtBalFilepath As String, ByVal pOutputFilepath As String)
    
    Dim FSOreadOrigBal As Object
    Set FSOreadOrigBal = CreateObject("Scripting.FileSystemObject")
    
    Dim FSOreadExtBal As Object
    Set FSOreadExtBal = CreateObject("Scripting.FileSystemObject")
    
    Set FSOreadOrigBal = FSOreadOrigBal.OpenTextFile(pOrigBalFilepath, 1)
    Set FSOreadExtBal = FSOreadExtBal.OpenTextFile(pExtBalFilepath, 1)
    
    'Compile both files into a 1d array

    '** The rest of the code **

End Sub

Private Function GetFilePathFromUser(ByVal pInitialPath As String, _
    ByVal pTitle As String) As String
    
    Dim fDialog As FileDialog
    Set fDialog = Application.FileDialog(msoFileDialogFilePicker)
    
    With fDialog
        .Filters.Clear
        .Filters.Add "CSV files", "*.CSV*"
        .Title = pTitle
        .AllowMultiSelect = False
        .InitialFileName = pInitialPath
        .Show
    End With
            
    GetFilePathFromUser= fDialog.SelectedItems(1)
End Function

Import_Files_Create_Consolidated_File now has 12 lines of code (including the 2 lines used for Const txtFilepath to avoid scrolling left/right)...Check

GetFilePathFromUser now has 15 lines of code...Check

CreateConsolidatedFile is now somewhere around 132 lines. Progress, but there is more work to be done.

Testing

The changes so far have been fairly straight forward. And, it is likely that the code hasn't been broken and still 'works'. However, additional refactoring of CreateConsolidatedFile will be more complex and incurs the risk of breaking the code. How should one proceed from this point and avoid this risk?

Separating the User interaction code (retained in the EntryPoint procedure) from the consolidation code (CreateConsolidateFile) has resulted in making CreateConsolidateFile a testable function. This UI/Domain logic separation is really important to facilitate testing. Now you can add test functions in a separate test module that looks something like:

Sub TestConsolidatedFile()


    Const myTestOriginalBalanceFilepath = "asdf/asdf/MyTestFiles/TestBalances.csv"
    
    Const myTestExtBalanceFilepath = "asdf/asdf/MyTestFiles/TestExtBalances.csv"
    
    Const myTestOutputFilepath = "asdf/asdf/MyTestFiles/TestConsolidationResults.csv"
    
    CreateConsolidatedFile myTestOriginalBalanceFilepath, _
        myTestExtBalanceFilepath, myTestOutputFilepath
        
    Dim testResult As Boolean
    testResult = False


    'Add code to read/check that TestConsolidationResults.csv contains what you expect and
    'sets the testResult flag
    
    
    If testResult Then
        Debug.Print "Success!"
    Else
        Debug.Print "TestConsolidatedFile Failed"
    End If
End Sub

If you don't have them, create test files that can be used with CreateConsolidatedFile. Add the necessary code to get the above test to print the success message when it executes. Now, you have a the ability to detect when you make a change that breaks the original/working logic.

From this point forward, the process is Modify-Test-Fix(if it fails) until all your functions are around 32 (or your own limit) lines of code. You will likely discover that many comments become unnecessary as function names describe what is happening within them. With respect to code documentation, always prefer a well-named function to a comment.

Make it Fast

By applying DRY and SRP along the way, the result will be code that is far more readable and easier to "Make Fast". When you are ready for that step, create test files and tests specifically designed to stress-test your code to verify that it performs well against your anticipated real csv files.

Obligatory

  1. Your posted code does not declare Option Explicit at the top of the module. Get in the habit of always doing this. It guarantees that you declare all variables before they are used. The VBE Tools->Options has a 'Require Variable Declaration' setting that will insert Option Explicit for new modules automatically.
  2. Variable Declarations

VBA interprets

     Dim joinOrig, joinExt, strTempArray1d, strTempArray1dExt As String

As

     Dim joinOrig As Variant
     Dim joinExt As Variant
     Dim strTempArray1d As Variant
     Dim strTempArray1dExt As String

The code still 'works' because a Variant can be assigned values or objects. And, in this case is able to behave like a String if it is assigned a String value. Still, it is a best practice to explicitly declare any variable as its intended Type rather than letting VBA decide for you.

More information

If you are interested in more information about best/better coding practices demonstrated using VBA, I would recommend you visit RubberDuck and consider using the free RubberDuck Excel Addin to augment the VBE. Full Disclosure: I contribute to the Rubberduck project...but you should consider using it nevertheless :)

\$\endgroup\$
2
  • \$\begingroup\$ Nice review! Introduction to a lot of concepts. I really like the point on some sub becoming short enough that it no longer needs to be its own thing and can be expanded back into the parent method. Another general rule of thumb is not LOC, but if a child sub has all the same parameters as the parent sub then it should not be its own routine. One point, the only place you should really never have underscores is in public method names (and public variables) - it will come back to bite you when you get round to using interfaces, and you want to remain consistent throughout. PascalCase Not_This. \$\endgroup\$
    – Greedo
    Commented Apr 21, 2022 at 8:09
  • \$\begingroup\$ Really appreciate your in depth review! It's much clearer to me now how an experienced person constructs the project in their mind & putting it to action. I will replicate these concepts on some other projects I have lined up. I can't upvote just yet!! \$\endgroup\$
    – JasonC
    Commented Apr 25, 2022 at 18:13
0
\$\begingroup\$

The structure of your code would suggest that your chances of being recognised as a developer are practically zero.

You have many functions encapsulated in a single method. Split your Method into smaller methods that can be tested.

You are manually searching linear arrays. Itr would be simpler to put those arrays in a dictionary and use the .Exists method (or an ArrayList and use the '.Contains' method.

You don't provide any test methods for your code so maintenenance is going to be an issue.

\$\endgroup\$
2
  • \$\begingroup\$ Don't worry, I'm aware it's not optimal, I'm pretty much at ground zero of my learnings. I will attempt both of your suggestions. What do you mean y test methods? \$\endgroup\$
    – JasonC
    Commented Apr 19, 2022 at 12:59
  • \$\begingroup\$ Google 'Test driven development'. \$\endgroup\$
    – Freeflow
    Commented Apr 19, 2022 at 13:06

Not the answer you're looking for? Browse other questions tagged or ask your own question.