3
\$\begingroup\$

I have built a COVID model using UVA data. Currently that data is unavailable so I'm using another source. The new source is, of course, a different format. So rather than refactor all of my model macros, I'm formatting the new data in the old format on import.

The new data looks like this:

enter image description here

The xlsx files go out to column EH with a new column added daily. There are 267 rows in these files. The import function ends up with a file that looks like this:

enter image description here

In this file I don't import all the Confirmed = 0 and I wind up deleting a bunch of the countries using the population column (population = 0 gets deleted). So the file end up with about 6800 rows. This will grow daily also.

I have a file for confirmed, another for deaths, and a third for recovered. Importing the Confirmed and deleting what I don't want takes about a minute. When I try to add in the Deaths file, I can see the column being filled in with the correct numbers from the new data but it's taking so long I can't imagine waiting for it to end. I've waited over 30 minutes before hitting Esc and Deaths still won't be finished.

I realize I'm going through a lot of cells a lot of times. So, is there a way to optimize my nested For loops in the Deaths and Recovered file imports to still be in the desired format yet not take over half an hour?

Option Explicit

Sub ImportCSSEConfirmed()

Dim i As Variant
Dim j As Variant
Dim lastrow As Long
Dim clastrow As Long
Dim lastcol As Long
Dim currentData As Range
Dim filePath As String
Dim wb As Excel.Workbook
Dim ws As Excel.Worksheet
Dim cws As Excel.Worksheet

Set cws = ThisWorkbook.Sheets("Raw_Data")

lastrow = cws.Cells(Rows.count, "a").End(xlUp).Row

If lastrow < 2 Then lastrow = 2
Set currentData = cws.Range("a2:l" & lastrow)

currentData.ClearContents

filePath = "C:\Users\chris.h\Desktop\COVID\Other_Data\CSSE\CSSE_Confirmed.xlsx"

Set wb = Excel.Workbooks.Open(filePath)
Set ws = wb.Worksheets(1)
        
lastrow = ws.Cells(Rows.count, "b").End(xlUp).Row
lastcol = ws.Cells(1, Columns.count).End(xlToLeft).Column
        
clastrow = cws.Cells(Rows.count, "a").End(xlUp).Row + 1
        
'takes the csse data files and combines and reformats them into the raw_data sheet in the combined file
'col a = province/state, col b = country, col c = date, col d = confirmed
For i = 2 To lastrow
    For j = 3 To lastcol
        If ws.Cells(i, j).Value <> 0 Then
            cws.Cells(clastrow, "a").Value = ws.Cells(i, 1).Value
            cws.Cells(clastrow, "b").Value = ws.Cells(i, 2).Value
            cws.Cells(clastrow, "c").Value = ws.Cells(1, j).Value
            cws.Cells(clastrow, "d").Value = ws.Cells(i, j).Value
            cws.Cells(clastrow, "d").NumberFormat = "#,##0"
            clastrow = clastrow + 1
        End If
    Next j
Next i
wb.Close False

Call PopulationColumn
Call DeleteExtras

predictDone = False
End Sub

Sub ImportCSSEDeaths()

Dim i As Variant
Dim j As Variant
Dim k As Variant
Dim lastrow As Long
Dim clastrow As Long
Dim lastcol As Long
Dim dte As Date
Dim filePath As String
Dim wb As Excel.Workbook
Dim ws As Excel.Worksheet
Dim cws As Excel.Worksheet

Dim t As Double
Dim tt As String

t = Timer

Set cws = ThisWorkbook.Sheets("Raw_Data")

lastrow = cws.Cells(Rows.count, "a").End(xlUp).Row

filePath = "C:\Users\chris.h\Desktop\COVID\Other_Data\CSSE\CSSE_Deaths.xlsx"

Set wb = Excel.Workbooks.Open(filePath)
Set ws = wb.Worksheets(1)
    
clastrow = cws.Cells(Rows.count, "b").End(xlUp).Row
        
lastrow = ws.Cells(Rows.count, "b").End(xlUp).Row
lastcol = ws.Cells(1, Columns.count).End(xlToLeft).Column

For i = 2 To clastrow
    For j = 2 To lastrow
        For k = 3 To lastcol
            
            If cws.Cells(i, "a").Value = ws.Cells(j, "a").Value And _
            cws.Cells(i, "b").Value = ws.Cells(j, "b").Value And _
            cws.Cells(i, "c").Value = ws.Cells(1, k).Value Then
                    
                cws.Cells(i, "e").Value = ws.Cells(j, k).Value
                cws.Cells(i, "e").NumberFormat = "#,##0"
            End If
        Next k
    Next j
Next i
wb.Close False

tt = Format((Timer - t) / 86400, "hh:mm:ss")
predictDone = False

End Sub

Sub ImportCSSERecovered()

Dim i As Variant
Dim j As Variant
Dim k As Variant
Dim lastrow As Long
Dim clastrow As Long
Dim lastcol As Long
Dim dte As Date
Dim filePath As String
Dim wb As Excel.Workbook
Dim ws As Excel.Worksheet
Dim cws As Excel.Worksheet

Set cws = ThisWorkbook.Sheets("Raw_Data")

lastrow = cws.Cells(Rows.count, "a").End(xlUp).Row

filePath = "C:\Users\chris.h\Desktop\COVID\Other_Data\CSSE\CSSE_Deaths.xlsx"

Set wb = Excel.Workbooks.Open(filePath & fileName)
Set ws = wb.Worksheets(1)
    
clastrow = cws.Cells(Rows.count, "b").End(xlUp).Row
        
lastrow = ws.Cells(Rows.count, "b").End(xlUp).Row
lastcol = ws.Cells(1, Columns.count).End(xlToLeft).Column
        
For i = 2 To clastrow
    For j = 2 To lastrow
        For k = 3 To lastcol
            
            If cws.Cells(i, "a").Value = ws.Cells(j, "a").Value And _
            cws.Cells(i, "b").Value = ws.Cells(j, "b").Value And _
            cws.Cells(i, "c").Value = ws.Cells(1, k).Value Then
                    
                cws.Cells(i, "f").Value = ws.Cells(j, k).Value
                cws.Cells(i, "f").NumberFormat = "#,##0"
            End If
        Next k
    Next j
Next i
wb.Close False

predictDone = False

End Sub
Sub PopulationColumn()

Dim i As Variant
Dim country As String
Dim state As String
Dim rng As Range
Dim lastrow As Long
Dim population As Long
Dim landarea As Double
Dim popdensity As Double
Dim cws As Worksheet

Set cws = ThisWorkbook.Worksheets("Raw_Data")

lastrow = cws.Cells(Rows.count, "b").End(xlUp).Row
Set rng = cws.Range("b2:b" & lastrow)

For Each i In rng
    country = i
    state = cws.Cells(i.Row, "a").Value
    
    If country = "United Arab Emirates" Then
        population = 9890402
        landarea = 32278
        popdensity = population / landarea
    ElseIf country = "Iran" Then
        population = 83992949
        landarea = 628786
        popdensity = population / landarea
    ElseIf country = "Oman" Then
        population = 5080712
        landarea = 119499
        popdensity = population / landarea
    ElseIf country = "Kuwait" Then
        population = 4270571
        landarea = 6880
        popdensity = population / landarea
    ElseIf country = "Bahrain" Then
        population = 1701575
        landarea = 293
        popdensity = population / landarea
    ElseIf country = "Iraq" Then
        population = 40222493
        landarea = 167692
        popdensity = population / landarea
    ElseIf country = "Pakistan" Then
        population = 220892340
        landarea = 297638
        popdensity = population / landarea
    ElseIf country = "Qatar" Then
        population = 2881053
        landarea = 4483
        popdensity = population / landarea
    ElseIf country = "Jordan" Then
        population = 10203134
        landarea = 34278
        popdensity = population / landarea
    ElseIf country = "Saudi Arabia" Then
        population = 34810000
        landarea = 830000
        popdensity = population / landarea
    ElseIf country = "Kazakhstan" Then
        population = 18776707
        landarea = 1042360
        popdensity = population / landarea
    ElseIf country = "Syria" Then
        population = 17500658
        landarea = 70900
        popdensity = population / landarea
    ElseIf country = "Yemen" Then
        population = 29825964
        landarea = 203850
        popdensity = population / landarea
    ElseIf country = "Afghanistan" Then
        population = 38928346
        landarea = 252071
        popdensity = population / landarea
    ElseIf country = "Italy" Then
        population = 60478457
        landarea = 113568
        popdensity = population / landarea
    ElseIf country = "France" Then
        population = 65273511
        landarea = 211413
        popdensity = population / landarea
    ElseIf country = "South Korea" Then
        population = 51269185
        landarea = 37541
        popdensity = population / landarea
    ElseIf country = "Spain" Then
        population = 46754778
        landarea = 192588
        popdensity = population / landarea
    ElseIf state = "South Carolina" Then
        population = 5210095
        landarea = 30111
        popdensity = population / landarea
    ElseIf state = "Texas" Then
        population = 29472295
        landarea = 261914
        popdensity = population / landarea
    ElseIf state = "Georgia" Then
        population = 10736059
        landarea = 57919
        popdensity = population / landarea
    ElseIf state = "Kentucky" Then
        population = 4499692
        landarea = 39732
        popdensity = population / landarea
    ElseIf state = "North Carolina" Then
        population = 10611862
        landarea = 48718
        popdensity = population / landarea
    ElseIf country = "United Kingdom" Then
        population = 67886011
        landarea = 93410
        popdensity = population / landarea
    ElseIf country = "Switzerland" Then
        population = 8654281
        landarea = 15257
        popdensity = population / landarea
    ElseIf country = "Hungary" Then
        population = 9660351
        landarea = 34954
        popdensity = population / landarea
    ElseIf country = "Turkey" Then
        population = 84339067
        landarea = 297156
        popdensity = population / landarea
    ElseIf country = "Portugal" Then
        population = 10196709
        landarea = 35363
        popdensity = population / landarea
    ElseIf country = "Austria" Then
        population = 9010000
        landarea = 31818
        popdensity = population / landarea
    ElseIf country = "Poland" Then
        population = 37846611
        landarea = 118236
        popdensity = population / landarea
    ElseIf country = "Germany" Then
        population = 83783942
        landarea = 134580
        popdensity = population / landarea
    ElseIf country = "Egypt" Then
        population = 102334404
        landarea = 384345
        popdensity = population / landarea
    ElseIf state = "Kansas" Then
        population = 2910357
        landarea = 81823
        popdensity = population / landarea
    ElseIf country = "Argentina" Then
        population = 45516865
        landarea = 1056641
        popdensity = population / landarea
    ElseIf country = "Belize" Then
        population = 397628
        landarea = 8807
        popdensity = population / landarea
    ElseIf country = "Norway" Then
        population = 5413094
        landarea = 141031
        popdensity = population / landarea
    ElseIf country = "Finland" Then
        population = 5540720
        landarea = 117333
        popdensity = population / landarea
    ElseIf country = "Japan" Then
        population = 126476461
        landarea = 140755
        popdensity = population / landarea
    ElseIf country = "Australia" Then
        population = 25701300
        landarea = 2969907
        popdensity = population / landarea
    ElseIf state = "Colorado" Then
        population = 5845526
        landarea = 103730
        popdensity = population / landarea
    ElseIf state = "Oregon" Then
        population = 4301089
        landarea = 96105
        popdensity = population / landarea
    ElseIf country = "Sweden" Then
        population = 10087218
        landarea = 173860
        popdensity = population / landarea
    Else
        population = 0
        popdensity = 0
    End If
    
    cws.Cells(i.Row, "h").Value = population
    cws.Cells(i.Row, "i").Value = popdensity
Next i

cws.Range("h2:h" & lastrow).NumberFormat = "#,##0"
cws.Range("i2:i" & lastrow).NumberFormat = "#,##0"

End Sub

Sub DeleteExtras()

Dim lastrow As Long
Dim rng As Range
Dim i As Variant
Dim count As Integer
Dim cws As Worksheet

Set cws = ThisWorkbook.Worksheets("Raw_Data")
lastrow = cws.Cells(Rows.count, "b").End(xlUp).Row
Set rng = cws.Range("h2:h" & lastrow)
count = 0

Do While count <= 10
For Each i In rng
    
    If i = 0 Then
        i.EntireRow.Delete
    End If
Next i
count = count + 1
Loop

End Sub
\$\endgroup\$
4
  • \$\begingroup\$ Can you post the code for PopulationColumn() and DeleteExtras()? As is, this doesn't compile and those could likely be optimized as well. \$\endgroup\$ Commented Jun 17, 2020 at 17:46
  • \$\begingroup\$ The Population and Pop Density columns are just a list of interested countries. The states that are still in the list are from the UVA data. My new data set is only outside the US.I left them in the Sub becuase I just haven't gotten around to deleting them yet. The DeleteExtra Sub is looped because for some reason it doesn't catch all the 0 values the first time. \$\endgroup\$
    – Chris H.
    Commented Jun 17, 2020 at 18:42
  • \$\begingroup\$ The question may be OBE. I've broken up the three loops into seperate macros and changed the order of the nesting. I did this so I could manage the new data files easier. I've also put a timer on the Deaths and Recovered macros. I will edit the above to reflect the new code and the actual time it takes to run. \$\endgroup\$
    – Chris H.
    Commented Jun 18, 2020 at 14:19
  • \$\begingroup\$ In the second and the third loop you are looping on rows twice (clastrow and lastrow also). So, if the both rows are say 500 then it will loop 500*500=250000 times. It will take huge time. In the first loop, you are incrementing clastrow by 1 in every loop. Do the same in second and third loop. Before For i = 2 To clastrow mention lastrow = 2 and before i loop ends increment lastrow by 1. So you can remove For j loop. \$\endgroup\$
    – Naresh
    Commented Jun 18, 2020 at 15:10

1 Answer 1

3
\$\begingroup\$

I have analyzed your code and suggest the following changes:

  • the second loop over j to find the startdate is superfluous and can be included in the search loop by using a simple if.

  • avoid using Redim Preserve in a loop, it's time consuming as each time the array has to be copied completely; Dim the array once to a set maximum and shorten it once after the loop.

  • from your code I am deducting that the array deaths() is filled from scratch for each loop over i. Therefore, k should be set to zero within the i-loop.

  • in the end, the whole array deaths() is copied cell-by-cell to a target range. This can be done in one statement, which is multiple times faster than touching each element.

      Sub ImportCSSEDeaths()
    
          Dim i As Long, j As Long, k As Long
          Dim lastrow As Long, clastrow As Long, lastcol As Long
          Dim deaths() As Long
          Dim startDate As Date
          Dim filePath As String
          Dim wb As Excel.Workbook, ws As Excel.Worksheet, cws As Excel.Worksheet
    
          Set cws = ThisWorkbook.Sheets("Raw_Data")
          clastrow = cws.Cells(Rows.count, "b").End(xlUp).row
    
          filePath = "C:\Users\chris.h\Desktop\COVID\Other_Data\CSSE\CSSE_Deaths.xlsx"
          Set wb = Excel.Workbooks.Open(filePath)
          Set ws = wb.Worksheets(1)
          lastrow = ws.Cells(Rows.count, "b").End(xlUp).row
          lastcol = ws.Cells(1, Columns.count).End(xlToLeft).Column
    
          For i = 2 To lastrow
              'puts country row deaths into array
              With ws
                  k = 0  ' deaths() is zero-based! Option Base 0
                  ReDim deaths(lastcol) ' cannot get larger than this
                  For j = 3 To lastcol
                      If .Cells(i, j).Value <> 0 Then
                          deaths(k) = .Cells(i, j).Value
                          If k = 0 Then
                              startDate = .Cells(1, j).Value
                          End If
                          k = k + 1
                      End If
                  Next j
              End With
              ReDim Preserve deaths(k - 1) ' shrink once to actual size
    
              'finds startdate in compiled data and enters array values down column E
              With cws
                  For j = 2 To clastrow
                      If .Cells(j, "a").Value = ws.Cells(i, "a").Value And _
                      .Cells(j, "b").Value = ws.Cells(i, "b") And _
                      .Cells(j, "c").Value = startDate Then
                          ' copy deaths(0..ub) to .cells(j..ub+j,"e") in one step
                          Dim dest As Range
                          Set dest = .Cells(j, "e") ' first cell in destination
                          Set dest = dest.Resize(UBound(deaths) + 1, 1)
                          dest.Value = Application.Transpose(deaths)
                      End If
                  Next j
              End With
          Next i
    
          wb.Close False
      End Sub  ' ImportCSSEDeaths()
    

Edit: delete rows with a null value

Following your comment, your routine Delete_Extras() not only searches row-by-row but does so for 11 times. You will probably have noticed that not all matching lines got deleted on the first pass.
One way to fix this is to loop from the end to the beginning of the range, so that deleting a row does not affect rows yet unprocessed.
Instead, I suggest the following: filter the range for a "0" in column H and delete all visible rows in one command, like this

    Sub Delete_Extra_Rows_Based_On_Value()
    ' autofilter a range and delete visible rows
    ' 2020-07-01
    
        Dim cws As Worksheet
        Dim lastrow As Long
        Dim result As Range
        
        Set cws = ThisWorkbook.Worksheets("H:\Raw_Data")
        lastrow = cws.Cells(Rows.count, "B").End(xlUp).row
        
        With Application
            .ScreenUpdating = False
            .EnableEvents = False
            .Calculation = xlCalculationManual
            .DisplayAlerts = False
        End With
        
        ' clear any existing filters
        If cws.AutoFilterMode Then cws.ShowAllData
        ' apply filter
        With cws.Range("A1:H" & lastrow)
            .AutoFilter Field:=8, Criteria1:=0
            ' delete matching rows
            .Offset(1, 0).SpecialCells(xlCellTypeVisible).Delete
            .AutoFilter
        End With
        
        With Application
            .ScreenUpdating = True
            .EnableEvents = True
            .Calculation = xlCalculationAutomatic
            .DisplayAlerts = True
        End With
    End Sub
\$\endgroup\$
6
  • \$\begingroup\$ This is awesome! Thank you so much! I caught the k=0 should be inside the first loop after posting but the rest is all new. This teaches me a lot actually. \$\endgroup\$
    – Chris H.
    Commented Jun 29, 2020 at 12:42
  • \$\begingroup\$ I was getting an out of bounds error at ReDim Preserve deaths(k - 1) so I changed it to ReDim Preserve deaths(k) and it removed the error. Honestly I'm not sure why this removed the error but it did. Now the biggest time suck on this process is removing the countries I don't track (population = 0). Which takes anywhere from 3 minutes to 20 depending on what else is happening on the computer. \$\endgroup\$
    – Chris H.
    Commented Jun 29, 2020 at 13:59
  • \$\begingroup\$ Yes I wasn't sure if ReDim takes the length or the highest index as parameter. For the other sub, have you already posted it here, as a new post? \$\endgroup\$ Commented Jun 30, 2020 at 13:12
  • \$\begingroup\$ I actually cut off screen updateing on each sub and it really sped things up. Delete extras takes about a minute, now. The whole model takes around 2 minutes to run, which is totally exceptable. If that changes, I'll definitely post another question. Thank you so much!!!! \$\endgroup\$
    – Chris H.
    Commented Jul 1, 2020 at 14:21
  • \$\begingroup\$ please see the sub I added - we missed each other while I was writing it. \$\endgroup\$ Commented Jul 1, 2020 at 15:23

Not the answer you're looking for? Browse other questions tagged or ask your own question.