INTRO
I am using an Excel worksheet as my "database." No headers so [F1]
is assigned to column one by default. I filled the entire column, all 1,048,576 cells with RANDBETWEEN(1,20)
. I then hard set these values by copy/pasting as value.
It is very slow. Much slower than using Excel functions would be to generate same data. I realize that I could use 1,048,576 as a constant denominator but I wanted to practice with SQL query language, and keep model more dynamic.
EDIT: it should not say % in results not many times. I am getting the % of times each var occurred in my data set. I am basically seeing what the distribution is of randbetween(1-20) outputs are over X calls. In this instance x is 1,048,576.
My SQL Query
"SELECT Round(SUM(IIF([F1]=" & searchKey & ",1,0))*100.0/SUM(IIF([F1]<> Null,1,0)),10) From [Sheet1$];"
Is returning the total number of records in column F1 that are equal to searchKey and dividing that number by the total number of records in column F1.
CODE
Option Explicit
Private Declare PtrSafe Function timeGetTime Lib "winmm.dll" () As Long
Sub SqlQueryOnWorkSheet()
Dim started As Long
Dim cn As ADODB.Connection
Dim filePath As String
Dim counter As Long
Dim outCome As Double
Dim ended As Long
started = timeGetTime
filePath = "Z:\Test\Test1.xlsx"
Set cn = EstablishConnection(filePath)
If cn.State <> 1 Then GoTo CleanFail:
For counter = 1 To 20
outCome = FindCount(cn, counter)
PrintOutcome counter, outCome
Next counter
cn.Close
Set cn = Nothing
ended = timeGetTime
Debug.Print "QUERIES RAN IN " & (ended - started) / 1000 & " SECONDS"
Exit Sub
CleanFail:
Debug.Print "CONNECTION COULD NOT BE MADE"
End Sub
Function EstablishConnection(ByVal filePath As String) As ADODB.Connection
Set EstablishConnection = New ADODB.Connection
EstablishConnection.Open _
"Provider=Microsoft.ACE.OLEDB.12.0;" & _
"Data Source='" & filePath & "';" & _
"Extended Properties=""Excel 12.0 Macro;HDR=No;IMEX=1;"";"
End Function
Function FindCount(ByRef cn As ADODB.Connection, ByVal searchKey As Long) As Double
Dim strSql As String
Dim rs As ADODB.Recordset
On Error GoTo CleanFail:
Set rs = New ADODB.Recordset
strSql = "SELECT Round(SUM(IIF([F1]=" & searchKey & ",1,0))*100.0/SUM(IIF([F1]<> Null,1,0)),10) From [Sheet1$];"
rs.Open strSql, cn
FindCount = rs.GetString
rs.Close
Set rs = Nothing
Exit Function
CleanFail:
Debug.Print "QUERY FAILED"
End Function
Sub PrintOutcome(ByVal counter As Long, ByVal outCome As Double)
Debug.Print "Variable " & counter & " Occured " & outCome & " Many Times"
End Sub
Results
EDIT: THIS SHOULD BE % NOT MANY -- FIXED
Variable 1 Occured 4.9837112427 % Of Time
Variable 2 Occured 5.0171852112 % Of Time
Variable 3 Occured 4.9752235413 % Of Time
Variable 4 Occured 4.9716949463 % Of Time
Variable 5 Occured 5.0051689148 % Of Time
Variable 6 Occured 4.9989700317 % Of Time
Variable 7 Occured 4.9901008606 % Of Time
Variable 8 Occured 5.0283432007 % Of Time
Variable 9 Occured 5.0018310547 % Of Time
Variable 10 Occured 5.0164222717 % Of Time
Variable 11 Occured 4.9933433533 % Of Time
Variable 12 Occured 5.0059318542 % Of Time
Variable 13 Occured 5.0333976746 % Of Time
Variable 14 Occured 4.9952507019 % Of Time
Variable 15 Occured 5.0163269043 % Of Time
Variable 16 Occured 4.9654006958 % Of Time
Variable 17 Occured 4.9822807312 % Of Time
Variable 18 Occured 5.0310134888 % Of Time
Variable 19 Occured 5.0113677979 % Of Time
Variable 20 Occured 4.9770355225 % Of Time
QUERIES RAN IN 38.754 SECONDS
RANDBETWEEN
is volatile, it's updating every single one of the 1,048,576 results every time the cursor moves for a read... \$\endgroup\$RANDBETWEEN
in the code anywhere, so are you really testing it or are you performing a similar query to compare results? Please adjust your title to reflect what your code does. \$\endgroup\$RANDBETWEEN
in them, you should expect the performance to be poor.RANDBETWEEN
is volatile function. That means every time you calculate the value of a cell withRANDBETWEEN
in it, all of them recalculate. That means as the ADO cursor moves from cell to cell, you end up recalculating 1,048,576 cells. If you aggregate them in the query, you get a trillion cell calculations. That's why it's slow. \$\endgroup\$