-1

In Excel VBA to load a website and get it into a sheet I have been using the following:

Dim IE As Object  
Set IE = CreateObject("InternetExplorer.Application")  
IE .navigate "https://www.wsj.com/market-data/bonds/treasuries"

And then I can copy and paste it into my Excel sheet. But this website no longer works with IE11, and Excel VBA insists on using IE11 even though it is about to be deprecated.

Is there another way? I have also looked at:

  • Selenium: but it seems to be pretty much obsolete for VBA (not updated since 2016) and I couldn’t get it to work with Edge or Firefox in VBA anyway.

  • AutoIt: I got it to write the website’s HTML code to a TXT file (oHTTP = ObjCreate("winhttp.winhttprequest.5.1") ; $oHTTP.Open("GET", $URL1, False) ; $oHTTP.Send(); $oReceived = $oHTTP.ResponseText; FileWrite($file, $oReceived)) but the txt file contents are far from convenient as there is endless HTML stuff in it. It’ll take a fair amount of VBA code to sort through the mess, which probably means it won’t be reliable going forward. Also given the size of my workbook which is very slow, it will take literally several minutes to copy the website data into a sheet element by element.

Surely there must be an easy way to load the site, or just the table within the site, into an Excel sheet? This must be a well trodden path, but after much googling I can’t find an easy solution that actually works.

I have a 5-10 web pages being loaded into this workbook, and it seems to be a full time job keeping the whole thing working!! Any thoughts/help very much appreciated!!!

8
  • 1
    SeleniumBasic is old, thats right. But it works with the newest WebDrivers. Look in the following link at the answer of YasserKhalil, how to install SeleniumBasic and the WebDriver: stackoverflow.com/questions/57216623/…
    – Zwenn
    Commented Jul 3, 2021 at 12:57
  • 2
    What data are you looking for in the website given in your question? I tried with XMLHTTP and although the HTML code may seem endless but if you are looking for Treasury Notes and Bonds or Treasury Bills data, it exists as a JSON string within the HTML document that you can extract it from and parse it.
    – Raymond Wu
    Commented Jul 3, 2021 at 13:12
  • (1) I did download the latest drivers and Selenium still wouldn’t work, even work with Chrome. But TBH I am trying to keep Chrome off my PC altogether and use Brave / Firefox / Edge. Guess I better start from scratch again but if there a good explanation somewhere on how to use Brave/Firefox/Edge that is current? And is Selenium going to be working for a good few years yet?
    – drb01
    Commented Jul 3, 2021 at 17:40
  • 1
    @Zwenn It works with some of the newest webdrivers. I think FF and Opera and PhatomJS might be no-no's now.
    – QHarr
    Commented Jul 3, 2021 at 21:56
  • 1
    @QHaar I tried it myself with Chrome and PhantomJS. With PJS actually only to try out screenshots of entire websites. I tried it with FF and Edge at the very beginning without success, but I hadn't yet found YasserKhalil's installation instructions. It works very well with Chrome, so I didn't explore the other browsers any further.
    – Zwenn
    Commented Jul 4, 2021 at 11:05

2 Answers 2

1

Similar idea to Christopher's answer in using regex. I am grabbing the instruments data (JS array), splitting the component dictionaries out (minus the end }), and then use regex, based on headers, to grab the appropriate values.

I use a dictionary to handle input/output headers, and set a couple of request headers to help to signal browser based request and to mitigate for being served cached results.

Ideally, one would use an html parser and grab the script tag, then use a json parser on the JavaScript object within the script tag.

If you want the data from the other tabbed results, I can add that in by explicitly setting re.Global = True, then looping the returned matches. Depends whether you want those and how you want them to appear in the sheet(s).

I currently write results out to a sheet called Treasury Notes & Bonds.


Option Explicit

Public Sub GetTradeData()
    Dim s As String, http As MSXML2.XMLHTTP60 'required reference Microsoft XML v6,
    
    Set http = New MSXML2.XMLHTTP60

    With http
        .Open "GET", "https://www.wsj.com/market-data/bonds/treasuries", False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        s = .responseText
    End With
    
    Dim re As VBScript_RegExp_55.RegExp 'required reference Microsoft VBScript Regular Expressions
    
    Set re = New VBScript_RegExp_55.RegExp
    re.Pattern = "instruments"":\[(.*?)\]"
    s = re.Execute(s)(0).SubMatches(0)
    
    Dim headers() As Variant, r As Long, c As Long, mappingDict As Scripting.Dictionary 'required reference Microsoft Scripting Runtime
    
    Set mappingDict = New Scripting.Dictionary
    mappingDict.Add "maturityDate", "MATURITY"
    mappingDict.Add "coupon", "COUPON"
    mappingDict.Add "bid", "BID"
    mappingDict.Add "ask", "ASKED"
    mappingDict.Add "change", "CHG"
    mappingDict.Add "askYield", "ASKED YIELD"
    
    headers = mappingDict.keys
    
    Dim results() As String, output() As Variant, key As Variant
    
    results = Split(s, "}")
    ReDim output(1 To UBound(results), 1 To UBound(headers) + 1)
    
    For r = LBound(results) To UBound(results) - 1
        c = 1
        For Each key In mappingDict.keys
            re.Pattern = "" & key & """:""(.*?)"""
            output(r + 1, c) = re.Execute(results(r))(0).SubMatches(0)
            c = c + 1
        Next
    Next
    
    re.Pattern = "timestamp"":""(.*?)"""
    re.Global = True
    
    With ThisWorkbook.Worksheets("Treasury Notes & Bonds")
        
        .UsedRange.ClearContents
        
         Dim matches As VBScript_RegExp_55.MatchCollection
         
         Set matches = re.Execute(http.responseText)
        .Cells(1, 1) = matches(matches.Count - 1).SubMatches(0)
        .Cells(2, 1).Resize(1, UBound(headers) + 1) = headers
        .Cells(3, 1).Resize(UBound(output, 1), UBound(output, 2)) = output
    End With
End Sub
16
  • 1
    I have updated to return the timestamp requested. Please note that the use of ClearContents towards the end will empty the sheet's usedrange between runs.
    – QHarr
    Commented Jul 4, 2021 at 8:39
  • 1
    I know you stated why headers were added to the request but could you explain your reason again.... dumbed down a little for someone like myself? Commented Jul 4, 2021 at 13:08
  • 1
    Hi. Just pre-emptive measures. The User-Agent to simulate a browser request to the server and the other header to hopefully avoid being served cached (old) results rather than the latest. Usually more useful with frequently updating pages. @ChristopherWeckesser
    – QHarr
    Commented Jul 4, 2021 at 13:40
  • 1
    Depends on the website. Authentication (login) is possible with many sites using xmlhttp requests (which is the type of browserless request shown above). codingislove.com/http-requests-excel-vba
    – QHarr
    Commented Jul 4, 2021 at 14:39
  • 1
    I figured with the User-Agent. I've seen people rotate Proxies and User-Agents to ensure their web scraping goes "uninterrupted". I had no idea about the second header though. Thanks for the info. @QHarr Commented Jul 4, 2021 at 16:08
1

The following code (not using web drivers) works but isn't an easy solution. I was able to find the information stored within the body, which was isolated by using REGEX and then stored into a JSON file for parsing.

Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim strPattern As String: strPattern = "window.__STATE__ = ({.+}}}});"
Dim JSON As Object
Dim Key As Variant
Dim key1, key2 As String

XMLPage.Open "GET", "https://www.wsj.com/market-data/bonds/treasuries", False
XMLPage.send

Set JSON = JsonConverter.ParseJson(REGEX(XMLPage.responseText, strPattern, "$1"))

' Notes and Bonds
key1 = "mdc_treasury_{" & """" & "treasury" & """" & ":" & """" & "NOTES_AND_BONDS" & """" & "}"

For Each Key In JSON("data")(key1)("data")("data")("instruments")
    Debug.Print Key("maturityDate")
    Debug.Print Key("ask")
    Debug.Print Key("askYield")
    Debug.Print Key("bid")
    Debug.Print Key("change")
Next Key

 ' Bills
key2 = "mdc_treasury_{" & """" & "treasury" & """" & ":" & """" & "BILLS" & """" & "}"

For Each Key In JSON("data")(key2)("data")("data")("instruments")
    Debug.Print Key("maturityDate")
    Debug.Print Key("ask")
    Debug.Print Key("askYield")
    Debug.Print Key("bid")
    Debug.Print Key("change")
Next Key

The following function will need to be copied into a module:

Function REGEX(strInput As String, matchPattern As String, Optional ByVal outputPattern As String = "$0") As Variant
    Dim inputRegexObj As New VBScript_RegExp_55.RegExp, outputRegexObj As New VBScript_RegExp_55.RegExp, outReplaceRegexObj As New VBScript_RegExp_55.RegExp
    Dim inputMatches As Object, replaceMatches As Object, replaceMatch As Object
    Dim replaceNumber As Integer

    With inputRegexObj
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = matchPattern
    End With
    With outputRegexObj
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        .Pattern = "\$(\d+)"
    End With
    With outReplaceRegexObj
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
    End With

    Set inputMatches = inputRegexObj.Execute(strInput)
    If inputMatches.Count = 0 Then
        REGEX = False
    Else
        Set replaceMatches = outputRegexObj.Execute(outputPattern)
        For Each replaceMatch In replaceMatches
            replaceNumber = replaceMatch.SubMatches(0)
            outReplaceRegexObj.Pattern = "\$" & replaceNumber

            If replaceNumber = 0 Then
                outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).value)
            Else
                If replaceNumber > inputMatches(0).SubMatches.Count Then
                    'regex = "A to high $ tag found. Largest allowed is $" & inputMatches(0).SubMatches.Count & "."
                    REGEX = CVErr(xlErrValue)
                    Exit Function
                Else
                    outputPattern = outReplaceRegexObj.Replace(outputPattern, inputMatches(0).SubMatches(replaceNumber - 1))
                End If
            End If
        Next
        REGEX = outputPattern
    End If
End Function

The following resources will help:

How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

https://github.com/VBA-tools/VBA-JSON

You will need to install the JSON converter and reference Regular Expression in the library. The REGEX function was found elsewhere on stack overflow so someone else deserves the credit for it.

Not the answer you're looking for? Browse other questions tagged or ask your own question.