0

PROBLEM

I'm trying to make a macro which allows me to enter in a reference numeral (e.g., 102) and have the entire document be scanned for instances of that reference numeral. I want the code to identify 2-word strings which immediately precede each instance of that reference numeral, and I want the most commonly appearing 2-word string (i.e., the mode of all the 2-word strings preceding the reference numeral) to be returned/output. This way, if "black sheep 106" is repeatedly referenced in the document, entering "106" will return "black sheep". Choosing the modal 2-word string just accounts for any discrepancies or inconsistencies in naming used throughout the document, e.g., sometimes "sheep 106, which is black" might be used in the doc, or the wrong reference numeral used in error, etc.

My code does not work, though:

Sub FindFeature()
Dim refNum As String
Dim doc As Document
Dim rng As Range
Dim wordArray() As String
Dim wordDict As Object
Dim maxCount As Long
Dim commonString As String
Dim i As Long

' Set the reference numeral (e.g., "102")
refNum = InputBox("Enter the reference numeral:")

' Initialize dictionary to store word frequencies
Set wordDict = CreateObject("Scripting.Dictionary")

' Set the document to the active document
Set doc = ActiveDocument

' Loop through all words in the document
For Each rng In doc.Words
    ' Check if the word is the reference numeral
    If IsNumeric(rng.text) And rng.text = refNum Then
        ' Get the 2-word string preceding the reference numeral
        wordArray = Split(rng.Previous.text, " ")
        If UBound(wordArray) >= 1 Then
            ' Construct the 2-word string (last two words)
            Dim key As String
            For i = UBound(wordArray) - 1 To UBound(wordArray)
                key = key & wordArray(i) & " "
            Next i
            key = Trim(key)
            
            ' Add the 2-word string to the dictionary
            If Not wordDict.Exists(key) Then
                wordDict.Add key, 1
            Else
                wordDict(key) = wordDict(key) + 1
            End If
            
            ' Update the maximum count and common string
            If wordDict(key) > maxCount Then
                maxCount = wordDict(key)
                commonString = key
            End If
        End If
    End If
Next rng

' Display the most common 2-word string
If commonString <> "" Then
    MsgBox "Reference numeral " & refNum & ": " & commonString
Else
    MsgBox "No feature with reference numeral " & refNum & " found."
End If
End Sub

For example, on this gobbledygook copilot-made story:

"My mother 10 was a lady 12 who could have won the Worst Gift Giver contest 14 hands down. For Christmas, she gave us expired boxes 16 of Stove Top Stuffing, sheets 18 of gold star stickers 20 with a few stars missing, toy rubber giraffes 22, a single tarnished spoon 24. When my daughter 26 turned 15, a package arrived in the mail 28. I will never forget the look on her face 30 as she unwrapped the present 32 to find a set of tiny wooden clothespins 34, each one no bigger than an almond 36. My mother had carefully written “NO NUKES” with a black pen 38 on each one 40. My mother use to like to play the flute 82 but my mother 11 wasn’t very good at it. Luckily my mother 10 gets bored easily".

Entering "16" should return "boxes", but it doesn't.

Any ideas how to fix the code?

3
  • How to create a Minimal, Reproducible Example Please share sample doc. does not work doesn't provide the information about the matter.
    – taller
    Commented May 22 at 14:25
  • I've provided a random text sample for testing, but the code doesn't work as intended at all. Ideally not only would it output the right feature when you input a numeral, but in cases where there are multiple numerals (erroneously) associated with one feature (i.e., as with "mother 10/11" in the sample text, I would want the text to return whichever feature is associated most often with that numeral (in this particular example, that would mean returning "mother" for both numerals, because "11" isn't used as the numeral for any other feature).
    – cjrc
    Commented May 22 at 15:05
  • 'Entering "16" should return "boxes", but it doesn't.' - shouldn't that be "expired boxes"? Commented May 22 at 15:39

1 Answer 1

0

Try this out:

Sub FindFeature()
    Const NUM_WORDS As Long = 2 '# of words preceding the number

    Dim refNum As String, doc As Document, rng As Range
    Dim wordDict As Object, colWords As New Collection
    Dim maxCount As Long, commonString As String, key As String, sep As String
    Dim i As Long, n As Long
    
    refNum = InputBox("Enter the reference numeral:")
    
    Set wordDict = CreateObject("Scripting.Dictionary")
    Set doc = ActiveDocument
    
    Debug.Print doc.Words.Count
    
    For Each rng In doc.Words 'first collect all (trimmed) words
        colWords.Add Trim(rng.Text)
    Next rng
    
    'loop over words
    For i = NUM_WORDS + 1 To colWords.Count
        If colWords(i) = refNum Then
            key = ""
            sep = ""
            For n = 1 To NUM_WORDS 'construct key
                key = colWords(i - n) & sep & key
                sep = " "
            Next n
            Debug.Print key
            wordDict(key) = wordDict(key) + 1
            ' new max count?
            If wordDict(key) > maxCount Then
                maxCount = wordDict(key)
                commonString = key
            End If
        End If
    Next i
    
    ' Display the most common 2-word string
    If commonString <> "" Then
        MsgBox "Reference numeral " & refNum & ": " & commonString
    Else
        MsgBox "No feature with reference numeral " & refNum & " found."
    End If
End Sub

Not the answer you're looking for? Browse other questions tagged or ask your own question.