Determine fragment identifier from HTML page in Swift

Question

For an iOS app which helps me rolling back vandalism on Stack Exchange, I have a piece of Swift code which downloads a revision page (example) and tries to find the 'spacer' fragment just above a certain revision. It might not be clear what I'm talking about, so here's a picture from Firefox + developer tools:

I have the HTML content of this page, and the GUID of the revision (8f9ab85f-1401-41e9-8f75-8a07b10bad32) from the Stack Exchange API. I'm looking for that element just above the revision header, since those are the only HTML elements with IDs on the page. I need that spacer-9617187a-fe48-4212-9a1a-f3a366e62736 so I can link directly to https://codereview.stackexchange.com/posts/189958/revisions#spacer-9617187a-fe48-4212-9a1a-f3a366e62736

For that, I've written a few lines of Swift code. The problem is that string handling in Swift confuses the **** out of me. Most of the language feels rather good, but I'd rather do string manipulation in SQL than in Swift...

Here is what I have so far. It works, but I was wondering if it could break in cases I haven't foreseen, or if it can be made more understandable/manageable by a future me. You see, even Stack Exchange's syntax highlighter has problems understanding it...

The input parameters for this piece of code are html (a String containing the content of the revisions page, e.g. https://codereview.stackexchange.com/posts/189958/revisions) and revisionGUID (8F9AB85F-1401-41E9-8F75-8A07B10BAD32 in the example above - the API returns them in upper case). fragment is eventually used as output parameter. The 43 is the length of spacer- plus a GUID.

// Find fragment just above selected revision
let range = html.range(of: #"onclick="StackExchange.revisions.toggle('"# + revisionGUID.lowercased() + #"')""#)!
let index = html.range(of: #"<tr id=""#, options: .backwards, range: html.startIndex..<range.lowerBound)!.upperBound
let fragment = String(html[index..<html.index(index, offsetBy: 43)])

Could you include an appendix with an example of a full html page you are parsing like this? — dfhwze, Commented Aug 1, 2019 at 19:44
I've added the link at a more appropriate place, I'm not sure if including the full 20 pages of HTML would be beneficial. It's a Stack Exchange link and not likely to break :) — Glorfindel, Commented Aug 1, 2019 at 19:47
not enough for an answer, but I would prefer let offset = "spacer-${guid_format}".characters.count over a magic number 43 with ${guid_format} being a default guid. — dfhwze, Commented Aug 1, 2019 at 19:55

Martin R · Accepted Answer · 2019-08-01 20:50:15Z

I don't know how stable the precise HTML structure of those pages is, could that change in the future? Using a HTML parsing library might be a more robust approach.

Some remarks concerning the Swift implementation:

Don't force-unwrap optionals. If one of the searched strings is not found, your program will terminate with a runtime error. Use optional binding with if let or guard let instead, and handle the failure case properly.
Instead of converting revisionGUID to lowercase you can do a case-insensitive search.
The first search string can be created with string interpolation instead of concatenation, that makes the expression slightly shorter:
```
#"onclick="StackExchange.revisions.toggle('\#(revisionGUID)')""#
```
Use a regular expression with positive look-ahead and look-behind for the second search. That allows to find the precise range of the spacer, without relying on a particular length.
Put the code in a function, and add documentation.

Putting it together, the function could look like this:

/// Find spacer fragment for GUID on revisions page
/// - Parameter html: HTML of a revisions page
/// - Parameter revisionGUID: A revision GUID from the StackExchange API
/// - Returns: The spacer fragment, or `nil` if not found

func findFragment(html: String, revisionGUID: String) -> String? {
    let pattern1 = #"onclick="StackExchange.revisions.toggle('\#(revisionGUID)')""#
    guard let range1 = html.range(of: pattern1, options: .caseInsensitive) else {
            return nil
    }
    let pattern2 = #"(?<=<tr id=")[^"]+(?=")"#
    guard let range2 = html.range(of: pattern2,
                                  options: [.backwards, .regularExpression],
                                  range: html.startIndex..<range1.lowerBound) else {
            return nil
    }

    return(String(html[range2]))
}

Stack Exchange Network

Determine fragment identifier from HTML page in Swift

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
strings
html
swift
stackexchange
or ask your own question.

Hot Network Questions

Determine fragment identifier from HTML page in Swift

1 Answer 1

Not the answer you're looking for? Browse other questions tagged stringshtmlswiftstackexchange or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
strings
html
swift
stackexchange
or ask your own question.