5
$\begingroup$

Let's say http://www.foo.bar/?domain=wibble returns a search page on the word "wibble" provided by an engine with the frontdoor address www.foo.bar, and you have a variable called searchvar that you want to search on.

You can assign the resulting search page to variable searchresult like this:

searchresult = Import[StringJoin[{"http://www.foo.bar/?domain=", searchvar}], "HTML"];

And then you can look in searchresult for whatever it is that you are trying to find.

I am trying to do this to find whether a given string has been registered as a Twitter username.

When you browse to, say, https://twitter.com/aosododof, you get a page that contains the words "This account doesn't exist", and then you can go ahead and register that name if you wish.

However, although the words "This account doesn't exist" are on my screen I am not seeing them in the source for the page, and therefore they are not being returned to Import[].

So my question is: how can I use Import[] to grab the text from a page such as https://twitter.com/aosododof that includes the words "doesn't exist" so that I can then log that name as available?

I have a few thousand that I would like to look up, so this needs to be automated.

$\endgroup$

1 Answer 1

5
$\begingroup$

When possible, avoid parsing of websites to get information. As you have discovered, this can lead to many issues (such as content not being visible in a "dumb" download), and is very prone to breaking if anything about the website changes. Instead, many websites offer APIs for other services to communicate with them. These APIs are well-defined and stable ways of getting information into and out of those websites. In the case of Twitter, the API documentation can be found here.

In this particular case, Mathematica already has API access built into ServiceConnect, see the Twitter Service Connection:

twitter = ServiceConnect["Twitter"]
(* ServiceObject["Twitter", "ID" -> "connection-XXXXXXXXXXXX"] *)

twitter["UserData", "Username" -> "WolframResearch"]
(* {<|"ID" -> 22659609, "ScreenName" -> "WolframResearch", 
  "Name" -> "Wolfram", "Location" -> Missing["NotAvailable"], 
  "FavouritesCount" -> 3455, "FollowersCount" -> 54999, 
  "FriendsCount" -> 371, "TweetsCount" -> 8230, 
  "CreationDate" -> 
   DateObject[{2009, 3, 3, 18, 56, 15}, "Instant", "Gregorian", 0.]|>} *)

Asking for a non-existant user gives an error:

twitter["UserData", "Username" -> "WolframResearch2"]
(* ServiceExecute::nouser: No user matches for specified terms. *)
(* ServiceObject["Twitter", "ID" -> "connection-XXXXXXXXXXXX"]["UserData", 
 "Username" -> "WolframResearch2"] *)

We can now pack it up into a function:

twitterUserExistsQ[user_] := 
 Quiet@Check[twitter["UserData", "Username" -> user]; True, False]

twitterUserExistsQ["WolframResearch"]
(* True *)

twitterUserExistsQ["WolframResearch2"]
(* False *)
$\endgroup$
2
  • $\begingroup$ Beautiful! Many thanks for this. Is there a way to make it count a twittername as existent not only when it is operative but also when it has been suspended, e.g. a name such as "boiddy"? $\endgroup$
    – tell
    Commented Aug 11, 2022 at 8:35
  • $\begingroup$ Checking on "Friends" rather than on "Username" seems to do the job, because suspended status, which counts as existence for "Username", appears to count as non-existence for "Friends". (I am still testing this and therefore not wholly certain.) $\endgroup$
    – tell
    Commented Aug 11, 2022 at 13:02

Not the answer you're looking for? Browse other questions tagged or ask your own question.