13

Objective

I need a reliable way to check in Python if a domain of any TLD has been registered or is available. The bold phrases are the key points that I'm struggling with.

What I tried?

  1. WHOIS is the obvious way to do the check and an existing Python library like the popular python-whois was my first try. The problem is that it doesn't seem to be able to retrieve information for some of the TLDs, e.g. .run, while it works mostly fine for older ones, e.g. .com.
  2. So if python-whois is not reliable, maybe just a wrapper for the Linux's whois would be better. I tried whois library and unfortunately it supports only a limited set of TLDs, apparently to make sure it can always parse the results.
  3. As I don't really need to parse the results, I ripped the code out of the whois library and tried to do the query by calling Linux's whois myself:

    p = subprocess.Popen(['whois', 'example.com'], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
    r = p.communicate()[0]
    print(r.decode())
    

    That works much better. Except it's not that reliable either. I tried one particular domain and got "Your connection limit exceeded. Please slow down and try again later." Well, it's not me who is exceeding the limit. Being behind a single IP in a huge office means that somebody else might hit the limit before I make a query.

  4. Another thought was not to use WHOIS and instead do a DNS lookup. However, I need to deal with domains that are registered or in the protected phase after expiry and don't have DNS records so this is apparently not possible.
  5. Last idea was to do the queries via an API of some 3rd party service. The problem is trust in those services as they might snatch an available domain that I check.

Similar questions

There are already similar questions:

...but they either deal only with a limited set of TLDs or are not that bothered by reliability.

6
  • 4
    Thats actually a really well written question. You dont get many like that haha Commented Jan 2, 2018 at 15:00
  • This project on github does something similar github.com/WeiChiaChang/domain-cli/blob/master/bin/index.js Commented Jan 2, 2018 at 15:04
  • @danielcooperxyz Thanks! I see that it uses api.domainsdb.info for the check so this falls under the point #5 that I wrote.
    – tmt
    Commented Jan 2, 2018 at 15:10
  • 1
    I'm late, but may I suggest github.com/funilrys/PyFunceble. It tries to check the availability based on WHOIS, DNS (Not only NS) Lookup, HTTP Status code, or its own set of rules. It's not the fastest but it may be useful. Disclaimer: I'm the author.
    – funilrys
    Commented Mar 9, 2021 at 16:22
  • @funilrys Interesting! I would suggest that you add your own answer to this question as long as you would be willing to go into good level of detail and describe which problems the package addresses, which it doesn't (lookup limits in my point #3 would be my guess), etc. All in all, thanks for sharing!
    – tmt
    Commented Mar 9, 2021 at 18:21

1 Answer 1

6

If you do not have specific access (like being a registrar), and if you do not target a specific TLD (as some TLDs do have a specific public service called domain availability), the only tool that makes sense is to query whois servers.

You have then at least the following two problems:

  1. use the appropriate whois server based on the given domain name
  2. taking into account that whois servers are rate-limited so if you are bulk querying them without care you will first hit delays and then even risk your IP to be blacklisted, for some time.

For the second point the usual methods apply (handling delays on your side, using multiple endpoints, etc.)

For the first point, in another of my reply here: https://unix.stackexchange.com/a/407030/211833 you could find some explanations of what you observe depending on the wrapper around whois you use and some counter measures. See also my other reply here: https://webmasters.stackexchange.com/a/111639/75842 and specifically point 2.

Note that depending on your specific requirements and if you are able to at least change part of them, you may have other solutions. For example, for gTLDs, if you tolerate 24 hours delay, you may use the published zonefiles of registries to find domain names registered (those published so not all of them).

Also, why you are right in a generic sense that using a third party has its weaknesses, if you find a worthy registrar that both has access to many registries and that provides you with an API, you could then use it for your needs.

In short, I do not believe you can achieve this task with all cases (100% reliability, 100% TLDs, etc.). You will need some compromises but they depend on your initial needs.

Also very important: do not shell out to run a whois command, this will create many security and performance problems. Use the appropriate libraries from your programming language to do whois queries or just open a TCP socket on port 43 and send your queries on one line terminated by CR+LF, reading back a blob of text, this is basically only what is defined in RFC3912.

6
  • Thank you for your answer and the longer ones that you linked. Very informative! If nothing else, they help me confirm the issues and steer my effort in a particular direction.
    – tmt
    Commented Jan 2, 2018 at 15:36
  • So you're saying that querying any whois server and then parsing its text result is not a safe action?
    – garry man
    Commented Dec 20, 2018 at 10:53
  • 1
    @garryman No I do not think I am saying that. I am saying it is difficult (because it is free text, so you have to adapt to many different outputs, and it is also heavily rate limited) and that if you do it from a program, use some kind of library, do not execute a shell to execute a whois command. Commented Dec 20, 2018 at 14:36
  • I need a library that parses WHOIS in php. Since I haven't found a working one, I am currently relying to checkdnsrr to retrieve if the domain is available or not, because that's what I am looking for. Am I on the right road?
    – garry man
    Commented Dec 20, 2018 at 14:38
  • 1
    @garryman If you have new questions, please post separately as a new question. But in short, as it name implies, checkdnsrr does a DNS query, so this has nothing to do with whois, and you will get false negatives (a domain can be registered but not published in the DNS). Depending on which resolvers you ask you may also get false positives (if your recursive nameserver hijacks NXDOMAIN replies). Commented Dec 20, 2018 at 14:45

Not the answer you're looking for? Browse other questions tagged or ask your own question.