1

In iptables I am logging certain public IP addresses... say of websites visited from different workstations, depending on destination port number... All this is not really relevant to the question but I am just stating it here so my goals become clear.

Now I want to analyze the data... I only have the IP addresses... What is the best way to get the domain name. OK, I know you can use nslookup and dig but the domain name displayed is possibly the A record in the DNS data which usually is not THE domain name that one is looking for...

I am a little fuzzy about the details... but what I would need is, for example, someone visits cnbc.com, I look at the ip addresses logged and I get all kinds of domains from Amazon web services to facebook.com. The closest domain for which an IP is logged was nbcuni.com...

Is there some "service," API, software, third-party solution, available to get the "closest" recognizable domain name for a given IP?

EDIT: There is another problem... Monitoring systems appear to handle them. Proxy systems (as recommended below) cannot distinguish between URL specified and the URL of contents in the page visited. Or can they? Any URL visited, expressly as specified in the browser or indirectly any URL whose content is displayed in the page will show up as URL visited. Is there a way to distinguish? Through Proxy logs or otherwise?

3 Answers 3

7

I'm not sure i get the whole picture but since you are writing about websites i think you are using a tool that is not exactly suitable for the task.

imho you are looking for that information (the visited domain name) at the wrong level: you should have a proxy and analyze its logs to gather that info.

A proxy is 'near' the client and has the exact and precise information you are looking for.

A transparent proxy would be able to gather these info without any client configuration change.

8
  • 2
    You're reading my thoughts: "proxy, proxy, proxy, ..."
    – Marki
    Commented Apr 27, 2015 at 20:43
  • I was not aware of transparent proxy. Yes, I was looking at the wrong "level" (as @scurius commented). I now realized that there is yet another problem which I think that proxy will not handle. If user specifies site a.b.c, with contents like ads from other sites, is it possible from the proxy logs to know the main URL specified? For example, if a user specifies a techie site and there is content from a dating site, both URLs will show in the proxy log. Is there a way to know which URL was expressly stated? How do monitoring systems do this?
    – Sunny
    Commented Apr 28, 2015 at 3:58
  • 1
    @Samir i can't get a grasp on your goal. Could you describe the whole system with a little detail? At first it was to get the domain name from the ip address and now there is request correlation. The more detail you give the best answer you get...
    – Paolo
    Commented Apr 28, 2015 at 5:37
  • @Paulo, what I really need to do is to have a system which logs all activities, at least http requests, to begin - Dual purpose: prevent abuse of company resources and have a log for security reasons. Iptables platform exists already so I though I could use the logs to get IP source and destination address. I thought that reverse-dns-ing that info would be the solution to my problem. Then I saw the log flooded with multiple requests (embedded http content) for each page and realized other problems beyond the dig not returning precise URL entered. Hope this helps.
    – Sunny
    Commented Apr 28, 2015 at 6:21
  • 1
    @Samir look for the referer. the proxy 'squid' is able to log info about the referer so i suppose other products are able to log that information; it will require some grunt work but with that info you could achieve something similar to what you are describing. it is still not clear your goal because this level of control is beyond the simple access loggnig.
    – Paolo
    Commented Apr 28, 2015 at 9:25
5

You cannot easily determine what the user typed into their browser's URL bar using just an IP address log: You can't tell if someone accessing 104.16.13.13 got there by typing aviation.stackexchange.com or tex.stackexchange.com (the best you can determine is that it's a CloudFlare IP address).

In order to get the information you seek you would need to either cross-reference with queries on your DNS server around the same time, or capture the whole packet and look for something in the protocol data (like an HTTP request) that discloses the hostname. The latter is trivial to foil: Just access sites over https or some other encrypted transport.


Given an IP address the best you can do is get the reverse DNS PTR record (dig -x or equivalent), or the netblock & netblock owner info (via whois), which you've already rejected as inadequate for your needs.

5
  • Your suggestion about analyzing DNS traffic is certainly helpful. I think I can log dns packets in iptables and examine them for the host name on which the dns query was initiated. I wonder how systems which provide employee websites visited work. How are these systems able to get domain names visited (not from information on the client machine itself but from the firewall or iptables m/c).
    – Sunny
    Commented Apr 27, 2015 at 18:37
  • 3
    @Samir Note that DNS correlation is not without limitations too (only an initial request will show up, so if a site has a 12-hour TTL or something and is in the machine's cache you may have to go back quite a ways to find the DNS query that matches the corresponding request). The web-logging software can employ a number of techniques, but deep packet inspection (actually looking at the content of each packet) is probably the most reliable. (Reverse DNS lookups are also generally reliable, though not bulletproof as I described above. They're often "Good Enough" for what you need.)
    – voretaq7
    Commented Apr 27, 2015 at 18:53
  • 1
    The TLS SNI info is visible in cleartext, so this isn't limited to HTTP. Commented Apr 28, 2015 at 1:09
  • @voretaq7, I think that my approach is wrong... As I found out,if someone visits sites a.b.c with content (ads, for example) from sites z.y, k.l.m, etc. In the iptables log, there are multiple entries for each connection initiated. Even if deep-packet inspection of DNS request packets is done, it will still not resolve this problem. In theory, yes, the first DNS request can be categorized as the main domain specified but in practice what is "first" would be difficult to define. I suspect that the same problem will arise in transparent proxy. UR comments are certainly very helpful.
    – Sunny
    Commented Apr 28, 2015 at 3:49
  • 1
    @Samir Yes, that's another failing of trying to use an IP log to determine "what happened" all it can tell you is who talked to whom, but not what they talked about. You can't really get what you want using IP addresses alone - like sciurus said you're ultimately examining the wrong layer (proxy systems get the information you want by examining the application-layer content rather than the network/transport layers, they're pretty good at it too).
    – voretaq7
    Commented Apr 28, 2015 at 18:04
5

You are tackling this problem at the wrong layer. Literally, layer 4 when you should use layer 7.

Don't log TCP connections in iptables. Instead, capture HTTP traffic and inspect the Host header in the requests that the clients are making.

2
  • Yes... you are right... the transparent proxy described above would fall in the category of solutions you suggest. Your comment is certainly helpful.
    – Sunny
    Commented Apr 28, 2015 at 3:27
  • This is the correct answer here. You're trying to identify the make and model of the car by looking at the road 10 minutes after it drove along it.
    – fukawi2
    Commented Apr 28, 2015 at 4:54

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .