0

I have large traffic files that I'm trying to analyze statistically to check if a user clicks on links in specific sites.

It is important to say that my packets are sorted by flows (IP1 <=> IP2).

My first idea was look in the packets content and search for hrefs and links, save them all in some kind of data structure with their time stamps, and then iterate again over the packets to search for requests at close time to the time the links appeared.

Something like in the following pseudo code:

for each packet in each flow:
      search for "href" or "http://" or "https://"
      save the links with their timestamp
for each packet in each flow:
      if it's an http request and its url matches some url in the list and the 
         time is close enough, record it

The problem with this code is that some (important) links are dynamically generated while the page is loading, and cannot be found using the above method.

Another idea was check the referrer field in the http header and look for packets that where referred by the relevant sites. This method generates a lot of false positives because of frames and embedded objects.

It is important to mention that this is not my server, and my intention is to make a tool for statistical analysis of users behavior (thus, I can't add some kind of click tracker to my site).

Does anyone have an idea what can I do in order to check if the users clicked on links according to their network traffic?
Any help will be appreciated!
Thank you

2
  • You are using the wrong tool for the job. Why not (at least) use the server logs? Additionally if the links are dynamically generated you won't be able to get that information without either doing the same generation (it has to follow a pattern?) or some good guessing. After all it's likely that there is only a number of ways to access a specific site.
    – Seth
    Commented May 18, 2017 at 9:21
  • Thank you very much for your reply! I think you are right and my point of view is wrong here. My intention is to determine the users behavior only by their traffic, and according to my assumptions I will not have access to the server itself (in the long run, I would want to check the link clicking behavior for few (specific) websites, so I can't use server side programs). I actually saw that there are clicks that do not appear in the packet as their full form (but as concatenations of variables).\n Do you have any other idea of what can I do to check it ?
    – kobibo
    Commented May 18, 2017 at 9:40

0

You must log in to answer this question.

Browse other questions tagged .