4th Parties: Uninvited Guests to Your Site

When we speak of “third party” content on the web, we mean it to be content created and controlled by a third party. There has been a lot written on how third parties affect performance: here is Harry Roberts’ talks on the subject.  Recently, Patrick Hulce did an incredible analysis on which 3rd party scripts are the most impacting for page load.  According to the HTTP Archive, of 184.5M requests on 1.2M pages, 74.8M (40%) are requests for content from 3rd parties! The median mobile site accesses 12 3rd party domains, with a median of 37 different requests (or about 3 requests made to each third party!
Screen Shot 2019-02-25 at 7.46.17 AM

I recently gave a talk on performance, and while speaking afterwards with Paul Hammant, he reminded me about how these third parties call “fourth party content” and there is nothing stopping calls to 5th party, 6th party, etc.  This is important from a performance aspect – what happens if these services go down – does it affect your 3rd party?  How are these external parties (that you are not affiliated with) handling sensitive information on your customers and how they are using your product?

 

Initial Search for 4th Parties

Initially I thought that perhaps 4th party calls were not terrible prevalent – but a quick hunt shows that they happen everywhere.  For example, here is the request map for cnn.com -showing the chain of each request to different domains:

requestmap_http___cnn.com

The blue sunburst on the left is CNN, and all of the other fireworks are 3rd party domains making calls to 4th party domains.  For example, let’s look at the purple firework farthest to the right:
Screen Shot 2019-02-25 at 7.42.03 AM

It may be a little difficult to read, but this host is ads.pubmatic.com.  If we look carefully, the arrow pointint to pubmatic is green, and comes from s.amazon-adsystem.com – making pubmatic a 4th party call already.  From pubmatic,  we can see calls to other pubmatic domains at the end of the trails (at the bottom right), but we can also see calls to digitru.st, and ad.turn.com = these are 5h party calls! 

So, wow!  4th, and 5th party calls happen on cnn.com – I wonder how many sites have 4th, 5th, 6th party calls?

 

Screen Shot 2019-02-27 at 12.34.51 PM

This is not the grande finale of a fireworks show, it is the request map of spin.com.

Digging Deeper into 4th Parties

The HTTP Archive has a table of every request made to 1.2M mobile websites (I am using December 15, 2018 data), and this includes the referer url – allowing us to track our way through the way the requests have been made.  So, we can track down these over third party calls.  But to get there, we need to get all third party requests:

Third Party Requests

I first created a table of all 3rd party requests with a referral url. There are 74.8M third party requests in this table, and 1.14M sites (nearly 100% of the dataset).  We can then use this table to find all of the 3rd party requests that are directly accessed from the page.  Again, using the CNN example, we find a list like this:

Screen Shot 2019-02-25 at 10.40.41 AM

Looking at this list, #4 is cnn.io – which is technically a 2nd party. Out of 50 domains from CNN, there are two second party domains (there is a turner domain as well), so perhaps the 3rd party numbers are inflated by about 5% per site.  In a bulk analysis, I’ll take that!

If I restrict the 3rd party requests to only those directly referred by the website domain,, we lose a lot of websites that do not include this header.  Only 487k sites (42% of 1.14M with 3rd party domains) with 30.7M 3rd party requests appear in this query. (Here is the search I used).

Fourth Party Requests

With a list of the domains from the third party requests, any requests from the page referred from the third party is a fourth party request.  Using this search, I find 10.9M requests from 2.4M different domains that are fourth party. There are 330,000 pages (27.5% of the HTTP Archive) with 4th party requests.

 

Fifth Party Requests

It really is “turtles all the way down”, and we can continue tracking ‘nth party’ requests forever. I continue to 5th parties, and find 92k sites (7.6% of the dataset) have fifth party requests. When I attempt to count the number of requests, there are 29M requests that count as 5th party. That’s 10x of what we found in 4th party….so what is going on?  In the CNN example we’ve been looking at, there are 4 paths into s.amazon-adsystems.com:

Screen Shot 2019-02-25 at 11.38.18 PM

Which means that each of these paths is unique, and all 3-4 paths have a further path to ib.adnxs.com (and all the other domains that are requested by amazon-adsystems.com).  So, this recursive tree path will grow exponentially as I continue down the tree.

As I move further and further out – the number of multiple paths continues to increase, and the number of domains with ‘x’th party requests continues to decrease.  I went as far as 8th party before I grew tired of the exercise:

Screen Shot 2019-02-27 at 1.03.26 PM

 

We can only imagine that this could continue on forever.

 

What Can We Learn From this Data?

Are sites with ‘x’th party calls slower?  In general, these calls appear to be non-render blocking, as the SpeedIndex of sites with 5th and 8th party calls are about the same speed as other sites:

Screen Shot 2019-02-26 at 12.58.12 PM.png

However, since many of the 3rd party files are ads, we do see a significant change to Visually Complete, with 5th and 8th request chains being significantly slower (7 seconds at the median):

Screen Shot 2019-02-26 at 1.18.39 PM

Are Specific 3rd Parties to Blame?

What leads to these long chains of ‘x’th party content?  Are there specific tools and 3rd parties that can be blamed for this behaviour?  I think that yes, some 3rd parties are more likely to use 4th/5th/etc. party content than others.  By comparing all 3rd parties with the third parties that lead to 8th party requests, I was able to find a few domains that appear to lead to more 3rd party requests than others.

As a caveat, it is difficult to parse these behaviours under bulk conditions, and of course, ad/analytics providers changed their methods frequently (and is it correlation or causation?).  For that reason, I’m not going to name names, but I can chart the percentile of 3rd party requests when a specific 3rd party is present:

Screen Shot 2019-02-27 at 3.24.47 PM

While sites that have Doubleclick present as a third party do see a small increase in 3rd party calls (compared to all sites), the presence of the other 3rd parties lead to a huge increase in the number of 3rd party requests.

Conclusion

3rd parties provide essential tooling for your content like ads and analytics.  However, it is worthwhile to audit your 3rd parties to ensure that they are not calling 4th, 5th, ‘x’th parties that are not only aggregating information about your site and your customers, but also could be adding a performance impact to your site.