An Update on FLEDGE Chrome Testing

Published in

Criteo R&D Blog

9 min readDec 15, 2022

In this article, Criteo will walk you through some findings that reflect our very first experience with testing FLEDGE.

Criteo has participated in testing and providing feedback since the earliest stages of the Privacy Sandbox design. We are actively engaged in multiple standardization groups of the W3C including WICG and PATCG on Fledge, Topics and Reporting API proposals. As collaborators on the Google Privacy Sandbox design, Criteo proposed SPARROW. Some of its ideas were incorporated in the current very first experiment for bidding with marketer-defined interest audiences called FLEDGE.

FLEDGE has been in testing as a Chrome Origin Trial since April 2022. We believe that testing the proposals in Google’s Privacy Sandbox is a good way to understand its implications on advertising utility, and our hope is to drive those principles forward.

Criteo is also in a unique position to assess the Privacy Sandbox proposals because of our global exposure as an ad tech provider. As a DSP, we are directly integrated with more than 20k active partners worldwide. We also handle 300 billon bid requests a day, serving ads to roughly 725 million users.

TL;DR

Criteo started implementing the FLEDGE API in April and is currently tagging 5% of Chrome traffic.
We are currently bidding and generating displays on a small share of U.S. traffic and we are working on stabilizing the pipeline to increase these volumes.
We are still awaiting some key features to be deployed, such as complete reporting capabilities and an A/B test framework to fully quantify the impact on marketer effectiveness and publisher revenues for FLEDGE in 2023.

The FLEDGE API

First, let’s start with a high-level overview of the FLEDGE API.

In a nutshell, FLEDGE allows a marketer to tag visitors to its website with a specific audience membership called interest groups. For example, a retailer website could tag all the users that visited a certain category of products within its website and create an interest group scoped only to users that are interested in that specific category. Google’s FLEDGE then enables marketers to bid on programmatic opportunities to re-engage this and other interest groups when the users visit publisher websites. Concretely, when a user visits a publisher, the FLEDGE bidding logic will automatically call all interest groups that have been assigned to that specific user, and each interest group will return a bid and a creative advertisement. Finally, the interest group with the highest bid will display the selected advertisement to the user.

To further understand the details, please look at the specifications and the API documentation here.

Origin Trial Traffic

Interest Group Tagging

Figure 2 shows the interest group tagging traffic we have observed since early June 2022.

The Origin Trial was first activated on 50% of all Chrome Beta users back in May and then was gradually ramped up to first 1% of Chrome stable users and now 5%. Currently, we observe close to 80 million tagged interest groups per day on 14,000 of our partners’ websites.

FLEDGE Bid Request

In terms of bidding, below is the total traffic we’ve received over time in the U.S., which for the moment is the only country we are testing. An extension beyond the U.S. is currently blocked because of the need for FLEDGE to support multiple currencies (for more info see here).

The bidding traffic is provided by our SSP partner Google Ad Manager, hence it is not proportional to the users in FLEDGE’s Origin Trial observed in figure 2. We must be conservative when ramping up bidding traffic. Even 1% of Chrome traffic, which represents over 20 millions creative displays per day, could have a significant impact. Any unresolved or undetected bugs may put at risk considerable amounts of publisher and advertiser revenue.

In the first phase, between May and August, we set up the bidding pipeline with a non-rendering experiment, meaning all the bidding logic was executed but the final ads were not rendered. Nevertheless, we were still receiving FLEDGE reporting events which allowed us to make some first assessments, such as computing Criteo’s win rate compared to the existing Criteo bidder, checking reporting capabilities, correctly applying the campaign and invalid traffic filters, and adjusting our bid level to the new constraints.

We are now in the next phase that allows the rendering of FLEDGE ads, announced by Google Ad Manager at the beginning of September. Since the beginning of October, as seen in the figure 3, we are receiving around 10 million daily FLEDGE contextual bid requests in the US. The received traffic is lower compared to the tagging traffic because we have set up a daily budget cap of 100 USD to be able to fully stabilize the bidding pipeline and creative rendering before scaling up any further. Once we are confident with our metrics, we want to scale up U.S. bidding traffic to the full Origin Trial and draft some first analysis.

Awaited Features

In summary, we can already create interest groups, bid for those interest groups, and measure displays, however, our FLEDGE testing is far from complete and there are still some key missing features to this first test:

we are able to use an event level reporting instead of aggregated reporting
we are still able to use our own servers for bidding and creative rendering, instead of a trusted server
rendering is still possible in an iframe, fenced frame not being available in its final state
no K-anonymity enforcement
limited functionality of daily update URL

The absence of those features does not allow us to fully experiment with FLEDGE. All technologies need to be available for testing FLEDGE’s business impacts on the advertising landscape, in particular quantifying the impact on marketer effectiveness and publisher revenues. Nevertheless, this does not stop us from properly evaluating the design and collecting our first data points.

When each of those technologies becomes available, we must be careful to not roll them out in a premature way. Fenced frames and trusted servers will severely impact the way we can debug our pipeline and find problems. Once the full design is validated and we have a stable implementation we can move forward with evaluating the utility of FLEDGE.

Now, let us discuss some limitations in more detail that need to be addressed for Criteo to properly evaluate the utility and impact of FLEDGE.

Complete Reporting Capabilities Are Essential

One limitation is the lack of reporting capabilities. For the moment, reporting is activated through the event level reportWin API, which is supposed to be a temporary mechanism. This API allows ad tech players to observe displays together with information on the value of bids, the winning creative, and some contextual information about the publisher. Yet, the current API misses some essential reporting features. For instance, it is not possible to retrieve information regarding the features of the interest group. You might ask yourself, “why are these not captured by the event level API?” Adding interest group feature reporting is actually against the FLEDGE principles because it would allow other organizations to link user data from the advertiser website to contextual information from the publisher website, and therefore track a user’s browsing behavior across these two events.

As with the event level reportWin API only displays can be observed for now, Google proposed some new event level API’s like

the event level Fenced Frame Ads Reporting API that would allow reporting on clicks,
integration with the event level Attribution Reporting API that would allow reporting on conversions.

However no official guidelines have been communicated on how to use those APIs in a consistent way and which use cases they are meant to address. For example, it seems unlikely that the noise level present in the Attribution Reporting API would be satisfactory for handling billing where exact measures are required.

Because we are not able to access reporting on user features within the interest groups we bid on, we cannot effectively use machine learning to automatically optimize the ad performance. Currently, a great part of Criteo’s campaign performance comes from optimizing ads using user signals collected on advertiser websites like purchasing power and user context (i.e., whether he is a previous buyer).

Reporting is also needed for technical monitoring. Currently, there’s not an easy way to analyze measured feedback on the auction itself, such as time spent in bidding functions or if there were errors that directly happened in the Chrome browser.

Fortunately, things are moving forward. The long-term vision of FLEDGE is to use aggregated reporting which could serve both needs of machine learning on user signals and technical monitoring. An aggregation API would allow companies to retrieve aggregated metrics at a cohort level from a trusted server. It would therefore activate the collection of contextual signals like day, domain, and the advertiser signals previously mentioned without being able to track a user’s browsing behavior across these two events.

For conversion events, Google recently proposed a way to integrate the aggregated Attribution Reporting API into FLEDGE. For wins, lost bids, and clicks, Google proposes the Extended Private Aggregation API. We would welcome the convergence of both API’s surfaces in terms of usability.

One challenge that still needs to be solved is how to automatically train ML models on aggregated data. We started the discussion by introducing the Privacy-preserving ML Challenge at AdKDD’21 on Criteo’s data and received many interesting submissions. This use case was discussed in the PATCG W3C standardization group in May 2022 but more research is still needed before getting any practical implementation for ML-based optimization with aggregated data.

Missing A/B Test Framework

As highlighted in this github issue submitted by our teams, in the current test framework, FLEDGE bids will compete against traditional bids relying on 3rd-party cookies.

While having access to 3rd-party cookies is good for debugging, it also means that simulating the real impact of the Privacy Sandbox on publishers’ revenue, advertisers’ ROAS, or machine learning models’ performance will prove difficult.

As the current bidders use more information about the user (for example by combining multi-advertiser signals) their bid would be probably better. Therefore, we could expect that:

FLEDGE bids will lose against existing bidders for better opportunities
FLEDGE will win opportunities of lower quality

It would be optimal to have an isolated environment where 3rd-party cookies are not available anymore, so we can compare to the behavior of the current ad tech ecosystem in a scenario as similar as possible to that of a full deprecation of third-party cookies.

This could be achieved with an A/B test that includes a treatment and control group. Chrome could assign the users randomly to a certain group and keep the user in the same group for the duration of the experiment.

For users that are in the control group, Chrome would keep 3rd-party cookies and remove the FLEDGE API and for users in the treatment group, Chrome would remove all 3rd-party cookies and keep the FLEDGE API.

Those groups should be made available with the reporting APIs along with the associated metrics. We could then compare those metrics to assess FLEDGE’s true impact on the current ad ecosystem.

What’s Next?

Currently, the FLEDGE tests are planned to run until April 2023, however, Google already extended the deadline in the past and this is very likely to happen again. Now that the technical pipeline is functional, we will have the opportunity to work on its stabilization and compare basic metrics to our current Criteo pipeline.

Until then, we hope more players, especially SSPs and DSPs, but also publishers and advertisers, will have engaged in the FLEDGE tests to cover all uses cases and make the experiment fully representative.

If you need any advice or if you want to partner with us for testing FLEDGE: privacy-sandbox-testing@criteo.com.

Competition with a real auction and multiple players will be a crucial part to measure relevant publisher revenues and compare them to the existing bid streams. We are looking forward to contributing to the implementation of the necessary changes for the next steps of testing (like currency management, A/B testing and aggregated reporting) and quantifying the impact on marketer effectiveness and publisher revenues for FLEDGE in 2023.