Enumeration-Resistant Labeling Ideas #2581
Replies: 1 comment
-
Don't have much top-of-mind to contribute to this, but very interested in thinking and options around this! A bit of our previous thinking has been that we want users (aka, end clients) to be able to validate/confirm individual labels which are resulting in an action/behavior. Related to this is likely wanting to be able to reliably query what labels (and thus behaviors/actions) would apply to a single specified piece of content. For some labelers (like the Bluesky Moderation Service), enumerability is at least partially beneficial and allows auditing, accountability, reporting, etc. In some situations, enumerability is not as beneficial; for example we intentionally do not have an API or mechanism to query the AppView specifically for a list of records or blobs which have been takendown at the infra level. But for other labelers, enumeration might not be desirable. In the original design of the label system we thought it would be cool to let label services gate access, enabling subscription models or more more-private bespoke labeling. We didn't end up with a way to ensure this end-to-end: we don't want AppViews to be too trusted of a party (to maintain label secrecy), because then it becomes difficult to start new independent AppViews without permission, which is one of our goals for ecosystem exit/governance. The door is left open for folks to add friction around this though, with some small changes. Maybe PDS instances can have access to labels, and hydrate them in to label lists (if we tweak Lexicons a bit, this could be app-agnostic, similar to blob extraction today). Or maybe we do allow labels to be limited to specific trusted appviews (which won't provide open enumerability), and that friction could be enough. Another aspect we have discussed is locking label access to specific app-client-implementations (eg, via OAuth client identifiers; the branded client software, not any particular user). I suspect that labels may end up like "the news": valuable today, much less valuable tomorrow. That doesn't cover the use-case of folks wanting to make labels less enumerable because they want to tamp down on drama/discourse though. |
Beta Was this translation helpful? Give feedback.
-
There are certain classes of moderation labels where the intention is to make the labeled content less visible, and yet, certain individuals would want to do the opposite (treating the label as an endorsement, and using it to gather matching content). From the perspective of the entity doing the labeling this is likely undesirable.
In these cases, ideally, it would be expensive to enumerate the set of records with a particular classification, but still cheap to find out which labels apply to a particular record. I don't think it's possible to solve this problem outright, but maybe some friction could be added.
Maybe I'm overthinking this, and the solution is just to have labeling services be selective about who they allow to subscribe.
Or maybe it's just not a problem that needs solving at all (I'm not aware of it being an active issue).
I don't have any good ideas for a solution either, but here are some bad ones:
Have the label reference the hash of the target URI, rather than the URI itself. This is a non-solution because it's far too easy to maintain an index mapping hashes to URIs.
As above, but with a salt picked secretly at random from a finite set (say, numbers between 0 and 100,000). If you wanted to build an index mapping hashes to URIs, your index would need to be 100000x larger than before, hopefully making it infeasible in terms of storage. So to answer the question of "what labels apply to this post?", you'd have to compute 100,000 hashes "live" and look each of them up in an index of hashes (bloom filters et al can make this cheap-ish). Cheap-ish to do once, but expensive for someone to enumerate though an archive of ~every record. Unfortunately this is still pretty expensive for legitimate use cases, probably too expensive for an AppView to do for every record it wants to serve, even with the results being cached for some time period. Maybe a smaller value of N could be picked? Or maybe you could get creative and find a way to offload the compute onto the client? Or maybe a similar but different approach that could be used to gain a different trade-off?
Beta Was this translation helpful? Give feedback.
All reactions