Recently, I've become interested in distributed, peer-to-peer (p2p) file sharing, including BitTorrent (en.wikipedia.org/wiki/BitTorrent) and the InterPlanetary File System (ipfs.io) (IPFS). This question specifically concerns IPFS.
IPFS is very complicated for non-technical people, so I'll try and break it down for the purpose of this discussion:
In IPFS, you can add data to the global file system, which produces a "hash" (a string uniquely identifying the data) that you can share with others. Data in IPFS is content-addressed, meaning that instead of using URLs or locations for data, it uses fingerprints of the data itself. This has several useful characteristics:
- anybody on the network with the same data can serve it to you, reducing the burden on a web server and eliminating the waste of bandwidth that is HTTP;
- data can remain available even after the original host fails, which makes archiving data much more reliable on IPFS;
- and there is no central system that can fail, like DNS resolvers, that can prevent the system from operating (aside from, of course, normal internet infrastructure).
When sharing on IPFS, you aren't actually sending whole files to peers; instead, these files are broken apart into "blocks" of data which are retrieved separately (and often from different peers altogether). These blocks are re-assembled at the end to form the complete data.
Now, suppose someone is sharing child pornography via IPFS. Under U.S. law, the mere possession of child pornography is illegal, so the sender is certainly acting illegally. The recipient of these files, after assembling the blocks to form the files, are now in possession of content that visually depicts minors engaging in sexual activity. However, would an intermediary node (hypothetically) that may have cached a block needed to form the violating data be guilty of possession? The block alone cannot represent the visual activity that constitutes illegal material, and in fact, the hash representing the block can represent any such block with the same data (since the hash identifies the data, not a location), so its entirely possible (if slightly improbable) that the same hash for a block used in child pornography could be a block used to represent part of a Disney movie. For example, these two sentences, broken down into uniquely identifiable chunks of two:
I have two cats in my house.
I |ha|ve| t|wo| c|at|s |in| m|y |ho|us|e.
and
You should burn in Hell.
Yo|u |sh|ou|ld| b|ur|n |in| H|el|l.
You'll notice that both sentences have a common chunk, "in". Under US law, would someone who possesses this block, "in", be responsible for its use in all contexts, such as in the second sentence?
Obviously, in real data these chunks are between 256 kilobytes and 2 megabytes, so the chance for collision is greatly reduced, but by design of the system, content addressing allows for the avoidance of duplicate data. My question is how that affects legal regulations concerning said data.