In this answer, it is mentioned that facts about a copyrighted work, such as lists of how many times they use each word, are not really derivative works because, as facts, they aren't subject to copyright. So it is not a violation of the normal license of, say, a book to prepare or publish a list of words used and their counts.
I could also make a table of frequency digraphs, though: there are 10 instances of the first most frequent word followed by the second most frequent word, 8 of the first most frequent word followed by the third most frequent word, and so on. It seems likely that that table would also be just a fact about the book, not really subject to copyright.
And I could make a table of digraph frequencies on the digraphs themselves: the third most common digraph is followed twice by the eighth most common digraph, and so on. Kind of a weird fact to explain, but still a genuine fact. It's not an obviously fake fact, like a jillion-digit number that just so happens to reveal the book when stored in a file in binary and opened in Notepad.
But if I repeat this process with enough levels, I eventually have a collection of facts that can be true of only one possible book, and the collection taken together could (assuming I am right about the correctness of this exact system) be used to practically reconstruct the book.
Am I actually allowed to publish all of these facts? Or do I have to publish only a subset of the facts, because the whole collection taken together actually is a copy of the book?
Can I hang up a sign "Facts sufficient to reconstruct $POPULAR_BOOK
, 10¢ each"? Or am I obligated to not encourage or deliberately facilitate book reconstruction, without having a reason to believe the reconstructor would be doing it under a license or under fair use?