I'm developing a open source software that (internally) uses font glyph identification. The glyphs I'd like to detect come from copyrighted fonts, to which I have personal use licenses.
By font glyph detection, I mean the following:
Let's say a user of my software has a font file.
A brief technical primer on fonts
Broadly, a font file is a table which says "draw the character a
like this" for each character supported by the font. We'll call a character X
the abstract idea of the letter X and the glyph "X" a visual representation of how to show that abstract idea of that letter. These "draw" instructions are usually vector artworks (like SVG, which allow for resizing of the art without the pixelation you get if you were to, say, enlarge a desktop background JPEG to twice its size). These draw instructions can be fed into some math which produces a rasterized version of the glyph, which is a format that can directly inform pixels on a screen or dots printed by a printer.
Usually, the glyph that a font file has for the character a
actually draws an "a". But, in this case, the user's font file has been compressed. For example, let's say only the a
and c
characters are used. In the user's font, instead of saying:
- character
a
is drawn like "a" - character
b
is drawn like (nothing) - character
c
is drawn like "c"
They remove the b
and move the c
up one:
- character
a
is drawn like "a" - character
b
is drawn like "c"
Given the letters "abba" to be drawn, the only way to recognize that that actually corresponds to (draws) the letters "acca" is to recognize that the b
character in the user's font actually draws a "c". This recognition is what I mean by glyph identification.
To do this glyph identification, I understand that I can't just include copyrighted fonts in software that I plan to open source. I also don't need to include the entire fonts. I just need some information about how the glyphs are drawn to be able to do identification.
To this end, my plan is to derive information from each copyrighted glyph and only include this derived information in my software. The derivation would work as follows:
Take a glyph from a copyrighted font and draw it at a small size. Place dots around the edges of the drawn glyph and record their coordinates. These coordinates (in order) would be the derived information.
What's (potentially) important to note is that from these coordinates you cannot derive the original font (it's a similar problem to scaling an SVG vs scaling a JPEG, except the pixelation with glyphs would make them unreadable—likely undiscernible—at most sizes). The only thing that could potentially be derived is a rough rasterization of the glyphs rendered at that small size, but this would require significant technical effort (eg. it's not as easy as opening an image or installing a font on your computer).
Would including such derived information be a violation of the copyright on the font? Does my method of derivation make my work a derivative work of the copyrighted fonts?
I believe their may be an argument for fair use here as one of the factors for evaluating fair use says:
Effect of the use upon the potential market for or value of the copyrighted work: Here, courts review whether, and to what extent, the unlicensed use harms the existing or future market for the copyright owner’s original work. In assessing this factor, courts consider whether the use is hurting the current market for the original work (for example, by displacing sales of the original) and/or whether the use could cause substantial harm if it were to become widespread.
Because it is impossible to recover the original font from my derived information, this software would likely have no impact on sales of the original work (because the only way to obtain the copyrighted font would be to buy it).
Would this constitute fair use?
tl;dr I have an open source software program which needs to use points derived from copyrighted fonts to perform part of its function. The points need to be included with the software, but they can't be used to reproduce the copyrighted font file. Is a violation of the fonts' copyright?
A quick aside:
This problem has some parallels to Optical Character Recognition (OCR), for which they are many open source libraries. These libraries include machine learning models (of which my software could be considered a gross simplification), which have been trained on copyrighted text. The same idea holds that you couldn't reconstruct the copyrighted fonts from the open source OCR libraries.
Unfortunately, using an OCR library is not well suited for my specific problem.