99

Pretty much as the title says. Rendering all of the unicode format correctly what with composite characters and characters that affect other characters and ligatures is really hard, I understand that. We have fonts that seem to be designed for maximum Unicode symbol support(Symbola, Code2001, others) and specialized fonts for certain planes or character ranges(BabelStone Han, others).

I don't know much about the underlying technical details for fonts. Is there a maximum size? Is it a copyright problem? Is essentially redrawing all ~110,000 extant glyphs too hard? I understand style concerns, but why not fall back to a 'default' font that had glyphs for everything? They're on unicode.org, redrawing them all would be pretty hard work but then you'd have a guaranteed fallback font for everything. If you got rights to some pre-existing fonts you could just composite them and that should help a lot. Such a font would be a great help to humanity and I can't see a good technical reason why it doesn't exist or at least an open-source effort to create it, so I presume an invisible-to-me reason why it can't be done.

What is that reason?

5
  • 1
    If you want your typeface to not look like an amateur effort then you need a specialist for each script. And Unicode has a lot of scripts. Commented Jan 12, 2016 at 1:44
  • 23
    Style wasn't something I was concerned about. I was thinking "well why not have a fallback font that had everything so you'd never see 'glyph not present' because that's not helpful" and an ugly glyph beats no glyph. As pointed out in Mike's answer there are technical reasons why a font collection is required, and very good open-source font collection efforts. Commented Jan 12, 2016 at 15:53
  • Weight limitations aside, style matching is why font families are preferred. Most CJK fonts have weird looking Latin letters to actively make the latter consistent. When they are not actively made to match you have oddities: ⁰¹²³⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾ⁿ is my pet peeve. Adobe Garamond Premier —Pricey— has Latin and Greek, but they did not intend it for chemists to use both simultaneous —"α-ketoglutarate" looks weird in many fonts. Commented May 16, 2016 at 7:10
  • 6
    I know this is older but I'm surprised by the "Is redrawing 110K glyphs (with metrics and kerning and combining attributes and hinting) too hard?" I used to do typography. A plain, unoriginal typeface with 255 straightforward latin-# oriented letters is at least a couple days of work; probably a couple weeks; couple months for truly good work. 110K is the equivalent of 400+ faces with much harder metrics and such. 15,000 hours of work or drastically more; so at least 7 or so years. So, kinda hard.
    – Ashley
    Commented Aug 17, 2018 at 20:48
  • 1
    Same with me: I just want to be able to see all Unicode characters! If the OpenType spec doesn't cut it, it must be extended!
    – cskwg
    Commented Nov 25, 2020 at 10:49

3 Answers 3

134
+50

"Why would you even want that?" questions aside, from a programming perspective there's a very simple reason: the OpenType spec only affords an addressable glyph index space of one USHORT, so one font can only support 16 bits worth of glyphs identifiers, or 65,536 glyphs max. (And note the terminology: a "glyph" is not the same as a "character" or "letter")

The current version of Unicode, v8 as of this answer, contains 120,737 assigned code points, or almost twice as many as fit in a modern font (2021 edit: v13 upped this number to 143,859). In fact, Unicode hasn't been able to fit in a modern OpenType font since 2001, with the release of Unicode 3.1, which upped the number of code points from 49,259 to 94,205.

"So what about font collections?" I hear you ask. Why not use multiple fonts and support all unicode that way? Well now, you've just described Adobe's Sans Pro, and Google's Noto (which are the same font).

As for the "how hard can it be": a uniform style for all glyphs in Unicode, across 129 established written scripts on this planet, each with their own typesetting rules? Incredibly hard. You may think fonts are just files with pictures for letters, and someone types a letter, that picture shows up: that is not how fonts work, and isn't how fonts have worked since the late 1980's.

Modern fonts are the typographic equivalent of a game ROM: sure, it's not much use without the hardware or software to run that ROM on, but all the things that actually matter are in the ROM. Similarly, modern fonts contain all the information for typesetting. Not just pictures, they contain the metadata, the metrics, the positioning and substitutions rules for arbitrary sequences, with separate rule sets for each written script that OpenType supports, mandatory and optional ligatures, language-specific character replacements for letters at the start/middle/final position in a word, or in isolation, character repositioning relative to arbitarily complex sequences of other characters either before or after it, arbitrarily complex sequence replacements with other arbitrarily complex sequences, possible bitmap fallbacks for small-point rendering, hinting instructions on how to properly rasterize vector graphics that are inherently not aligned to any particular pixel grid, and more. A modern font is a ridiculously complex application, that a font engine consults to figure out how to typeset sequences of code points.

Making a (set of) Unicode-encompassing font(s) that looks good for all contexts is a vast team effort.

So: "Why isn't there a font that contains all Unicode glyphs?", because that's been technically impossible since 2001. We can, and do, make font families that cover all of Unicode, but with 129 different scripts all with their own typesetting rules, it's a lot of work, and almost (almost) not worth the effort compared to only covering a subset of all languages.

And as for this:

Such a font would be a great help to humanity and I can't see a good technical reason why it doesn't exist or at least an open-source effort to create it, so I presume an invisible-to-me reason why it can't be done.

Just because you didn't know about them, doesn't mean they don't exist, with millions of people who are familiar with them. They exist =)

They're even open source, go out and thank the people who made them!

11
  • 14
    Adobe Blank is an extreme specialty font, it does the opposite of a full Unicode implementation: it has a special CMAP that maps every single Unicode code point to the same, single, glyph (the "blank"). Instead of implementing everything, it implements nothing, and represents that nothing with an empty picture. It's used in font debugging as fallback during testing: if you see Adobe Blank's "blank" (which has a width, so you can see it in your text), you know the font you're debugging is missing something. Commented Jan 12, 2016 at 16:15
  • 3
    it's special purpose for when you're doing type design implementation, as well as proofing before sending something off for production (be that text, a webpage, whatever). Especially for the latter, seeing "nothing" rather than text styled with a different font that might be similar enough that you don't catch it on the first proofing is quite valuable. Commented Jan 12, 2016 at 17:03
  • 9
    I wonder why the OTF/TTF specifications aren't updated to support more than 65536 maximum glyphs. Clearly we've surpassed this limit a while ago and downloading a single font would be easier than trying to navigate a font family.
    – Gili
    Commented Aug 22, 2016 at 22:36
  • 5
    Because they can't. A USHORT can only fit 65k numbers. Want more characters? Good news: use a font collection. Which the spec has been updated with (microsoft.com/typography/otspec/otff.htm => "font collections") Commented Aug 23, 2016 at 3:24
  • 8
    Then feel free to sign up for the OpenType discussion list and posit that statement, and then you will likely get a pretty well-reasoned response on why that isn't happening (most notably: it fixes something that isn't a problem because font engines can deal with font stacks just fine, while at the same time breaking compatibility for every device on the planet. Not just computers, but also the million or so models of printers currently in use across the world) Commented Dec 2, 2016 at 17:19
10

There is GNU Unifont. It aims to contain all Unicode, except Apple Emoji.

3
  • 12
    Except it doesn't - it only implements the Basic Multilingual Plane, which isn't even half of Unicode, and it's not actually very good at being a font: it's just a character map. If you need any kind of complex text shaping as required for quite a lot of languages covered by BMP, then GNU Unifont is basically useless to you. Also, as mentioned, a single font cannot, due to programming limitations contain more than one USHORT of glyph ids so you'll never be able to put everything in a single font. That's why collections exist. Commented Apr 13, 2019 at 15:51
  • 1
    @Mike'Pomax'Kamermans In fact, as noted in the page itself, GNU Unifont has to be used as a font collection (with Unifonts Upper & CSUR) for it to cover all (characters that do not need high-resolution symbols in) Unicode. Even then, the authors also noted that complex scripts with special forms for letter combinations...will not render well in Unifont and that Unifont is only suitable as a font of last resort. Note that I exclusively use the Unifont collection in my web browsing because I hate myself. Commented May 14, 2019 at 15:05
  • 4
    I know? I looked up what it did, which is why I left a comment. It is an insane font to use, and if you want local "all languages" support, go grab the Noto family or something, because those do support real languages instead of just "some glyphs". Commented May 14, 2019 at 15:36
-1

You will probably find what you are looking for at the following links.

Unicode Character Table

HTML Character Entity References

Huge List of Unicode Symbols

List of Unicode Characters of Category “Other Symbol

This other is funny for particular character since you can draw what you search:

Unicode Character Recognition

Can't enter unicode character with Alt+ even with EnableHexNumpad

Basic Questions

Q: How many characters are in Unicode? A: The short answer is that as of Version 13.0, the Unicode Standard contains 143,859 characters. The long answer is rather more complicated, because of all the different kinds of characters that people might be interested in counting.

Unicode font A Unicode font is a computer font that maps glyphs to code points defined in the Unicode Standard. The vast majority of modern computer fonts use Unicode mappings, even those fonts which only include glyphs for a single writing system, or even only support the basic Latin alphabet.

Fonts which support a wide range of Unicode scripts and Unicode symbols are sometimes referred to as "pan-Unicode fonts", although as the maximum number of glyphs that can be defined in a TrueType font is restricted to 65,535, it is not possible for a single font to provide individual glyphs for all defined Unicode characters (143,859 characters, with Unicode 13.0).

...

No single "Unicode font" includes all the characters defined in the present revision of ISO 10646 (Unicode) standard, as more and more languages and characters are continually added to it, and common font formats cannot contain more than 65,535 glyphs (about half the number of characters encoded in Unicode).

As a result, font developers and foundries incorporate new characters in newer versions or revisions of a font, or in separate auxiliary fonts intended specifically for particular languages.

Enjoy!

Not the answer you're looking for? Browse other questions tagged or ask your own question.