This is a problem that I've encountered a couple times recently. Here is my most recent experience:
Trying to browse https://www.scape.sc/release.php?id=48, a page that contains japanese text. The japanese in this page is completely garbled and displayed as unicode square characters, symbols and various Latin accented characters. This is true even in the html source, so I don't think it is an issue of font choice.
The site uses what I understand from this webhint.io article an out-of-date method of declaring the character set, <META Http-equiv="Content-Type" Content="text/html; charset=utf8">
. Although the article does mention that this shouldn't be a problem nowadays.
This is how the raw html looks when I visit it in my browser:
<TR><TD>2.</TD><TD>記憶ã¨ç©º</TD><TD> <I>(kioku to sora)</I></TD></TR>
In the past, I had found that searching for older versions of websites with this issue on the Internet Archive's Wayback Machine would display the japanese characters correctly. This is true in my current case as well.
In the following two examples from the Wayback Machine, the first link is from a capture in 2016, both the page source and the rendered page use valid/uncorrupted japanese characters. The second is from 2023, and displays the same garbled text that I see on my own machine, which makes me more confident that it is not a problem on my end.
raw html from 2016:
<tr><td>2.</td><td>記憶と空</td><td> <i>(kioku to sora)</i></td></tr>
raw html from 2023:
<tr><td>2.</td><td>記憶ã¨ç©º</td><td> <i>(kioku to sora)</i></td></tr>
My suspicion is that this is an error on the webmaster's part, that maybe there was some charset mismatch when making changes to the site in a text editor sometime between 2016 and now. Does this sound reasonable? Is there any way to recover the "corrupted" unicode and avoid having to rely on old captures of sites on the Wayback Machine?
TL;DR: Website used to contain valid unicode, no longer does. How can such an issue occur? Can the problem text be reversed/made legible by the end-user?