Here is a string from a text file:
@™TdaŽ®Æ‚êƒ~ƒNƒXƒgƒŒ[ƒgEƒrƒLƒjver1.11d1.d2iƒrƒLƒjƒ‚ƒfƒ‹ver.1.1³Ž®”z•z”Åj
It includes many nonprinting characters and is copied here: https://pastebin.com/TUG4agN4
Using https://2cyr.com/decode/?lang=en, we can confirm that it translates to the following:
☆Tda式照れミクストレート・ビキニver1.11d1.d2(ビキニモデルver.1.1正式配布版)
This is with source encoding = SJIS (shift-jis), displayed as Windows-1252.
But how can we obtain the same result without a website? The relevant tool is iconv, but something in the tool chain is broken. If I try to cat from the source text file or use it as standard input with '<' in bash, one of the 'iconv's in the chain quickly errors out. If I copy the above string from text editor gedit (reading the file as utf-16le) or as output by iconv with utf16-to-utf8 conversion, then the result is close, but still wrong:
@儺da式ニれミクストレ[トEビキニver1.11d1.d2iビキニモデルver.1.1ウ式配布版j
Some evidence of the tool chain failing:
$ cat 'utf8.txt' |head -1
@™TdaŽ®Æ‚êƒ~ƒNƒXƒgƒŒ[ƒgEƒrƒLƒjver1.11d1.d2iƒrƒLƒjƒ‚ƒfƒ‹ver.1.1³Ž®”z•z”Å
$ cat 'utf8.txt' |head -1| iconv -f utf8 -t utf16
���@�"!Tda}��� ��~�N�X�g�R�[�g�E�r�L�jver1.11d1.d2�i�r�L�j� �f�9 ver.1.1��}� z" z ��j
Note three invalid characters at start.
$ cat 'utf8.txt' |head -1| iconv -f utf8 -t utf16|iconv -f utf16 -t windows-1252
iconv: illegal input sequence at position 2
$ echo "@™TdaŽ®Æ‚êƒ~ƒNƒXƒgƒŒ[ƒgEƒrƒLƒjver1.11d1.d2iƒrƒLƒjƒ‚ƒfƒ‹ver.1.1³Ž®”z•z”Åj"| iconv -f utf8 -t utf16
��@"!Tda}�� ��~�N�X�g�R[�gE�r�L�jver1.11d1.d2i�r�L�j� �f�9 ver.1.1�}� z" z �j
Note two invalid characters at start, other differences. The sequence copied from terminal matches the string displayed in text editor, confirmed by find (ctrl-F) matching it, which is the same string that gives the correct result on 2cyr.com.
Extending the last command above with '|iconv -f utf16 -t windows-1252|iconv -f shift-jis -t utf8' gives the close, but incorrect result quoted above, instead of erroring out as the direct chain does.
If I tried making a file named the example string and using the tool convmv on it, convmv said the output filename contained "characters, which are not POSIX filesystem conform! This may result in data loss." Most filenames that are invalid with UTF-8 don't give this warning.
Is there any bit sequence that piping in bash can't handle? If not, why is the tool chain not working?
Apparently the difference is because bash won't paste unprinting characters (the boxes with numbers) to the command line; maybe 'readline' can't handle them? But the result being close suggests the conversion order in the toolchain is correct, so why isn't it working?
The original file, with its filename scrambled in a different way (expires after 30 days): https://ufile.io/oorcq