Playing with UTF8 in bash
Upgrade 2023...
Print UTF8
From some time ago, bash use %b
in printf
:
printf %b\\n \\U1F600
😀
Store UTF8 into a variable
So you could assign a variable by using -v
flag of bash's printf
builtin:
printf -v smiley \\U1F600
echo $smiley
😀
Strictly answering SO question:
user@host:~$ printf -v skull %b '\U2620'
user@host:~$ PS1=${PS1/%\\$ /$skull\\$ }
user@host:~☠$
Could do the job. (Note %b
is nearly useless)
Showing part of table
Then for showing quickly some part of unicode table:
printf %b\\n \\U1F6{{0..9},{A..F}}{{0..9},{a..f}}|paste -d\ -{,,,}{,,,}
😀 😁 😂 😃 😄 😅 😆 😇 😈 😉 😊 😋 😌 😍 😎 😏
😐 😑 😒 😓 😔 😕 😖 😗 😘 😙 😚 😛 😜 😝 😞 😟
😠 😡 😢 😣 😤 😥 😦 😧 😨 😩 😪 😫 😬 😭 😮 😯
😰 😱 😲 😳 😴 😵 😶 😷 😸 😹 😺 😻 😼 😽 😾 😿
🙀 🙁 🙂 🙃 🙄 🙅 🙆 🙇 🙈 🙉 🙊 🙋 🙌 🙍 🙎 🙏
🙐 🙑 🙒 🙓 🙔 🙕 🙖 🙗 🙘 🙙 🙚 🙛 🙜 🙝 🙞 🙟
🙠 🙡 🙢 🙣 🙤 🙥 🙦 🙧 🙨 🙩 🙪 🙫 🙬 🙭 🙮 🙯
🙰 🙱 🙲 🙳 🙴 🙵 🙶 🙷 🙸 🙹 🙺 🙻 🙼 🙽 🙾 🙿
🚀 🚁 🚂 🚃 🚄 🚅 🚆 🚇 🚈 🚉 🚊 🚋 🚌 🚍 🚎 🚏
🚐 🚑 🚒 🚓 🚔 🚕 🚖 🚗 🚘 🚙 🚚 🚛 🚜 🚝 🚞 🚟
🚠 🚡 🚢 🚣 🚤 🚥 🚦 🚧 🚨 🚩 🚪 🚫 🚬 🚭 🚮 🚯
🚰 🚱 🚲 🚳 🚴 🚵 🚶 🚷 🚸 🚹 🚺 🚻 🚼 🚽 🚾 🚿
🛀 🛁 🛂 🛃 🛄 🛅 🛆 🛇 🛈 🛉 🛊 🛋 🛌 🛍 🛎 🛏
🛐 🛑 🛒 🛓 🛔 🛕 🛖 🛗 🛜 🛝 🛞 🛟
🛠 🛡 🛢 🛣 🛤 🛥 🛦 🛧 🛨 🛩 🛪 🛫 🛬
🛰 🛱 🛲 🛳 🛴 🛵 🛶 🛷 🛸 🛹 🛺 🛻 🛼
Showing braille part:
printf %b\\n \\U28{{0..9},{A..F}}{{0..9},{a..f}}|paste -d\ -{,,,}{,,,}
⠀ ⠁ ⠂ ⠃ ⠄ ⠅ ⠆ ⠇ ⠈ ⠉ ⠊ ⠋ ⠌ ⠍ ⠎ ⠏
⠐ ⠑ ⠒ ⠓ ⠔ ⠕ ⠖ ⠗ ⠘ ⠙ ⠚ ⠛ ⠜ ⠝ ⠞ ⠟
⠠ ⠡ ⠢ ⠣ ⠤ ⠥ ⠦ ⠧ ⠨ ⠩ ⠪ ⠫ ⠬ ⠭ ⠮ ⠯
⠰ ⠱ ⠲ ⠳ ⠴ ⠵ ⠶ ⠷ ⠸ ⠹ ⠺ ⠻ ⠼ ⠽ ⠾ ⠿
⡀ ⡁ ⡂ ⡃ ⡄ ⡅ ⡆ ⡇ ⡈ ⡉ ⡊ ⡋ ⡌ ⡍ ⡎ ⡏
⡐ ⡑ ⡒ ⡓ ⡔ ⡕ ⡖ ⡗ ⡘ ⡙ ⡚ ⡛ ��� ⡝ ⡞ ⡟
⡠ ⡡ ⡢ ⡣ ⡤ ⡥ ⡦ ⡧ ⡨ ⡩ ⡪ ⡫ ⡬ ⡭ ⡮ ⡯
⡰ ⡱ ⡲ ⡳ ⡴ ⡵ ⡶ ⡷ ⡸ ⡹ ⡺ ⡻ ⡼ ⡽ ⡾ ⡿
⢀ ⢁ ⢂ ⢃ ⢄ ⢅ ⢆ ⢇ ⢈ ⢉ ⢊ ⢋ ⢌ ⢍ ⢎ ⢏
⢐ ⢑ ⢒ ⢓ ⢔ ⢕ ⢖ ⢗ ⢘ ⢙ ⢚ ⢛ ⢜ ⢝ ⢞ ⢟
⢠ ⢡ ⢢ ⢣ ⢤ ⢥ ⢦ ⢧ ⢨ ⢩ ⢪ ⢫ ⢬ ⢭ ⢮ ⢯
⢰ ⢱ ⢲ ⢳ ⢴ ⢵ ⢶ ⢷ ⢸ ⢹ ⢺ ⢻ ⢼ ⢽ ⢾ ⢿
⣀ ⣁ ⣂ ⣃ ⣄ ⣅ ⣆ ⣇ ⣈ ⣉ ⣊ ⣋ ⣌ ⣍ ⣎ ⣏
⣐ ⣑ ⣒ ⣓ ⣔ ⣕ ⣖ ⣗ ⣘ ⣙ ⣚ ⣛ ⣜ ⣝ ⣞ ⣟
⣠ ⣡ ⣢ ⣣ ⣤ ⣥ ⣦ ⣧ ⣨ ⣩ ⣪ ⣫ ⣬ ⣭ ⣮ ⣯
⣰ ⣱ ⣲ ⣳ ⣴ ⣵ ⣶ ⣷ ⣸ ⣹ ⣺ ⣻ ⣼ ⣽ ⣾ ⣿
Better into a little function
showU8_256() {
local i a
for a ;do
for i in {0..9} {A..F}; do
printf '\\U%05Xx: %b %b %b %b %b %b %b %b %b %b %b %b %b %b %b %b\n' \
0x$a$i \\U$a${i}{{0..9},{A..F}}
done
done
}
Then
showU8_256 1f{3,4}
\U01F30x: 🌀 🌁 🌂 🌃 🌄 🌅 🌆 🌇 🌈 🌉 🌊 🌋 🌌 🌍 🌎 🌏
\U01F31x: 🌐 🌑 🌒 🌓 🌔 🌕 🌖 🌗 🌘 🌙 🌚 🌛 🌜 🌝 🌞 🌟
\U01F32x: 🌠 🌡 🌢 🌣 🌤 🌥 🌦 🌧 🌨 🌩 🌪 🌫 🌬 🌭 🌮 🌯
\U01F33x: 🌰 🌱 🌲 🌳 🌴 🌵 🌶 🌷 🌸 🌹 🌺 🌻 🌼 🌽 🌾 🌿
\U01F34x: 🍀 🍁 🍂 🍃 🍄 🍅 🍆 🍇 🍈 🍉 🍊 🍋 🍌 🍍 🍎 🍏
\U01F35x: 🍐 🍑 🍒 🍓 🍔 🍕 🍖 🍗 🍘 🍙 🍚 🍛 🍜 🍝 🍞 🍟
\U01F36x: 🍠 🍡 🍢 🍣 🍤 🍥 🍦 🍧 🍨 🍩 🍪 🍫 🍬 🍭 🍮 🍯
\U01F37x: 🍰 🍱 🍲 🍳 🍴 🍵 🍶 🍷 🍸 🍹 🍺 🍻 🍼 🍽 🍾 🍿
\U01F38x: 🎀 🎁 🎂 🎃 🎄 🎅 🎆 🎇 🎈 🎉 🎊 🎋 🎌 🎍 🎎 🎏
\U01F39x: 🎐 🎑 🎒 🎓 🎔 🎕 🎖 🎗 🎘 🎙 🎚 🎛 🎜 🎝 🎞 🎟
\U01F3Ax: 🎠 🎡 🎢 🎣 🎤 🎥 🎦 🎧 🎨 🎩 🎪 🎫 🎬 🎭 🎮 🎯
\U01F3Bx: 🎰 🎱 🎲 🎳 🎴 🎵 🎶 🎷 🎸 🎹 🎺 🎻 🎼 🎽 🎾 🎿
\U01F3Cx: 🏀 🏁 🏂 🏃 🏄 🏅 🏆 🏇 🏈 🏉 🏊 🏋 🏌 🏍 🏎 🏏
\U01F3Dx: 🏐 🏑 🏒 🏓 🏔 🏕 🏖 🏗 🏘 🏙 🏚 🏛 🏜 🏝 🏞 🏟
\U01F3Ex: 🏠 🏡 🏢 🏣 🏤 🏥 🏦 🏧 🏨 🏩 🏪 🏫 🏬 🏭 🏮 🏯
\U01F3Fx: 🏰 🏱 🏲 🏳 🏴 🏵 🏶 🏷 🏸 🏹 🏺 🏻 🏼 🏽 🏾 🏿
\U01F40x: 🐀 🐁 ��� 🐃 🐄 🐅 🐆 🐇 🐈 🐉 🐊 🐋 🐌 🐍 🐎 🐏
\U01F41x: 🐐 🐑 🐒 🐓 🐔 🐕 🐖 🐗 🐘 🐙 🐚 🐛 🐜 🐝 🐞 🐟
\U01F42x: 🐠 🐡 🐢 🐣 🐤 🐥 🐦 🐧 🐨 🐩 🐪 🐫 🐬 🐭 🐮 🐯
\U01F43x: 🐰 🐱 🐲 🐳 🐴 🐵 🐶 🐷 🐸 🐹 🐺 🐻 🐼 🐽 🐾 🐿
\U01F44x: 👀 👁 👂 👃 👄 👅 👆 👇 👈 👉 👊 👋 👌 👍 👎 👏
\U01F45x: 👐 👑 👒 👓 👔 👕 👖 👗 👘 👙 👚 👛 👜 👝 👞 👟
\U01F46x: 👠 👡 👢 👣 👤 👥 👦 👧 👨 👩 👪 👫 👬 👭 👮 👯
\U01F47x: 👰 👱 👲 👳 👴 👵 👶 👷 👸 👹 👺 👻 👼 👽 👾 👿
\U01F48x: 💀 💁 💂 💃 💄 💅 💆 💇 💈 💉 💊 💋 💌 💍 💎 💏
\U01F49x: 💐 💑 💒 💓 💔 💕 💖 💗 💘 💙 💚 💛 💜 💝 💞 💟
\U01F4Ax: 💠 💡 💢 💣 💤 💥 💦 💧 💨 💩 💪 💫 💬 💭 💮 💯
\U01F4Bx: 💰 💱 💲 💳 💴 💵 💶 💷 💸 💹 💺 💻 💼 💽 💾 💿
\U01F4Cx: 📀 📁 📂 📃 📄 📅 📆 📇 📈 📉 📊 📋 📌 📍 📎 📏
\U01F4Dx: 📐 📑 📒 📓 📔 📕 📖 📗 📘 📙 📚 📛 📜 📝 📞 📟
\U01F4Ex: 📠 📡 📢 📣 📤 📥 📦 📧 📨 📩 📪 📫 📬 📭 📮 📯
\U01F4Fx: 📰 📱 📲 📳 📴 📵 📶 📷 📸 📹 📺 📻 📼 📽 📾 📿
Browsing unicode table
For this, after searching reliable way, I'v finally posted on SuperUser Dumping / browsing full unicode table, my python dumpUnicode
script:
Shortly:
dumpUnicode() {
python3 -c $'from unicodedata import name\nfor i in range(0x10FFFF):\n try:
var = name(chr(i))\n except:\n var = None\n finally:\n if var:
print("\\\\U%06X: \47%s\47 %s" % (i,chr(i),var))'; }
dumpUnicode | grep SMIL.*SUNGLAS\\\|FONDUE
\U01F60E: '😎' SMILING FACE WITH SUNGLASSES
\U01FAD5: '🫕' FONDUE
Or for strictly answering SO request:
dumpUnicode |grep "' SKULL AND CROSSBONES"
\U002620: '☠' SKULL AND CROSSBONES
Converting to ASCII values
There is not 4 digit, but a variable number of bytes:
printf -v skull '%b' \\U2620
LANG=C printf -v skull %q $skull
IFS=\' read -r _ skull _ <<<"$skull"
echo ${skull//\\/\\0}
\0342\0230\0240
echo -e ${skull//\\/\\0}
☠
As a function:
u8toBytes() {
local char
printf -v char %b "$1"
LANG=C printf -v char %q "$char"
IFS=\' read -r _ char _ <<< "$char"
echo ${char//\\/\\0}
echo -e ${char//\\/\\0}
}
u8toBytes \\U2620
\0342\0230\0240
☠
u8toBytes \\UA0
\0302\0240
u8toBytes 😎
\0360\0237\0230\0216
😎
Further
Have a look at Using Unicode specific character in bash
"\x7F"
in a UTF-8 locale (which thebash
tag suggests yours is)... patterns represented by a single byte are never in the range\x80-\xFF
. This range is illegal in singl-byte UTF-8 chars. eg a Unicode Codepoint value ofU+0080
(ie.\x80
) is actually 2 bytes in UTF-8..\xC2\x80
..printf "\\u007C\\u001C"
.gnome-terminal
,echo -e '\ufc'
does not produce a ü, even with character encoding set to UTF-8. However, egurxvt
does print egprintf "\\ub07C\\ub01C"
as expected (not with a � or box).bash
tag such a useful hint? Are different terminals common in CJK or … ?