The expected approach of String.truncate(usize)
fails because it doesn't consider Unicode characters (which is baffling considering Rust treats strings as Unicode).
let mut s = "ボルテックス".to_string();
s.truncate(4);
thread '' panicked at 'assertion failed: self.is_char_boundary(new_len)'
Additionally, truncate
modifies the original string, which is not always desired.
The best I've come up with is to convert to char
s and collect into a String
.
fn truncate(s: String, max_width: usize) -> String {
s.chars().take(max_width).collect()
}
e.g.
fn main() {
assert_eq!(truncate("ボルテックス".to_string(), 0), "");
assert_eq!(truncate("ボルテックス".to_string(), 4), "ボルテッ");
assert_eq!(truncate("ボルテックス".to_string(), 100), "ボルテックス");
assert_eq!(truncate("hello".to_string(), 4), "hell");
}
However this feels very heavy handed.
char
(which corresponds to code points) as unit and not grapheme clusters?char
s as possible without going over N bytes). While this does not match people's perception of character counts, it is reasonable when the restriction is storage-motivated (e.g., the size of a database column).