I would like to transform a Java String str into byte[] b with the following characteristics:

  • b is a valid C string (ie it has b.length = str.length() + 1 and b[str.length()] == 0.
  • the characters in b are obtained by converting the characters in str to 8-bit ASCII characters.

What is the most efficient way to do this — preferably an existing library function? Sadly, str.getBytes("ISO-8859-1") doesn't meet my first requirement...

2 Answers 2

// do this once to setup
CharsetEncoder enc = Charset.forName("ISO-8859-1").newEncoder();

// for each string
int len = str.length();
byte b[] = new byte[len + 1];
ByteBuffer bbuf = ByteBuffer.wrap(b);
enc.encode(CharBuffer.wrap(str), bbuf, true);
// you might want to ensure that bbuf.position() == len
b[len] = 0;

This requires allocating a couple of wrapper objects, but does not copy the string characters twice.

  • Note this won't work if converting to UTF-8. The number of bytes returned from enc.encode() might not equal UTF-16 string length.
    – richb
    Commented Apr 22, 2016 at 2:40
  • @richb You're right, but the question is specifically restricted to the ISO-8859-1 encoding. UTF-8 is a variable-size encoding and would require precalculation or pessimistic allocation (CharseEncoder.maxBytesPerChar()). You'd probably be better off giving up on the single-allocation goal and just using CharsetEncoder.encode(CharBufffer). Commented Apr 23, 2016 at 16:49

You can use str.getBytes("ISO-8859-1") with a little trick at the end:

byte[] stringBytes=str.getBytes("ISO-8859-1");
byte[] ntBytes=new byte[stringBytes.length+1];
System.arraycopy(stringBytes, 0, ntBytes, 0, stringBytes.length);

arraycopy is relatively fast as it can use native tricks and optimizations in many cases. The new array is filled with null bytes everywhere we didn't overwrite it(basically just the last byte).

ntBytes is the array you need.

