3

I would like to transform a Java String str into byte[] b with the following characteristics:

  • b is a valid C string (ie it has b.length = str.length() + 1 and b[str.length()] == 0.
  • the characters in b are obtained by converting the characters in str to 8-bit ASCII characters.

What is the most efficient way to do this — preferably an existing library function? Sadly, str.getBytes("ISO-8859-1") doesn't meet my first requirement...

6
  • 4
    What have you tried so far, that is not efficient enough for you? You have to show some code.
    – Bruno Reis
    Commented Jul 19, 2013 at 2:32
  • 1
    I'm asking whether there's a library function. See above "preferably an existing library function." That implies I'm looking for ... a library function. I'm sorry that you couldn't be bothered to read the whole question and instead got stuck on the phrase you so carefully italicized.
    – 0xbe5077ed
    Commented Jul 19, 2013 at 4:07
  • Actually, I did read the whole question. The thing is, as you ask, your question reads: "Hello, I have to do some work, but I don't want to do it. Will you do it for me?". You don't seem to be looking for "the most efficient", or for a specific library function; instead, you seem to be using that phrase just to try to hide from others and from yourself that you didn't had anything to show, that you didn't try anything. Finally, since you very, very new to StackOverflow, maybe you simply didn't know that you should do some work before asking, and I'm kindly providing you with that information
    – Bruno Reis
    Commented Jul 19, 2013 at 12:00
  • 1
    Funny, then, that Nova was able to provide an excellent answer without any snark or downvotes. In fact, I did do "some" work, and I wasn't trying to "hide" anything. It's just that "some" people on this site are just looking for an excuse to be aggressive, churlish, and smarter-than-thou. Would my question have "had less to hide" if I had posted a for loop that truncates the high-order byte and said "golly, there has to be a better way"? If after my search I didn't locate any other way besides that and str.getBytes(), which I mentioned, how do I represent NO IDEA in code for you?
    – 0xbe5077ed
    Commented Jul 19, 2013 at 15:54
  • It certainly would help you better ask your question, or, at least, better reason about what you are asking. You specifically said you are looking for the most efficient way to do this, maybe with a library function. What is efficient for you? If you did have a solution (as you say in your comment), why wasn't this solution efficient, or good enough? What were the problems with it? Now, for the answers, how are they better than what you have? Did you measure? Are you sure the answer you marked as accepted does indeed give you the most efficient solution? Did you consider JNI?
    – Bruno Reis
    Commented Jul 19, 2013 at 16:35

2 Answers 2

11
// do this once to setup
CharsetEncoder enc = Charset.forName("ISO-8859-1").newEncoder();

// for each string
int len = str.length();
byte b[] = new byte[len + 1];
ByteBuffer bbuf = ByteBuffer.wrap(b);
enc.encode(CharBuffer.wrap(str), bbuf, true);
// you might want to ensure that bbuf.position() == len
b[len] = 0;

This requires allocating a couple of wrapper objects, but does not copy the string characters twice.

2
  • Note this won't work if converting to UTF-8. The number of bytes returned from enc.encode() might not equal UTF-16 string length.
    – richb
    Commented Apr 22, 2016 at 2:40
  • @richb You're right, but the question is specifically restricted to the ISO-8859-1 encoding. UTF-8 is a variable-size encoding and would require precalculation or pessimistic allocation (CharseEncoder.maxBytesPerChar()). You'd probably be better off giving up on the single-allocation goal and just using CharsetEncoder.encode(CharBufffer). Commented Apr 23, 2016 at 16:49
7

You can use str.getBytes("ISO-8859-1") with a little trick at the end:

byte[] stringBytes=str.getBytes("ISO-8859-1");
byte[] ntBytes=new byte[stringBytes.length+1];
System.arraycopy(stringBytes, 0, ntBytes, 0, stringBytes.length);

arraycopy is relatively fast as it can use native tricks and optimizations in many cases. The new array is filled with null bytes everywhere we didn't overwrite it(basically just the last byte).

ntBytes is the array you need.

Not the answer you're looking for? Browse other questions tagged or ask your own question.