0

How can I replace two newline characters \n\n with a null \0 character in bash, and vice-versa?

I see that td can replace a single newline with null, but not two consecutive newline characters. I tried sed, but it doesn't seem possible. sed seems "line based" and doesn't like messing with newline characters.

My reason for wanting to do this is to be able sort a file that has records separated by empty lines using sort -z. For example, given a file

record b
foo bar

record a
zee bee
dee da

I want to pipe that file into a transform that replaces blank lines with null, then into sort -z, and then replace null with blank lines, and finally have it spit out

record a
zee bee
dee da

record b
foo bar

2 Answers 2

0

Preliminary note

The title asks about your attempted solution; you described the real problem in the question body. Compare XY problem. This answer doesn't exactly "replace empty lines with null and vice-versa", so it doesn't solve the title. It injects null bytes in the right places, then removes them (because I think it's easier), this way it solves the body.


Solution

Let's adjust this other answer of mine: Sort packs of lines alphabetically. Your question is almost a duplicate of the one I answered.

There, each record starts with a header like [ProfileX]. In your case we can say an empty line is such header, but the very first record misses it. To use the linked solution we need to add the first header beforehand, and to remove the first header at the end.

sed -e '1 s/^/\n/' -e '1 ! s/^$/\x00/' | sort -z | tr -d '\0' | sed -e '1 d'
#   ^^^^^^^^^^^^^^                                                           add missing header
#                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                apply the other answer (adjusted)
#                                                               ^^^^^^^^^^^^ remove excessive header

Without this modification the first record from the input would appear in the output without any empty line before it; this would be a problem, unless it's the first record in the output. And the first record in the output would have its header (if any) before it; this would be a problem unless it was the first record of the input (the one without a header).

3
  • Thanks! For me, sed -e 's/^$/\x00 replaces empty lines with the character sequence "x00" instead of a null character. Do not all versions of sed understand that character escape sequence? Commented Oct 23, 2020 at 18:34
  • @ChristopherKing Not all versions. This answer, like the other one, was tested on Debian. I have GNU sed there. Commented Oct 23, 2020 at 18:39
  • I tested on OSX so I have BSD sed and apparently it doesn't support hex escape sequences. riptutorial.com/sed/topic/9436/… So looks like my problem is that I can't even express a null character on OSX! I have a crazy solution that uses base64 to encoding the record onto a single line prefixed with the plain text I'm sorting on. I guess I'll just have to go with that.. Commented Oct 23, 2020 at 20:15
1

You don't need to struggle with 0 as a delimiter. Let's use 255 (hex FF) instead:

#!/bin/bash

ORIGINAL=/path/to/yourfile.txt
SORTED=${ORIGINAL}.sorted
FF=$'\xff'

while read LINE ;do
  if [ "$LINE" = "" ] ;then
    echo
  else 
    echo -n $LINE$FF
  fi
done <$ORIGINAL | sort | tr $FF '\n' >$SORTED

Result:

record a
zee bee
dee da

record b
foo bar

Notice: The above assumes that your line endings are Linux (LF), not Windows (CR+LF) or Mac (CR). If you want it to work with any kind of line endings, then we'll have to rework the script a little.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .