105

I need to periodically run a command that ensures that some text files are kept in Linux mode. Unfortunately dos2unix always modifies the file, which would mess file's and folder's timestamps and cause unnecessary writes.

The script I write is in Bash, so I'd prefer answers based on Bash.

0

11 Answers 11

100

Note: This should not be used in automation scripts, just for checking quickly. For automation scripts, I'd suggest you look at other answers

Use cat -A

$ cat file
hello
hello

Now if this file was made in *NIX systems, it would display

$ cat -A file
hello$
hello$

But if this file was made in Windows, it would display

$ cat -A file
hello^M$
hello

^M represents CR and $ represents LF.

I used Windows Notepad in Windows and Vim in Linux. Notice that Windows Notepad did not save the last line with CRLF.

This does not change the file contents either.

5
  • 3
    +1 By far the best answer. No dependencies, no complicated bash scripts. Just -A to cat. One tip though would be to use cat -A file | less if the file is too big. I'm sure it's not uncommon to have to check file endings for a particularly long file. (Press q to leave less) Commented Aug 2, 2019 at 20:15
  • 6
    I think cat -ve can be used as well and is more portable.
    – Oskar Skog
    Commented Jan 28, 2020 at 18:52
  • If you use Rust's version of cat (bat), then it's even more clear visually.
    – psygo
    Commented Feb 21, 2021 at 19:19
  • 3
    Thanks, @Oskar, indeed cat -A did not work on macOS but cat -ve was fine.
    – user118967
    Commented Mar 11, 2021 at 7:39
  • If anybody has bat (github.com/sharkdp/bat, handy cat/less replacement to have anyway) then bat -A myfile actually make this very nice to read, by displaying formatted and as applicable Commented Aug 30, 2022 at 7:35
60

You can use dos2unix as a filter and compare its output to the original file:

dos2unix < myfile.txt | cmp - myfile.txt
1
  • 3
    Very smart and useful, because it tests the complete file and not only the first or a few line.
    – halloleo
    Commented May 19, 2015 at 0:04
33

If the goal is just to avoid affecting the timestamp, dos2unix has a -k or --keepdate option which will keep the timestamp the same. It will still have to do a write to make the temporary file and rename it, but your timestamps will not be affected.

If any modification of the file is unacceptable, you can use the following solution from this answer.

find . -not -type d -exec file "{}" ";" | grep CRLF

The file command will output CRLF if it detects such line terminations.

7
  • 1
    Do you mean you literally write CRLF as 4 characters C, R, L and F?
    – bodacydo
    Commented Dec 3, 2015 at 3:19
  • 7
    Do you also mean that grep can take CR and LF just like that?
    – bodacydo
    Commented Dec 3, 2015 at 3:19
  • @bodacydo It's explained in the answer he links to, and now also in Scott's edit of BertS' answer here unix.stackexchange.com/a/79708/59699 . Commented Dec 3, 2015 at 5:14
  • @dave_thompson_085 I dont see explanation. It only mentions CRLF but doesnt explain what it is.
    – bodacydo
    Commented Dec 3, 2015 at 6:05
  • 2
    @bodacydo stackoverflow.com/questions/73833/… says that find ... -exec file ... | grep CRLF for a file with DOS line endings (i.e. bytes 0D 0A) "will get you something like: ./1/dos1.txt: ASCII text, with CRLF line terminators As you can see this contains the actual string CRLF and therefore is matched by grep looking for the simple string CRLF. Commented Dec 4, 2015 at 8:40
30

Since version 7.1, dos2unix has an -i, --info option to get information about line breaks. Since version 7.3.5 (July 2017), the -i option supports a 0 flag to use a null separator. You can use dos2unix itself to test which files need conversion.

For example, assuming GNU xargs:

dos2unix -ic0 -- *.txt | xargs -r0 dos2unix --

would only convert the files that need converting (reported as such by dos2unix -ic).

2
  • Here is link to the changelog itself waterlan.home.xs4all.nl/dos2unix/NEWS.txt Commented Sep 23, 2015 at 14:16
  • 2
    The fields output by that flag have no headers, but they are: number of DOS line breaks, number of Unix line breaks, number of Mac line breaks, byte order mark, text or binary, file name. So if the first column is nonzero then you have some DOS linebreaks.
    – Migwell
    Commented May 1, 2023 at 1:13
28

You could try to grep for CRLF code, octal:

grep -U $'\015' myfile.txt

or hex:

grep -U $'\x0D' myfile.txt
6
  • Of course, the assumption is that this is a text file.
    – mdpc
    Commented Jun 17, 2013 at 17:24
  • 2
    I like this grep usage because it allows me to easily list all such files in the directory with grep -lU $'\x0D' * and pass the output to xargs.
    – Melebius
    Commented Apr 30, 2015 at 6:19
  • whats the meaning of the $ before the search pattern? @don_crissti
    – fersarr
    Commented Oct 31, 2017 at 10:07
  • 1
    @fersarr - unix.stackexchange.com/a/401451/22142 Commented Oct 31, 2017 at 11:03
  • 2
    This only tests for a CR, not necessarily a CR-LF sequence.
    – Jim L.
    Commented Oct 25, 2021 at 20:27
21

First method (grep):

Count the lines that contain a carriage return:

[[ $(grep -c $'\r' myfile.txt) -gt 0 ]] && echo dos

Count the lines that end with a carriage return:

[[ $(grep -c $'\r$' myfile.txt) -gt 0 ]] && echo dos

These will typically be equivalent; a carriage return in the interior of a line (i.e., not at the end) is rare.

More efficient:

grep -q $'\r' myfile.txt && echo dos

This is more efficient

  1. because it doesn't need to convert the count to an ASCII string, and then convert that string back to an integer, and compare it to zero, and
  2. because grep -c needs to read the entire file, to count all the occurrences of the pattern, while grep -q can exit upon seeing the first occurrence of the pattern.

Notes:

  • Throughout the above, you may need to add the -U option (i.e., use -cU or -qU), because GNU grep guesses whether the file is a text file.  If it thinks the file is text, it ignores carriage returns at the ends of lines, in an attempt to make $ in regular expressions work "correctly" — even if the regular expression is \r$!  Specifying -U (or --binary) overrules this guesswork, causing grep to treat the file(s) as binary and pass the data to the matching mechanism verbatim, with CR-endings intact.
  • Do not do grep … $'\r\n' myfile.txt, because grep treats \n as a pattern delimiter.  Just as grep -E 'foo|' looks for lines containing foo or a null string, grep $'\r\n' looks for lines containing \r or a null string, and every line matches a null string.

Second method (file):

[[ $(file myfile.txt) =~ CRLF ]] && echo dos

because file reports something like:

myfile.txt: UTF-8 Unicode text, with CRLF line terminators

Safer variant:

[[ $(file -b - < myfile.txt) =~ CRLF ]] && echo dos

where

  • file -b outputs only the file type, and not the file name.  Without this, a file whose name included the characters CRLF would trigger a false positive.
  • file - < filename works even if filename begins with -See Bash script: check if a file is a text file.

Beware that checking the output from file might not work in a non-English locale.

9
  • 1
    You can replace "$(echo -e '\r')" with the much simpler $'\r', although personally I'd use $'\r\n' to reduce the number of false positives.
    – rici
    Commented Jun 17, 2013 at 17:03
  • @rici grep $'\r\n' seems to match all files on my system...
    – depquid
    Commented Jun 17, 2013 at 17:09
  • @rici: good catch. I edited my answer according to your suggestion. — depquid: Maybe you are on Windows? :-) rici's tip works here.
    – BertS
    Commented Jun 17, 2013 at 17:11
  • @depquid (and BertS): Actually, I think the correct invocation is grep -U $'\r$', to prevent grep trying to second-guess line-endings.
    – rici
    Commented Jun 17, 2013 at 17:18
  • Also, you can use -q to just set the return code if a match is found, instead of -c which requires an additional check. Personally I like your second solution, although it's highly dependent on the whims of file and might not work in a non-English locale.
    – rici
    Commented Jun 17, 2013 at 17:22
9

Use file:

$ file README.md
README.md: ASCII text, with CRLF line terminators

$ dos2unix README.md
dos2unix: converting file README.md to Unix format...

$ file README.md
README.md: ASCII text
2
  • This idea has been discussed much more thoroughly in two previous answers. Commented May 10, 2018 at 15:57
  • You can use the -k switch in file command to check the complete file. Heres the excerpt from help -k, --keep-going don't stop at the first match Commented May 25, 2020 at 11:17
7

a bash function for you:

# return 0 (true) if first line ends in CR
isDosFile() {
    [[ $(head -1 "$1") == *$'\r' ]]  
}

Then you can do stuff like

streamFile () {
    if isDosFile /tmp/foo.txt; then
        sed 's/\r$//' "$1"
    else
        cat "$1"
    fi
}

streamFile /tmp/foo.txt | process_lines_without_CR
2
  • 3
    You don't have to use isDosFile() in your example: streamFile() { sed 's/\r$//' "$1" ; }.
    – user26112
    Commented Jun 17, 2013 at 18:59
  • 1
    I think this is the most elegant solution; it doesn't read whole file, just the first line. Commented Jun 22, 2013 at 20:50
5

If a file has DOS/Windows-style CR-LF line endings, then if you look at it using a Unix-based tool you'll see CR ('\r') characters at the end of each line.

This command:

grep -l '^M$' filename

will print filename if the file contains one or more lines with Windows-style line endings, and will print nothing if it doesn't. Except that the ^M has to be a literal carriage return character, typically entered in the terminal by typing Ctrl+V followed by Enter (or Ctrl+V and then Ctrl+M). The bash shell lets you write a literal carriage return as $'\r' (documented here), so you can write:

grep -l $'\r$' filename

Other shells may provide a similar feature.

You can use another tool instead:

awk '/\r$/ { exit(1) }' filename

This will exit with a status of 1 (setting $? to 1) if the file contains any Windows-style line endings, and with a status of 0 if it doesn't, making it useful in a shell if statement (note the lack of [ brackets ]):

if awk '/\r$/ { exit(1) }' filename ; then
    echo filename has Unix-style line endings
else
    echo filename has at least one Windows-style line ending
fi

A file can contain a mixture of Unix-style and Windows-style line endings. I'm assuming here that you want to detect files that have any Windows-style line endings.

2
  • 1
    You can encode a carriage return on the command line in bash (and some other shells) by typing $'\r', as mentioned in other answers to this question. Commented Dec 2, 2015 at 6:41
  • +1 for insisting that the CR must immediately precede the LF to qualify as a CR-LF. Also, note that grep -q $'\r''$' filename is equally useful in if statements, although the test polarity is reversed from your awk example. if grep -q $'\r''$' filename ; then echo DOS-style; else echo UNIX-style; fi
    – Jim L.
    Commented Oct 25, 2021 at 20:39
3

I have been using

cat -v filename.txt | diff - filename.txt

which seems to work. I find the output a little easier to read than

dos2unix < filename.txt | diff - filename.txt

It is also useful if you can't install dos2unix for some reason.

3

file tells you the line endings, but only if they are not Unix-style:

❯ echo "hello1\nhello2\n" > hello-unix.txt
❯ cp hello-unix.txt hello-dos.txt
❯ cp hello-unix.txt hello-mac.txt
❯ unix2dos hello-dos.txt
unix2dos: converting file hello-dos.txt to DOS format...
❯ unix2mac hello-mac.txt
unix2mac: converting file hello-mac.txt to Mac format...
❯ file hello-unix.txt
hello-unix.txt: ASCII text
❯ file hello-dos.txt
hello-dos.txt: ASCII text, with CRLF line terminators
❯ file hello-mac.txt
hello-mac.txt: ASCII text, with CR line terminators

So:

  • If file reports "CRLF line terminators", the file is DOS-style
  • If file reports "CR line terminators", the file is Mac-style
  • If file doesn't mention line terminators, the file is Unix-style
1
  • I think this is the simplest answer to the original question.
    – PRouleau
    Commented Apr 30, 2022 at 18:10

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .