Grabbing the first [x] characters for a string from a pipe

Question

If I have really long output from a command (single line) but I know I only want the first [x] (let's say 8) characters of the output, what's the easiest way to get that? There aren't any delimiters.

Steven D · Accepted Answer · 2010-10-23 23:20:53Z

174

One way is to use cut:

 command | cut -c1-8

This will give you the first 8 characters of each line of output. Since cut is part of POSIX, it is likely to be on most Unices.

answered Oct 23, 2010 at 23:20

Steven D

46.5k14 gold badges120 silver badges116 bronze badges

8

Note that cut -c selects characters; cut -b or head -c selects bytes. This makes a difference in some locales (in practice, when using UTF-8).
– Gilles 'SO- stop being evil'
Commented Oct 24, 2010 at 22:07
1

You also don't have to specify the start index in this case. Saying cut -c-8 will select from character 1 to 8.
– Sparhawk
Commented May 9, 2014 at 5:08
@Steven, cut's equivalent on Windows is?
– Pacerier
Commented Aug 25, 2015 at 13:06
Also command | dd bs=8 count=1 2>/dev/null. Not saying it's shorter or superior. Just another alternative.
– dubiousjim
Commented Sep 24, 2015 at 3:50
1

@Gilles, but note that with current versions of GNU cut, cut -c works like cut -b (that is, it doesn't work properly for multi-byte characters).
– Stéphane Chazelas
Commented Aug 9, 2016 at 13:49

Add a comment |

user1606 · Accepted Answer · 2010-10-24 04:34:28Z

52

These are some other ways to get only first 8 characters.

command | head -c8

command | awk '{print substr($0,1,8);exit}' 

command | sed 's/^\(........\).*/\1/;q'

And if you have bash

var=$(command)
echo ${var:0:8}

answered Oct 24, 2010 at 4:34

user1606

9495 silver badges3 bronze badges

3

I think the following sed formulation is a bit easier to read: command | sed 's/$.\{8\}$.*/\1/' or if your sed supports it: command | sed -r 's/(.{8}).*/\1/'; Otherwise, +1
– Steven D
Commented Oct 24, 2010 at 4:48
2

Good stuff, but note that head -c counts bytes, not characters. Similarly, among the major Awk implementations, only GNU awk handles multi-byte characters correctly - FreeBSD Awk and Mawk do not.
– mklement0
Commented Jul 5, 2015 at 17:30

Add a comment |

Prabhat Kumar Singh · Accepted Answer · 2023-09-14 08:52:46Z

16

Another one liner solution by using Shell parameter expansion

echo ${word:0:x}

EG: word="Hello world"
echo ${word:0:3} or echo ${word::3} 
o/p: Hel


EG.2: word="Hello world"
echo ${word:1:3}
o/p: ell

edited Sep 14, 2023 at 8:52

answered Nov 15, 2018 at 8:40

Prabhat Kumar Singh

3423 silver badges5 bronze badges

1

You can also use a variable holding the length, e.g.: x=8; echo ${word:0:$x} instead of hard-coding the integer.
– Cometsong
Commented Apr 25, 2019 at 14:58
worth noting this will not be possible in ksh88, only 93
– access_granted
Commented Apr 29, 2020 at 1:45
1

@Cometsong Testing with the Bash shell that came with "Git for Windows", it looks like you don't need to prefix x with the $ sign in this case: x=8; echo ${word:0:x} will work the same.
– AJM
Commented Mar 26, 2021 at 11:10
The source you're linking to does not describe the construct you used.
– zrajm
Commented Mar 12, 2023 at 19:44
@zrajm Corrected the link.
– Prabhat Kumar Singh
Commented Sep 14, 2023 at 8:53

Add a comment |

dubiousjim · Accepted Answer · 2012-09-08 14:02:59Z

4

If you have a sufficiently advanced shell (for example, the following will work in Bash, not sure about dash), you can do:

read -n8 -d$'\0' -r <(command)

After executing read ... <(command), your characters will be in the shell variable REPLY. Type help read to learn about other options.

Explanation: the -n8 argument to read says that we want up to 8 characters. The -d$'\0' says read until a null, rather than to a newline. This way the read will continue for 8 characters even if one of the earlier characters is a newline (but not if its a null). An alternative to -n8 -d$'\0' is to use -N8, which reads for exactly 8 characters or until the stdin reaches EOF. No delimiter is honored. That probably fits your needs better, but I don't know offhand how many shells have a read that honors -N as opposed to honoring -n and -d. Continuing with the explanation: -r says ignore \-escapes, so that, for example, we treat \\ as two characters, rather than as a single \.

Finally, we do read ... <(command) rather than command | read ... because in the second form, the read is executed in a subshell which is then immediately exited, losing the information you just read.

Another option is to do all your processing inside the subshell. For example:

$ echo abcdefghijklm | { read -n8 -d$'\0' -r; printf "REPLY=<%s>\n" "$REPLY"; }
REPLY=<abcdefgh>

answered Sep 8, 2012 at 14:02

dubiousjim

2,71820 silver badges27 bronze badges

1

If you just want to output the 8 chars, and don't need to process them in the shell, then just use cut.
– dubiousjim
Commented Sep 8, 2012 at 14:04
Good to know about read -n <num>; small caveat: Bash 3.x (still current on OS) mistakenly interprets <num> as a byte count and thus fails with multi-byte characters; this has been fixed in Bash 4.x.
– mklement0
Commented Jul 6, 2015 at 1:41
This is a great and useful answer. Much more general than the others.
– not2qubit
Commented Oct 25, 2019 at 10:08
On my git bash, I have the "-N" flag, which reads exactly N chars until EOF or timeout. Isn't that what you try to achieve your "d" flag ?
– Itération 122442
Commented May 3, 2022 at 8:16
@Itération122442 yes but as I wrote "I don't know offhand how many shells have a read that honors -N as opposed to honoring -n and -d."
– dubiousjim
Commented May 4, 2022 at 9:43

| Show 2 more comments

Community · Accepted Answer · 2017-05-23 12:39:56Z

2

This is portable:

a="$(command)"             # Get the output of the command.
b="????"                   # as many ? as characters are needed.
echo ${a%"${a#${b}}"}      # select that many chars from $a

To build a string of variable length of characters has its own question here.

edited May 23, 2017 at 12:39

CommunityBot

1

answered Aug 23, 2015 at 7:12

user79743

Add a comment |

Krzysztof Jabłoński · Accepted Answer · 2017-01-05 17:28:30Z

2

I had this problem when manually generating checksum files in maven repository. Unfortunately cut -c always prints out a newline at the end of output. To suppress that I use xxd:

command | xxd -l$BYTES | xxd -r

It outputs exactly $BYTES bytes, unless the command's output is shorter, then exactly that output.

answered Jan 5, 2017 at 17:28

Krzysztof Jabłoński

1436 bronze badges

another method to take off cut's trailing newline is to pip it into: | tr -d '\n'
– Cometsong
Commented Apr 25, 2019 at 15:00

Add a comment |

Ciro Santilli OurBigBook.com · Accepted Answer · 2023-10-05 07:23:24Z

How to consider Unicode + UTF-8

Let's do a quick test for those interested in Unicode characters rather than just bytes. Each character of áéíóú (acute accented vowels) is made up of two bytes in UTF-8. With:

printf 'áéíóú' | LC_CTYPE=en_US.UTF-8 awk '{print substr($0,1,3);exit}'
printf 'áéíóú' | LC_CTYPE=C awk '{print substr($0,1,3);exit}'
printf 'áéíóú' | LC_CTYPE=en_US.UTF-8 head -c3
echo
printf 'áéíóú' | LC_CTYPE=C head -c3
echo

we get:

áéí
á
á
á

so we see that only awk + LC_CTYPE=en_US.UTF-8 considered the UTF-8 characters. The other approaches took only three bytes. We can confirm that with:

printf 'áéíóú' | LC_CTYPE=C head -c3 | hd

which gives:

00000000  c3 a1 c3                                          |...|
00000003

and the c3 by itself is trash, and does not show up on the terminal, so we saw only á.

awk + LC_CTYPE=en_US.UTF-8 actually returns 6 bytes however.

We could also have equivalently tested with:

printf '\xc3\xa1\xc3\xa9\xc3\xad\xc3\xb3\xc3\xba' | LC_CTYPE=en_US.UTF-8 awk '{print substr($0,1,3);exit}'

and if you want a general parameter:

n=3
printf 'áéíóú' | LC_CTYPE=en_US.UTF-8 awk "{print substr(\$0,1,$n);exit}"

Question more specific about Unicode + UTF-8: https://superuser.com/questions/450303/unix-tool-to-output-first-n-characters-in-an-utf-8-encoded-file

Related: https://stackoverflow.com/questions/1405611/how-to-extract-the-first-two-characters-of-a-string-in-shell-scripting

Tested on Ubuntu 21.04.

Stéphane Chazelas · Accepted Answer · 2023-10-05 09:50:48Z

With zsh, you can do:

cmd | read -u0 -k4 -e

read will read as many bytes as needed to read 4 characters (k was initially for key, but with -u specifying a file descriptor it reads characters from there instead of key presses from the terminal) and echo (-e) those characters on stdout. You can change -e to a variable name to read those bytes into a variable.

ksh93 later (in ksh93o in 2003, while zsh's -k is from the 90s) added a -N options as an equivalent of zsh's -k which was later copied by bash (though with some differences, see below). It doesn't have an equivalent for -e though.

cmd | read -N4 var

Contrary to zsh, ksh93 cannot store NUL characters in its variables, and more generally, it will fail in random ways if there are NULs in its input.

Now, besides -N, ksh93 also has a -n x option which reads up to x characters from a line, and the record delimiter can be changed with -d, and with recent versions of ksh93u+m, -d '' is for NUL-delimited records.

So:

cmd | read -d '' -n 4

there fails in a less random way if the input contains NUL characters: it just stops at the first NUL.

Now, bash copied all of -n, -N, -d (including -d '') from ksh93 but with important differences:

it still does backslash processing when -n/-N are specified, so you need -r to work around it as usual.
it still does IFS processing, which you need to work around by calling it as IFS= read... as usual
it skips all NUL characters in its input
by default, the last component of a pipeline also runs in a subshell, which you can work around by using a redirection to a process substitution.

So in bash, you'd do:

IFS= read -rN4 var < <(cmd)

To read the first non-NUL characters of the output of cmd or (if cmd is cat /dev/zero, it will never return). And:

IFS= read -d '' -rn4 var < <(cmd)

To read 4 characters up to the first NUL.

Stack Exchange Network

Grabbing the first [x] characters for a string from a pipe

8 Answers 8

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
command-line
shell
text-processing
.

Linked

Hot Network Questions

Grabbing the first [x] characters for a string from a pipe

8 Answers 8

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged command-lineshelltext-processing.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
command-line
shell
text-processing
.