42

Given a hostname in format of aaa0.bbb.ccc, I want to extract the first substring before ., that is, aaa0 in this case. I use following awk script to do so,

echo aaa0.bbb.ccc | awk '{if (match($0, /\./)) {print substr($0, 0, RSTART - 1)}}'

While the script running on one machine A produces aaa0, running on machine B produces only aaa, without 0 in the end. Both machine runs Ubuntu/Linaro, but A runs newer version of awk(gawk with version 3.1.8 while B with older awk (mawk with version 1.2)

I am asking in general, how to write a compatible awk script that performs the same functionality ...

7 Answers 7

82

You just want to set the field separator as . using the -F option and print the first field:

$ echo aaa0.bbb.ccc | awk -F'.' '{print $1}'
aaa0

Same thing but using cut:

$ echo aaa0.bbb.ccc | cut -d'.' -f1
aaa0

Or with sed:

$ echo aaa0.bbb.ccc | sed 's/[.].*//'
aaa0

Even grep:

$ echo aaa0.bbb.ccc | grep -o '^[^.]*'
aaa0
2
  • how would you get bbb with grep?
    – red888
    Commented May 21, 2019 at 23:00
  • in the sed case, you can just escape the . i.e. echo aaa0.bbb.ccc | sed 's/\..*//' Commented Nov 5, 2021 at 15:04
7

I am asking in general, how to write a compatible awk script that performs the same functionality ...

To solve the problem in your quesiton is easy. (check others' answer).

If you want to write an awk script, which portable to any awk implementations and versions (gawk/nawk/mawk...) it is really hard, even if with --posix (gawk)

for example:

  • some awk works on string in terms of characters, some with bytes
  • some supports \x escape, some not
  • FS interpreter works differently
  • keywords/reserved words abbreviation restriction
  • some operator restriction e.g. **
  • even same awk impl. (gawk for example), the version 4.0 and 3.x have difference too.
  • the implementation of certain functions are also different. (your problem is one example, see below)

well all the points above are just spoken in general. Back to your problem, you problem is only related to fundamental feature of awk. awk '{print $x}' the line like that will work all awks.

There are two reasons why your awk line behaves differently on gawk and mawk:

  • your used substr() function wrongly. this is the main cause. you have substr($0, 0, RSTART - 1) the 0 should be 1, no matter which awk do you use. awk array, string idx etc are 1-based.

  • gawk and mawk implemented substr() differently.

5

Or just use cut:

echo aaa0.bbb.ccc | cut -d'.' -f1
2

You don't need awk for this...

echo aaa0.bbb.ccc | cut -d. -f1
cut -d. -f1 <<< aaa0.bbb.ccc

echo aaa0.bbb.ccc | { IFS=. read a _ ; echo $a ; }
{ IFS=. read a _ ; echo $a ; } <<< aaa0.bbb.ccc 

x=aaa0.bbb.ccc; echo ${x/.*/}

Heavier options:

sed:
echo aaa0.bbb.ccc | sed 's/\..*//'
sed 's/\..*//' <<< aaa0.bbb.ccc 
awk:
echo aaa0.bbb.ccc | awk -F. '{print $1}'
awk -F. '{print $1}' <<< aaa0.bbb.ccc 
0
2

You do not need any external command at all, just use Parameter Expansion in bash:

hostname=aaa0.bbb.ccc
echo ${hostname%%.*}
1

if you don't want to change the input field separator, then it's possible to use split function:

echo "some aaa0.bbb.ccc text" | awk '{split($2, a, "."); print a[1]}'

documentation:

split(string, array [, fieldsep [, seps ] ])
    Divide string into pieces separated by fieldsep 
    and store the pieces in array and the separator 
    strings in the seps array.
0

awk is still the cleanest approach :

mawk NF=1 FS='[.]' <<< aaa0.bbb.ccc
aaa0

If there's stuff before or after :

mawk ++NF FS='[.].+$|^[^ ]* ' OFS= <<< 'some aaa0.bbb.ccc text'
mawk '$!NF=$2' FS='[ .]'           <<< 'some aaa0.bbb.ccc text'
aaa0

Not the answer you're looking for? Browse other questions tagged or ask your own question.