Use Awk to extract substring

Question

Given a hostname in format of aaa0.bbb.ccc, I want to extract the first substring before ., that is, aaa0 in this case. I use following awk script to do so,

echo aaa0.bbb.ccc | awk '{if (match($0, /\./)) {print substr($0, 0, RSTART - 1)}}'

While the script running on one machine A produces aaa0, running on machine B produces only aaa, without 0 in the end. Both machine runs Ubuntu/Linaro, but A runs newer version of awk(gawk with version 3.1.8 while B with older awk (mawk with version 1.2)

I am asking in general, how to write a compatible awk script that performs the same functionality ...

Chris Seymour · Accepted Answer · 2013-04-16 15:09:19Z

82

You just want to set the field separator as . using the -F option and print the first field:

$ echo aaa0.bbb.ccc | awk -F'.' '{print $1}'
aaa0

Same thing but using cut:

$ echo aaa0.bbb.ccc | cut -d'.' -f1
aaa0

Or with sed:

$ echo aaa0.bbb.ccc | sed 's/[.].*//'
aaa0

Even grep:

$ echo aaa0.bbb.ccc | grep -o '^[^.]*'
aaa0

answered Apr 16, 2013 at 15:09

Chris Seymour

85k31 gold badges164 silver badges206 bronze badges

how would you get bbb with grep?
– red888
Commented May 21, 2019 at 23:00
in the sed case, you can just escape the . i.e. echo aaa0.bbb.ccc | sed 's/\..*//'
– Alexander Cska
Commented Nov 5, 2021 at 15:04

Add a comment |

Kent · Accepted Answer · 2013-04-16 15:43:47Z

I am asking in general, how to write a compatible awk script that performs the same functionality ...

To solve the problem in your quesiton is easy. (check others' answer).

If you want to write an awk script, which portable to any awk implementations and versions (gawk/nawk/mawk...) it is really hard, even if with --posix (gawk)

for example:

some awk works on string in terms of characters, some with bytes
some supports \x escape, some not
FS interpreter works differently
keywords/reserved words abbreviation restriction
some operator restriction e.g. **
even same awk impl. (gawk for example), the version 4.0 and 3.x have difference too.
the implementation of certain functions are also different. (your problem is one example, see below)

well all the points above are just spoken in general. Back to your problem, you problem is only related to fundamental feature of awk. awk '{print $x}' the line like that will work all awks.

There are two reasons why your awk line behaves differently on gawk and mawk:

your used substr() function wrongly. this is the main cause. you have substr($0, 0, RSTART - 1) the 0 should be 1, no matter which awk do you use. awk array, string idx etc are 1-based.
gawk and mawk implemented substr() differently.

perreal · Accepted Answer · 2013-04-16 15:11:04Z

5

Or just use cut:

echo aaa0.bbb.ccc | cut -d'.' -f1

answered Apr 16, 2013 at 15:11

perreal

96.8k23 gold badges157 silver badges186 bronze badges

Add a comment |

anishsane · Accepted Answer · 2013-04-16 15:12:20Z

2

You don't need awk for this...

echo aaa0.bbb.ccc | cut -d. -f1
cut -d. -f1 <<< aaa0.bbb.ccc

echo aaa0.bbb.ccc | { IFS=. read a _ ; echo $a ; }
{ IFS=. read a _ ; echo $a ; } <<< aaa0.bbb.ccc 

x=aaa0.bbb.ccc; echo ${x/.*/}

Heavier options:

sed:
echo aaa0.bbb.ccc | sed 's/\..*//'
sed 's/\..*//' <<< aaa0.bbb.ccc 
awk:
echo aaa0.bbb.ccc | awk -F. '{print $1}'
awk -F. '{print $1}' <<< aaa0.bbb.ccc

answered Apr 16, 2013 at 15:12

anishsane

20.7k5 gold badges42 silver badges75 bronze badges

Add a comment |

choroba · Accepted Answer · 2013-11-23 15:35:52Z

2

You do not need any external command at all, just use Parameter Expansion in bash:

hostname=aaa0.bbb.ccc
echo ${hostname%%.*}

edited Nov 23, 2013 at 15:35

answered Apr 16, 2013 at 15:21

choroba

238k26 gold badges214 silver badges298 bronze badges

Add a comment |

atti · Accepted Answer · 2022-07-08 00:16:46Z

1

if you don't want to change the input field separator, then it's possible to use split function:

echo "some aaa0.bbb.ccc text" | awk '{split($2, a, "."); print a[1]}'

documentation:

split(string, array [, fieldsep [, seps ] ])
    Divide string into pieces separated by fieldsep 
    and store the pieces in array and the separator 
    strings in the seps array.

answered Jul 8, 2022 at 0:16

atti

1,7991 gold badge12 silver badges11 bronze badges

Add a comment |

RARE Kpop Manifesto · Accepted Answer · 2022-07-08 04:28:41Z

0

awk is still the cleanest approach :

mawk NF=1 FS='[.]' <<< aaa0.bbb.ccc

aaa0

If there's stuff before or after :

mawk ++NF FS='[.].+$|^[^ ]* ' OFS= <<< 'some aaa0.bbb.ccc text'
mawk '$!NF=$2' FS='[ .]'           <<< 'some aaa0.bbb.ccc text'

aaa0

answered Jul 8, 2022 at 4:28

RARE Kpop Manifesto

2,7204 silver badges13 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Use Awk to extract substring

7 Answers 7

Not the answer you're looking for? Browse other questions tagged
bash
awk
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

Not the answer you're looking for? Browse other questions tagged bashawk or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
bash
awk
or ask your own question.