1

I'm trying to get the character that precede each occurrence of given character/pattern in a string using standard bash tools as grep, awk/gawk, sed ...

Step I: get the character that precede each occurrence of the character :

Example:

String 1 => :hd:fg:kl:

String 2 => :df:lkjh:

String 3 => :glki:l:s:d:

Expected results

Result 1 => dgl

Result 2 => fh

Result 3 => ilsd

I tried many times with awk but without success

Step II: Insert a given character between each character of the resulting string

Example with /

Result 1 => d/g/l

Result 2 => f/h

Result 3 => i/l/s/d

I have an awk expression for this step awk -F '' -v OFS="/" '{$1=$1;print}'

I don't know if it is possible to do Step I with awk or sed and why not do Step I and Step II in once.

Kind Regards

4
  • You might find this question helpful stackoverflow.com/questions/2777579/…
    – Leonard
    Commented Jul 4, 2018 at 1:01
  • You should include a case of back-to-back colons in your sample input/output (e.g. foo::bar) if it can occur as that could be hard to handle depending on your requirements for doing so. Is the output o or o: or something else? If it cannot happen then add a statement to your question saying so.
    – Ed Morton
    Commented Jul 4, 2018 at 11:34
  • @RavinderSingh13 I apologize for my late answer because I was offline
    – moocan
    Commented Jul 7, 2018 at 1:12
  • 1
    @Ed Morton, that can not happen in my case ... but it's a very good advice
    – moocan
    Commented Jul 7, 2018 at 1:14

8 Answers 8

1

What about:

awk 'BEGIN{FS=":"}{for(i=1;i<NF;i++){if(i>2)printf"/";printf substr($i,length($i))}print""}' input.txt

input.txt:

:hd:fg:kl:
:df:lkjh:
:glki:l:s:d:

Output:

d/g/l
f/h
i/l/s/d
0
1

Solution 1st: Could you please try following and let me know if this helps you.

awk -F":" '
{
  for(i=1;i<=NF;i++){
    if($i){ val=(val?val:"")substr($i,length($i)) }
  }
  print val;
  val=""
}' Input_file

Output will be as follows.

dgl
fh
ilsd

Solution 2nd: With a / in between output strings.

awk '
BEGIN{
  OFS="/";
  FS=":"
}
{
  for(i=1;i<=NF;i++){
    if($i){
      val=(val?val OFS:"")substr($i,length($i))
    }}
  print val;
  val=""
}' Input_file

Output will be as follows.

d/g/l
f/h
i/l/s/d

Solution 3rd: With match utility of awk.

awk '
{
  while(match($0,/[a-zA-Z]:/)){
    val=(val?val:"")substr($0,RSTART,RLENGTH-1)
    $0=substr($0,RSTART+RLENGTH)
   }
  print val
  val=""
}'  Input_file
2
  • 1
    In my question, I always finished my examples with ":" which is a mistake from me because it can also end with any letter. In the case of a pattern such as ":hfd:l:jh:m", the output is "dlhm" for your first solution and "d/l/h/m" for the second solution. Your third solution works well because the output is "dlh".
    – moocan
    Commented Jul 7, 2018 at 1:48
  • @moocan, sue thanks for letting me know, please try to keep question's samples as per your requirement only because solutions will be given as per your samples, cheers and happy learning. Commented Jul 7, 2018 at 2:19
0

This might work for you (GNU sed):

sed -r 's/[^:]*([^:]):+|:+/\1/g;s/\B/\//g' file

Replace zero or more non :'s followed by a single character followed by a : or a lone : by the single character globally throughout the line. Then replace insert a / between each character.

1
  • in the case of a pattern such as ":hfd:l:jh:m", the output is "d/l/ h/m". In my question, I always finished my examples with ":" which is a mistake from me because it can also end with any letter
    – moocan
    Commented Jul 7, 2018 at 1:32
0

Perl and negative lookahead:

$ perl -p -e 's/.(?!:)//g' file
dgl
fh
ilsd
0

This is easier to do with perl

$ cat ip.txt
:hd:fg:kl:
:df:lkjh:
:glki:l:s:d:

$ perl -lne 'print join "/", /.(?=:)/g' ip.txt
d/g/l
f/h
i/l/s/d
  • /.(?=:)/g get all characters preceding :
  • the resulting matches are then printed using / as delimiter string
1
  • works very well with all my test pattern even if the pattern is not ending with ":" but with any letter. Thanks
    – moocan
    Commented Jul 7, 2018 at 2:01
0

With all sed with ERE

sed -E 's#[^:]*(.):#\1/#g;s/^.|.$//g' infile
0

Using GNU sed:

sed -E 's/[^:]*([^:]):/\1/g; s/([^:])/\/\1/g; s/^:\///'

The first command, s/[^:]*([^:]):/\1/g matches strips out the extra characters and the colons (except the first one), so yields this:

:dgl
:fh
:ilsd

The second command s/([^:])/\/\1/g inserts a / before each character, yielding:

:/d/g/l
:/f/h
:/i/l/s/d

The last command s/^:\/// simply removes the :/ from the beginning of each line:

d/g/l
f/h
i/l/s/d
0

You could iterate across each line starting at the second character with gawk. Everytime the iterator hits a colon print the previous character.

$ awk <file.txt '{for(i=2;i<=length($0);i++) { \
                    if (substr($0,i,1)==":") printf substr($0,i-1,1);} printf "\n";}'
dgl
fh
ilsd

Not the answer you're looking for? Browse other questions tagged or ask your own question.