0

The top answer to this question demonstrates that cut can be used with tr to cut based on repeated spaces with

< file tr -s ' ' | cut -d ' ' -f 8

I want to get the remotes of several Git repos in a directory and am attempting to extract the remote URL fields from each with the following:

ls | xargs -I{} git -C {} remote -vv | sed -n 'p;n' | tr -s " " | cut -d ' ' -f1

However, this results in (for example) the following output, where I can see that two consecutive spaces (Unicode code point 32) are retained:

origin  https://github.com/jik876/hifi-gan.git
origin  https://github.com/NVIDIA/NeMo.git
origin  https://github.com/NVIDIA/tacotron2.git

(I have also using xargs with tr)

The desired output is a list of URLs, like:

https://github.com/jik876/hifi-gan.git
https://github.com/NVIDIA/NeMo.git
https://github.com/NVIDIA/tacotron2.git

What am I missing here?

1
  • 2
    Piping ls into xargs is very dangerous, and the output of ls shouldn't be used for anything except displaying file/directory listings for a human to read. See Why not parse ls (and what to do instead)?
    – cas
    Commented Aug 22, 2023 at 12:09

2 Answers 2

5

That's a tab, not two spaces.

You can get the same output safer with a shell loop iterating over the subdirectories in the current working directory that has a .git directory, then cut the first space-delimited field (to remove the (fetch) and (push) labels at the end that git adds) and then pass that through uniq to only show a single line for each remote+URL:

for r in ./*/.git/; do
    git -C "$r" remote -v
done | cut -f 1 -d ' ' | uniq | cut -f 2

The final cut -f 2 isolates the URLs by returning the 2nd tab-delimited field.

Taking into account that awk treats tabs and spaces the same (unless you use a specific separator character or pattern), we can replace the trailing pipeline with a single invocation of awk:

for r in ./*/.git/; do
    git -C "$r" remote -v
done | awk '!seen[$2]++ { print $2 }'
0
1

Instead of messing around with bash and tr and cut and xargs and all the whitespace and word-splitting and glob issues that bash brings, you could use a language more suited to the job...and one that includes a library module for interacting with git repos. For example, perl with the Git::Raw module.

Here's a very simple one-liner example (although it's worth noting that Git::Raw is capable of much more than this and you'd probably be better off writing a stand-alone perl script that uses the module):

I've added newlines and indentation for readability. It works as-is, or all squished up on one line.

$ perl -MGit::Raw -l -e '
  foreach my $d (@ARGV) {
    $d =~ s/\.git$//;
    next unless -d "$d/.git";
    my $repo = Git::Raw::Repository->open($d);

    print $d;
    foreach my $r ($repo->remotes()) {
     print $r->url
    };
    print "";
  }' */.git

In English, that's:

  1. Open each repo listed on the command line, ignoring directories that aren't actually git repos,
  2. iterate over the repo's remotes and print their URLs.

Sample output. I ran the one-liner above in a directory that I sometimes use to clone various repos from github. I wrote the one-liner to print the directory name before the list of remotes belonging to that repo, and a blank line after each directory processed.

mgetty
https://github.com/Distrotech/mgetty.git

roxterm
https://github.com/realh/roxterm.git

zpool-iostat-viz
https://github.com/chadmiller/zpool-iostat-viz.git

Note: Git::Raw is not included with perl, it needs to be installed with cpan or via a distro package (e.g. on Debian etc, apt-get install libgit-raw-perl. Other distros probably have it too). The module is a perl wrapper around libgit2 so installing it manually with CPAN will require gcc and the libgit2 development library & headers installed.

Also worth noting: even without Git::Raw parsing the output of git with perl (or almost any other language) is going to be a lot easier and a lot less error-prone than doing it in bash. Perl in particular is designed for string matching and manipulation, so doing what you're trying to do in bash is trivial in perl.


BTW, if you prefer python, you may want to look at GitPython instead. On Debian etc, you can install it with apt-get install python3-git, and it's probably packaged for other distros too. This one doesn't use libgit2, it's a wrapper around the git command, similar to what you're trying to do in bash.

I somehow missed this yesterday, but the libgit2 web site says that pygit2 does for python what Git::Raw does for perl (i.e. it uses the C libgit2 library - so will be faster than forking git whenever needed, and avoids the risk of output-parsing problems). Debian package is python3-pygit2, and it's probably packaged for other distros too.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .