1057

How would I count the total number of lines present in all the files in a git repository?

git ls-files gives me a list of files tracked by git.

I'm looking for a command to cat all those files. Something like

git ls-files | [cat all these files] | wc -l

18 Answers 18

1615

xargs will let you cat all the files together before passing them to wc, like you asked:

git ls-files | xargs cat | wc -l

But skipping the intermediate cat gives you more information and is probably better:

git ls-files | xargs wc -l
28
  • 16
    I guess trivial; How about include only source code files (eg *.cpp). We have some bin files committed :)
    – Daniel
    Commented Sep 5, 2012 at 14:25
  • 63
    Stick grep cpp | in there before the xargs, then.
    – Carl Norum
    Commented Sep 5, 2012 at 15:18
  • 48
    Use git ls-files -z | xargs -0 wc -l if you have files with spaces in the name.
    – mpontillo
    Commented Nov 19, 2013 at 4:33
  • 57
    For including/excluding certain files use: git ls-files | grep -P ".*(hpp|cpp)" | xargs wc -l where the grep part is any perl regex you want!
    – Gabriel
    Commented Nov 19, 2014 at 14:41
  • 41
    If you were interested in just .java files you can use git ls-files | grep "\.java$" | xargs wc -l
    – dseibert
    Commented Dec 9, 2014 at 15:27
470

If you want this count because you want to get an idea of the project’s scope, you may prefer the output of CLOC (“Count Lines of Code”), which gives you a breakdown of significant and insignificant lines of code by language.

cloc $(git ls-files)

(This line is equivalent to git ls-files | xargs cloc. It uses sh’s $() command substitution feature.)

Sample output:

      20 text files.
      20 unique files.                              
       6 files ignored.

http://cloc.sourceforge.net v 1.62  T=0.22 s (62.5 files/s, 2771.2 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Javascript                       2             13            111            309
JSON                             3              0              0             58
HTML                             2              7             12             50
Handlebars                       2              0              0             37
CoffeeScript                     4              1              4             12
SASS                             1              1              1              5
-------------------------------------------------------------------------------
SUM:                            14             22            128            471
-------------------------------------------------------------------------------

You will have to install CLOC first. You can probably install cloc with your package manager – for example, brew install cloc with Homebrew.

cloc $(git ls-files) is often an improvement over cloc .. For example, the above sample output with git ls-files reports 471 lines of code. For the same project, cloc . reports a whopping 456,279 lines (and takes six minutes to run), because it searches the dependencies in the Git-ignored node_modules folder.

12
  • 4
    CLOC ignores some languages, such as TypeScript. Commented Oct 2, 2015 at 14:31
  • 11
    @MarceloCamargo at this moment TypeScript is supported
    – Alex
    Commented Jun 9, 2016 at 9:39
  • 50
    You can just use cloc --vcs git these days, which avoids some edge cases with badly named files (or too many of them).
    – seanf
    Commented Jan 24, 2017 at 3:08
  • 2
    does this leaks the code. i meant the github credentials and all
    – Madhu Nair
    Commented Feb 18, 2019 at 12:19
  • 6
    @MadhuNair Of course not. cloc counts lines of files in a local directory, without ever accessing the network. It doesn’t even know whether the code came from GitHub or not. Commented Feb 18, 2019 at 22:06
428
git diff --stat 4b825dc642cb6eb9a060e54bf8d69288fbee4904

This shows the differences from the empty tree to your current working tree. Which happens to count all lines in your current working tree.

To get the numbers in your current working tree, do this:

git diff --shortstat `git hash-object -t tree /dev/null`

It will give you a string like 1770 files changed, 166776 insertions(+).

14
  • 47
    BTW, you can get that hash by running git hash-object -t tree /dev/null.
    – ephemient
    Commented Jan 27, 2011 at 23:00
  • 87
    And even more succinct: git diff --stat `git hash-object -t tree /dev/null`
    – rpetrich
    Commented Jul 8, 2012 at 21:40
  • 11
    This is the better soloution since this does not count binary files like archives or images which are counted in the version above!
    – BrainStone
    Commented Jul 20, 2013 at 22:02
  • 33
    +1 I like this solution better as binaries don't get counted. Also we are really just interested in the last line of the git diff output: git diff --stat `git hash-object -t tree /dev/null` | tail -1 Commented Oct 16, 2013 at 20:07
  • 36
    instead use git diff --shortstat `git hash-object -t tree /dev/null` to get the last line, tail isnt needed.
    – Jim Wolff
    Commented Oct 16, 2014 at 11:38
79

The best solution, to me anyway, is buried in the comments of @ephemient's answer. I am just pulling it up here so that it doesn't go unnoticed. The credit for this should go to @FRoZeN (and @ephemient).

git diff --shortstat `git hash-object -t tree /dev/null`

returns the total of files and lines in the working directory of a repo, without any additional noise. As a bonus, only the source code is counted - binary files are excluded from the tally.

The command above works on Linux and OS X. The cross-platform version of it is

git diff --shortstat 4b825dc642cb6eb9a060e54bf8d69288fbee4904

That works on Windows, too.

For the record, the options for excluding blank lines,

  • -w/--ignore-all-space,
  • -b/--ignore-space-change,
  • --ignore-blank-lines,
  • --ignore-space-at-eol

don't have any effect when used with --shortstat. Blank lines are counted.

2
  • 1
    git mktree </dev/null or true|git mktree or git mktree <&- or :|git mktree for the keystroke-counters among us :-) - a spare empty tree floating around the repo isn't going to hurt anything.
    – jthill
    Commented Mar 12, 2015 at 16:38
  • 6
    For people wondering what is that hash out of the blue : stackoverflow.com/questions/9765453/…
    – Tejas Kale
    Commented Jul 13, 2017 at 12:03
75

I've encountered batching problems with git ls-files | xargs wc -l when dealing with large numbers of files, where the line counts will get chunked out into multiple total lines.

Taking a tip from question Why does the wc utility generate multiple lines with "total"?, I've found the following command to bypass the issue:

wc -l $(git ls-files)

Or if you want to only examine some files, e.g. code:

wc -l $(git ls-files | grep '.*\.cs')

8
  • This is great but it seems to fail for paths which contain white spaces. Is there a way to solve that?
    – Lea Hayes
    Commented Jun 8, 2014 at 22:48
  • 1
    Had trouble with grep '.*\.m' picking up binary files like .mp3, .mp4. Had more success with using the find command to list code files wc -l $(git ls-files | find *.m *.h) Commented Oct 13, 2014 at 21:04
  • 3
    @LeaHayes this is one way: wc -l --files0-from=<(git ls-files -z). The <(COMMAND) syntax returns the name of a file whose contents are the result of COMMAND.
    – buck
    Commented Nov 21, 2014 at 2:59
  • 1
    @LeaHayes I came up with this script which I think would work for you: ``` #!/bin/bash results=$(git ls-files | xargs -d '\n' wc -l) let grand_total=0 for x in $(echo "$results" | egrep '[[:digit:]]+ total$'); do let grand_total+=$(echo "$x" | awk '{print $1}') done echo "${results}" echo "grand total: ${grand_total}" ```
    – buck
    Commented Nov 23, 2014 at 0:54
  • 1
    the -n switch with xargs can be used to increase the maximum number of lines within a chunk
    – Anthony
    Commented Dec 29, 2014 at 12:48
31

This works as of cloc 1.68:

cloc --vcs=git

2
  • 1
    --vcs didn't work for me, maybe it was removed. cloc . while at the git repo did work, OTOH.
    – acdcjunior
    Commented Jul 10, 2019 at 9:29
  • 1
    --vcs=git worked for me on version v1.90 =) But yes I ran it at the root, it's just an option to tell cloc what it can ignore Commented Aug 10, 2021 at 10:38
26

I use the following:

git grep ^ | wc -l

This searches all files versioned by git for the regex ^, which represents the beginning of a line, so this command gives the total number of lines!

2
  • This is concise and doesn't require any new software, and gives a fast count of textual lines (which is all the question really asks for). But it isn't a precise measure of executable code. It counts blank lines and comment lines, which are ignored by most of the purpose-built tools. (As an experiment I ran this on a small repo of utility code. git grep method: 5322; sloccount: 2942; cloc: 3251) Commented Oct 12, 2022 at 20:38
  • @PaulBissex very true! Total lines is often what I want, but I've seen others modify this to git grep . | wc -l to only match lines containing at least one character Commented May 2, 2023 at 17:41
15

I was playing around with cmder (http://gooseberrycreative.com/cmder/) and I wanted to count the lines of html,css,java and javascript. While some of the answers above worked, or pattern in grep didn't - I found here (https://unix.stackexchange.com/questions/37313/how-do-i-grep-for-multiple-patterns) that I had to escape it

So this is what I use now:

git ls-files | grep "\(.html\|.css\|.js\|.java\)$" | xargs wc -l

1
  • 3
    This seemed to respond with chunks for me. Using your grep in combination with Justin Aquadro's solution resulted well for me. wc -l $(git ls-files | grep "\(.html\|.css\|.js\|.php\|.json\|.sh\)$")
    – PeterM
    Commented Sep 16, 2016 at 16:21
5

I did this:

git ls-files | xargs file | grep "ASCII" | cut -d : -f 1 | xargs wc -l

this works if you count all text files in the repository as the files of interest. If some are considered documentation, etc, an exclusion filter can be added.

5

Try:

find . -type f -name '*.*' -exec wc -l {} + 

on the directory/directories in question

5

If you want to get the number of lines from a certain author, try the following code:

git ls-files "*.java" | xargs -I{} git blame {} | grep ${your_name} | wc -l
3

This tool on github https://github.com/flosse/sloc can give the output in more descriptive way. It will Create stats of your source code:

  • physical lines
  • lines of code (source)
  • lines with comments
  • single-line comments
  • lines with block comments
  • lines mixed up with source and comments
  • empty lines
0
3

Depending on whether or not you want to include binary files, there are two solutions.

  1. git grep --cached -al '' | xargs -P 4 cat | wc -l
  2. git grep --cached -Il '' | xargs -P 4 cat | wc -l

    "xargs -P 4" means it can read the files using four parallel processes. This can be really helpful if you are scanning very large repositories. Depending on capacity of the machine you may increase number of processes.

    -a, process binary files as text (Include Binary)
    -l '', show only filenames instead of matching lines (Scan only non empty files)
    -I, don't match patterns in binary files (Exclude Binary)
    --cached, search in index instead of in the work tree (Include uncommitted files)

3

If you want to find the total number of non-empty lines, you could use AWK:

git ls-files | xargs cat | awk '/\S/{x++} END{print "Total number of non-empty lines:", x}'

This uses regex to count the lines containing a non-whitespace character.

3

The answer by Carl Norum assumes there are no files with spaces, one of the characters of IFS with the others being tab and newline. The solution would be to terminate the line with a NULL byte.

 git ls-files -z | xargs -0 cat | wc -l
2
: | git mktree | git diff --shortstat --stdin

Or:

git ls-tree @ | sed '1i\\' | git mktree --batch | xargs | git diff-tree --shortstat --stdin
1

From a Windows11 terminal:

wsl.exe /bin/bash -c "git ls-files .| xargs wc -mwl"

Where the . is your git repository

Output:

Lines count | Word count | Character count

0

Per AlDanial, the cloc maintainer, this is the proper way to use git ls-files with cloc to reduce any issues with filenames:

git ls-files | cloc --list-file -

Not the answer you're looking for? Browse other questions tagged or ask your own question.