54

How can I count the number of characters in the compiled version of a Latex file? This should include spaces and all of the document (index, footnotes, bibliography etc)

5
  • 4
    Have you tried the suggests listed at Is there any way to do a correct word count of a LaTeX document?
    – Werner
    Commented Mar 18, 2012 at 15:00
  • 3
    How should maths formulae be counted? E.g., \(\sqrt{\sin(x)}\)? Commented Mar 18, 2012 at 16:13
  • I actually don't have any formulae in my document, but I guess every character would have to be counted.
    – Bob
    Commented Mar 18, 2012 at 19:57
  • The word count question delivered no answers which met my criteria, unfortunately.
    – Bob
    Commented Mar 18, 2012 at 20:02
  • I am pretty sure that the wordcount solution in the cited question (the fourth) will deliver what you're looking for.
    – krlmlr
    Commented Mar 21, 2012 at 22:09

6 Answers 6

35
+50

This is probably as much as you can get

pdftotext document.pdf -enc UTF-8 - | wc -m

For DVI files one can use

catdvi -e UTF-8 -s document.dvi | wc -m

(Thanks to Bob for having pointed to the -enc option and to catdvi.)

8
  • 1
    Thanks for your answer. This lead me to the idea to use catdvi. Using catdvi -s document.dvi | wc -m it gives me some good results. pdftotext has some problems reproducing special chars.
    – Bob
    Commented Mar 22, 2012 at 21:48
  • Ciao @egreg: where exactly is this command pdftotext... to be typed? Commented Jul 2, 2012 at 8:23
  • pdftotext is a program coming with xpdf; how to invoke it depends on the operating system: on Unix systems it's called from the command line.
    – egreg
    Commented Jul 2, 2012 at 8:29
  • @egreg:Grazie, I have windows XP, can you tell me please whether it applies in this case as well? And is xpdf to be installed via \usepackage? Commented Jul 2, 2012 at 8:31
  • @AbhimanyuArora I can only point to this link
    – egreg
    Commented Jul 2, 2012 at 8:37
14

How does detex file.tex | wc -C work for you? detex removes all the tex macros, and wc -C returns the number of characters remaining. This should be a good enough proxy for characters in the output file given that there's no maths.

This obviously won't count things like running headers or other automatically generated text. For that, I guess you'd need to parse the .dvi as Bruno Le Floch suggested in comments.

1
  • 1
    I think the right option is a lowercase 'c', i.e. detex file.tex | wc -c, otherwise that works fine for me.
    – twsh
    Commented Aug 28, 2012 at 19:49
9

A completely different approach would be to use the stdpage package. It creates 'standard pages' of 30 lines with 60 characters each (of course you can change this to different values). This approach results from the time, when people where using typewriters to write the manuscripts they hand in to their publisher. Some publishers still ask for standard pages today and pay per standard page.

The stdpage package allows you to switch between ragged and justified lines and you can turn on/off hyphenation and linenumbers. In the best case, the usage is as simple as adding

\usepackage[linenumbers,lines=30,chars=50,noindent]{stdpage}

to your preamble. As this package changes the linespacing and fonts, you will have to adapt the rest of your preamble (I had to remove a couple of packages). I personally hand in two pdfs: one with standard pages and the second one with the same text but using a nicer font, hyphenation, microtype and so on.

5

If you are using winedt then you can count everything. Following is what I got from a count from a file that included \(\sqrt{\sin(x)}\) and white spaces:

Monday, March 19, 2012 at 14:43

C:\Users\test\minmimal.tex
Mode: TeX

Size:    973  bytes

Words:   25      [10%]
Numbers: 12      [1%]

Spaces:        35         [4%] %<--------------------spaces here
Alpha Chars:   100        [10%]
Numeric Chars: 14         [1%]

Lines:              43
Empty Lines:        1
Max Line Length:    93
Average  Length:    21
Environments:       3

Paragraphs:         43         [41%]
"\" Commands:       37         [52%]
"%" Comments:       6          [19%]

LaTeX Math \(?\):   1          [0%]. %<--------------------Math here.

Winedt is a shareware though.

2
  • 1
    That counts chars in the tex file, not the compiled pdf. I wonder if the dvi could be parsed more easily perhaps? Commented Mar 19, 2012 at 18:11
  • Thanks for your answer. I'll give it a try, but actually I'm more a Linux user than a Windows user ;-)
    – Bob
    Commented Mar 20, 2012 at 8:32
4

If you have the .tex file, you can use TeXcount which gives you the number of characters with the -char option:

texcount -char mydoc.tex
1
  • 1
    From texcount documentation, this doesn't count spaces, so don't use this. -char, -character, -letters Counts letters/characters instead of words. Note that spaces and punctuation is not counted.
    – EdwardAndo
    Commented Jan 5, 2022 at 18:14
0

An easy way to do this would be to select the entire text from the pdf document (ctl+a, ctl+c) and copy this to a txt file(ctl+v). Then you can use wc or other tools on the plain txt file

1
  • 5
    Have you tried this? It will probably fail with hyphenated words and ligatures will be counted wrong. Commented Dec 11, 2012 at 7:34

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .