Is there a way to get a word count of natural language words in Markdown (or better, Pandoc Markdown), via the command line? It's possible to just use wc to get a very rough estimate, but wc is naive, and counts anything surrounded by white space as a word. This includes things like header formatting, bullet points, and URLs in links.

What would be ideal would be to remove all markdown formatting, (including Pandoc citations, if possible), and then pass that through wc, but I can't find a way to do that, as the pandoc plaintext output format still includes a lot of markdown styling.

  • 1
    You could try "rendering" the Markdown document as plain text and run wc on the resulting file -- something like this: stackoverflow.com/questions/761824/…
    – user319088
    Commented May 27, 2014 at 13:27
  • @CongMa: that doesn't work properly (see my last sentence). But it probably is the closest I'm going to get at the moment. And I guess it's not that far out, really.
    – naught101
    Commented May 29, 2014 at 6:19

3 Answers 3


There is a new lua filter for that: https://pandoc.org/lua-filters.html#counting-words-in-a-document

Save the following code as wordcount.lua

-- counts words in a document

words = 0

wordcount = {
  Str = function(el)
    -- we don't count a word if it's entirely punctuation:
    if el.text:match("%P") then
        words = words + 1

  Code = function(el)
    _,n = el.text:gsub("%S+","")
    words = words + n

  CodeBlock = function(el)
    _,n = el.text:gsub("%S+","")
    words = words + n

function Pandoc(el)
    -- skip metadata, just count body:
    pandoc.walk_block(pandoc.Div(el.blocks), wordcount)
    print(words .. " words in body")

and call pandoc like this:

pandoc --lua-filter wordcount.lua myfile.md

A somewhat manual solution:

  1. use pandoc to convert the markdown file to a MS Word document (*.docx) or OpenOffice/LibreOffice Writer document (*.odt)
  2. open that document in LibreOffice1
  3. select everything (ctrl+a)
  4. Menu Tools>Word Count

1 OpenOffice would probably work the same, but I haven't tested that.


I was facing the same challenge, and I have written a Python script for it. It removes special characters and Markdown/HTML elements and counts the remaining words!

