6
\$\begingroup\$

Write a stripped down version of the wc command. The program can simply open a file which does not need to be configurable or read from stdin until it encounters the end of file. When the program is finished reading, it should print:

  • # bytes read
  • # characters read
  • # lines read
  • max line length
  • # words read

Program must at least work with ascii but bonus points if it works on other encodings.

Shortest code wins.

\$\endgroup\$
4
  • 1
    \$\begingroup\$ criteria 1 & 2 should be the same on ascii. \$\endgroup\$ Commented Oct 9, 2011 at 22:52
  • 2
    \$\begingroup\$ As yet "words" are undefined (see below for one way that is profitable for a terse c implementation). This is the kind of issue that could have been profitably hammered out in the puzzle-lab chat or in the sandbox on meta. \$\endgroup\$ Commented Oct 10, 2011 at 0:00
  • \$\begingroup\$ in this problem, a word is a sequence of characters delimited by spaces. They can even include special characters. Consider them chunks, actually. \$\endgroup\$ Commented Oct 12, 2011 at 1:37
  • 3
    \$\begingroup\$ Some testcases would be fine. \$\endgroup\$ Commented Oct 13, 2011 at 9:37

14 Answers 14

11
\$\begingroup\$

Haskell - 80 characters

l=length
main=interact$ \s->show$map($s)[l,l,l.lines,maximum.map l.lines,l.words]
\$\endgroup\$
4
  • \$\begingroup\$ That's actually 81 chars (not including a final newline) \$\endgroup\$
    – FUZxxl
    Commented Oct 12, 2011 at 22:35
  • \$\begingroup\$ This isn't C or C++ where not ending in a newline results in undefined behavior... or am I wrong, and the Haskell specs mandate it? \$\endgroup\$ Commented Oct 12, 2011 at 23:04
  • 1
    \$\begingroup\$ @trinithis There is no such mandate in c or c++. There were (in the misty depth of time) some antique unix systems that boggled on unterminated final lines: thus the purely human convention of always appending a newline. Line endings (like any other whitespace have no significance to c or c++ except to the preprocessor; examine the grammar and to separate tokens. \$\endgroup\$ Commented Oct 13, 2011 at 15:09
  • 12
    \$\begingroup\$ C++ Standard [2.1.1.2]: If a source file that is not empty does not end in a new-line character, or ends in a new-line character immediately preceded by a backslash character before any such splicing takes place, the behavior is undefined. \$\endgroup\$ Commented Oct 13, 2011 at 16:04
6
\$\begingroup\$

Perl - 71

$c+=$x=y!!!c,$w+=split,$m=$m<$x?$x-1:$m,$l++for<>;say"$c,$c,$l,$m,$w"

I think it can be shortened further but I don't know how.

(edit: used Zaid's suggestion)
(edit 2: changed length to y!!!c)
(edit 3: changed print to say)

\$\endgroup\$
2
  • 1
    \$\begingroup\$ Replace the while with a for to save two chars. \$\endgroup\$
    – Zaid
    Commented Oct 29, 2011 at 16:38
  • \$\begingroup\$ Use say instead of print that is allowed without penalties (meta.codegolf.stackexchange.com/questions/273/…) and save 2 chars. \$\endgroup\$
    – Toto
    Commented Dec 9, 2011 at 14:13
5
\$\begingroup\$

c -- 214

Implementation is almost right out of K&R. Relies on K&R semantics for main (unspecified return type), but I think it otherwise conforms to c89. Output format is specified below (but it assumes that chars == bytes).

#include <stdio.h>
int b,w,l,t,L;main(int c,char**v){FILE*f=fopen(v[1],"r");for(;(c=getc(f))
!=-1;++b){t++;if(c=='\n')l++,L=t>L?t:L,t=0;else if(c==' '||c=='\t')w++;}
L=t>L?t:L;printf("%d (%d) %d %d %d\n",l,L,w,b,b);}

There is some ambiguity in the meaning of "words" as yet. This version defines words as breaking only on '[ \t\n] and does not account for a word that ends with EOF. This will work for files following the old unix convention of always ending with a newline, but break for those that stop hard on EOF.

It does test lines that end of EOF for maximal length.

Un-golfed

#include <stdio.h>
int b, /* bytes */
  w, /* words */
  l, /* lines */
  t, /*this line*/
  L; /* longest line */
main(int c,char**v){
  FILE*f=fopen(v[1],"r");
  for(;(c=getc(f))!=-1;++b){
    t++;
    if(c=='\n') l++, L = t>L ? t : L, t=0;
    else if(c==' '||c=='\t')w++;
  }
  L = t>L ? t : L;
  /* Output format is lines (max line length) words chars bytes */
  printf("%d (%d) %d %d %d\n",l,L,w,b,b);
}

Validation

$ gcc -o wc wordcount_golf.c
$ ./wc wordcount.c
17 (67) 82 410 410
$ ./wc wordcount_golf.c
4 (74) 9 217 217
$ wc wordcount.c
 17  67 410 wordcount.c
$ wc wordcount_golf.c
  4  13 217 wordcount_golf.c
\$\endgroup\$
1
  • \$\begingroup\$ How about 198? I am actually happy that gcc don't refuses to eat's that... \$\endgroup\$
    – FUZxxl
    Commented Oct 12, 2011 at 22:58
3
\$\begingroup\$

ruby (72 characters)

t=readlines;s=t*'';p [s.size]*2+[t.size,t.map(&:size).max,s.split.size]

This solution uses a functional style similar to trinithis's Haskell solution.

I assume ASCII text and space-separated words.

\$\endgroup\$
3
\$\begingroup\$

awk - 58

awk 'x=1+length{w+=NF;if(m<x)m=x;c+=x}END{print NR,w,c,m}'
\$\endgroup\$
1
  • \$\begingroup\$ This deserves more thumbs up, but I really need to see an non-obfuscated version of it. Awk is truly the winner, and then I noticed K what on earth? I stick with awk. \$\endgroup\$
    – Tomachi
    Commented Mar 30, 2021 at 20:58
2
\$\begingroup\$

Groovy, 85 81

a={it.size()};[s=a(t=f.text),s,a(z=t.split"\n"),z*.size().max(),a(t.split(/\s/))]
\$\endgroup\$
2
\$\begingroup\$

Perl - 66 characters

for(<>){$t+=$l=y!!!c,$m=$l-1if$m<$l,$w+=split}say"$. $w $t $m $t"

I managed to save a couple of bytes by letting Perl handle the number of lines, using the $. variable. Other than that, even without looking, I ended up with what Marinus got, for the most part.

Here's the program I used to verify it:

#!/bin/bash

function test1() {
    cat <<EOF
Lorem ipsum dolor sit amet,
consectetur adipiscing elit. Duis eget
neque vel ipsum porta bibendum
dictum in ante. Ut accumsan
magna id nisl bibendum et
tincidunt turpis eleifend. Duis nec
mi hendrerit lorem hendrerit convallis.
Quisque sit amet tincidunt diam.
Sed vehicula velit sed risus
pellentesque vitae auctor nisi semper.
Nulla erat massa, semper sit
amet luctus non, bibendum id
eros. Etiam non lacus odio.
Donec vitae nisl vitae nisi
elementum suscipit. Cras ut mollis
mauris.
EOF
}

function test2() {
    cat <<EOF
Curabitur quis elit turpis. Vestibulum
ut elementum magna. Lorem ipsum
dolor sit amet, consectetur adipiscing
elit. Nam sit amet quam
ante. Nullam in risus est,
quis cursus magna. Vestibulum feugiat
nisl nec velit scelerisque molestie.
Pellentesque habitant morbi tristique senectus
et netus et malesuada fames
ac turpis egestas. Vestibulum feugiat
sem vitae mauris aliquet eget
ullamcorper velit molestie.
EOF
}

function test3() {
    cat <<EOF
Proin diam elit, imperdiet id
gravida et, facilisis nec lectus.
Nullam placerat enim sed nulla
porttitor hendrerit. Praesent eu quam
enim, et commodo orci. Nam
eu purus ut ipsum malesuada
rhoncus vitae ut turpis. Morbi
a risus eu ligula faucibus
tincidunt. Cum sociis natoque penatibus
et magnis dis parturient montes,
nascetur ridiculus mus. Nulla faucibus
vehicula diam at tempus. Aliquam
tristique, erat vel fringilla scelerisque,
magna purus venenatis nulla, at
elementum mi lectus vitae nibh.
Phasellus nibh neque, tempus commodo
pretium eget, gravida quis felis.
Vivamus venenatis tristique volutpat. Nunc
vulputate accumsan magna, sit amet
vehicula orci imperdiet vel. Nam
vitae laoreet purus.
EOF
}

PROG1=wc
PROG2='perl5.14.2 -Mv5.10 wc-golfed.pl'

for((i=1;i<4;i++)); do
    wc=( $(test$i | $PROG1) )
    golf=( $(test$i | $PROG2) )
    for((j=0;j<2;j++)); do
    if [ ! ${wc[$j]} = ${golf[$j]} ]; then
        echo "Test #$i failed"
        echo "     wc: ${wc[@]}"
        echo "   golf: ${golf[@]}"
        continue 2
    fi
    done
    echo "Test #$i passed"
done
\$\endgroup\$
1
\$\begingroup\$

C - 164 chars

This is based on dmckee's code with FUZxxl's improvements, but shortened further and converted to standard C. The code reads from standard input only. Any sequence of one or more consecutive characters with ASCII code >= 33 is counted as a word.

#include<stdio.h>
int b,w,l,t,L,c,s;int main(void){for(;c=getchar()+1;++b)t-=c==11?l++,t>L?L=t:t:-1,s=c>33?w+=!s:0;printf("%d %d %d %d\n",l,t>L?t:L,w,b);return 0;}

Un-golfed

#include <stdio.h>
int b, /* bytes */
     w, /* words */
     l, /* lines */
     t, /* this line */
     L, /* longest line */
     c, /* current character */
     s; /* s==1 if the current word has been counted */
int main(void) {
  for (; c = getchar()+1; ++b)
       t -= (c==11 ? (l++, (t>L ? L=t : t))
                   : -1),
       s = (c>33 ? w += !s
                 : 0);
  printf("%d %d %d %d\n", l, (t>L ? t : L), w, b);
  return 0;
}
\$\endgroup\$
1
  • 1
    \$\begingroup\$ You can remove #include<stdio.h>. It'll compile and run without those 19 chars. \$\endgroup\$
    – Patrick
    Commented Dec 7, 2011 at 20:47
1
\$\begingroup\$

R, 91 characters

Golfed

x=scan(w="c",se="\n");n=nchar(x);sum(n);length(x);max(n);sum(sapply(strsplit(x," "),length))

Ungolfed

x=scan(what="c",sep="\n")
n=nchar(x)
num.char=sum(n)
num.lines=length(x)
max.line.length=max(n)
word.count=sum(sapply(strsplit(x," "),length))

Output

> x=scan(what="c",sep="\n");n=nchar(x);sum(n);length(x);max(n);sum(sapply(strsplit(x," "),length))
1: This is a test
2: of how well
3: the program works!
4: 
Read 3 items
[1] 43
[1] 3
[1] 18
[1] 10

Alternatives

x=scan(w="c",se="\n");n=nchar(x);sum(n);length(x);max(n);length(unlist(gregexpr("\\w+",x)))

Saves a character at the expense of not really matching everything.

\$\endgroup\$
3
  • \$\begingroup\$ A little shorter: x=scan(,"",se="\n");n=nchar(x);sum(n);l=length;l(x);max(n);sum(sapply(strsplit(x," "),l)) \$\endgroup\$
    – Tommy
    Commented Dec 7, 2011 at 18:55
  • \$\begingroup\$ @Tommy Yup. And I believe that's actually 91 characters if you count the \ and the n separately :-O \$\endgroup\$ Commented Dec 8, 2011 at 3:15
  • \$\begingroup\$ We can define variable in argument + define length() to save some characters - 87 now. sum(n<-nchar(x<-scan(,"",se="\n")));(l=length)(x);max(n);sum(sapply(strsplit(x," "),l)) \$\endgroup\$
    – Vlo
    Commented Aug 15, 2014 at 19:52
0
\$\begingroup\$

Javscript (runs in Rhino) - 213 chars

The java file i/o calls take up a lot of space. But I didn't realize you call java so easily from javascript and thought that was pretty neat:

var r=new BufferedReader(new FileReader(f));var s=true;var n=0,l=0,m=0,w=0;while(s){s=r.readLine();if(s){n+=s.length();l+=1;m=m<s.length()?s.length():m;w+=s.split("\\s+").length;}}print(n+','+n+','+l+','+m+','+w);

Ungolfed:

        var reader = new BufferedReader( new FileReader(filename) );
        var s = true;
        var n=0,l=0,m=0,w=0;

        while (s) {
            s = reader.readLine();
            if (s) {
                n+=s.length(); 
                l+=1;
                m = m<s.length()?s.length():m;
                w += s.split("\\s+").length;
            }
        }
        System.out.println('bytes/chars: '+n+', Lines: '+l+', Max Line: '+m+', Words: '+w);
\$\endgroup\$
0
\$\begingroup\$

Python 160

Your file name is assigned to variable q.

g=len
z=open(q,'r')
w=z.read()
z.close()
b,e,f=g(w),g(w.replace('\n',' ').split()),w.split('\n')
d,c,a=g(max(f)),g(f),g(open(q,'rb').read())
for i in 'abcde':
     print eval(i)

This prints the 5 requirements in order:

bytes read
characters read
lines read
max line length
words read

(TyrantWaves suggestion implemented)

\$\endgroup\$
5
  • \$\begingroup\$ one problem with this, is that it will think the\nbear is one word, while it should be two. you could do w.split(w.replace('\n', ' ')) \$\endgroup\$
    – Blazer
    Commented Jan 8, 2012 at 6:34
  • \$\begingroup\$ Could also do l=len to save 3 more characters. (5 len's = 15, whilst \nl=len\n + 5 l's = 12) \$\endgroup\$
    – TyrantWave
    Commented Jan 9, 2012 at 14:05
  • \$\begingroup\$ Thanks, taken into consideration! (I didn't even know you could assign a function to a variable like that! :O Living in the dark ages) \$\endgroup\$
    – Anti Earth
    Commented Jan 11, 2012 at 2:52
  • \$\begingroup\$ You could laso get rid of close and the read/write mode. do z=open(q).read() \$\endgroup\$ Commented May 10, 2012 at 4:56
  • \$\begingroup\$ Of course, but what terrible practise! ;) Does open(q) open the file in binary read? \$\endgroup\$
    – Anti Earth
    Commented May 11, 2012 at 11:15
0
\$\begingroup\$

K, 40

{a:0:x;(#a;+/b;max b:#:'a;+/#:'" "\:'a)}

Prints linecount, char count, max line length, word count.

Words are space-separated.

\$\endgroup\$
0
0
\$\begingroup\$

Python, 98

l=len
f=open(q).read()
j=f.split("\n")
for i in (l(f),l(f),l(j),max(j,key=l),l(f.split())):print i

98 is including newlines. Do newlines count?

\$\endgroup\$
2
  • \$\begingroup\$ It fails on Windows for \r\n newlines (invalid bytes count). q is not defined (OP allows a fixed name such as 'q' or use sys.stdin). It expects only ascii (invalid character count). There is missing l() call around max() (it prints line instead of its length) \$\endgroup\$
    – jfs
    Commented Nov 9, 2013 at 18:14
  • \$\begingroup\$ Open in binary mode \$\endgroup\$ Commented Feb 26, 2016 at 1:34
0
\$\begingroup\$

PHP, 163

<?$f=file($argv[1]);$l=0;foreach($f as$i)$l=$l<strlen($i)?strlen($i):$l;$n=count($f);$c=strlen($g=implode($f));$w=count(explode(' ',$g))+$n-1;echo"$c $c $n $l $w";
\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.