Implement wc (word count) shortest code wins [duplicate]

Question

Write a stripped down version of the wc command. The program can simply open a file which does not need to be configurable or read from stdin until it encounters the end of file. When the program is finished reading, it should print:

# bytes read
# characters read
# lines read
max line length
# words read

Program must at least work with ascii but bonus points if it works on other encodings.

Shortest code wins.

As yet "words" are undefined (see below for one way that is profitable for a terse c implementation). This is the kind of issue that could have been profitably hammered out in the puzzle-lab chat or in the sandbox on meta. — dmckee --- ex-moderator kitten, Commented Oct 10, 2011 at 0:00
in this problem, a word is a sequence of characters delimited by spaces. They can even include special characters. Consider them chunks, actually. — Thomas Dignan, Commented Oct 12, 2011 at 1:37

Community · Accepted Answer · 2020-06-17 09:04:33Z

11

Haskell - 80 characters

l=length
main=interact$ \s->show$map($s)[l,l,l.lines,maximum.map l.lines,l.words]

edited Jun 17, 2020 at 9:04

CommunityBot

1

answered Oct 10, 2011 at 20:16

Thomas Eding

8006 silver badges7 bronze badges

\$\begingroup\$ That's actually 81 chars (not including a final newline) \$\endgroup\$
– FUZxxl
Commented Oct 12, 2011 at 22:35
\$\begingroup\$ This isn't C or C++ where not ending in a newline results in undefined behavior... or am I wrong, and the Haskell specs mandate it? \$\endgroup\$
– Thomas Eding
Commented Oct 12, 2011 at 23:04
1

\$\begingroup\$ @trinithis There is no such mandate in c or c++. There were (in the misty depth of time) some antique unix systems that boggled on unterminated final lines: thus the purely human convention of always appending a newline. Line endings (like any other whitespace have no significance to c or c++ except to the preprocessor; examine the grammar and to separate tokens. \$\endgroup\$
– dmckee --- ex-moderator kitten
Commented Oct 13, 2011 at 15:09
12

\$\begingroup\$ C++ Standard [2.1.1.2]: If a source file that is not empty does not end in a new-line character, or ends in a new-line character immediately preceded by a backslash character before any such splicing takes place, the behavior is undefined. \$\endgroup\$
– Thomas Eding
Commented Oct 13, 2011 at 16:04

Add a comment |

marinus · Accepted Answer · 2012-01-01 15:51:41Z

6

Perl - 71

$c+=$x=y!!!c,$w+=split,$m=$m<$x?$x-1:$m,$l++for<>;say"$c,$c,$l,$m,$w"

I think it can be shortened further but I don't know how.

(edit: used Zaid's suggestion)
(edit 2: changed length to y!!!c)
(edit 3: changed print to say)

edited Jan 1, 2012 at 15:51

answered Oct 24, 2011 at 17:40

marinus

31.2k7 gold badges71 silver badges112 bronze badges

1

\$\begingroup\$ Replace the while with a for to save two chars. \$\endgroup\$
– Zaid
Commented Oct 29, 2011 at 16:38
\$\begingroup\$ Use say instead of print that is allowed without penalties (meta.codegolf.stackexchange.com/questions/273/…) and save 2 chars. \$\endgroup\$
– Toto
Commented Dec 9, 2011 at 14:13

Add a comment |

dmckee --- ex-moderator kitten · Accepted Answer · 2011-10-09 23:58:22Z

c -- 214

Implementation is almost right out of K&R. Relies on K&R semantics for main (unspecified return type), but I think it otherwise conforms to c89. Output format is specified below (but it assumes that chars == bytes).

#include <stdio.h>
int b,w,l,t,L;main(int c,char**v){FILE*f=fopen(v[1],"r");for(;(c=getc(f))
!=-1;++b){t++;if(c=='\n')l++,L=t>L?t:L,t=0;else if(c==' '||c=='\t')w++;}
L=t>L?t:L;printf("%d (%d) %d %d %d\n",l,L,w,b,b);}

There is some ambiguity in the meaning of "words" as yet. This version defines words as breaking only on '[ \t\n] and does not account for a word that ends with EOF. This will work for files following the old unix convention of always ending with a newline, but break for those that stop hard on EOF.

It does test lines that end of EOF for maximal length.

Un-golfed

#include <stdio.h>
int b, /* bytes */
  w, /* words */
  l, /* lines */
  t, /*this line*/
  L; /* longest line */
main(int c,char**v){
  FILE*f=fopen(v[1],"r");
  for(;(c=getc(f))!=-1;++b){
    t++;
    if(c=='\n') l++, L = t>L ? t : L, t=0;
    else if(c==' '||c=='\t')w++;
  }
  L = t>L ? t : L;
  /* Output format is lines (max line length) words chars bytes */
  printf("%d (%d) %d %d %d\n",l,L,w,b,b);
}

Validation

$ gcc -o wc wordcount_golf.c
$ ./wc wordcount.c
17 (67) 82 410 410
$ ./wc wordcount_golf.c
4 (74) 9 217 217
$ wc wordcount.c
 17  67 410 wordcount.c
$ wc wordcount_golf.c
  4  13 217 wordcount_golf.c

How about 198? I am actually happy that gcc don't refuses to eat's that... — FUZxxl, Commented Oct 12, 2011 at 22:58

mking · Accepted Answer · 2011-10-29 07:54:32Z

3

ruby (72 characters)

t=readlines;s=t*'';p [s.size]*2+[t.size,t.map(&:size).max,s.split.size]

This solution uses a functional style similar to trinithis's Haskell solution.

I assume ASCII text and space-separated words.

answered Oct 29, 2011 at 7:54

mking

311 bronze badge

Add a comment |

kev · Accepted Answer · 2012-01-08 06:14:29Z

3

awk - 58

awk 'x=1+length{w+=NF;if(m<x)m=x;c+=x}END{print NR,w,c,m}'

edited Jan 8, 2012 at 6:14

answered Jan 8, 2012 at 6:04

kev

1314 bronze badges

\$\begingroup\$ This deserves more thumbs up, but I really need to see an non-obfuscated version of it. Awk is truly the winner, and then I noticed K what on earth? I stick with awk. \$\endgroup\$
– Tomachi
Commented Mar 30, 2021 at 20:58

Add a comment |

Armand · Accepted Answer · 2011-12-07 19:08:15Z

2

Groovy, 85 81

a={it.size()};[s=a(t=f.text),s,a(z=t.split"\n"),z*.size().max(),a(t.split(/\s/))]

edited Dec 7, 2011 at 19:08

answered Dec 5, 2011 at 16:53

Armand

5192 silver badges7 bronze badges

Add a comment |

Catherine · Accepted Answer · 2012-01-09 06:29:58Z

Perl - 66 characters

for(<>){$t+=$l=y!!!c,$m=$l-1if$m<$l,$w+=split}say"$. $w $t $m $t"

I managed to save a couple of bytes by letting Perl handle the number of lines, using the $. variable. Other than that, even without looking, I ended up with what Marinus got, for the most part.

Here's the program I used to verify it:

#!/bin/bash

function test1() {
    cat <<EOF
Lorem ipsum dolor sit amet,
consectetur adipiscing elit. Duis eget
neque vel ipsum porta bibendum
dictum in ante. Ut accumsan
magna id nisl bibendum et
tincidunt turpis eleifend. Duis nec
mi hendrerit lorem hendrerit convallis.
Quisque sit amet tincidunt diam.
Sed vehicula velit sed risus
pellentesque vitae auctor nisi semper.
Nulla erat massa, semper sit
amet luctus non, bibendum id
eros. Etiam non lacus odio.
Donec vitae nisl vitae nisi
elementum suscipit. Cras ut mollis
mauris.
EOF
}

function test2() {
    cat <<EOF
Curabitur quis elit turpis. Vestibulum
ut elementum magna. Lorem ipsum
dolor sit amet, consectetur adipiscing
elit. Nam sit amet quam
ante. Nullam in risus est,
quis cursus magna. Vestibulum feugiat
nisl nec velit scelerisque molestie.
Pellentesque habitant morbi tristique senectus
et netus et malesuada fames
ac turpis egestas. Vestibulum feugiat
sem vitae mauris aliquet eget
ullamcorper velit molestie.
EOF
}

function test3() {
    cat <<EOF
Proin diam elit, imperdiet id
gravida et, facilisis nec lectus.
Nullam placerat enim sed nulla
porttitor hendrerit. Praesent eu quam
enim, et commodo orci. Nam
eu purus ut ipsum malesuada
rhoncus vitae ut turpis. Morbi
a risus eu ligula faucibus
tincidunt. Cum sociis natoque penatibus
et magnis dis parturient montes,
nascetur ridiculus mus. Nulla faucibus
vehicula diam at tempus. Aliquam
tristique, erat vel fringilla scelerisque,
magna purus venenatis nulla, at
elementum mi lectus vitae nibh.
Phasellus nibh neque, tempus commodo
pretium eget, gravida quis felis.
Vivamus venenatis tristique volutpat. Nunc
vulputate accumsan magna, sit amet
vehicula orci imperdiet vel. Nam
vitae laoreet purus.
EOF
}

PROG1=wc
PROG2='perl5.14.2 -Mv5.10 wc-golfed.pl'

for((i=1;i<4;i++)); do
    wc=( $(test$i | $PROG1) )
    golf=( $(test$i | $PROG2) )
    for((j=0;j<2;j++)); do
    if [ ! ${wc[$j]} = ${golf[$j]} ]; then
        echo "Test #$i failed"
        echo "     wc: ${wc[@]}"
        echo "   golf: ${golf[@]}"
        continue 2
    fi
    done
    echo "Test #$i passed"
done

han · Accepted Answer · 2011-11-08 23:17:14Z

C - 164 chars

This is based on dmckee's code with FUZxxl's improvements, but shortened further and converted to standard C. The code reads from standard input only. Any sequence of one or more consecutive characters with ASCII code >= 33 is counted as a word.

#include<stdio.h>
int b,w,l,t,L,c,s;int main(void){for(;c=getchar()+1;++b)t-=c==11?l++,t>L?L=t:t:-1,s=c>33?w+=!s:0;printf("%d %d %d %d\n",l,t>L?t:L,w,b);return 0;}

Un-golfed

#include <stdio.h>
int b, /* bytes */
     w, /* words */
     l, /* lines */
     t, /* this line */
     L, /* longest line */
     c, /* current character */
     s; /* s==1 if the current word has been counted */
int main(void) {
  for (; c = getchar()+1; ++b)
       t -= (c==11 ? (l++, (t>L ? L=t : t))
                   : -1),
       s = (c>33 ? w += !s
                 : 0);
  printf("%d %d %d %d\n", l, (t>L ? t : L), w, b);
  return 0;
}

You can remove #include<stdio.h>. It'll compile and run without those 19 chars. — Patrick, Commented Dec 7, 2011 at 20:47

Ari B. Friedman · Accepted Answer · 2011-12-07 18:05:10Z

1

R, 91 characters

Golfed

x=scan(w="c",se="\n");n=nchar(x);sum(n);length(x);max(n);sum(sapply(strsplit(x," "),length))

Ungolfed

x=scan(what="c",sep="\n")
n=nchar(x)
num.char=sum(n)
num.lines=length(x)
max.line.length=max(n)
word.count=sum(sapply(strsplit(x," "),length))

Output

> x=scan(what="c",sep="\n");n=nchar(x);sum(n);length(x);max(n);sum(sapply(strsplit(x," "),length))
1: This is a test
2: of how well
3: the program works!
4: 
Read 3 items
[1] 43
[1] 3
[1] 18
[1] 10

Alternatives

x=scan(w="c",se="\n");n=nchar(x);sum(n);length(x);max(n);length(unlist(gregexpr("\\w+",x)))

Saves a character at the expense of not really matching everything.

edited Dec 7, 2011 at 18:05

answered Dec 7, 2011 at 16:45

Ari B. Friedman

1,1639 silver badges14 bronze badges

\$\begingroup\$ A little shorter: x=scan(,"",se="\n");n=nchar(x);sum(n);l=length;l(x);max(n);sum(sapply(strsplit(x," "),l)) \$\endgroup\$
– Tommy
Commented Dec 7, 2011 at 18:55
\$\begingroup\$ @Tommy Yup. And I believe that's actually 91 characters if you count the \ and the n separately :-O \$\endgroup\$
– Ari B. Friedman
Commented Dec 8, 2011 at 3:15
\$\begingroup\$ We can define variable in argument + define length() to save some characters - 87 now. sum(n<-nchar(x<-scan(,"",se="\n")));(l=length)(x);max(n);sum(sapply(strsplit(x," "),l)) \$\endgroup\$
– Vlo
Commented Aug 15, 2014 at 19:52

Add a comment |

Upgradingdave · Accepted Answer · 2011-11-08 14:52:13Z

Javscript (runs in Rhino) - 213 chars

The java file i/o calls take up a lot of space. But I didn't realize you call java so easily from javascript and thought that was pretty neat:

var r=new BufferedReader(new FileReader(f));var s=true;var n=0,l=0,m=0,w=0;while(s){s=r.readLine();if(s){n+=s.length();l+=1;m=m<s.length()?s.length():m;w+=s.split("\\s+").length;}}print(n+','+n+','+l+','+m+','+w);

Ungolfed:

        var reader = new BufferedReader( new FileReader(filename) );
        var s = true;
        var n=0,l=0,m=0,w=0;

        while (s) {
            s = reader.readLine();
            if (s) {
                n+=s.length(); 
                l+=1;
                m = m<s.length()?s.length():m;
                w += s.split("\\s+").length;
            }
        }
        System.out.println('bytes/chars: '+n+', Lines: '+l+', Max Line: '+m+', Words: '+w);

Anti Earth · Accepted Answer · 2012-01-11 02:51:19Z

0

Python 160

Your file name is assigned to variable q.

g=len
z=open(q,'r')
w=z.read()
z.close()
b,e,f=g(w),g(w.replace('\n',' ').split()),w.split('\n')
d,c,a=g(max(f)),g(f),g(open(q,'rb').read())
for i in 'abcde':
     print eval(i)

This prints the 5 requirements in order:

bytes read
characters read
lines read
max line length
words read

(TyrantWaves suggestion implemented)

edited Jan 11, 2012 at 2:51

answered Nov 10, 2011 at 11:00

Anti Earth

1514 bronze badges

\$\begingroup\$ one problem with this, is that it will think the\nbear is one word, while it should be two. you could do w.split(w.replace('\n', ' ')) \$\endgroup\$
– Blazer
Commented Jan 8, 2012 at 6:34
\$\begingroup\$ Could also do l=len to save 3 more characters. (5 len's = 15, whilst \nl=len\n + 5 l's = 12) \$\endgroup\$
– TyrantWave
Commented Jan 9, 2012 at 14:05
\$\begingroup\$ Thanks, taken into consideration! (I didn't even know you could assign a function to a variable like that! :O Living in the dark ages) \$\endgroup\$
– Anti Earth
Commented Jan 11, 2012 at 2:52
\$\begingroup\$ You could laso get rid of close and the read/write mode. do z=open(q).read() \$\endgroup\$
– Joel Cornett
Commented May 10, 2012 at 4:56
\$\begingroup\$ Of course, but what terrible practise! ;) Does open(q) open the file in binary read? \$\endgroup\$
– Anti Earth
Commented May 11, 2012 at 11:15

Add a comment |

tmartin · Accepted Answer · 2012-05-09 17:02:02Z

0

K, 40

{a:0:x;(#a;+/b;max b:#:'a;+/#:'" "\:'a)}

Prints linecount, char count, max line length, word count.

Words are space-separated.

edited May 9, 2012 at 17:02

answered May 8, 2012 at 17:06

tmartin

3,99815 silver badges16 bronze badges

Add a comment |

Community · Accepted Answer · 2020-06-17 09:04:33Z

0

Python, 98

l=len
f=open(q).read()
j=f.split("\n")
for i in (l(f),l(f),l(j),max(j,key=l),l(f.split())):print i

98 is including newlines. Do newlines count?

edited Jun 17, 2020 at 9:04

CommunityBot

1

answered May 10, 2012 at 5:07

Joel Cornett

3711 silver badge9 bronze badges

\$\begingroup\$ It fails on Windows for \r\n newlines (invalid bytes count). q is not defined (OP allows a fixed name such as 'q' or use sys.stdin). It expects only ascii (invalid character count). There is missing l() call around max() (it prints line instead of its length) \$\endgroup\$
– jfs
Commented Nov 9, 2013 at 18:14
\$\begingroup\$ Open in binary mode \$\endgroup\$
– CalculatorFeline
Commented Feb 26, 2016 at 1:34

Add a comment |

Community · Accepted Answer · 2020-06-17 09:04:33Z

0

PHP, 163

<?$f=file($argv[1]);$l=0;foreach($f as$i)$l=$l<strlen($i)?strlen($i):$l;$n=count($f);$c=strlen($g=implode($f));$w=count(explode(' ',$g))+$n-1;echo"$c $c $n $l $w";

edited Jun 17, 2020 at 9:04

CommunityBot

1

answered May 16, 2012 at 11:39

l0n3sh4rk

1,45711 silver badges13 bronze badges

Add a comment |

Stack Exchange Network

Implement wc (word count) shortest code wins [duplicate]

14 Answers 14

Haskell - 80 characters

Perl - 71

c -- 214

Un-golfed

Validation

ruby (72 characters)

awk - 58

Groovy, 85 81

Perl - 66 characters

C - 164 chars

Un-golfed

R, 91 characters

Javscript (runs in Rhino) - 213 chars

Ungolfed:

K, 40

Python, 98

PHP, 163

Not the answer you're looking for? Browse other questions tagged
code-golf
or ask your own question.

Linked

Hot Network Questions

Implement wc (word count) shortest code wins [duplicate]

14 Answers 14

Haskell - 80 characters

Perl - 71

c -- 214

Un-golfed

Validation

ruby (72 characters)

awk - 58

Groovy, 85 81

Perl - 66 characters

C - 164 chars

Un-golfed

R, 91 characters

Javscript (runs in Rhino) - 213 chars

Ungolfed:

K, 40

Python, 98

PHP, 163

Not the answer you're looking for? Browse other questions tagged code-golf or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
code-golf
or ask your own question.