Back in 2002, Payne conducted a study comparing three similar Unix-like operating systems, one of which was closed-source (Solaris) and two of which were open-source (Debian and OpenBSD) across a number of security metrics. He concludes:
The results show that, of the three systems, OpenBSD had the most number of security features (18) with Debian second (15) and Solaris third (11). Of these features, OpenBSD's features rated highest scoring 7.03 out of 10 while Debian's scored 6.42 and Solaris’ scored 5.92. A similar pattern was observed for the vulnerabilities with OpenBSD having the fewest (5).
...
Based on these results it would appear that open source systems tend to be more secure, however, ... in scoring 10.2, OpenBSD was the only system of the tree to receive a positive score and, a comparison with the magnitudes of the other two scores suggests this is a relatively high score also. Therefore, the significant differences between Debian and OpenBSD's score support the argument that making a program ‘open source’ does not, by itself, automatically improve the security of the program (Levy, 2000), (Viega, 2000). What, therefore, accounts for the dramatically better security exhibited by the OpenBSD system over the other two? The author believes that the answer to this question lies in the fact that, while the source code for the Debian system is available for anyone who cares to examine it, the OpenBSD source code is regularly and purposefully examined with the explicit intention of finding and fixing security holes (Payne, 1999), (Payne, 2000). Thus it is this auditing work, rather than simply the general availability of source code, that is responsible for OpenBSD's low number of security problems.
Edit: To summarize, Payne explains his results by claiming that it is the culture of security itself that promotes actual security. While that is likely true, I think it is also important to note that, with all else being equal, the general public can't independently audit that which is not open.
That study is a bit dated and of limited breadth, though.
I tried looking for a more comprehensive study, but I couldn't really find anything substantive (there are many "opinion pieces" giving arguments as to why open source is better, but not much data). Therefore, I took a quick look at the National Vulnerability Database, which collects, rates, and posts software vulnerabilities. It has a database dating back into the 1980s. I quickly hacked together this perl script to parse the database:
#!/usr/bin/perl -w
use Cwd 'abs_path';
use File::Basename;
use XML::Parser;
my @csseverity;my @osseverity;my @bothseverity;
my $numNeither = 0;
sub mean {
my $result; return 0 if(@_ <= 0); foreach (@_) { $result += $_ } return $result / @_;
}
sub stddev {
my $mean = mean(@_); my @elem_squared; foreach (@_) { push (@elem_squared, ($_ **2)); }
return sqrt( mean(@elem_squared) - ($mean ** 2));
}
sub handle_start {
if($_[1] eq "entry") {
$item = {};
undef($next) if(defined($next));
for(my $i=2; $i<@_; $i++) {
if(!defined($key)) {
$key = $_[$i];
} else {
$item->{$key} = $_[$i];
undef($key);
}
}
} elsif(defined($item)) {
$next = $_[1];
}
}
sub handle_end {
if($_[1] eq "entry") {
if(!exists($item->{'reject'}) || $item->{'reject'} != 1) {
my $score = $item->{'CVSS_score'};
my $d = $item->{"descript"};
my $isOS = 0;
my $isCS = 0;
$isOS = 1 if($d =~ m/(^|\W)(linux|nfs|openssl|(net|open|free)?bsd|netscape|red hat|lynx|apache|mozilla|perl|x windowing|xlock|php|w(u|f)-?ftpd|sendmail|ghostscript|gnu|slackware|postfix|vim|bind|kde|mysql|squirrelmail|ssh-agent|formmail|sshd|suse|hsftp|xfree86|Mutt|mpg321|cups|tightvnc|pam|bugzilla|mediawiki|tor|piwiki|ruby|chromium|open source)(\W|$)/i);
$isCS = 1 if($d =~ m/(^|\W)(windows|tooltalk|solaris|sun|microsoft|apple|macintosh|sybergen|mac\s*os|mcafee|irix|iis|sgi|internet explorer|ntmail|sco|cisco(secure)?|aix|samba|sunos|novell|dell|netware|outlook|hp(-?ux)?|iplanet|flash|aol instant|aim|digital|compaq|tru64|wingate|activex|ichat|remote access service|qnx|mantis|veritas|chrome|3com|vax|vms|alcatel|xeneo|msql|unixware|symantec|oracle|realone|real\s*networks|realserver|realmedia|ibm|websphere|coldfusion|dg\/ux|synaesthesia|helix|check point|proofpoint|martinicreations|webfort|vmware)(\W|$)/i);
if($isOS && $isCS) {
push(@bothseverity, $score);
} elsif($isOS) {
push(@osseverity, $score);
} elsif($isCS) {
push(@csseverity, $score);
} else {
$numNeither++;
#print $d . "\n";
}
}
undef($item);
}
}
sub handle_char {
$item->{$next} = $_[1] if(defined($item) && defined($next));
undef($next) if(defined($next));
}
my($scriptfile, $scriptdir) = fileparse(abs_path($0));
sub process_year {
my $filename = 'nvdcve-' . $_[0] . '.xml';
system("cd $scriptdir ; wget http://nvd.nist.gov/download/" . $filename) unless(-e $scriptdir . $filename);
$p = new XML::Parser(Handlers => {Start => \&handle_start,
End => \&handle_end,
Char => \&handle_char});
$p->parsefile($filename);
}
my($sec,$min,$hour,$mday,$mon,$currentyear,$wday,$yday,$isdst) = localtime(time);
$currentyear += 1900;
for(my $year=2002; $year<=$currentyear; $year++) {
&process_year($year);
}
print "Total vulnerabilities: " . (@osseverity + @csseverity + @bothseverity + $numNeither) . "\n";
print "\t # Open Source (OS): " . @osseverity . "\n";
print "\t# Closed Source (OS): " . @csseverity . "\n";
print "\t # Both: " . @bothseverity . "\n";
print "\t # Unclassified: " . $numNeither . "\n";
print "OS Severity: " . &mean(@osseverity) . "\t" . &stddev(@osseverity) . "\n";
print "CS Severity: " . &mean(@csseverity) . "\t" . &stddev(@csseverity) . "\n";
print "Both Severity: " . &mean(@bothseverity) . "\t" . &stddev(@bothseverity) . "\n";
Feel free to modify the code, if you'd like. Here are the results:
The full database has 46102 vulnerabilities. My script was able to classify 15748 of them as specifically related to open source software, 11430 were related to closed source software, 782 were applicable to both closed source and open source software, and 18142 were unclassified (I didn't have time to optimize my classifier very much; feel free to improve it). Among the vulnerabilities that were classified, the open source ones had an average severity of 6.24 with a standard deviation of 1.74 (a higher severity is worse). The closed source vulnerabilities had an average severity of 6.65 (stddev = 2.21). The vulnerabilities that were classified as both had an average severity of 6.47 (stddev = 2.13). This may not be a completely fair comparison, though, since open source software has become much more popular in recent years. If I restrict the results to the years 2003 to the present, we get:
- Total vulnerabilities: 39445
- # Open Source (OS): 14595
- # Closed Source (CS): 9293
- # Both: 675
- # Unclassified: 14882
- Avg. OS Severity: 6.25 (stddev 1.70)
- Avg. CS Severity: 6.79 (stddev 2.24)
- Both Severity: 6.52 (stddev 2.15)
I haven't had time to do any statistical analysis on these results, however, it does look like, on average, the vulnerabilities affecting open source software have a slightly lower severity rating than vulnerabilities affecting closed source software.
When I get some more time, I'll try and generate a graph of the running average of severity over time.