I ran the following benchmark:
bench=function(...,n=1,r=3){
a=match.call(expand.dots=F)$...
t=matrix(ncol=length(a),nrow=n)
for(i in 1:length(a))for(j in 1:n){t1=Sys.time();eval(a[[i]],parent.frame());t[j,i]=Sys.time()-t1}
o=t(apply(t,2,function(x)c(median(x),min(x),max(x),mean(x))))
round(1e3*`dimnames<-`(o,list(names(a),c("median","min","max","mean"))),r)
}
ns=10^c(3:7)
m=sapply(ns,function(n)bench(n=5,
`vector at length + 1`={l=c();for(i in 1:n)l[length(l)+1]=i},
`vector at index`={l=c();for(i in 1:n)l[i]=i},
`vector at index, initialize with type`={l=integer();for(i in 1:n)l[i]=i},
`vector at index, initialize with length`={l=vector(length=n);for(i in 1:n)l[i]=i},
`vector at index, initialize with type and length`={l=integer(n);for(i in 1:n)l[i]=i},
`list at length + 1`={l=list();for(i in 1:n)l[[length(l)+1]]=i},
`list at index`={l=list();for(i in 1:n)l[[i]]=i},
`list at index, initialize with length`={l=vector('list',n);for(i in 1:n)l[[i]]=i},
`list at index, initialize with double length, remove null`={l=vector("list",2*n);for(i in 1:n)l[[i]]=i;l=head(l,i)},
`list at index, double when full, get length from variable`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==len){len=len*2;length(l)=len}};l=head(l,i)},
`list at index, double when full, check length inside loop`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==length(l)){length(l)=i*2}};l=head(l,i)},
`nested lists`={l=list();for(i in 1:n)l=list(l,i)},
`nested lists with unlist`={if(n<=1e5){l=list();for(i in 1:n)l=list(l,i);o=unlist(l)}},
`nested lists with manual unlist`={l=list();for(i in 1:n)l=list(l,i);o=integer(n);for(i in 1:n){o[n-i+1]=l[[2]];l=l[[1]]}},
`JanKanis better_env_as_container`={env=new.env(hash=T,parent=globalenv());for(i in 1:n)env[[as.character(i)]]=i},
`JanKanis inlineLinkedList`={a=list();for(i in 1:n)a=list(a,i);b=vector('list',n);head=a;for(i in n:1){b[[i]]=head[[2]];head=head[[1]]}},
`JanKanis inlineExpandingList`={l=vector('list',10);cap=10;len=0;for(i in 1:n){if(len==cap){l=c(l,vector('list',cap));cap=cap*2};len=len+1;l[[len]]=i};l[1:len]},
`c`={if(n<=1e5){l=c();for(i in 1:n)l=c(l,i)}},
`append vector`={if(n<=1e5){l=integer(n);for(i in 1:n)l=append(l,i)}},
`append list`={if(n<=1e9){l=list();for(i in 1:n)l=append(l,i)}}
)[,1])
m[rownames(m)%in%c("nested lists with unlist","c","append vector","append list"),4:5]=NA
m2=apply(m,2,function(x)formatC(x,max(0,2-ceiling(log10(min(x,na.rm=T)))),format="f"))
m3=apply(rbind(paste0("1e",log10(ns)),m2),2,function(x)formatC(x,max(nchar(x)),format="s"))
writeLines(apply(cbind(m3,c("",rownames(m))),1,paste,collapse=" "))
Output:
1e3 1e4 1e5 1e6 1e7
2.35 24.5 245 2292 27146 vector at length + 1
0.61 5.9 60 590 7360 vector at index
0.61 5.9 64 587 7132 vector at index, initialize with type
0.56 5.6 54 523 6418 vector at index, initialize with length
0.54 5.5 55 522 6371 vector at index, initialize with type and length
2.65 28.8 299 3955 48204 list at length + 1
0.93 9.2 96 1605 13480 list at index
0.58 5.6 57 707 8461 list at index, initialize with length
0.62 5.8 59 739 9413 list at index, initialize with double length, remove null
0.88 8.4 81 962 11872 list at index, double when full, get length from variable
0.96 9.5 92 1264 15813 list at index, double when full, check length inside loop
0.21 1.9 22 426 3826 nested lists
0.25 2.4 29 NA NA nested lists with unlist
2.85 27.5 295 3065 31427 nested lists with manual unlist
1.65 20.2 293 6505 8835 JanKanis better_env_as_container
1.11 10.1 110 1534 27119 JanKanis inlineLinkedList
2.66 26.3 266 3592 47120 JanKanis inlineExpandingList
1.22 118.6 15466 NA NA c
3.64 512.0 45167 NA NA append vector
6.35 664.8 71399 NA NA append list
The table above shows the median time for each method and not the mean time, because occasionally a single run took much longer than a typical run which distorted the mean running time. But none of the methods became much faster on subsequent runs after the first run, so the minimum time and median time were typically similar for each method.
The method "vector at index" (l=c();for(i in 1:n)l[i]=i
) was about 5 times faster than "vector at length + 1" (l=c();for(i in 1:n)l[length(l)]=i
), because getting the length of the vector took longer than adding an element to the vector. When I initialized the vector with a predetermined length, it made the code about 20% faster, but initializing with a specific type didn't make a difference, because the type just needs to be changed once when the first item is added to the vector. And in the case of lists, when you compare the methods "list at index" and "list at index initialized with length", initializing the list with a predetermined length made a bigger difference as the length of the list increased, because it made the code about twice as fast at length 1e6 but about 3 times as fast at length 1e7.
The method "list at index" (l=list();for(i in 1:n)l[[i]]=i
) was about 3-4 times faster than the method "list at length + 1" (l=list();for(i in 1:n)l[[length(l)+1]]=i
).
The linked list and expanding list methods by JanKanis were slower than "list at index" but faster than "list at length + 1". The linked list was faster than the expanding list.
Some people claim that the append
function is faster than the c
function, but in my benchmark append
was about 3-4 times slower than c
.
In the table above, the lengths 1e6 and 1e7 and are missing for three methods: for "c", "append vector", and "append list" because they had quadratic time complexity, and for "nested lists with unlist" because it resulted in a stack overflow.
The "nested lists" option was the fastest, but it doesn't include the time that it takes to flatten the list. When I used the unlist
function to flatten the nested list, I got a stack overflow when the length of the list was around 1.26e5 or higher, because the unlist
function calls itself recursively by default: n=1.26e5;l=list();for(i in 1:n)l=list(l,list(i));u=unlist(l)
. And when I used repeated calls of unlist(recursive=F)
, it took about 4 seconds to run even for a list with only 10,000 items: for(i in 1:n)l=unlist(l,recursive=F)
. But when I did the unlisting manually, it only took about 0.3 seconds to run for a list with a million items: o=integer(n);for(i in 1:n){o[n-i+1]=l[[2]];l=l[[1]]}
.
If you don't know how many items you are going to append to a list in advance but you know the maximum number of items, then you can try to initialize the list at the maximum length and then later remove NULL values. Or another approach is to double the size of the list every time the list becomes full (which you can do faster if you have one variable for the length of the list and another variable for the number of items you have added to the list, so then you don't have to check the length of the list object on each iteration of a loop):
ns=10^c(2:7)
m=sapply(ns,function(n)bench(n=5,
`list at index`={l=list();for(i in 1:n)l[[i]]=i},
`list at length + 1`={l=list();for(i in 1:n)l[[length(l)+1]]=i},
`list at index, initialize with length`={l=vector("list",n);for(i in 1:n)l[[i]]=i},
`list at index, initialize with double length, remove null`={l=vector("list",2*n);for(i in 1:n)l[[i]]=i;l=head(l,i)},
`list at index, initialize with length 1e7, remove null`={l=vector("list",1e7);for(i in 1:n)l[[i]]=i;l=head(l,i)},
`list at index, initialize with length 1e8, remove null`={l=vector("list",1e8);for(i in 1:n)l[[i]]=i;l=head(l,i)},
`list at index, double when full, get length from variable`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==len){len=len*2;length(l)=len}};l=head(l,i)},
`list at index, double when full, check length inside loop`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==length(l)){length(l)=i*2}};l=head(l,i)}
)[,1])
m2=apply(m,2,function(x)formatC(x,max(0,2-ceiling(log10(min(x)))),format="f"))
m3=apply(rbind(paste0("1e",log10(ns)),m2),2,function(x)formatC(x,max(nchar(x)),format="s"))
writeLines(apply(cbind(m3,c("",rownames(m))),1,paste,collapse=" "))
Output:
1e4 1e5 1e6 1e7
9.3 102 1225 13250 list at index
27.4 315 3820 45920 list at length + 1
5.7 58 726 7548 list at index, initialize with length
5.8 60 748 8057 list at index, initialize with double length, remove null
33.4 88 902 7684 list at index, initialize with length 1e7, remove null
333.2 393 2691 12245 list at index, initialize with length 1e8, remove null
8.6 83 1032 10611 list at index, double when full, get length from variable
9.3 96 1280 14319 list at index, double when full, check length inside loop