258

If I have some R list mylist, you can append an item obj to it like so:

mylist[[length(mylist)+1]] <- obj

But surely there is some more compact way. When I was new at R, I tried writing lappend() like so:

lappend <- function(lst, obj) {
    lst[[length(lst)+1]] <- obj
    return(lst)
}

but of course that doesn't work due to R's call-by-name semantics (lst is effectively copied upon call, so changes to lst are not visible outside the scope of lappend(). I know you can do environment hacking in an R function to reach outside the scope of your function and mutate the calling environment, but that seems like a large hammer to write a simple append function.

Can anyone suggest a more beautiful way of doing this? Bonus points if it works for both vectors and lists.

8
  • 7
    R has the immutable data characteristics that are often found in functional languages, hate to say this, but I think you just have to deal with it. It has its pros and its cons
    – Dan
    Commented Mar 14, 2010 at 0:15
  • When you say "call-by-name" you really mean "call-by-value", right? Commented Mar 15, 2010 at 17:54
  • 8
    No, it's definitely not call-by-value, otherwise this wouldn't be a problem. R actually uses call-by-need (en.wikipedia.org/wiki/Evaluation_strategy#Call_by_need).
    – Nick
    Commented Mar 18, 2010 at 1:56
  • 4
    A good idea is to pre-allocate your vector/list: N = 100 mylist = vector('list', N) for (i in 1:N) { #mylist[[i]] = ... } Avoid 'growing' objects in R.
    – Fernando
    Commented Sep 9, 2012 at 19:46
  • I accidentally found the answer here, stackoverflow.com/questions/17046336/… So hard to implement so easy algorithm!
    – KH Kim
    Commented Feb 11, 2014 at 20:47

17 Answers 17

260

If it's a list of string, just use the c() function :

R> LL <- list(a="tom", b="dick")
R> c(LL, c="harry")
$a
[1] "tom"

$b
[1] "dick"

$c
[1] "harry"

R> class(LL)
[1] "list"
R> 

That works on vectors too, so do I get the bonus points?

Edit (2015-Feb-01): This post is coming up on its fifth birthday. Some kind readers keep repeating any shortcomings with it, so by all means also see some of the comments below. One suggestion for list types:

newlist <- list(oldlist, list(someobj))

In general, R types can make it hard to have one and just one idiom for all types and uses.

17
  • 21
    This doesn't append... it concatenates. LL would still have two elements after C(LL, c="harry") is called.
    – Nick
    Commented Mar 13, 2010 at 6:29
  • 27
    Just reassign to LL: LL <- c(LL, c="harry"). Commented Mar 13, 2010 at 11:52
  • 53
    This only works with strings. If a, b and c are integer vectors, the behavior is completely different. Commented Dec 13, 2010 at 17:14
  • 8
    @Dirk: You have the parens nested differently than I do. My call to c() has 2 arguments: the list I'm trying to append to, namely list(a=3, b=c(4, 5)), and the item I'm trying to append, namely c=c(6, 7). If you use my approach, you'll see that 2 list items are appended (6 and 7, with names c1 and c2) instead of a single 2-element vector named c as is clearly intended! Commented Dec 5, 2011 at 16:45
  • 8
    So is the conclusion mylist <- list(mylist, list(obj))? If yes it would be nice to modify the answer
    – Matthew
    Commented Oct 23, 2014 at 15:33
103

The OP (in the April 2012 updated revision of the question) is interested in knowing if there's a way to add to a list in amortized constant time, such as can be done, for example, with a C++ vector<> container. The best answer(s?) here so far only show the relative execution times for various solutions given a fixed-size problem, but do not address any of the various solutions' algorithmic efficiency directly. Comments below many of the answers discuss the algorithmic efficiency of some of the solutions, but in every case to date (as of April 2015) they come to the wrong conclusion.

Algorithmic efficiency captures the growth characteristics, either in time (execution time) or space (amount of memory consumed) as a problem size grows. Running a performance test for various solutions given a fixed-size problem does not address the various solutions' growth rate. The OP is interested in knowing if there is a way to append objects to an R list in "amortized constant time". What does that mean? To explain, first let me describe "constant time":

  • Constant or O(1) growth:

    If the time required to perform a given task remains the same as the size of the problem doubles, then we say the algorithm exhibits constant time growth, or stated in "Big O" notation, exhibits O(1) time growth. When the OP says "amortized" constant time, he simply means "in the long run"... i.e., if performing a single operation occasionally takes much longer than normal (e.g. if a preallocated buffer is exhausted and occasionally requires resizing to a larger buffer size), as long as the long-term average performance is constant time, we'll still call it O(1).

    For comparison, I will also describe "linear time" and "quadratic time":

  • Linear or O(n) growth:

    If the time required to perform a given task doubles as the size of the problem doubles, then we say the algorithm exhibits linear time, or O(n) growth.

  • Quadratic or O(n2) growth:

    If the time required to perform a given task increases by the square of the problem size, them we say the algorithm exhibits quadratic time, or O(n2) growth.

There are many other efficiency classes of algorithms; I defer to the Wikipedia article for further discussion.

I thank @CronAcronis for his answer, as I am new to R and it was nice to have a fully-constructed block of code for doing a performance analysis of the various solutions presented on this page. I am borrowing his code for my analysis, which I duplicate (wrapped in a function) below:

library(microbenchmark)
### Using environment as a container
lPtrAppend <- function(lstptr, lab, obj) {lstptr[[deparse(substitute(lab))]] <- obj}
### Store list inside new environment
envAppendList <- function(lstptr, obj) {lstptr$list[[length(lstptr$list)+1]] <- obj} 
runBenchmark <- function(n) {
    microbenchmark(times = 5,  
        env_with_list_ = {
            listptr <- new.env(parent=globalenv())
            listptr$list <- NULL
            for(i in 1:n) {envAppendList(listptr, i)}
            listptr$list
        },
        c_ = {
            a <- list(0)
            for(i in 1:n) {a = c(a, list(i))}
        },
        list_ = {
            a <- list(0)
            for(i in 1:n) {a <- list(a, list(i))}
        },
        by_index = {
            a <- list(0)
            for(i in 1:n) {a[length(a) + 1] <- i}
            a
        },
        append_ = { 
            a <- list(0)    
            for(i in 1:n) {a <- append(a, i)} 
            a
        },
        env_as_container_ = {
            listptr <- new.env(parent=globalenv())
            for(i in 1:n) {lPtrAppend(listptr, i, i)} 
            listptr
        }   
    )
}

The results posted by @CronAcronis definitely seem to suggest that the a <- list(a, list(i)) method is fastest, at least for a problem size of 10000, but the results for a single problem size do not address the growth of the solution. For that, we need to run a minimum of two profiling tests, with differing problem sizes:

> runBenchmark(2e+3)
Unit: microseconds
              expr       min        lq      mean    median       uq       max neval
    env_with_list_  8712.146  9138.250 10185.533 10257.678 10761.33 12058.264     5
                c_ 13407.657 13413.739 13620.976 13605.696 13790.05 13887.738     5
             list_   854.110   913.407  1064.463   914.167  1301.50  1339.132     5
          by_index 11656.866 11705.140 12182.104 11997.446 12741.70 12809.363     5
           append_ 15986.712 16817.635 17409.391 17458.502 17480.55 19303.560     5
 env_as_container_ 19777.559 20401.702 20589.856 20606.961 20939.56 21223.502     5
> runBenchmark(2e+4)
Unit: milliseconds
              expr         min         lq        mean    median          uq         max neval
    env_with_list_  534.955014  550.57150  550.329366  553.5288  553.955246  558.636313     5
                c_ 1448.014870 1536.78905 1527.104276 1545.6449 1546.462877 1558.609706     5
             list_    8.746356    8.79615    9.162577    8.8315    9.601226    9.837655     5
          by_index  953.989076 1038.47864 1037.859367 1064.3942 1065.291678 1067.143200     5
           append_ 1634.151839 1682.94746 1681.948374 1689.7598 1696.198890 1706.683874     5
 env_as_container_  204.134468  205.35348  208.011525  206.4490  208.279580  215.841129     5
> 

First of all, a word about the min/lq/mean/median/uq/max values: Since we are performing the exact same task for each of 5 runs, in an ideal world, we could expect that it would take exactly the same amount of time for each run. But the first run is normally biased toward longer times due to the fact that the code we are testing is not yet loaded into the CPU's cache. Following the first run, we would expect the times to be fairly consistent, but occasionally our code may be evicted from the cache due to timer tick interrupts or other hardware interrupts that are unrelated to the code we are testing. By testing the code snippets 5 times, we are allowing the code to be loaded into the cache during the first run and then giving each snippet 4 chances to run to completion without interference from outside events. For this reason, and because we are really running the exact same code under the exact same input conditions each time, we will consider only the 'min' times to be sufficient for the best comparison between the various code options.

Note that I chose to first run with a problem size of 2000 and then 20000, so my problem size increased by a factor of 10 from the first run to the second.

Performance of the list solution: O(1) (constant time)

Let's first look at the growth of the list solution, since we can tell right away that it's the fastest solution in both profiling runs: In the first run, it took 854 microseconds (0.854 milliseconds) to perform 2000 "append" tasks. In the second run, it took 8.746 milliseconds to perform 20000 "append" tasks. A naïve observer would say, "Ah, the list solution exhibits O(n) growth, since as the problem size grew by a factor of ten, so did the time required to execute the test." The problem with that analysis is that what the OP wants is the growth rate of a single object insertion, not the growth rate of the overall problem. Knowing that, it's clear then that the list solution provides exactly what the OP wants: a method of appending objects to a list in O(1) time.

Performance of the other solutions

None of the other solutions come even close to the speed of the list solution, but it is informative to examine them anyway:

Most of the other solutions appear to be O(n) in performance. For example, the by_index solution, a very popular solution based on the frequency with which I find it in other SO posts, took 11.6 milliseconds to append 2000 objects, and 953 milliseconds to append ten times that many objects. The overall problem's time grew by a factor of 100, so a naïve observer might say "Ah, the by_index solution exhibits O(n2) growth, since as the problem size grew by a factor of ten, the time required to execute the test grew by a factor of 100." As before, this analysis is flawed, since the OP is interested in the growth of a single object insertion. If we divide the overall time growth by the problem's size growth, we find that the time growth of appending objects increased by a factor of only 10, not a factor of 100, which matches the growth of the problem size, so the by_index solution is O(n). There are no solutions listed which exhibit O(n2) growth for appending a single object.

6
  • 1
    To the reader: Please read JanKanis's answer, which provides a very practical extension to my findings above, and dives a bit into the overhead of various solutions given the internal workings of the C implementation of R. Commented Nov 3, 2015 at 16:50
  • 7
    Not sure the list option implements what it is required: >length(c(c(c(list(1)),list(2)),list(3))) [1] 3 > length(list(list(list(list(1)),list(2)),list(3))) [1] 2. Looks more like nested lists.
    – Picarus
    Commented Jan 24, 2016 at 23:45
  • 1
    @Picarus - I think you're right. I'm not working with R anymore, but thankfully JanKanis posted an answer with a much more useful O(1) solution and notes the issue you identified. I'm sure JanKanis will appreciate your upvote. Commented Jan 25, 2016 at 3:16
  • @phonetagger, you should edit your answer. Not everybody will read all the answers.
    – Picarus
    Commented Jan 25, 2016 at 5:35
  • "not a single answer has addressed the actual question" --> The problem is that the original question was not about algorithm complexity, take a look at the editions of the question. The OP asked first how to append an element in a list, than, several months later, he changed the question. Commented Mar 7, 2016 at 15:11
44

In the other answers, only the list approach results in O(1) appends, but it results in a deeply nested list structure, and not a plain single list. I have used the below datastructures, they supports O(1) (amortized) appends, and allow the result to be converted back to a plain list.

expandingList <- function(capacity = 10) {
    buffer <- vector('list', capacity)
    length <- 0

    methods <- list()

    methods$double.size <- function() {
        buffer <<- c(buffer, vector('list', capacity))
        capacity <<- capacity * 2
    }

    methods$add <- function(val) {
        if(length == capacity) {
            methods$double.size()
        }

        length <<- length + 1
        buffer[[length]] <<- val
    }

    methods$as.list <- function() {
        b <- buffer[0:length]
        return(b)
    }

    methods
}

and

linkedList <- function() {
    head <- list(0)
    length <- 0

    methods <- list()

    methods$add <- function(val) {
        length <<- length + 1
        head <<- list(head, val)
    }

    methods$as.list <- function() {
        b <- vector('list', length)
        h <- head
        for(i in length:1) {
            b[[i]] <- head[[2]]
            head <- head[[1]]
        }
        return(b)
    }
    methods
}

Use them as follows:

> l <- expandingList()
> l$add("hello")
> l$add("world")
> l$add(101)
> l$as.list()
[[1]]
[1] "hello"

[[2]]
[1] "world"

[[3]]
[1] 101

These solutions could be expanded into full objects that support al list-related operations by themselves, but that will remain as an exercise for the reader.

Another variant for a named list:

namedExpandingList <- function(capacity = 10) {
    buffer <- vector('list', capacity)
    names <- character(capacity)
    length <- 0

    methods <- list()

    methods$double.size <- function() {
        buffer <<- c(buffer, vector('list', capacity))
        names <<- c(names, character(capacity))
        capacity <<- capacity * 2
    }

    methods$add <- function(name, val) {
        if(length == capacity) {
            methods$double.size()
        }

        length <<- length + 1
        buffer[[length]] <<- val
        names[length] <<- name
    }

    methods$as.list <- function() {
        b <- buffer[0:length]
        names(b) <- names[0:length]
        return(b)
    }

    methods
}

Benchmarks

Performance comparison using @phonetagger's code (which is based on @Cron Arconis' code). I have also added a better_env_as_container and changed the env_as_container_ a bit. The original env_as_container_ was broken and doesn't actually store all the numbers.

library(microbenchmark)
lPtrAppend <- function(lstptr, lab, obj) {lstptr[[deparse(lab)]] <- obj}
### Store list inside new environment
envAppendList <- function(lstptr, obj) {lstptr$list[[length(lstptr$list)+1]] <- obj} 
env2list <- function(env, len) {
    l <- vector('list', len)
    for (i in 1:len) {
        l[[i]] <- env[[as.character(i)]]
    }
    l
}
envl2list <- function(env, len) {
    l <- vector('list', len)
    for (i in 1:len) {
        l[[i]] <- env[[paste(as.character(i), 'L', sep='')]]
    }
    l
}
runBenchmark <- function(n) {
    microbenchmark(times = 5,  
        env_with_list_ = {
            listptr <- new.env(parent=globalenv())
            listptr$list <- NULL
            for(i in 1:n) {envAppendList(listptr, i)}
            listptr$list
        },
        c_ = {
            a <- list(0)
            for(i in 1:n) {a = c(a, list(i))}
        },
        list_ = {
            a <- list(0)
            for(i in 1:n) {a <- list(a, list(i))}
        },
        by_index = {
            a <- list(0)
            for(i in 1:n) {a[length(a) + 1] <- i}
            a
        },
        append_ = { 
            a <- list(0)    
            for(i in 1:n) {a <- append(a, i)} 
            a
        },
        env_as_container_ = {
            listptr <- new.env(hash=TRUE, parent=globalenv())
            for(i in 1:n) {lPtrAppend(listptr, i, i)} 
            envl2list(listptr, n)
        },
        better_env_as_container = {
            env <- new.env(hash=TRUE, parent=globalenv())
            for(i in 1:n) env[[as.character(i)]] <- i
            env2list(env, n)
        },
        linkedList = {
            a <- linkedList()
            for(i in 1:n) { a$add(i) }
            a$as.list()
        },
        inlineLinkedList = {
            a <- list()
            for(i in 1:n) { a <- list(a, i) }
            b <- vector('list', n)
            head <- a
            for(i in n:1) {
                b[[i]] <- head[[2]]
                head <- head[[1]]
            }                
        },
        expandingList = {
            a <- expandingList()
            for(i in 1:n) { a$add(i) }
            a$as.list()
        },
        inlineExpandingList = {
            l <- vector('list', 10)
            cap <- 10
            len <- 0
            for(i in 1:n) {
                if(len == cap) {
                    l <- c(l, vector('list', cap))
                    cap <- cap*2
                }
                len <- len + 1
                l[[len]] <- i
            }
            l[1:len]
        }
    )
}

# We need to repeatedly add an element to a list. With normal list concatenation
# or element setting this would lead to a large number of memory copies and a
# quadratic runtime. To prevent that, this function implements a bare bones
# expanding array, in which list appends are (amortized) constant time.
    expandingList <- function(capacity = 10) {
        buffer <- vector('list', capacity)
        length <- 0

        methods <- list()

        methods$double.size <- function() {
            buffer <<- c(buffer, vector('list', capacity))
            capacity <<- capacity * 2
        }

        methods$add <- function(val) {
            if(length == capacity) {
                methods$double.size()
            }

            length <<- length + 1
            buffer[[length]] <<- val
        }

        methods$as.list <- function() {
            b <- buffer[0:length]
            return(b)
        }

        methods
    }

    linkedList <- function() {
        head <- list(0)
        length <- 0

        methods <- list()

        methods$add <- function(val) {
            length <<- length + 1
            head <<- list(head, val)
        }

        methods$as.list <- function() {
            b <- vector('list', length)
            h <- head
            for(i in length:1) {
                b[[i]] <- head[[2]]
                head <- head[[1]]
            }
            return(b)
        }

        methods
    }

# We need to repeatedly add an element to a list. With normal list concatenation
# or element setting this would lead to a large number of memory copies and a
# quadratic runtime. To prevent that, this function implements a bare bones
# expanding array, in which list appends are (amortized) constant time.
    namedExpandingList <- function(capacity = 10) {
        buffer <- vector('list', capacity)
        names <- character(capacity)
        length <- 0

        methods <- list()

        methods$double.size <- function() {
            buffer <<- c(buffer, vector('list', capacity))
            names <<- c(names, character(capacity))
            capacity <<- capacity * 2
        }

        methods$add <- function(name, val) {
            if(length == capacity) {
                methods$double.size()
            }

            length <<- length + 1
            buffer[[length]] <<- val
            names[length] <<- name
        }

        methods$as.list <- function() {
            b <- buffer[0:length]
            names(b) <- names[0:length]
            return(b)
        }

        methods
    }

result:

> runBenchmark(1000)
Unit: microseconds
                    expr       min        lq      mean    median        uq       max neval
          env_with_list_  3128.291  3161.675  4466.726  3361.837  3362.885  9318.943     5
                      c_  3308.130  3465.830  6687.985  8578.913  8627.802  9459.252     5
                   list_   329.508   343.615   389.724   370.504   449.494   455.499     5
                by_index  3076.679  3256.588  5480.571  3395.919  8209.738  9463.931     5
                 append_  4292.321  4562.184  7911.882 10156.957 10202.773 10345.177     5
       env_as_container_ 24471.511 24795.849 25541.103 25486.362 26440.591 26511.200     5
 better_env_as_container  7671.338  7986.597  8118.163  8153.726  8335.659  8443.493     5
              linkedList  1700.754  1755.439  1829.442  1804.746  1898.752  1987.518     5
        inlineLinkedList  1109.764  1115.352  1163.751  1115.631  1206.843  1271.166     5
           expandingList  1422.440  1439.970  1486.288  1519.728  1524.268  1525.036     5
     inlineExpandingList   942.916   973.366  1002.461  1012.197  1017.784  1066.044     5
> runBenchmark(10000)
Unit: milliseconds
                    expr        min         lq       mean     median         uq        max neval
          env_with_list_ 357.760419 360.277117 433.810432 411.144799 479.090688 560.779139     5
                      c_ 685.477809 734.055635 761.689936 745.957553 778.330873 864.627811     5
                   list_   3.257356   3.454166   3.505653   3.524216   3.551454   3.741071     5
                by_index 445.977967 454.321797 515.453906 483.313516 560.374763 633.281485     5
                 append_ 610.777866 629.547539 681.145751 640.936898 760.570326 763.896124     5
       env_as_container_ 281.025606 290.028380 303.885130 308.594676 314.972570 324.804419     5
 better_env_as_container  83.944855  86.927458  90.098644  91.335853  92.459026  95.826030     5
              linkedList  19.612576  24.032285  24.229808  25.461429  25.819151  26.223597     5
        inlineLinkedList  11.126970  11.768524  12.216284  12.063529  12.392199  13.730200     5
           expandingList  14.735483  15.854536  15.764204  16.073485  16.075789  16.081726     5
     inlineExpandingList  10.618393  11.179351  13.275107  12.391780  14.747914  17.438096     5
> runBenchmark(20000)
Unit: milliseconds
                    expr         min          lq       mean      median          uq         max neval
          env_with_list_ 1723.899913 1915.003237 1921.23955 1938.734718 1951.649113 2076.910767     5
                      c_ 2759.769353 2768.992334 2810.40023 2820.129738 2832.350269 2870.759474     5
                   list_    6.112919    6.399964    6.63974    6.453252    6.910916    7.321647     5
                by_index 2163.585192 2194.892470 2292.61011 2209.889015 2436.620081 2458.063801     5
                 append_ 2832.504964 2872.559609 2983.17666 2992.634568 3004.625953 3213.558197     5
       env_as_container_  573.386166  588.448990  602.48829  597.645221  610.048314  642.912752     5
 better_env_as_container  154.180531  175.254307  180.26689  177.027204  188.642219  206.230191     5
              linkedList   38.401105   47.514506   46.61419   47.525192   48.677209   50.952958     5
        inlineLinkedList   25.172429   26.326681   32.33312   34.403442   34.469930   41.293126     5
           expandingList   30.776072   30.970438   34.45491   31.752790   38.062728   40.712542     5
     inlineExpandingList   21.309278   22.709159   24.64656   24.290694   25.764816   29.158849     5

I have added linkedList and expandingList and an inlined version of both. The inlinedLinkedList is basically a copy of list_, but it also converts the nested structure back into a plain list. Beyond that the difference between the inlined and non-inlined versions is due to the overhead of the function calls.

All variants of expandingList and linkedList show O(1) append performance, with the benchmark time scaling linearly with the number of items appended. linkedList is slower than expandingList, and the function call overhead is also visible. So if you really need all the speed you can get (and want to stick to R code), use an inlined version of expandingList.

I've also had a look at the C implementation of R, and both approaches should be O(1) append for any size up until you run out of memory.

I have also changed env_as_container_, the original version would store every item under index "i", overwriting the previously appended item. The better_env_as_container I have added is very similar to env_as_container_ but without the deparse stuff. Both exhibit O(1) performance, but they have an overhead that is quite a bit larger than the linked/expanding lists.

Memory overhead

In the C R implementation there is an overhead of 4 words and 2 ints per allocated object. The linkedList approach allocates one list of length two per append, for a total of (4*8+4+4+2*8=) 56 bytes per appended item on 64-bit computers (excluding memory allocation overhead, so probably closer to 64 bytes). The expandingList approach uses one word per appended item, plus a copy when doubling the vector length, so a total memory usage of up to 16 bytes per item. Since the memory is all in one or two objects the per-object overhead is insignificant. I haven't looked deeply into the env memory usage, but I think it will be closer to linkedList.

4
  • what is the point of keeping the list option if it does not solve the problem we are trying to solve?
    – Picarus
    Commented Jan 25, 2016 at 5:37
  • 1
    @Picarus I'm not sure what you mean. Why I kept it in the benchmark? As comparison with the other options. The list_ option is faster and could be useful if you don't need to convert to a normal list, i.e. if you use the result as a stack.
    – JanKanis
    Commented Jan 25, 2016 at 11:33
  • @Gabor Csardi posted a faster way to convert environments back to lists in a different question at stackoverflow.com/a/29482211/264177. I benchmarked that as well on my system. It is about twice as fast better_env_as_container but still slower than linkedList and expandingList.
    – JanKanis
    Commented Mar 24, 2016 at 15:31
  • Deeply nested (n=99999) lists seem manageable and tolerable for certain applications: Anyone want to benchmark nestoR? (I'm still a bit of a noob at the environment stuff I used for nestoR.) My bottleneck is almost always human time spent coding and doing data analysis, but I appreciate the benchmarks I've found on this post. As for memory overhead, I would not mind up to about a kB per node for my applications. I hold on to large arrays, etc.
    – Ana Nimbus
    Commented Feb 1, 2020 at 6:23
16

In the Lisp we did it this way:

> l <- c(1)
> l <- c(2, l)
> l <- c(3, l)
> l <- rev(l)
> l
[1] 1 2 3

though it was 'cons', not just 'c'. If you need to start with an empy list, use l <- NULL.

5
  • 3
    Excellent! All the other solutions return some weird list of lists.
    – metakermit
    Commented Nov 7, 2013 at 13:16
  • 4
    In Lisp, prepending to a list is an O(1) operation, while appending runs in O(n), @flies. The need for reversion is outweighed by performance gain. This is not the case in R. Not even in pairlist, which generally resembles List lists the most.
    – Palec
    Commented Mar 9, 2015 at 3:16
  • @Palec "This is not the case in R" - I'm not sure which "this" you're referring to. Are you saying that appending is not O(1), or it's not O(n)?
    – flies
    Commented Jun 5, 2015 at 15:26
  • 1
    I’m saying that if you were coding in Lisp, your approach would be inefficient, @flies. That remark was meant to explain why the answer is written as it is. In R, the two approaches are on par performance-wise, AFAIK. But now I’m not sure about the amortized complexity. Haven’t touched R since about the time my previous comment was written.
    – Palec
    Commented Jun 5, 2015 at 16:45
  • 4
    In R, this approach will be O(n). The c() function copies its arguments into a new vector/list and returns that.
    – JanKanis
    Commented Jan 25, 2016 at 11:16
7

You want something like this maybe?

> push <- function(l, x) {
   lst <- get(l, parent.frame())
   lst[length(lst)+1] <- x
   assign(l, lst, envir=parent.frame())
 }
> a <- list(1,2)
> push('a', 6)
> a
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 6

It's not a very polite function (assigning to parent.frame() is kind of rude) but IIUYC it's what you're asking for.

5

If you pass in the list variable as a quoted string, you can reach it from within the function like:

push <- function(l, x) {
  assign(l, append(eval(as.name(l)), x), envir=parent.frame())
}

so:

> a <- list(1,2)
> a
[[1]]
[1] 1

[[2]]
[1] 2

> push("a", 3)
> a
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

> 

or for extra credit:

> v <- vector()
> push("v", 1)
> v
[1] 1
> push("v", 2)
> v
[1] 1 2
> 
1
  • 1
    This is basically the behavior that I want, however it still calls append internally, resulting in O(n^2) performance.
    – Nick
    Commented Mar 13, 2010 at 6:31
5

Not sure why you don't think your first method won't work. You have a bug in the lappend function: length(list) should be length(lst). This works fine and returns a list with the appended obj.

1
  • 4
    You are absolutely right. There was a bug in the code and I've fixed it. I've tested the lappend() that I've provided and it seems to perform about as well as c() and append(), all of which exhibit O(n^2) behavior.
    – Nick
    Commented Sep 21, 2010 at 16:51
5

I have made a small comparison of methods mentioned here.

n = 1e+4
library(microbenchmark)
### Using environment as a container
lPtrAppend <- function(lstptr, lab, obj) {lstptr[[deparse(substitute(lab))]] <- obj}
### Store list inside new environment
envAppendList <- function(lstptr, obj) {lstptr$list[[length(lstptr$list)+1]] <- obj} 

microbenchmark(times = 5,  
        env_with_list_ = {
            listptr <- new.env(parent=globalenv())
            listptr$list <- NULL
            for(i in 1:n) {envAppendList(listptr, i)}
            listptr$list
        },
        c_ = {
            a <- list(0)
            for(i in 1:n) {a = c(a, list(i))}
        },
        list_ = {
            a <- list(0)
            for(i in 1:n) {a <- list(a, list(i))}
        },
        by_index = {
            a <- list(0)
            for(i in 1:n) {a[length(a) + 1] <- i}
            a
        },
        append_ = { 
            a <- list(0)    
            for(i in 1:n) {a <- append(a, i)} 
            a
        },
        env_as_container_ = {
            listptr <- new.env(parent=globalenv())
            for(i in 1:n) {lPtrAppend(listptr, i, i)} 
            listptr
        }   
)

Results:

Unit: milliseconds
              expr       min        lq       mean    median        uq       max neval cld
    env_with_list_  188.9023  198.7560  224.57632  223.2520  229.3854  282.5859     5  a 
                c_ 1275.3424 1869.1064 2022.20984 2191.7745 2283.1199 2491.7060     5   b
             list_   17.4916   18.1142   22.56752   19.8546   20.8191   36.5581     5  a 
          by_index  445.2970  479.9670  540.20398  576.9037  591.2366  607.6156     5  a 
           append_ 1140.8975 1316.3031 1794.10472 1620.1212 1855.3602 3037.8416     5   b
 env_as_container_  355.9655  360.1738  399.69186  376.8588  391.7945  513.6667     5  a 
3
  • This is great info: would never ever have guessed that the list = list were not only the winner - but by 1 to 2 orders or magnitude! Commented Jun 24, 2018 at 22:24
  • This comparison is not valid: list_ does not create a list of integers as expected. It will contain list of lists. At each iteration, a new list is created with 2 elements, one is the new integer and the other one is the previous version of the same list. Because values are not overwritten, a simple copy by reference is done internally. It's why it's so fast, but we don't have the same object at all. For all the other examples, we have a list of length n+1 Commented Oct 19, 2020 at 10:21
  • @DavidBellot it is correct, it is meant there for benchmark level. You can flatten it at then end :) Commented Dec 4, 2020 at 9:06
3

try this function lappend

lappend <- function (lst, ...){
  lst <- c(lst, list(...))
  return(lst)
}

and other suggestions from this page Add named vector to a list

Bye.

3

in fact there is a subtelty with the c() function. If you do:

x <- list()
x <- c(x,2)
x = c(x,"foo")

you will obtain as expected:

[[1]]
[1]

[[2]]
[1] "foo"

but if you add a matrix with x <- c(x, matrix(5,2,2), your list will have another 4 elements of value 5 ! You would better do:

x <- c(x, list(matrix(5,2,2))

It works for any other object and you will obtain as expected:

[[1]]
[1]

[[2]]
[1] "foo"

[[3]]
     [,1] [,2]
[1,]    5    5
[2,]    5    5

Finally, your function becomes:

push <- function(l, ...) c(l, list(...))

and it works for any type of object. You can be smarter and do:

push_back <- function(l, ...) c(l, list(...))
push_front <- function(l, ...) c(list(...), l)
2

I think what you want to do is actually pass by reference (pointer) to the function-- create a new environment (which are passed by reference to functions) with the list added to it:

listptr=new.env(parent=globalenv())
listptr$list=mylist

#Then the function is modified as:
lPtrAppend <- function(lstptr, obj) {
    lstptr$list[[length(lstptr$list)+1]] <- obj
}

Now you are only modifying the existing list (not creating a new one)

3
  • 1
    This appears to have quadratic time complexity again. The problem is obviously that list/vector resize is not implemented in the way it is usually implemented in most languages.
    – eold
    Commented Aug 31, 2011 at 22:12
  • Yes-- looks like the appending at the end is very slow-- probably b/c lists are recursive, and R is best at vector operations rather than loop type operations. Its much better to do:
    – DavidM
    Commented Sep 1, 2011 at 16:21
  • 1
    system.time(for(i in c(1:10000) mylist[i]=i) (a few seconds), or better yet do it all in one operation: system.time(mylist=list(1:100000)) (less than a second), then modifying that preallocated list with the for loop will also be faster.
    – DavidM
    Commented Sep 1, 2011 at 16:38
2

This is a straightforward way to add items to an R List:

# create an empty list:
small_list = list()

# now put some objects in it:
small_list$k1 = "v1"
small_list$k2 = "v2"
small_list$k3 = 1:10

# retrieve them the same way:
small_list$k1
# returns "v1"

# "index" notation works as well:
small_list["k2"]

Or programmatically:

kx = paste(LETTERS[1:5], 1:5, sep="")
vx = runif(5)
lx = list()
cn = 1

for (itm in kx) { lx[itm] = vx[cn]; cn = cn + 1 }

print(length(lx))
# returns 5
3
  • This isn't really appending. What if I have 100 objects and I want to append them to a list programmatically? R has an append() function, but it's really a concatenate function and it only works on vectors.
    – Nick
    Commented Mar 13, 2010 at 1:45
  • append() works on vectors and lists, and it is a true append (which is basically the same as concatenate, so I don't see what your problem is)
    – hadley
    Commented Mar 13, 2010 at 3:31
  • 9
    An append function should mutate an existing object, not create a new one. A true append would not have O(N^2) behavior.
    – Nick
    Commented Mar 13, 2010 at 6:30
1

There is also list.append from the rlist (link to the documentation)

require(rlist)
LL <- list(a="Tom", b="Dick")
list.append(LL,d="Pam",f=c("Joe","Ann"))

It's very simple and efficient.

3
  • 1
    doesnt look like R to me... Python?
    – JD Long
    Commented Aug 2, 2018 at 20:49
  • 1
    I made an edit and tried it out: It is freaking slow. Better use the c() or list-method. Both are way faster.
    – 5th
    Commented Sep 15, 2018 at 18:21
  • 1
    Looking a the code for rlist::list.append(), it's essentially a wrapper around base::c().
    – nbenn
    Commented Apr 12, 2019 at 12:00
0
> LL<-list(1:4)

> LL

[[1]]
[1] 1 2 3 4

> LL<-list(c(unlist(LL),5:9))

> LL

[[1]]
 [1] 1 2 3 4 5 6 7 8 9
2
  • 2
    I don't think this is the kind of appending the OP was looking for.
    – joran
    Commented Dec 3, 2011 at 1:55
  • This is not appending elements in a list. Here you are increasing the elements of the integer vector, which is the only element of the list. The list has only one element, an integer vector.
    – Sergio
    Commented Oct 11, 2016 at 23:33
0

This is a very interesting question and I hope my thought below could contribute an way of solution to it. This method do give a flat list without indexing, but it does have list and unlist to avoid the nesting structures. I'm not sure about the speed since I don't know how to benchmark it.

a_list<-list()
for(i in 1:3){
  a_list<-list(unlist(list(unlist(a_list,recursive = FALSE),list(rnorm(2))),recursive = FALSE))
}
a_list

[[1]]
[[1]][[1]]
[1] -0.8098202  1.1035517

[[1]][[2]]
[1] 0.6804520 0.4664394

[[1]][[3]]
[1] 0.15592354 0.07424637
1
  • What I want to add is that it does give a two level nested list, but that's it. The way how list and unlist working is not very clear to me, but this is the result by testing the code
    – xappppp
    Commented Apr 14, 2016 at 16:50
0

For validation I ran the benchmark code provided by @Cron. There is one major difference (in addition to running faster on the newer i7 processor): the by_index now performs nearly as well as the list_:

Unit: milliseconds
              expr        min         lq       mean     median         uq
    env_with_list_ 167.882406 175.969269 185.966143 181.817187 185.933887
                c_ 485.524870 501.049836 516.781689 518.637468 537.355953
             list_   6.155772   6.258487   6.544207   6.269045   6.290925
          by_index   9.290577   9.630283   9.881103   9.672359  10.219533
           append_ 505.046634 543.319857 542.112303 551.001787 553.030110
 env_as_container_ 153.297375 154.880337 156.198009 156.068736 156.800135

For reference here is the benchmark code copied verbatim from @Cron's answer (just in case he later changes the contents):

n = 1e+4
library(microbenchmark)
### Using environment as a container
lPtrAppend <- function(lstptr, lab, obj) {lstptr[[deparse(substitute(lab))]] <- obj}
### Store list inside new environment
envAppendList <- function(lstptr, obj) {lstptr$list[[length(lstptr$list)+1]] <- obj}

microbenchmark(times = 5,
        env_with_list_ = {
            listptr <- new.env(parent=globalenv())
            listptr$list <- NULL
            for(i in 1:n) {envAppendList(listptr, i)}
            listptr$list
        },
        c_ = {
            a <- list(0)
            for(i in 1:n) {a = c(a, list(i))}
        },
        list_ = {
            a <- list(0)
            for(i in 1:n) {a <- list(a, list(i))}
        },
        by_index = {
            a <- list(0)
            for(i in 1:n) {a[length(a) + 1] <- i}
            a
        },
        append_ = {
            a <- list(0)
            for(i in 1:n) {a <- append(a, i)}
            a
        },
        env_as_container_ = {
            listptr <- new.env(parent=globalenv())
            for(i in 1:n) {lPtrAppend(listptr, i, i)}
            listptr
        }
)
1
  • See my comment above. The list_ does not provide the same result as the others. So this comparison is not valid! Commented Oct 19, 2020 at 10:22
0

I ran the following benchmark:

bench=function(...,n=1,r=3){
  a=match.call(expand.dots=F)$...
  t=matrix(ncol=length(a),nrow=n)
  for(i in 1:length(a))for(j in 1:n){t1=Sys.time();eval(a[[i]],parent.frame());t[j,i]=Sys.time()-t1}
  o=t(apply(t,2,function(x)c(median(x),min(x),max(x),mean(x))))
  round(1e3*`dimnames<-`(o,list(names(a),c("median","min","max","mean"))),r)
}

ns=10^c(3:7)
m=sapply(ns,function(n)bench(n=5,
  `vector at length + 1`={l=c();for(i in 1:n)l[length(l)+1]=i},
  `vector at index`={l=c();for(i in 1:n)l[i]=i},
  `vector at index, initialize with type`={l=integer();for(i in 1:n)l[i]=i},
  `vector at index, initialize with length`={l=vector(length=n);for(i in 1:n)l[i]=i},
  `vector at index, initialize with type and length`={l=integer(n);for(i in 1:n)l[i]=i},
  `list at length + 1`={l=list();for(i in 1:n)l[[length(l)+1]]=i},
  `list at index`={l=list();for(i in 1:n)l[[i]]=i},
  `list at index, initialize with length`={l=vector('list',n);for(i in 1:n)l[[i]]=i},
  `list at index, initialize with double length, remove null`={l=vector("list",2*n);for(i in 1:n)l[[i]]=i;l=head(l,i)},
  `list at index, double when full, get length from variable`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==len){len=len*2;length(l)=len}};l=head(l,i)},
  `list at index, double when full, check length inside loop`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==length(l)){length(l)=i*2}};l=head(l,i)},
  `nested lists`={l=list();for(i in 1:n)l=list(l,i)},
  `nested lists with unlist`={if(n<=1e5){l=list();for(i in 1:n)l=list(l,i);o=unlist(l)}},
  `nested lists with manual unlist`={l=list();for(i in 1:n)l=list(l,i);o=integer(n);for(i in 1:n){o[n-i+1]=l[[2]];l=l[[1]]}},
  `JanKanis better_env_as_container`={env=new.env(hash=T,parent=globalenv());for(i in 1:n)env[[as.character(i)]]=i},
  `JanKanis inlineLinkedList`={a=list();for(i in 1:n)a=list(a,i);b=vector('list',n);head=a;for(i in n:1){b[[i]]=head[[2]];head=head[[1]]}},
  `JanKanis inlineExpandingList`={l=vector('list',10);cap=10;len=0;for(i in 1:n){if(len==cap){l=c(l,vector('list',cap));cap=cap*2};len=len+1;l[[len]]=i};l[1:len]},
  `c`={if(n<=1e5){l=c();for(i in 1:n)l=c(l,i)}},
  `append vector`={if(n<=1e5){l=integer(n);for(i in 1:n)l=append(l,i)}},
  `append list`={if(n<=1e9){l=list();for(i in 1:n)l=append(l,i)}}
)[,1])

m[rownames(m)%in%c("nested lists with unlist","c","append vector","append list"),4:5]=NA
m2=apply(m,2,function(x)formatC(x,max(0,2-ceiling(log10(min(x,na.rm=T)))),format="f"))
m3=apply(rbind(paste0("1e",log10(ns)),m2),2,function(x)formatC(x,max(nchar(x)),format="s"))
writeLines(apply(cbind(m3,c("",rownames(m))),1,paste,collapse=" "))

Output:

 1e3   1e4   1e5  1e6   1e7
2.35  24.5   245 2292 27146 vector at length + 1
0.61   5.9    60  590  7360 vector at index
0.61   5.9    64  587  7132 vector at index, initialize with type
0.56   5.6    54  523  6418 vector at index, initialize with length
0.54   5.5    55  522  6371 vector at index, initialize with type and length
2.65  28.8   299 3955 48204 list at length + 1
0.93   9.2    96 1605 13480 list at index
0.58   5.6    57  707  8461 list at index, initialize with length
0.62   5.8    59  739  9413 list at index, initialize with double length, remove null
0.88   8.4    81  962 11872 list at index, double when full, get length from variable
0.96   9.5    92 1264 15813 list at index, double when full, check length inside loop
0.21   1.9    22  426  3826 nested lists
0.25   2.4    29   NA    NA nested lists with unlist
2.85  27.5   295 3065 31427 nested lists with manual unlist
1.65  20.2   293 6505  8835 JanKanis better_env_as_container
1.11  10.1   110 1534 27119 JanKanis inlineLinkedList
2.66  26.3   266 3592 47120 JanKanis inlineExpandingList
1.22 118.6 15466   NA    NA c
3.64 512.0 45167   NA    NA append vector
6.35 664.8 71399   NA    NA append list

The table above shows the median time for each method and not the mean time, because occasionally a single run took much longer than a typical run which distorted the mean running time. But none of the methods became much faster on subsequent runs after the first run, so the minimum time and median time were typically similar for each method.

The method "vector at index" (l=c();for(i in 1:n)l[i]=i) was about 5 times faster than "vector at length + 1" (l=c();for(i in 1:n)l[length(l)]=i), because getting the length of the vector took longer than adding an element to the vector. When I initialized the vector with a predetermined length, it made the code about 20% faster, but initializing with a specific type didn't make a difference, because the type just needs to be changed once when the first item is added to the vector. And in the case of lists, when you compare the methods "list at index" and "list at index initialized with length", initializing the list with a predetermined length made a bigger difference as the length of the list increased, because it made the code about twice as fast at length 1e6 but about 3 times as fast at length 1e7.

The method "list at index" (l=list();for(i in 1:n)l[[i]]=i) was about 3-4 times faster than the method "list at length + 1" (l=list();for(i in 1:n)l[[length(l)+1]]=i).

The linked list and expanding list methods by JanKanis were slower than "list at index" but faster than "list at length + 1". The linked list was faster than the expanding list.

Some people claim that the append function is faster than the c function, but in my benchmark append was about 3-4 times slower than c.

In the table above, the lengths 1e6 and 1e7 and are missing for three methods: for "c", "append vector", and "append list" because they had quadratic time complexity, and for "nested lists with unlist" because it resulted in a stack overflow.

The "nested lists" option was the fastest, but it doesn't include the time that it takes to flatten the list. When I used the unlist function to flatten the nested list, I got a stack overflow when the length of the list was around 1.26e5 or higher, because the unlist function calls itself recursively by default: n=1.26e5;l=list();for(i in 1:n)l=list(l,list(i));u=unlist(l). And when I used repeated calls of unlist(recursive=F), it took about 4 seconds to run even for a list with only 10,000 items: for(i in 1:n)l=unlist(l,recursive=F). But when I did the unlisting manually, it only took about 0.3 seconds to run for a list with a million items: o=integer(n);for(i in 1:n){o[n-i+1]=l[[2]];l=l[[1]]}.

If you don't know how many items you are going to append to a list in advance but you know the maximum number of items, then you can try to initialize the list at the maximum length and then later remove NULL values. Or another approach is to double the size of the list every time the list becomes full (which you can do faster if you have one variable for the length of the list and another variable for the number of items you have added to the list, so then you don't have to check the length of the list object on each iteration of a loop):

ns=10^c(2:7)
m=sapply(ns,function(n)bench(n=5,
  `list at index`={l=list();for(i in 1:n)l[[i]]=i},
  `list at length + 1`={l=list();for(i in 1:n)l[[length(l)+1]]=i},
  `list at index, initialize with length`={l=vector("list",n);for(i in 1:n)l[[i]]=i},
  `list at index, initialize with double length, remove null`={l=vector("list",2*n);for(i in 1:n)l[[i]]=i;l=head(l,i)},
  `list at index, initialize with length 1e7, remove null`={l=vector("list",1e7);for(i in 1:n)l[[i]]=i;l=head(l,i)},
  `list at index, initialize with length 1e8, remove null`={l=vector("list",1e8);for(i in 1:n)l[[i]]=i;l=head(l,i)},
  `list at index, double when full, get length from variable`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==len){len=len*2;length(l)=len}};l=head(l,i)},
  `list at index, double when full, check length inside loop`={len=1;l=list();for(i in 1:n){l[[i]]=i;if(i==length(l)){length(l)=i*2}};l=head(l,i)}
)[,1])

m2=apply(m,2,function(x)formatC(x,max(0,2-ceiling(log10(min(x)))),format="f"))
m3=apply(rbind(paste0("1e",log10(ns)),m2),2,function(x)formatC(x,max(nchar(x)),format="s"))
writeLines(apply(cbind(m3,c("",rownames(m))),1,paste,collapse=" "))

Output:

  1e4 1e5  1e6   1e7
  9.3 102 1225 13250 list at index
 27.4 315 3820 45920 list at length + 1
  5.7  58  726  7548 list at index, initialize with length
  5.8  60  748  8057 list at index, initialize with double length, remove null
 33.4  88  902  7684 list at index, initialize with length 1e7, remove null
333.2 393 2691 12245 list at index, initialize with length 1e8, remove null
  8.6  83 1032 10611 list at index, double when full, get length from variable
  9.3  96 1280 14319 list at index, double when full, check length inside loop

Not the answer you're looking for? Browse other questions tagged or ask your own question.