Error "'DataFrame' object has no attribute 'append'"

Question

I am trying to append a dictionary to a DataFrame object, but I get the following error:

AttributeError: 'DataFrame' object has no attribute 'append'

As far as I know, DataFrame does have the method "append".

Code snippet:

df = pd.DataFrame(df).append(new_row, ignore_index=True)

I was expecting the dictionary new_row to be added as a new row.

How can I fix it?

I imagine you use pandas 2.0, please make it explicit in your question (and question title) — mozway, Commented Apr 7, 2023 at 7:10
Also give an example of new_row for clarity. You might need to use pd.DataFrame([new_row]) or pd.DataFrame(new_row) depending on the format. — mozway, Commented Apr 7, 2023 at 7:11

Peter Mortensen · Accepted Answer · 2023-08-13 22:58:54Z

As of pandas 2.0, append (previously deprecated) was removed.

You need to use concat instead (for most applications):

df = pd.concat([df, pd.DataFrame([new_row])], ignore_index=True)

As noted by @cottontail, it's also possible to use loc, although this only works if the new index is not already present in the DataFrame (typically, this will be the case if the index is a RangeIndex:

df.loc[len(df)] = new_row # only use with a RangeIndex!

Why was it removed?

We frequently see new users of pandas try to code like they would do it in pure Python. They use iterrows to access items in a loop (see here why you shouldn't), or append in a way that is similar to python list.append.

However, as noted in pandas' issue #35407, pandas's append and list.append are really not the same thing. list.append is in place, while pandas's append creates a new DataFrame:

I think that we should deprecate Series.append and DataFrame.append. They're making an analogy to list.append, but it's a poor analogy since the behavior isn't (and can't be) in place. The data for the index and values needs to be copied to create the result.

These are also apparently popular methods. DataFrame.append is around the 10th most visited page in our API docs.

Unless I'm mistaken, users are always better off building up a list of values and passing them to the constructor, or building up a list of NDFrames followed by a single concat.

As a consequence, while list.append is amortized O(1) at each step of the loop, pandas' append is O(n), making it inefficient when repeated insertion is performed.

What if I need to repeat the process?

Using append or concat repeatedly is not a good idea (this has a quadratic behavior as it creates a new DataFrame for each step).

In such case, the new items should be collected in a list, and at the end of the loop converted to DataFrame and eventually concatenated to the original DataFrame.

lst = []

for new_row in items_generation_logic:
    lst.append(new_row)

# create extension
df_extended = pd.DataFrame(lst, columns=['A', 'B', 'C'])
# or columns=df.columns if identical columns

# concatenate to original
out = pd.concat([df, df_extended])

when I collect the data in a list and THEN convert the data to a dataframe - then I do not see the benefit of pandas at all. I have the lists and can plot them etc — Alex, Commented Jan 25 at 7:43
@Alex that's absolutely true, if you can perform your workflow without pandas then just do so. Pandas is a convenience to manipulate data, but if you already have lists or arrays and no pandas operations to run, then don't use pandas! matplotlib for instance works very well with lists/dictionaries/arrays. — mozway, Commented Jan 25 at 7:50
I opened issue github.com/pandas-dev/pandas/issues/57849 to bring back append. This was a sensibly named method that did what its name implied. Performance concerns should have been addressed by improving append(). I am now forced to fix broken code that used to be perfectly fine. The fix makes the code more verbose and less readable. — Aren Cambre, Commented Mar 15 at 4:03
@ArenCambre I think the idea was to push people to refactor the code to avoid needing append. Could you provide a minimal reproducible example version of your code (open a new question) to see if it can be refactored? — mozway, Commented Mar 15 at 4:16
@ArenCambre btw, is downvoting this answer because you're upset that append was removed really relevant? ;) — mozway, Commented Mar 15 at 4:20

Peter Mortensen · Accepted Answer · 2023-08-13 23:03:50Z

72

Disclaimer: this answer seems to attract popularity, but the proposed approach should not be used. append was not changed to _append, _append is a private internal method and append was removed from pandas API. The claim "The append method in pandas look similar to list.append in Python. That's why append method in pandas is now modified to _append." is utterly incorrect. The leading _ only means one thing: the method is private and is not intended to be used outside of pandas' internal code.

In the new version of Pandas, the append method is changed to _append. You can simply use _append instead of append, i.e., df._append(df2).

df = df1._append(df2,ignore_index=True)

Why is it changed?

The append method in pandas looks similar to list.append in Python. That's why the append method in pandas is now modified to _append.

edited Aug 13, 2023 at 23:03

Peter Mortensen

31.3k22 gold badges109 silver badges132 bronze badges

answered Jun 11, 2023 at 6:55

Anubhav

8595 silver badges6 bronze badges

8

This is a private method and thus not part of the official API. It cannot be used reliably (the method could be changed or removed without notice). TBH, almost every time someone asked a question about append here, there was a better alternative than using it.
– mozway
Commented Jun 11, 2023 at 9:10
3

I added a disclaimer to the answer. We now start seeing question using _append on SO, which is a fully incorrect use of the API. _append is not intended to be part of the public API. append was already leading to bad code, please do not encourage other to do even worse…
– mozway
Commented Aug 3, 2023 at 13:22
3

@Nils I explained it in my answer, unlike list.append for which appending an item is in place and amortized, DataFrame.append was creating a new DataFrame for each step. Thus, in a loop the complexity is quadratic (1+2+3+4+...).
– mozway
Commented Oct 3, 2023 at 15:38
2

This is easier than switching to concat for fixing old code. Ty
– gumdropsteve
Commented Oct 12, 2023 at 7:18
3

64 upvotes on this, and 51 of those came after someone pointed out that it's completely wrong and added a disclaimer... I guess people really don't care about what is correct or proper practice, they just want to copy and paste something that seems to give the right result, until it breaks and they can go looking again. I wish people understood that programming is about thinking.
– Karl Knechtel
Commented Mar 15 at 6:40

| Show 3 more comments

cottontail · Accepted Answer · 2023-11-09 22:01:28Z

If you are enlarging a dataframe in a loop using DataFrame.append or concat or loc, consider rewriting your code to enlarge a Python list and construct a dataframe once. Sometimes, you may not even need pd.concat, you may just need a DataFrame constructor on a list of dicts.

A pretty common example of appending new rows to a dataframe is scraping data from a webpage and storing them a dataframe. In that case, instead of appending to a dataframe, literally just replace dataframe with a list and call pd.DataFrame() or pd.concat once at the end once. An example:

So instead of:

df = pd.DataFrame()       # <--- initial dataframe (doesn't have to be empty)
for url in ticker_list:
    data = pd.read_csv(url)
    df = df.append(data, ignore_index=True)  # <--- enlarge dataframe

use:

lst = []                  # <--- initial list (doesn't have to be empty; 
for url in ticker_list:   #                    could store the initial df)
    data = pd.read_csv(url)
    lst.append(data)                         # <--- enlarge list
df = pd.concat(lst)                          # <--- concatenate the frames

Data reading logic could be response data from an API, data scraped from a webpage, whatever, the code refactoring is really minimal. In the above example, we assumed that lst is a list of dataframes but if it were a list of dicts/lists etc. then we could use df = pd.DataFrame(lst) instead in the last line of code.

That said, if a single row is to be appended to a dataframe, loc could also do the job.

df.loc[len(df)] = new_row

With the loc call, the dataframe is enlarged with index label len(df), which makes sense only if the index is RangeIndex; RangeIndex is created by default if an explicit index is not passed to the dataframe constructor.

A working example:

df = pd.DataFrame({'A': range(3), 'B': list('abc')})
df.loc[len(df)] = [4, 'd']
df.loc[len(df)] = {'A': 5, 'B': 'e'}
df.loc[len(df)] = pd.Series({'A': 6, 'B': 'f'})

As pointed out by @mozway, enlarging a pandas dataframe has O(n^2) complexity because in each iteration, the entire dataframe has to be read and copied. The following perfplot shows the runtime difference relative to concatenation done once.¹ As you can see, both ways to enlarge a dataframe are much, much slower than enlarging a list and constructing a dataframe once (e.g. for a dataframe with 10k rows, concat in a loop is about 800 times slower and loc in a loop is about 1600 times slower).

¹ The code used to produce the perfplot:

import pandas as pd
import perfplot

def concat_loop(lst):
    df = pd.DataFrame(columns=['A', 'B'])
    for dic in lst:
        df = pd.concat([df, pd.DataFrame([dic])], ignore_index=True)
    return df.infer_objects()
    
def concat_once(lst):
    df = pd.DataFrame(columns=['A', 'B'])
    df = pd.concat([df, pd.DataFrame(lst)], ignore_index=True)
    return df.infer_objects()

def loc_loop(lst):
    df = pd.DataFrame(columns=['A', 'B'])
    for dic in lst:
        df.loc[len(df)] = dic
    return df


perfplot.plot(
    setup=lambda n: [{'A': i, 'B': 'a'*(i%5+1)} for i in range(n)],
    kernels=[concat_loop, concat_once, loc_loop],
    labels= ['concat in a loop', 'concat once', 'loc in a loop'],
    n_range=[2**k for k in range(16)],
    xlabel='Length of dataframe',
    title='Enlarging a dataframe in a loop',
    relative_to=1,
    equality_check=pd.DataFrame.equals);

I quoted your answer in mine to give some more details on when loc can be used. Let me know if you want to edit it (or feel free to do it). — mozway, Commented May 1, 2023 at 9:00
@mozway Thanks for the heads up. I'll come around to editing my post when I have the time. — cottontail, Commented May 1, 2023 at 18:44

Collectives™ on Stack Overflow

Error "'DataFrame' object has no attribute 'append'"

3 Answers 3

Why was it removed?

What if I need to repeat the process?

Not the answer you're looking for? Browse other questions tagged
python
pandas
dataframe
attributeerror
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Why was it removed?

What if I need to repeat the process?

Not the answer you're looking for? Browse other questions tagged pythonpandasdataframeattributeerror or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
dataframe
attributeerror
or ask your own question.