Insert a row to pandas dataframe

Question

I have a dataframe:

s1 = pd.Series([5, 6, 7])
s2 = pd.Series([7, 8, 9])

df = pd.DataFrame([list(s1), list(s2)],  columns =  ["A", "B", "C"])

   A  B  C
0  5  6  7
1  7  8  9

[2 rows x 3 columns]

and I need to add a first row [2, 3, 4] to get:

I've tried append() and concat() functions but can't find the right way how to do that.

How to add/insert series to dataframe?

note that it's better to use s1.values as opposed to list(s1) as you will be creating an entirely new list using list(s1). — acushner, Commented Jun 18, 2014 at 13:56
I don't understand why everyone loves pandas so much when something that should be so simple is such a pain in the ass and so slow. — MattCochrane, Commented Aug 2, 2017 at 9:27
@MattCochrane - Almost every time that I have found Pandas to be slow, I have found a different pandas method that is much faster later on or realised I was doing things weirdly backward. I find a lot of database functions like how you describe -I think that's due to the way database theory works, not down to Pandas specifically. I'm aware that there are other more specialised libraries that are faster for specific purposes, but few that do as much as broadly well as Pandas. If you / anyone has an alternate suggestion, I'd love to hear it! — ciaran haines, Commented Aug 2, 2022 at 10:25
@ciaranhaines I find pandas and numpy being just bandaids for the fact that python being a (very slow) interpreted language. There's only a handful of 'optimized' building blocks that they provide, versus the infinite scope of potential problems I regularly face. I spend countless time finding the right combination of those primitives that would do what I need, and more often than not I figure out that there isn't one. I can write an unvectorized loop to do the same in a fraction of my time, but it will run slow. Python is good only as a prototyping language. — Yakov Galka, Commented Sep 8, 2022 at 15:06

Martin · Accepted Answer · 2020-07-30 12:00:00Z

271

Just assign row to a particular index, using loc:

 df.loc[-1] = [2, 3, 4]  # adding a row
 df.index = df.index + 1  # shifting index
 df = df.sort_index()  # sorting by index

And you get, as desired:

See in Pandas documentation Indexing: Setting with enlargement.

edited Jul 30, 2020 at 12:00

Martin

1,0071 gold badge7 silver badges18 bronze badges

answered Jun 18, 2014 at 11:44

Piotr Migdal

12.4k9 gold badges67 silver badges89 bronze badges

2

If you don't want to set with enlargement, but insert inside the dataframe, have a look at stackoverflow.com/questions/15888648/…
– FooBar
Commented Jun 18, 2014 at 11:51
7

shifting index alternative: df.sort().reset_index(drop=True)
– Meloun
Commented Jun 18, 2014 at 11:56
5

df.sort is deprecated, use df.sort_index()
– GBGOLC
Commented Sep 20, 2017 at 13:30
41

I think df.loc[-1] = [2, 3, 4] # adding a row is a bit misleading, as -1 isn't the last row/element, as it is for Python arrays.
– flow2k
Commented Apr 24, 2019 at 2:05
9

If you don't want to do any re-sorting of the index you can just do df.loc[len(df)] = [2,3,4]. Of course this makes assumption that the last index in the frame would be len(df)-1. However most of the dataframes I work with are structured like this.
– Greg
Commented Nov 18, 2021 at 5:42

| Show 2 more comments

OfirD · Accepted Answer · 2023-12-12 22:13:28Z

Testing a few answers it is clear that using pd.concat() is more efficient for large dataframes.

Comparing the performance using dict and list, the list is more efficient, but for small dataframes, using a dict should be no problem and somewhat more readable.

1st - `pd.concat() + list`

%%timeit
df = pd.DataFrame(columns=['a', 'b'])
for i in range(10000):
    df = pd.concat([pd.DataFrame([[1,2]], columns=df.columns), df], ignore_index=True)

4.88 s ± 47.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

2nd - `pd.append() + dict` [removed as of v2.0.0]

%%timeit

df = pd.DataFrame(columns=['a', 'b'])
for i in range(10000):
    df = df.append({'a': 1, 'b': 2}, ignore_index=True)

10.2 s ± 41.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

3rd - `pd.DataFrame().loc + index operations`

%%timeit
df = pd.DataFrame(columns=['a','b'])
for i in range(10000):
    df.loc[-1] = [1,2]
    df.index = df.index + 1
    df = df.sort_index()

17.5 s ± 37.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

If one really has no choice, _append() can be used instead of the removed append() (see some discussion here). — OfirD, Commented Dec 12, 2023 at 22:17

smci · Accepted Answer · 2019-12-11 04:13:01Z

75

Not sure how you were calling concat() but it should work as long as both objects are of the same type. Maybe the issue is that you need to cast your second vector to a dataframe? Using the df that you defined the following works for me:

df2 = pd.DataFrame([[2,3,4]], columns=['A','B','C'])
pd.concat([df2, df])

edited Dec 11, 2019 at 4:13

smci

33.6k21 gold badges116 silver badges149 bronze badges

answered Jun 18, 2014 at 13:42

mgilbert

3,6154 gold badges25 silver badges39 bronze badges

4

Best answer ^ :)
– Cam.Davidson.Pilon
Commented Mar 16, 2019 at 0:48
3

Should not be this modified a bit to do the job correctly? I think that code by @mgilbert inserts row at 0 but we end up with two rows having index 0. I think line two needs to be modified to look like the one below pd.concat([df2, df]).reset_index(drop=True)
– The smell of roses
Commented Mar 24, 2021 at 9:03
1

One problem is if the row that we want to be inserted is pd.Series due to iloc. The solution is to use iloc with double bracket as shown in my answer.
– Muhammad Yasirroni
Commented Jul 31, 2022 at 7:45
2

@Thesmellofroses Or, better yet, pd.concat([df2, df], ignore_index=True)
– Antony Hatchkins
Commented Dec 23, 2022 at 18:08

Add a comment |

FooBar · Accepted Answer · 2014-06-18 11:48:48Z

One way to achieve this is

>>> pd.DataFrame(np.array([[2, 3, 4]]), columns=['A', 'B', 'C']).append(df, ignore_index=True)
Out[330]: 
   A  B  C
0  2  3  4
1  5  6  7
2  7  8  9

Generally, it's easiest to append dataframes, not series. In your case, since you want the new row to be "on top" (with starting id), and there is no function pd.prepend(), I first create the new dataframe and then append your old one.

ignore_index will ignore the old ongoing index in your dataframe and ensure that the first row actually starts with index 1 instead of restarting with index 0.

Typical Disclaimer: Cetero censeo ... appending rows is a quite inefficient operation. If you care about performance and can somehow ensure to first create a dataframe with the correct (longer) index and then just inserting the additional row into the dataframe, you should definitely do that. See:

>>> index = np.array([0, 1, 2])
>>> df2 = pd.DataFrame(columns=['A', 'B', 'C'], index=index)
>>> df2.loc[0:1] = [list(s1), list(s2)]
>>> df2
Out[336]: 
     A    B    C
0    5    6    7
1    7    8    9
2  NaN  NaN  NaN
>>> df2 = pd.DataFrame(columns=['A', 'B', 'C'], index=index)
>>> df2.loc[1:] = [list(s1), list(s2)]

So far, we have what you had as df:

>>> df2
Out[339]: 
     A    B    C
0  NaN  NaN  NaN
1    5    6    7
2    7    8    9

But now you can easily insert the row as follows. Since the space was preallocated, this is more efficient.

>>> df2.loc[0] = np.array([2, 3, 4])
>>> df2
Out[341]: 
   A  B  C
0  2  3  4
1  5  6  7
2  7  8  9

That's nice workarround solution, I was trying to insert series into dataframe. It's good enough for me at the moment. — Meloun, Commented Jun 18, 2014 at 11:43
I like most the last option. This truly matches what I really want to do. Thank you @FooBar! — Jade Cacho, Commented Dec 19, 2019 at 8:03

elPastor · Accepted Answer · 2017-09-21 22:34:42Z

22

I put together a short function that allows for a little more flexibility when inserting a row:

def insert_row(idx, df, df_insert):
    dfA = df.iloc[:idx, ]
    dfB = df.iloc[idx:, ]

    df = dfA.append(df_insert).append(dfB).reset_index(drop = True)

    return df

which could be further shortened to:

def insert_row(idx, df, df_insert):
    return df.iloc[:idx, ].append(df_insert).append(df.iloc[idx:, ]).reset_index(drop = True)

Then you could use something like:

df = insert_row(2, df, df_new)

where 2 is the index position in df where you want to insert df_new.

answered Sep 21, 2017 at 22:34

elPastor

8,74411 gold badges55 silver badges84 bronze badges

1

Note that 2 is a positional index here, not index in ordinary pandas meaning.
– Antony Hatchkins
Commented Dec 23, 2022 at 18:18

Add a comment |

Alex L · Accepted Answer · 2023-09-23 18:35:08Z

15

We can use numpy.insert. This has the advantage of flexibility. You only need to specify the index you want to insert to.

s1 = pd.Series([5, 6, 7])
s2 = pd.Series([7, 8, 9])

df = pd.DataFrame([list(s1), list(s2)],  columns =  ["A", "B", "C"])

pd.DataFrame(np.insert(df.values, 0, values=[2, 3, 4], axis=0), columns=df.columns)

    0   1   2
0   2   3   4
1   5   6   7
2   7   8   9

For np.insert(df.values, 0, values=[2, 3, 4], axis=0), 0 tells the function the place/index you want to place the new values.

edited Sep 23, 2023 at 18:35

Alex L

4,2231 gold badge10 silver badges28 bronze badges

answered Jan 15, 2018 at 23:09

Tai

7,9043 gold badges30 silver badges49 bronze badges

1

Good solution for this example. Generally, you lose column names this way though
– verwirrt
Commented Jun 23, 2022 at 10:08
1

you can modify it slightly so you don't lose column names, I've edited the answer to show that
– Alex L
Commented Sep 23, 2023 at 18:34

Add a comment |

David Golembiowski · Accepted Answer · 2020-07-11 01:23:51Z

13

It is pretty simple to add a row into a pandas DataFrame:

Create a regular Python dictionary with the same columns names as your Dataframe;
Use pandas.append() method and pass in the name of your dictionary, where .append() is a method on DataFrame instances;
Add ignore_index=True right after your dictionary name.

edited Jul 11, 2020 at 1:23

David Golembiowski

1654 silver badges16 bronze badges

answered Apr 28, 2020 at 8:11

Pepe

1371 silver badge3 bronze badges

3

This is probably the most preferable option (circa 2020).
– David Golembiowski
Commented Jun 25, 2020 at 0:12
3

This function doesn't have an inplace argument, so: df = df.append(your_dict, ignore_index=True)
– SocraticDatum
Commented Jan 15, 2021 at 21:12
3

append has been deprecated for a while now
– Fran Marzoa
Commented Jun 19, 2022 at 12:32

Add a comment |

Aaron Melgar · Accepted Answer · 2019-07-10 19:14:50Z

8

this might seem overly simple but its incredible that a simple insert new row function isn't built in. i've read a lot about appending a new df to the original, but i'm wondering if this would be faster.

df.loc[0] = [row1data, blah...]
i = len(df) + 1
df.loc[i] = [row2data, blah...]

answered Jul 10, 2019 at 19:14

Aaron Melgar

3623 silver badges7 bronze badges

Did you mean "appending a new df" or just "appending a new row", as your code shows?
– smci
Commented Dec 11, 2019 at 6:52
sorry my sentence wasn't clear. i've read other people solutions that concat/append a whole new dataframe with just a single row. but in my solution its just a single row in the existing dataframe no need for an additional dataframe to be created
– Aaron Melgar
Commented Jan 15, 2020 at 19:15
OP wanted to insert a row at position 0. Your code overwrites whatever there is at position 0.
– Antony Hatchkins
Commented Dec 23, 2022 at 18:23

Add a comment |

Sagar Rathod · Accepted Answer · 2019-10-01 09:55:55Z

7

Below would be the best way to insert a row into pandas dataframe without sorting and reseting an index:

import pandas as pd

df = pd.DataFrame(columns=['a','b','c'])

def insert(df, row):
    insert_loc = df.index.max()

    if pd.isna(insert_loc):
        df.loc[0] = row
    else:
        df.loc[insert_loc + 1] = row

insert(df,[2,3,4])
insert(df,[8,9,0])
print(df)

edited Oct 1, 2019 at 9:55

answered Apr 8, 2019 at 4:16

Sagar Rathod

5528 silver badges14 bronze badges

why would you say this is the best way?
– Yuca
Commented Apr 12, 2019 at 16:44
1

then it would be nice to provide evidence to support that claim, did you time it?
– Yuca
Commented Apr 15, 2019 at 11:42
2

you can use pd.isna to avoid importing numpy
– kato2
Commented Jun 14, 2019 at 21:13

Add a comment |

M. Viaz · Accepted Answer · 2020-05-18 13:52:48Z

concat() seems to be a bit faster than last row insertion and reindexing. In case someone would wonder about the speed of two top approaches:

In [x]: %%timeit
     ...: df = pd.DataFrame(columns=['a','b'])
     ...: for i in range(10000):
     ...:     df.loc[-1] = [1,2]
     ...:     df.index = df.index + 1
     ...:     df = df.sort_index()

17.1 s ± 705 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [y]: %%timeit
     ...: df = pd.DataFrame(columns=['a', 'b'])
     ...: for i in range(10000):
     ...:     df = pd.concat([pd.DataFrame([[1,2]], columns=df.columns), df])

6.53 s ± 127 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Steven · Accepted Answer · 2021-02-15 08:10:10Z

It just came up to me that maybe T attribute is a valid choice. Transpose, can get away from the somewhat misleading df.loc[-1] = [2, 3, 4] as @flow2k mentioned, and it is suitable for more universal situation such as you want to insert [2, 3, 4] before arbitrary row, which is hard for concat(),append() to achieve. And there's no need to bare the trouble defining and debugging a function.

a = df.T
a.insert(0,'anyName',value=[2,3,4])
# just give insert() any column name you want, we'll rename it.
a.rename(columns=dict(zip(a.columns,[i for i in range(a.shape[1])])),inplace=True)
# set inplace to a Boolean as you need.
df=a.T
df

    A   B   C
0   2   3   4
1   5   6   7
2   7   8   9

I guess this can partly explain @MattCochrane 's complaint about why pandas doesn't have a method to insert a row like insert() does.

Alessio Pan · Accepted Answer · 2022-04-05 11:26:43Z

3

Create empty df with columns name:

df = pd.DataFrame(columns = ["A", "B", "C"])

Insert new row:

df.loc[len(df.index)] = [2, 3, 4]
df.loc[len(df.index)] = [5, 6, 7]
df.loc[len(df.index)] = [7, 8, 9]

answered Apr 5, 2022 at 11:26

Alessio Pan

312 bronze badges

Add a comment |

Muhammad Yasirroni · Accepted Answer · 2022-12-25 07:36:59Z

2

For those that want to concat a row from the previous data frame, use double bracket ([[...]]) for iloc.

s1 = pd.Series([5, 6, 7])
s2 = pd.Series([7, 8, 9])

df = pd.DataFrame([list(s1), list(s2)],  columns =  ["A", "B", "C"])

#   A   B   C
# 0 5   6   7
# 1 7   8   9

pd.concat((df.iloc[[0]],  # [[...]] used to slice DataFrame as DataFrame
           df), ignore_index=True)

#   A   B   C
# 0 5   6   7
# 1 5   6   7
# 2 7   8   9

For duplicating or replicating arbitrary times, combine with star.

pd.concat((df.iloc[[0]],
           df,
           *[df.iloc[[1]]] * 4), ignore_index=True)

#   A   B   C
# 0 5   6   7
# 1 7   8   9
# 2 7   8   9
# 3 7   8   9
# 4 7   8   9

edited Dec 25, 2022 at 7:36

answered Jul 31, 2022 at 7:42

Muhammad Yasirroni

1,97716 silver badges23 bronze badges

What's the benefit of using a nested concat here?
– Antony Hatchkins
Commented Dec 23, 2022 at 18:12
@AntonyHatchkins apparently, there is none. You can remove the concat inside and it just works. I might be use that due to copy paste from the previous section. Might edit and delete that after testing to make sure.
– Muhammad Yasirroni
Commented Dec 24, 2022 at 21:59

Add a comment |

Xinyi Li · Accepted Answer · 2020-04-15 03:16:26Z

1

You can simply append the row to the end of the DataFrame, and then adjust the index.

For instance:

df = df.append(pd.DataFrame([[2,3,4]],columns=df.columns),ignore_index=True)
df.index = (df.index + 1) % len(df)
df = df.sort_index()

Or use concat as:

df = pd.concat([pd.DataFrame([[1,2,3,4,5,6]],columns=df.columns),df],ignore_index=True)

answered Apr 15, 2020 at 3:16

Xinyi Li

9328 silver badges10 bronze badges

Add a comment |

Ehsan Akbaritabar · Accepted Answer · 2021-09-09 13:18:33Z

1

Do as following example:

a_row = pd.Series([1, 2])

df = pd.DataFrame([[3, 4], [5, 6]])

row_df = pd.DataFrame([a_row])

df = pd.concat([row_df, df], ignore_index=True)

and the result is:

answered Sep 9, 2021 at 13:18

Ehsan Akbaritabar

5594 silver badges13 bronze badges

Add a comment |

ZTang · Accepted Answer · 2022-06-08 14:59:24Z

s1 = pd.Series([5, 6, 7])
s2 = pd.Series([7, 8, 9])

df = pd.DataFrame([list(s1), list(s2)],  columns =  ["A", "B", "C"])

To insert a new row anywhere, you can specify the row position: row_pos = -1 for inserting at the top or row_pos = 0.5 for inserting between row 0 and row 1.

row_pos = -1
insert_row = [2,3,4]

df.loc[row_pos] = insert_row
df = df.sort_index()
df = df.reset_index(drop = True)

row_pos = -1

The outcome is:

    A   B   C
0   2   3   4
1   5   6   7
2   7   8   9

row_pos = 0.5

The outcome is:

    A   B   C
0   5   6   7
1   2   3   4
2   7   8   9

mins · Accepted Answer · 2024-05-14 13:28:16Z

1

Assuming the index is a default index with integer values starting at 0:

import pandas as pd

data = [[5, 6, 7], [7, 8, 9]]
df = pd.DataFrame(data, columns=list('ABC'))
row = [2, 3, 4]

# Inset new row
df.loc[-1] = row
df = df.sort_index()
df.index = range(len(df))

print(df)

Adjust df.loc[-1] for any position in the original index.

answered May 14 at 13:28

mins

7,20413 gold badges63 silver badges83 bronze badges

Add a comment |

Xin Niu · Accepted Answer · 2022-04-21 14:35:34Z

Give the data structure of dataframe of pandas is a list of series (each series is a column), it is convenient to insert a column at any position. So one idea I came up with is to first transpose your data frame, insert a column, and transpose it back. You may also need to rename the index (row names), like this:

s1 = pd.Series([5, 6, 7])
s2 = pd.Series([7, 8, 9])

df = pd.DataFrame([list(s1), list(s2)],  columns =  ["A", "B", "C"])
df = df.transpose()
df.insert(0, 2, [2,3,4])
df = df.transpose()
df.index = [i for i in range(3)]
df

    A   B   C
0   2   3   4
1   5   6   7
2   7   8   9

m02ph3u5 · Accepted Answer · 2020-04-28 10:07:51Z

-3

The simplest way add a row in a pandas data frame is:

DataFrame.loc[ location of insertion ]= list( )

Example :

DF.loc[ 9 ] = [ ´Pepe’ , 33, ´Japan’ ]

NB: the length of your list should match that of the data frame.

edited Apr 28, 2020 at 10:07

m02ph3u5

3,1527 gold badges40 silver badges56 bronze badges

answered Apr 28, 2020 at 9:21

Pepe

1371 silver badge3 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Insert a row to pandas dataframe

19 Answers 19

1st - `pd.concat() + list`

2nd - `pd.append() + dict` [removed as of v2.0.0]

3rd - `pd.DataFrame().loc + index operations`

Not the answer you're looking for? Browse other questions tagged
python
pandas
dataframe
insert
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

19 Answers 19

1st - pd.concat() + list

2nd - pd.append() + dict [removed as of v2.0.0]

3rd - pd.DataFrame().loc + index operations

Not the answer you're looking for? Browse other questions tagged pythonpandasdataframeinsert or ask your own question.

Linked

Related

1st - `pd.concat() + list`

2nd - `pd.append() + dict` [removed as of v2.0.0]

3rd - `pd.DataFrame().loc + index operations`

Not the answer you're looking for? Browse other questions tagged
python
pandas
dataframe
insert
or ask your own question.