Split a string in pandas row and insert new rows by enlarging the dataframe

Question

I have the following DataFrame:

	no	word	status
0	0	one	to_check
1	1	two	to_check
2	2	:)	emoticon
3	3	dr.	to_check
4	4	"future"	to_check
5	5	to	to_check
6	6	be	to_check

I want to iterate trough each row to find quotes at word initial and final positions and create a DataFrame like this:

	no	word	status
0	0	one	to_check
1	1	two	to_check
2	2	:)	emoticon
3	3	dr.	to_check
4	4	"	quotes
5	4	future	word
6	4	"	quotes
7	5	to	to_check
8	6	be	to_check

I can strip quotes and split the word into three pieces but I got the this DataFrame, it overwrites the last two rows:

	no	word	status
0	0	one	to_check
1	1	two	to_check
2	2	:)	emoticon
3	3	dr.	to_check
4	4	"	quotes
5	4	future	word
6	4	"	quotes

I tried df.loc[index], df.iloc[index], df.at[index] but none of them helped me to extend the number of rows in the DataFrame.

Is it possible to add new rows at specific index without overwriting last two rows?

Did you try any of these: stackoverflow.com/questions/24284342/… ? — Chaos_Is_Harmony, Commented Sep 14, 2021 at 1:31

BENY · Accepted Answer · 2021-09-14 02:37:44Z

6

In your case you can split then explode

out = df.assign(word = df.word.str.split(r'(\")')).explode('word').\
           loc[lambda x : x['word']!='']
   no    word    status
0   0     one  to_check
1   1     two  to_check
2   2      :)  emoticon
3   3     dr.  to_check
4   4       "  to_check
4   4  future  to_check
4   4       "  to_check
5   5      to  to_check
6   6      be  to_check

For change the status

out['status'] = np.where(out['word'].eq('"'), 'quotes',out['status'])

edited Sep 14, 2021 at 2:37

answered Sep 14, 2021 at 1:38

BENY

322k22 gold badges173 silver badges247 bronze badges

This is beautiful! You must change status as well though.
– haneulkim
Commented Sep 14, 2021 at 1:50

Add a comment |

Hanseo Park · Accepted Answer · 2021-09-14 01:49:16Z

0

you try creating this function.

def InsertRow(df, idx):
    df1 = df[0:idx]
    df2 = df[idx:]

    arr_empty = np.array([idx-1, '\"', 'quotes'])

    df1.iloc[-1]=arr_empty
    df = pd.concat([df1, df2])

    return df

answered Sep 14, 2021 at 1:49

Hanseo Park

112 bronze badges

As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.
– Community Bot
Commented Sep 14, 2021 at 2:23

Add a comment |

haneulkim · Accepted Answer · 2021-09-14 01:54:46Z

This is not very efficient however I think it is very readable, maybe you could use it if you don't care too much of efficiency.

no_lst = list()
word_lst = list()
status_lst = list()

def check_quote(w):
    if w.startswith('"') and w.endswith('"'):
        return True
    else:
        return False

for i, row in enumerate(df.itertuples()):
    word = getattr(row, "word")
    status = getattr(row, "status")
    
    if check_quote(word):
        no_lst += [i,i,i]
        stripped_w = word.strip('"')
        
        word_lst.append('"')
        word_lst.append(stripped_w)
        word_lst.append('"')
        
        status_lst += ["quotes", "word", "quotes"]
        continue
        
    no_lst.append(i)
    word_lst.append(word)
    status_lst.append(status)

new_df = pd.DataFrame({"no":no_lst,
                   "word":word_lst,
                   "status":status_lst})

Collectives™ on Stack Overflow

Split a string in pandas row and insert new rows by enlarging the dataframe

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
python
pandas
dataframe
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Not the answer you're looking for? Browse other questions tagged pythonpandasdataframe or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
pandas
dataframe
or ask your own question.