5

I have the following DataFrame:

no word status
0 0 one to_check
1 1 two to_check
2 2 :) emoticon
3 3 dr. to_check
4 4 "future" to_check
5 5 to to_check
6 6 be to_check

I want to iterate trough each row to find quotes at word initial and final positions and create a DataFrame like this:

no word status
0 0 one to_check
1 1 two to_check
2 2 :) emoticon
3 3 dr. to_check
4 4 " quotes
5 4 future word
6 4 " quotes
7 5 to to_check
8 6 be to_check

I can strip quotes and split the word into three pieces but I got the this DataFrame, it overwrites the last two rows:

no word status
0 0 one to_check
1 1 two to_check
2 2 :) emoticon
3 3 dr. to_check
4 4 " quotes
5 4 future word
6 4 " quotes

I tried df.loc[index], df.iloc[index], df.at[index] but none of them helped me to extend the number of rows in the DataFrame.

Is it possible to add new rows at specific index without overwriting last two rows?

2

3 Answers 3

6

In your case you can split then explode

out = df.assign(word = df.word.str.split(r'(\")')).explode('word').\
           loc[lambda x : x['word']!='']
   no    word    status
0   0     one  to_check
1   1     two  to_check
2   2      :)  emoticon
3   3     dr.  to_check
4   4       "  to_check
4   4  future  to_check
4   4       "  to_check
5   5      to  to_check
6   6      be  to_check

For change the status

out['status'] = np.where(out['word'].eq('"'), 'quotes',out['status'])
1
  • This is beautiful! You must change status as well though.
    – haneulkim
    Commented Sep 14, 2021 at 1:50
0

you try creating this function.

def InsertRow(df, idx):
    df1 = df[0:idx]
    df2 = df[idx:]

    arr_empty = np.array([idx-1, '\"', 'quotes'])

    df1.iloc[-1]=arr_empty
    df = pd.concat([df1, df2])

    return df
1
  • As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.
    – Community Bot
    Commented Sep 14, 2021 at 2:23
0

This is not very efficient however I think it is very readable, maybe you could use it if you don't care too much of efficiency.

no_lst = list()
word_lst = list()
status_lst = list()

def check_quote(w):
    if w.startswith('"') and w.endswith('"'):
        return True
    else:
        return False

for i, row in enumerate(df.itertuples()):
    word = getattr(row, "word")
    status = getattr(row, "status")
    
    if check_quote(word):
        no_lst += [i,i,i]
        stripped_w = word.strip('"')
        
        word_lst.append('"')
        word_lst.append(stripped_w)
        word_lst.append('"')
        
        status_lst += ["quotes", "word", "quotes"]
        continue
        
    no_lst.append(i)
    word_lst.append(word)
    status_lst.append(status)

new_df = pd.DataFrame({"no":no_lst,
                   "word":word_lst,
                   "status":status_lst})

Not the answer you're looking for? Browse other questions tagged or ask your own question.