1

As seen in my previous question

Rename columns regex, keep name if no match

Why is there a different output of the regex?

data = {'First_Column': [1,2,3], 'Second_Column': [1,2,3], 
        '\First\Mid\LAST.Ending': [1,2,3], 'First1\Mid1\LAST1.Ending': [1,2,3]}

df = pd.DataFrame(data)

     First_Column   Second_Column   \First\Mid\LAST.Ending  First1\Mid1\LAST1.Ending

pd.str.extract()

df.columns.str.extract(r'([^\\]+)\.Ending')   

    0
0   NaN
1   NaN
2   LAST
3   LAST1

re.search()

col = df.columns.tolist()
for i in col[2:]:
    print(re.search(r'([^\\]+)\.Ending', i).group())

LAST.Ending
LAST1.Ending

THX

1 Answer 1

1

From pandas.Series.str.extract docs

Extract capture groups in the regex pat as columns in a DataFrame.

It returns the capture group. Whereas, re.search with group() or group(0) returns the whole match, but if you change to group(1) it will return the capture group 1.

This will return full match:

 for i in col[2:]:
    print(re.search(r'([^\\]+)\.Ending', i).group())

LAST.Ending
LAST1.Ending

This will return only the capture group:

 for i in col[2:]:
    print(re.search(r'([^\\]+)\.Ending', i).group(1))

LAST
LAST1

Further read Link

Not the answer you're looking for? Browse other questions tagged or ask your own question.