Different ouput for pd.str.extract() and re.search()

Question

As seen in my previous question

Rename columns regex, keep name if no match

Why is there a different output of the regex?

data = {'First_Column': [1,2,3], 'Second_Column': [1,2,3], 
        '\First\Mid\LAST.Ending': [1,2,3], 'First1\Mid1\LAST1.Ending': [1,2,3]}

df = pd.DataFrame(data)

     First_Column   Second_Column   \First\Mid\LAST.Ending  First1\Mid1\LAST1.Ending

pd.str.extract()

df.columns.str.extract(r'([^\\]+)\.Ending')   

    0
0   NaN
1   NaN
2   LAST
3   LAST1

re.search()

col = df.columns.tolist()
for i in col[2:]:
    print(re.search(r'([^\\]+)\.Ending', i).group())

LAST.Ending
LAST1.Ending

THX

ManojK · Accepted Answer · 2020-03-25 09:30:16Z

From pandas.Series.str.extract docs

Extract capture groups in the regex pat as columns in a DataFrame.

It returns the capture group. Whereas, re.search with group() or group(0) returns the whole match, but if you change to group(1) it will return the capture group 1.

This will return full match:

 for i in col[2:]:
    print(re.search(r'([^\\]+)\.Ending', i).group())

LAST.Ending
LAST1.Ending

This will return only the capture group:

 for i in col[2:]:
    print(re.search(r'([^\\]+)\.Ending', i).group(1))

LAST
LAST1

Further read Link

Collectives™ on Stack Overflow

Different ouput for pd.str.extract() and re.search()

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
regex
python-3.x
pandas
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged regexpython-3.xpandas or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
regex
python-3.x
pandas
or ask your own question.