1
\$\begingroup\$

I have two dataframes: One contains of company and its corresponding texts. The texts are in lists

**supplier_company_name   Main_Text**

JDA SOFTWARE          ['Supply chains','The answer is simple -RunJDA!']

PTC                    ['Hello', 'Solution']

The second dataframe is texts extracted from the company's website.

      Company            Text   
0   JDA SOFTWARE    About | JDA Software    
1   JDA SOFTWARE    833.JDA.4ROI
2   JDA SOFTWARE    Contact Us
3   JDA SOFTWARE    Customer Support    
4   PTC             Training    
5   PTC             Partner Advantage

I want to create the new column in second dataframe if the text extracted from the web matches with the any item inside the list in the Main_Text column of the first data frame, fill True else fill False.

Code:

target = []
for x in tqdm(range(len(df['supplier_company_name']))): #company name in df1
    #print(x)
    for y in range(len(samp['Company']): #company name in df2
        if samp['Company'][y] == df['supplier_company_name'][x]: #if the company name matches
            #check if the text matches
            if samp['Company'][y] in df['Main_Text'][x]:
                target.append(True)
            else:
                target.append(False)

How can I change my code to run efficiently?

\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

I’ll take the hypothesis that your first dataframe (df) has unique company names. If so, you can easily reindex it by said company name and extract the (only one left) Main_Text Series to make it pretty much like a good old dict:

main_text = df.set_index('supplier_company_name')['Main_Text']

Now we just need to iterate over each line in samp, fetch the main text corresponding to the first column and generate a truthy value based on that and the second column. This is a job for apply:

target = samp.apply(lambda row: row[1] in main_text.loc[row[0]], axis=1)
\$\endgroup\$

Not the answer you're looking for? Browse other questions tagged or ask your own question.