1

i'm trying to automate a program to scrap periodically some prices from amazon and other pages. (I'm starting with amazon)

The problem is when i do the soup.find method with PyCharm, it finds his target and returns-it correctly and with the windows terminal it returns: None

I have the code running well from PyCharm, but i need it running from the windows terminal to automate it thought a .bat file.

I find that's a very strange issue and I could't find documentation about it so if any of you could help me with it It would be awesome!

There's some things I've tried so they are discarded.

  • Uninstalling and reinstalling bs4
  • Verify the installation of all the modules needed
  • Verify that the windows is running the program in the same folder as PyCharm
  • Point out that this issue doesn't always happens (i built a logreport and it shows me that from 12 webs it fails at 10, 11 or 12 (not a fixed number too) That only happens running from terminal)
  • Response is <Response [200]> with both cases

I've compared the soup it gets with PyCharm and Windows and are different soups, in the Windows one i couldn't find manually the text words.

Finally I'm putting here the code I'm using so you can see what i'm seeing:

import time
import requests
from bs4 import BeautifulSoup
import pandas as pd
import os
from csv import writer
from datetime import date, datetime
from tqdm import tqdm

def r_Amazon(URL):
    headers = {
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept-Language': 'es-ES,es;q=0.8',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36'
    }
    response = requests.get(URL, headers=headers)
    soup = BeautifulSoup(response.content, 'lxml')
    # Comprobar si el Item esta en Stock o Solo Segunda Mano
    get_error = 0
    try:
        product_price_stat = soup.find('span', {'class': 'a-text-bold'}).text.strip() #<- HERE IT FAILS

        if product_price_stat == 'Comprar de segunda mano' or product_price_stat == 'Ofertas destacadas no disponibles':
            # El item tiene el precio de 2a MANO, utilizar script correspondiente
            try:
                product_price = 'ND'
                get_error = 1
            except:
                print('ERROR 2nd TRY')
                get_error = 1
        else:
            # El item tiene el precio NORMAL, utilizar script correspondiente
            try:
                product_price = soup.find('span', {'class': 'a-offscreen'}).text.strip()
                # Format Correctly
                product_price = product_price.replace('.', '')
                product_price = product_price.replace(',', '.')
                product_price = product_price.replace('€', '')
            except:
                print('ERROR 1rs TRY')
                get_error = 1
    except:
        product_price = 'ND'
        print('ERROR')
        get_error = 1
    return product_price, get_error
0

1 Answer 1

1

It is neither the problem with code or terminal, it is just that amazon is not letting you do scarping because it think that you are a robot(YES even if you use Header most of the time amazom can detect it).

If you try to print the soup in the function (at the time of error you will g et this)

Enter the characters you see below

Sorry, we just need to make sure you're not a robot. For best results, please make sure your browser is accepting cookies.

I find my self in this same mess in past, I recommend you to use Selenium to get the content of webpage instead of request.

This is how you can do it

import time
# import requests
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
import pandas as pd
import os
from csv import writer
from datetime import date, datetime
from tqdm import tqdm

def r_Amazon(URL):
    chrome_options = Options()
    chrome_options.add_argument('--headless') # using this so that the browser will open in background
    driver = webdriver.Chrome(options=chrome_options)
    driver.get(URL)
    soup = BeautifulSoup(driver.page_source, 'lxml')
    driver.quit()
    # Comprobar si el Item esta en Stock o Solo Segunda Mano
    get_error = 0
    try:
        product_price_stat = soup.find('span', {'class': 'a-text-bold'}).text.strip() #<- HERE IT FAILS

        if product_price_stat == 'Comprar de segunda mano' or product_price_stat == 'Ofertas destacadas no disponibles':
            # El item tiene el precio de 2a MANO, utilizar script correspondiente
            try:
                product_price = 'ND'
                get_error = 1
            except:
                print('ERROR 2nd TRY')
                get_error = 1
        else:
            # El item tiene el precio NORMAL, utilizar script correspondiente
            try:
                
                product_price = soup.find('span', {'class': 'a-offscreen'}).text.strip()
                # Format Correctly
                product_price = product_price.replace('.', '')
                product_price = product_price.replace(',', '.')
                product_price = product_price.replace('€', '')
            except:
                print('ERROR 1rs TRY')
                get_error = 1
    except:
        product_price = 'ND'
        print('ERROR')
        get_error = 1
    os.system("cls" if os.name == 'nt' else "clear") # clear you screen before returning the output
    return product_price, get_error


Make sure to install selenium using pip install selenium


Why does Amazon do it? Most probably because they have their own API for scraping, so they don't want us to do it for free.

Also one more thing, YOu can do this with only selenium too

2
  • Thanks for that, i've tried selenium with amazon in the past but i got the captcha. I'll try again and see the results! The strange thing about this is that I got 0 problems trying with PyCharm but with the same code it didn't work with windows terminal. Commented Jul 6 at 13:09
  • @OscarTarrago That will work in your terminal too, but probablity of that code not running for amazon is high
    – Sharim09
    Commented Jul 6 at 14:47

Not the answer you're looking for? Browse other questions tagged or ask your own question.