6

So I have a problem that I have been noticing with selenium when I run it headless where some pages don't totally load/render some elements. I don't exactly know what's happening not to load 100%; maybe JS not running?

My code:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from decouple import config
from time import sleep

DEBUG = config('DEBUG')

class DiscordME(object):
    def __init__(self):
        self.LINUX = config('LINUX', cast=bool)
        self.DRIVER_VERSION = config('DRIVER_VERSION')
        self.HEADLESS = True

        options = Options()
        options.add_argument('--no-sandbox')
        options.add_argument('--disable-gpu')
        options.add_argument('--ignore-certificate-errors')
        options.add_argument('--disable-extensions')
        options.add_argument('--disable-dev-shm-usage')
        if self.HEADLESS:
            options.add_argument('--headless')
            options.add_argument('--window-size=1920,1200')

        if self.LINUX:
            self.browser = webdriver.Chrome(executable_path=f'./drivers/chromedriver-{self.DRIVER_VERSION}', options=options)
        else:
            self.browser = webdriver.Chrome(executable_path=f'.\drivers\chromedriver-{self.DRIVER_VERSION}.exe', options=options)

    def get_website(self):
        self.browser.get('https://discord.me/login')
        WebDriverWait(self.browser, 10).until(
            EC.url_changes('https://discord.me/login')
        )
        print(self.browser.current_url)
        print(self.browser.page_source)
        #print(self.browser.find_element_by_xpath('//*[@id="app-mount"]/div[2]/div/div[2]/div/div/form/div/div/div[1]/div[3]/div[1]/div/div[2]/input'))

DiscordME().get_website()

In this script, it doesn't load the login inputs when it accesses the discord API login page. As I can see in the page_source I noticed that the page is not being mounted so that could be the problem.

3 Answers 3

22
from selenium import webdriver
from time import sleep

options = webdriver.ChromeOptions()
options.add_argument("--window-size=1920,1080")
options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument(
    "user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36")
browser = webdriver.Chrome(options=options)

some websites uses user-agent to detect whether the browser is in headless mode or not as headless browser uses a different user-agent than normal browser. So explicitly set user agent.

Headless browser detection

1
  • 1
    Copy and pasting your user-agent argument worked for me in Ubuntu linux even though the version of chrome driver I am using is 92.0.4515.107.
    – EliSquared
    Commented Aug 11, 2021 at 21:15
2

Another thing to to consider if you are having trouble loading a website with selenium is the processing power.

I was using a Micro AWS instance with a single CPU which worked for many websites, but when I came to a more complex one it kept intermittently getting 0 elements when conducting a search like find_elements_by_xpath('//a[@href]') while sometimes it would work successfully and find the hyperlinks. I upgraded the instance to one with more CPUs (4, but 2 would probably have been sufficient) and that allowed me to fully load the site and scrape the elements.

I would definitely try the other two solutions posted here first (chrome options or firefox browser), but processing power could be the problem as well.

1

I Just would like to share my experience on this as solving the issue consumed much of my time trying many options and settings for Chrome webdriver.

The user-agent setting solved the problem for some websites I scraped. but, for some other websites the only solution worked with me was to use FireFox webdriver instead of Chrome as per following :

from selenium import webdriver
from selenium.webdriver.firefox.options import Options

fireFoxOptions = Options()  
fireFoxOptions.add_argument("--headless") 
fireFoxOptions.add_argument("--window-size=1920,1080")
fireFoxOptions.add_argument('--start-maximized')
fireFoxOptions.add_argument('--disable-gpu')
fireFoxOptions.add_argument('--no-sandbox')

driver = webdriver.Firefox(options=fireFoxOptions, 
executable_path=r'C:\[your path to firefox webdriver exe file]\geckodriver.exe')

driver.get('https://discord.me/login')

Use the link here to download latest geckodriver for FireFox, and make sure FireFox browser is already installed in you machine.

1
  • I was running chrome driver v 92.0.4515.107 and I could not render some websites with --headless so I had to install Firefox and use it as a backup driver. So far Chrome + Firefox has worked for 100% of use cases.
    – EliSquared
    Commented Sep 1, 2021 at 19:53

Not the answer you're looking for? Browse other questions tagged or ask your own question.