4
\$\begingroup\$

Use case - motivation & challenge

Hi all! I have been working with Python for the last two years, but never learned proper object-oriented programming and design patterns. I've decided for this year to close this gap by reading some books and applying the knowledge to a real-world problem. I am looking forward to learning a lot from all the suggestions :)

To kick off my learning, I've decided to automate a recurring weekly task of filling some timesheets located in Microsoft Teams, using a bot to do the heavy lifting for me. The bot should perform the following steps:

  • Navigate to the login page
  • Fill in username and password
  • Sign in
  • Navigate to the excel page with the timesheet
  • Fill in my weekly hours

Currently, the bot does almost all steps, except the last two, which I haven't implemented yet.

Code breakdown

The code is quite simple. I rely heavily on selenium to perform all actions, so I want to create a chrome instance where the agent will perform its actions.

Naturally, I first import the libraries I am going to use:

import os
import time
import random

from selenium import webdriver
from dataclasses import dataclass
from abc import ABC, abstractmethod
from webdriver_manager.chrome import ChromeDriverManager

Next up, I define immutable classes whose only purpose is to containerize information that is static, so that code duplication can be avoided.

@dataclass(frozen=True)
class XPathsContainer:
    teams_login_button: str = '//*[@id="mectrl_main_trigger"]/div/div[1]'
    teams_login_user_button: str = '//*[@id="i0116"]'
    teams_login_next_button: str = '//*[@id="idSIButton9"]'
    teams_login_pwd_button: str = '//*[@id="i0118"]'
    teams_sign_in_button: str = '//*[@id="idSIButton9"]'
    teams_sign_in_keep_logged_in: str = '//*[@id="KmsiCheckboxField"]'


@dataclass(frozen=True)
class UrlsContainer:
    teams_login_page: str = 'https://www.microsoft.com/en-in/microsoft-365/microsoft-teams/group-chat-software'

Now, I try to implement a base class which is called Driver. This class contains the initialization of the chrome object and sets the foundations for other agents to be inherited. Each Agent child class might have (in the future) different actions but they must have a sleep method (to avoid restrictions in using bots), they must be able to click, write information and navigate to pages.

class Driver(ABC):
    def __init__(self, action, instruction, driver=None):
        if driver:
            self.driver = driver
        else:
            self.driver = webdriver.Chrome(ChromeDriverManager().install())

        self.actions = {
            'navigate': self.navigate,
            'click': self.click,
            'write': self.write
        }

        self.parameters = {
            'action': None,
            'instruction': None
        }

    @abstractmethod
    def sleep(self, current_tick=1):
        pass

    @abstractmethod
    def navigate(self, *args):
        pass

    @abstractmethod
    def click(self, *args):
        pass

    @abstractmethod
    def write(self, **kwargs):
        pass

    @abstractmethod
    def main(self, **kwargs):
        pass

Now I implement a basic Agent child class, which implements the logic of required functions of the base class Driver.

class Agent(Driver):
    def __init__(self, action, instruction, driver):
        super().__init__(action, instruction, driver)
        self.action = action
        self.instruction = instruction

    def sleep(self, current_tick=1):
        seconds = random.randint(3, 7)
        timeout = time.time() + seconds
        while time.time() <= timeout:
            time.sleep(1)
            print(f"Sleeping to replicate user.... tick {current_tick}/{seconds}")
            current_tick += 1

    def navigate(self, url):
        print(f"Agent navigating to {url}...")
        return self.driver.get(url)

    def click(self, xpath):
        print(f"Agent clicking in '{xpath}'...")
        return self.driver.find_element_by_xpath(xpath).click()

    def write(self, args):
        xpath = args[0]
        phrase = args[1]
        print(f"Agent writing in '{xpath}' the phrase '{phrase}'...")
        return self.driver.find_element_by_xpath(xpath).send_keys(phrase)

    def main(self, **kwargs):
        self.action = kwargs.get('action', self.action)
        self.instruction = kwargs.get('instruction', self.instruction)
        self.actions[self.action](self.instruction)
        self.sleep()

Finally, I've created a function that updates the parameters of the class whenever there is a set of actions and instructions that need to be executed under the same chrome driver. And I've created a function that takes a script of actions and executes them.

def update_driver_parameters(driver, values):
    params = driver.parameters
    params['action'] = values[0]
    params['instruction'] = values[1]
    return params


def run_script(script):
    for script_line, script_values in SCRIPT.items():
        chrome = Agent(None, None, None)

        for instructions in script_values:
            params = update_driver_parameters(chrome, instructions)
            chrome.main(**params)
        chrome.sleep()


USER = os.environ["USERNAME"]
SECRET = os.environ["SECRET"]

SCRIPT = {
    'login': [
        ('navigate', UrlsContainer.teams_login_page),
        ('click', XPathsContainer.teams_login_button),
        ('write', (XPathsContainer.teams_login_user_button, USER)),
        ('click', XPathsContainer.teams_login_next_button),
        ('write', (XPathsContainer.teams_login_pwd_button, SECRET)),
        ('click', XPathsContainer.teams_sign_in_button),
        ('click', XPathsContainer.teams_sign_in_keep_logged_in),
        ('click', XPathsContainer.teams_sign_in_button),

    ]
}
run_script(SCRIPT)

Concerns

Right now, I think the code has several major concerns, mostly related to being inexperienced in design patterns:

  • I rely too much on Xpaths to make the bot do something which will result in an enormous data class if there are many steps to do;
  • Also, relying on Xpaths could be bad, because if the page is updated, I will have to retrace steps, but this is probably necessary evil;
  • I am not sure whether the implementation of an immutable class is the correct one. I've used dataclass for this;
  • I have the feeling that the inheritance that I've implemented is quite clunky. I want to be able to share the same driver along with multiple classes. I don't want to create a new driver per action, I always want to fetch the latest context the driver did, but if a new agent is created then a new driver must be assigned to that agent;
  • Maybe kwargs arguments could be implemented differently, I am never sure of the correct way to parse them without using kwargs.get;
  • Inconsistent use of args and kwargs, could this be implemented differently?
\$\endgroup\$

1 Answer 1

4
\$\begingroup\$

Bug: on the first line of run_script, SCRIPT.items() should be script.items(). As written, it executes the global SCRIPT and not the argument to the function.

It doesn't seem like Agent should inherit from Driver

If you research Selenium best practices, you will find a few that make sense for your use case (most are geared toward testing). Two of them are Page Objects and preferred selector order.

The idea behind Page Objects is to create a class for each page of the web application (or at least the pages you are using). The class encapsulates the data and methods needed to interact with that page. Your automation script then calls the methods on the Page Objects to automate a task. For example, a class for a login page might have methods for getting the login page, for entering a username, entering a password, clicking a remember me checkbox, and clicking a login button. A login method then calls these methods in the right order to do a login.

This lets you isolate page specifics in one place. For example, the current design seems to suggest that if you automate another task, you would need to duplicate the login portion of SCRIPT. Then, if the login process changes every script needs to by updated. Using a Page Object, only the login page class needs to be changed.

In practice the most reliable and robust way to select an element is by ID, then by name, css selector, and lastly Xpath is the least robust. It looks like most of your targets have IDs, so use that.

Structure the project something like this:

project
    pages
        __init__.py   # can be empty
        base.py       # one for each page 
        home.py
        login.py
        time.py
        ...etc...     # add whatever other pages you use

    entertime.py      # the script

Then

base.py
class BasePage:
    URL = None

    def __init__(self, driver=None):
        if driver is None:
            driver = webdriver.Chrome(ChromeDriverManager().install())
        
        self.driver = driver
        
    def click(self, locator, mu=1.5, sigma=0.3):
        """simulate human speed and click a page element."""
        self.dally(mu, sigma)
        self.driver.find_element(*locator).click()
        return self

    def dally(self, mu=1, sigma=0.2):
        pause = random.gauss(mu, sigma)
        while pause > 0:
            delta = min(1, pause)
            pause -= delta
            time.spleep(delta)
        return self

    def navigate(self):
        if self.URL:
            self.driver.get(self.URL)
            return self
            
        raise ValueError("No where to go.  No URL")
        
    def send_keys(self, locator, keys):
        self.driver.find_element(*locator).send_keys(keys)
        return self
login.py
from selenium import webdriver
from selenium.webdriver.common.by import By

from .base import BasePage
        
class LoginPage(BasePage):
    URL = 'https://www.microsoft.com/en-in/microsoft-365/microsoft-teams/group-chat-software'
    
    #locators for elements of the page
    LOGIN_BUTTON = (By.XPATH, '//*[@id="mectrl_main_trigger"]/div/div[1]')
    USERNAME_FIELD = (By.ID, "i0116")
    NEXT_BUTTON = (By.ID, "idSIButton9")
    PASSWORD_FIELD = (By.ID, "i0118")
    STAY_LOGGED_IN = (By.ID, "KmsiCheckboxField")
    
    def click_next(self):
        self.click(*self.NEXT_BUTTON)
        return self
        
    def start_login(self):
        self.click(*self.LOGIN_BUTTON)
        return self

    def enter_username(self, username):
        self.send_keys(*self.USERNAME_FIELD, username)
        self.click_next()
        return self
        
    def enter_password(self, password):
        self.send_keys(*self.PASSWORD_FIELD, password)
        self.click_next()
        return self
        
    def toggle_stay_logged_in(self):
        self.driver.find_element(*self.STAY_LOGGED_IN).click()
        return self
        
    def login(self, username, password):
        self.navigate()
        self.start_login()
        self.enter_username(username)
        self.enter_password(password)
        self.toggle_stay_logged_in()
        self.click_next()
    
        return HomePage(driver)   # or whatever page comes after a login
entertime.py
import os

from pages import LoginPage, HomePage   # what ever pages you need for the script

from selenium import webdriver
from selenium.webdriver.common.by import By

USER = os.environ["USERNAME"]
SECRET = os.environ["SECRET"]

homepage = LoginPage().login(USER, SECRET)

timepage = homepage.navigate_to_time_entry()  # <== whatever method you define
timepage.entertime()                          # <== whatever method you define

I don't have MS teams to test this on, so this hasn't been tested. It is merely as suggestion on how to structure you project to make it easier to update, expand, etc.

\$\endgroup\$
3
  • 1
    \$\begingroup\$ Nice answer. The import pattern entertime.py is a pattern I used to use, I found it good until I was more comfortable with correctly setting up a __main__.py. I'd personally rename entertime.pyto pages/__main__.py (with some import changes, from . import LoginPage, ...) and run the package with python -m pages rather than python entertime.py. Note using a __main__.py can be quite finicky at times so you (anyone) may prefer this much easier approach. \$\endgroup\$
    – Peilonrayz
    Commented Feb 23, 2021 at 2:15
  • 1
    \$\begingroup\$ @Peilonrayz, I'm presuming that there will be multiple scripts like entertime.py to do different tasks. So a __main__.py wouldn't work, unless it took arguments to tell it what to do, e.g., something like python -m teams entertime would cause __main__.py to execute entertime.py. \$\endgroup\$
    – RootTwo
    Commented Feb 23, 2021 at 3:59
  • \$\begingroup\$ Oh good point. Yeah using .pys would be simpler in that regard, hadn't thought of that. \$\endgroup\$
    – Peilonrayz
    Commented Feb 23, 2021 at 4:30

Not the answer you're looking for? Browse other questions tagged or ask your own question.