An enhanced WebDriver: manen’s Browser#

The module manen.browser defines several Browser classes such as ChromeBrowser. These classes inherits from WebDriver and are designed to only completes the original classes and not overwrite the base methods. These new methods are used for instanciation, DOMs exploration and some utils methods. This guide completes the documentation of these methods (that can be getting the help on the class manen.browser.BrowserMixin).

Note that all the browsers are not currently supported. To see the list of browser classes, you can use the values returned by manen.browser.__all__.

[1]:
from manen.browser import __all__
__all__
[1]:
('BraveBrowser', 'ChromeBrowser')

Instanciating a browser#

Each browser class implemented in manen provides the same additional methods compared to a WebDriver. One of the method is initialize which allows to instanciate a new browser with some common options:

  • proxy that will be the proxies used in the browser for each request.

  • headless to launch the browser in headless mode (without graphical interface).

  • driver_path which is the path of the binary for the browser driver. Note that if no driver path is specified, manen will automatically get the driver compatible with the used browser (by using the method {browser}.driver.get which will download the driver file if needed)

  • window_size which will the dimensions in pixel of the created browser window.

[2]:
from manen.browser import ChromeBrowser
browser = ChromeBrowser.initialize(proxy=None, headless=False, window_size=(1152, 864))

Given that a browser class just inherits from a WebDriver, all the usual methods such as get, save_screenshot, find_element(s)*.

[3]:
browser.get("https://kodaho.github.io/manen")
[4]:
from selenium.webdriver.common.by import By
for elt in browser.find_elements(By.CSS_SELECTOR, "div.sidebar-tree>ul>li.toctree-l1>a"):
    print(f"{elt.get_attribute('innerText'):25} -> {elt.get_property('href')}")
About the project         -> https://kodaho.github.io/manen/about.html
Installation              -> https://kodaho.github.io/manen/installation.html
User Guide                -> https://kodaho.github.io/manen/user_guide.html
Going deeper into manen   -> https://kodaho.github.io/manen/manen/manen.html
Changelog                 -> https://kodaho.github.io/manen/changelog.html
Contributing              -> https://kodaho.github.io/manen/contributing.html

Exploring the DOM#

Retrieving web elements is usually done with the methods find_element or find_elements. Even if these methods fulfil several use cases, the retrieval of web elements sometimes requires to handle more cases; the function manen.finder.find() has been designed to answer this problematic and has been integrated to the browser class. Please refer to the documentation of this function to get the exhaustive list of what can be done with this function. The find method of a browser and the function in manen.finder have approximately the same signature: the only differences is that it will retrieve by default all the elements matching the given selectors.

Let’s dig into the find() method, with an example by exploring the results matching the query “selenium” on PyPI.

[5]:
browser.get("https://pypi.org/search/?q=selenium")
[6]:
NAME_SELECTOR = "ul[aria-label='Search results']>li>a>h3>span.package-snippet__name"
elements = browser.find(NAME_SELECTOR) # equivalent to browser.find(NAME_SELECTOR, many=True)
for element in elements:
    print(element.get_attribute('innerText'))
selenium
selenium-wire
selenium-generator
Selenium-Screenshot
auto-selenium
django-selenium
selenium-robot
selenium_wrapper
datetime-selenium
fasttest-selenium
selenium-findtext
selenium-pinterest
selenium-elements
nose-selenium
ctreport-selenium
masonite-selenium
play-selenium
amazon-selenium
selenium-configurator
selenium-firefox

If you want to get only one hit, you can set the keyword argument many to False.

[7]:
browser.find("xpath://*[@id='content']//form/div[1]/div[1]/p", many=False).get_attribute('innerText')
[7]:
'1,436 projects for "selenium"'

The function has been designed to be scalable, meaning that instead of searching inside the whole page, you can search inside a specific element.

[8]:
results = browser.find("ul[aria-label='Search results'] li")

Let’s search the name of the first package in the results.

[9]:
browser.find("h3 span.package-snippet__name", many=False, inside=results[0]).get_property('innerText')
[9]:
'selenium'

If the inside keyword argument is a list instead of a single element, it will return one result for each element in the list.

[10]:
elements_name = browser.find("h3 span.package-snippet__name", many=False, inside=results)
elements_name[:5]
[10]:
[<selenium.webdriver.remote.webelement.WebElement (session="a49cbf2183c8851614afed4cc983838f", element="2031bbcb-bf43-47a4-afde-3e5020003bd6")>,
 <selenium.webdriver.remote.webelement.WebElement (session="a49cbf2183c8851614afed4cc983838f", element="a2aa299d-6159-45d9-8b9c-fb026b517782")>,
 <selenium.webdriver.remote.webelement.WebElement (session="a49cbf2183c8851614afed4cc983838f", element="a7686b0d-7494-450a-865e-8f2fca2a93c8")>,
 <selenium.webdriver.remote.webelement.WebElement (session="a49cbf2183c8851614afed4cc983838f", element="394fab67-20fb-407b-8da2-2d5a0010d9e7")>,
 <selenium.webdriver.remote.webelement.WebElement (session="a49cbf2183c8851614afed4cc983838f", element="ca69129b-bf8e-4854-a707-f05a3d9972b5")>]

By default, manen will raise a ElementNotFound exception if the specified selectors match no elements in the area to inspect.

[11]:
browser.find("css:fake-selector")
---------------------------------------------------------------------------
ElementNotFound                           Traceback (most recent call last)
/Users/kodjo/Documents/Projects/manen/docs/source/user_guide/browser.ipynb Cell 23' in <module>
----> <a href='vscode-notebook-cell:/Users/kodjo/Documents/Projects/manen/docs/source/user_guide/browser.ipynb#ch0000037?line=0'>1</a> browser.find("css:fake-selector")

File ~/Documents/Projects/manen/manen/browser.py:172, in BrowserMixin.find(self, selector, **kwargs)
    <a href='file:///~/Documents/Projects/manen/manen/browser.py?line=169'>170</a> kwargs.setdefault("inside", self)
    <a href='file:///~/Documents/Projects/manen/manen/browser.py?line=170'>171</a> kwargs.setdefault("many", True)
--> <a href='file:///~/Documents/Projects/manen/manen/browser.py?line=171'>172</a> return find(selector, **kwargs)

File ~/Documents/Projects/manen/manen/finder.py:237, in find(selector, wait, default, inside, many)
    <a href='file:///~/Documents/Projects/manen/manen/finder.py?line=233'>234</a>     return default
    <a href='file:///~/Documents/Projects/manen/manen/finder.py?line=235'>236</a> driver = inside if isinstance(inside, WebDriver) else inside.parent
--> <a href='file:///~/Documents/Projects/manen/manen/finder.py?line=236'>237</a> raise ElementNotFound(selectors=selectors, driver=driver)

ElementNotFound: Unable to find inside document an element matching the selectors :
> css:fake-selector

Context of the exception:
- Title page: Search results · PyPI
- URL: https://pypi.org/search/?q=selenium

To avoid raising an error, you can specify a default value to be returned if any element is found.

[12]:
browser.find('css:fake-selector', default=None)

Note that the browser has a method lookup() that is equivalent to the find() method, except that it will never raise an error; the default value for an unfound element is, by default, None.

By specifying the wait keyword argument, you can specify the number of seconds to wait until trying to locate the element(s).

[13]:
%%time
browser.find('css:fake-selector', wait=3, default=None)
CPU times: user 13.8 ms, sys: 3.33 ms, total: 17.1 ms
Wall time: 3.08 s

Another use case supported by manen is trying several different selectors to locate an element. Just specify a list of selectors instead of just a selector. It will try all the selectors by order and return an element as soon as a selector hits a result.

[14]:
browser.find(['fake-link-selector', 'a'], many=False).get_property('href')
[14]:
'https://pypi.org/search/?q=selenium#content'

In order to help the development of your project, the browser has a method highlight that will draw a frame around each element matching the specified selectors.

[15]:
from tempfile import NamedTemporaryFile
from IPython.display import Image

browser.highlight(NAME_SELECTOR)

with NamedTemporaryFile() as f:
    browser.save_screenshot(f.name + '.png')
    highlight_screenshot = Image(f.name + '.png')
highlight_screenshot
[15]:
../_images/user_guide_browser_31_0.png

Cookies management#

One of the feature offered by manen’s Browser is an interface for cookies. With a selenium.WebDriver, you will have to use the methods get_cookies, add_cookie and delete_all_cookies. With a Browser, you can interact with cookies through the properties cookies.

[16]:
browser.get("https://httpbin.org/cookies/set/test/manen")
browser.cookies
[16]:
[{'domain': 'httpbin.org',
  'httpOnly': False,
  'name': 'test',
  'path': '/',
  'secure': False,
  'value': 'manen'}]
[17]:
# browser.cookies = list_of_cookies_dict
[18]:
del browser.cookies
assert len(browser.get_cookies()) == 0
[19]:
browser.quit()