Python-bloggers

Undetected ChromeDriver: Stay Below the Radar

This article was first published on Python - datawookie , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

There’s one major problem with ChromeDriver: anti-bot services are able to detect that a browser session is being automated (as opposed to being used by a regular meat sack) and will often impose restrictions or deny connections altogether. The Undetected ChromeDriver (undetected-chromedriver) Python package is a patched version of ChromeDriver which avoids triggering a selection of anti-bot services, allowing it to glide under the anti-bot radar.

What is ChromeDriver?

ChromeDriver is used for testing websites and apps, as well as web scraping. It is often used via Selenium, which provides a consistent, high level interface for controlling a browser. It’s useful to understand the relationship between client programming languages, Selenium, ChromeDriver and the controlled browser.

Browsers

It’s useful to be able to choose from a selection of browsers. If you’re testing an app or website then you’ll want to be confident that it works on a variety of browsers. If you’re web scraping then your choice of browser might be based on subtle changes in the way that a site is rendered on different browsers, differences in performance and memory footprint, or just personal preference.

WebDriver

The WebDriver specification defines a protocol for remotely inspecting and controlling user agents (which in this context is just a general term for “browsers”). It’s a general specification, which means that it is language and browser agnostic. ChromeDriver and GeckoDriver are implementations of WebDriver for browsers built on the Chromium and Mozilla codebases respectively. They provide the mechanism for controlling a specific browser.

Selenium

The WebDriver specification provides a low level protocol for communicating with a browser. Using this protocol directly would be hard work. Selenium provides a high level interface to WebDriver, which makes writing client code easier and more efficient.

Clients

There are wrappers for the Selenium library which make it accessible from a variety of languages. Possibly the most frequently used languages for this purpose are (IMHO) Java, Python and R, but you could also use C#, Ruby or JavaScript.

Undetected ChromeDriver in Docker

You can install the undetected-chromedriver package using pip.

pip install undetected-chromedriver

Many applications get wrapped up in a Docker image, so it’s rather useful to have Python, the undetected-chromedriver package, ChromeDriver and a browser all neatly enclosed in a single image.

There’s an Undetected ChromeDriver Docker image. However, the corresponding Dockerfile is not available and I like to understand what’s gone into an image. So I rolled my own, which can be found here.

Example

We’re going to access two sites:

💡 If you’re trying this out yourself then you might want to run the examples using Undetected ChromeDriver first before coming back to Selenium because the latter will likely result in your IP address being flagged.

Using Selenium

To run these examples I first launched a Selenium Docker container exposing VNC on port 5900 and the Selenium hub on port 4444.

docker run -p 4444:4444 -p 5900:5900 selenium/standalone-chrome-debug:3.141.59

First we’ll visit https://nowsecure.nl and take a screenshot.

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

driver = webdriver.Remote("http://127.0.0.1:4444/wd/hub", DesiredCapabilities.CHROME)

driver.get("https://nowsecure.nl")

driver.set_window_size(1000, 900)
driver.save_screenshot('selenium-nowsecure.png')

This is what the screenshot looks like.

It’s a little underwhelming, but it indicates that one of the anti-bot mechanisms on the site is blocking us. Hang on, you’ll see shortly what it should look like. Or just visit the site now. If you experience a sensory assault then it’s confirmation that you’re not a bot.

Now let’s take a swing at https://datadome.co/. We’ll take another screenshot to record the result.

driver.get("https://datadome.co/")

driver.save_screenshot('selenium-datadome.png')

Aha! It looks like we’ve been spotted. A CAPTCHA indicates that the site regards the request as suspicious and would normally scupper our attempts to browse the site.

Using Undetected Chromedriver

Now we’ll try the same sites using Undetected Chromedriver. These examples were run in an interactive session using the Undetected Chromedriver Docker image. Again VNC is exposed on port 5900.

docker run -it -p 5900:5900 datawookie/undetected-chromedriver:3.9

Let’s visit https://nowsecure.nl.

import undetected_chromedriver as uc

driver = uc.Chrome()
driver.get("https://nowsecure.nl")

A screenshot indicates that we have penetrated the anti-bot measures.

What about DataDome?

driver.get("https://datadome.co/")

Looks good!

🚨 If your IP address has already been flagged by an anti-bot mechanism then using Undetected ChromeDriver is probably not going to help you. Well, not from the compromised IP address. If you can get a fresh IP address then you’re back in business.

Extending Undetected Chromedriver Docker Image

The benefits of having a Docker image with the Undetected ChromeDriver functionality is that you can easily create a derived image with additional capabilities. Suppose, for example, that I wanted an Undetected ChromeDriver script that also used the pyjokes package (because why wouldn’t you?). The script, doit.py, might look like this:

import undetected_chromedriver as uc
import pyjokes

driver = uc.Chrome()
driver.get("https://nowsecure.nl")

print(driver.page_source)

print(pyjokes.get_joke())

And the corresponding Dockerfile would be:

FROM datawookie/undetected-chromedriver:3.9

RUN pip3 install pyjokes

COPY doit.py .

CMD ["python", "doit.py"]

This is based on the Undetected ChromeDriver image but adds the pyjokes package and includes the script itself (a container will automatically run the script).

To leave a comment for the author, please follow the link and comment on their blog: Python - datawookie .

Want to share your content on python-bloggers? click here.
Exit mobile version