Scraper Template with Selenium #2: Docker Bridge Network
Want to share your content on python-bloggers? click here.
In the previous post we set up a scraper template which used Selenium on Docker via the host network. Now we’re going to do essentially the same thing but using a bridge network.
Default Network
We’ll start by using Docker’s default bridge network.
docker network ls
NETWORK ID NAME DRIVER SCOPE ea5ebd23a086 bridge bridge local bb80a2809880 host host local 00b74ecbf970 none null local
These three networks will always be available:
bridge
, host
and none
. We’re only interested in the first one.Let’s create a Selenium container.
docker run -d --rm --name selenium selenium/standalone-chrome:3.141
cede2a2e6fc279fcb2014f290cc5e324d86f2033d04cca1b2da59c03e121aec5
Now if we inspect the
bridge
network we’ll see that the selenium
container is connected.docker network inspect bridge
[ { "Name": "bridge", "IPAM": { "Config": [ { "Subnet": "172.17.0.0/16", "Gateway": "172.17.0.1" } ] }, "Containers": { "cede2a2e6fc279fcb2014f290cc5e324d86f2033d04cca1b2da59c03e121aec5": { "Name": "selenium", "MacAddress": "02:42:ac:11:00:02", "IPv4Address": "172.17.0.2/16", "IPv6Address": "" } } } ]
The above output has been abridged for clarity.
We can see that the gateway between the host and the bridge network has an IP of 172.17.0.1 and that the
selenium
container is at 172.17.0.2.Launching a shell inside the
selenium
container we can see what the network looks like from its perspective.root@cede2a2e6fc2:/# ip -br -c a
lo UNKNOWN 127.0.0.1/8 eth0@if68 UP 172.17.0.2/16
To get this to work you’ll need to install the
iproute2
package on the container.Okay, now let’s try connecting to the
selenium
container via the default bridge network. In order to do this we need to use it’s IP address.from selenium import webdriver SELENIUM_URL = "172.17.0.2:4444" browser = webdriver.Remote(f"http://{SELENIUM_URL}/wd/hub", {'browserName': 'chrome'}) browser.get("https://www.google.com") print(f"Retrieved URL: {browser.current_url}.") browser.close()
We have to explicitly specify the IP for the
selenium
container. Obviously this is not ideal. We cannot be assured that the selenium
container will always be at the same IP address, so this will become hard to maintain.Stop the existing
selenium
container.docker stop selenium
User-Defined Network
We’re able to build a more robust setup if we create a user-defined network.
docker network create --driver bridge google
57a868c4124e4339a35b13dd6125f36835530e561e48a85e26de02e31d44460b ```network. List the Docker networks again. ```bash docker network ls
NETWORK ID NAME DRIVER SCOPE ea5ebd23a086 bridge bridge local 57a868c4124e google bridge local bb80a2809880 host host local 00b74ecbf970 none null local
The
google
network has been added to the list.Now launch the Selenium container again, but this time using the
--network
argument to connect it to the google
network.docker run -d --rm --name selenium --network google selenium/standalone-chrome:3.141
f8a0a0dd21f4f27773c5ce260df21cb6d509815b56638ccfd6be5f05dbb8172b
If we inspect the
google
network then we’ll see the details of the selenium
container.docker network inspect google
[ { "Name": "google", "IPAM": { "Driver": "default", "Options": {}, "Config": [ { "Subnet": "172.21.0.0/16", "Gateway": "172.21.0.1" } ] }, "Containers": { "f8a0a0dd21f4f27773c5ce260df21cb6d509815b56638ccfd6be5f05dbb8172b": { "Name": "selenium", "MacAddress": "02:42:ac:15:00:02", "IPv4Address": "172.21.0.2/16", "IPv6Address": "" } } } ]
The above output has been abridged for clarity.
On a user-defined network containers can be located either via IP address or by name (where the name is internally resolved to an IP address via the automatic service discovery capability). This means that, rather than address the
selenium
container by its IP address we can simply refer to it by name. This is a much more robust setup since, provided we consistently use the same name for this container.from selenium import webdriver SELENIUM_URL = "selenium:4444" browser = webdriver.Remote(f"http://{SELENIUM_URL}/wd/hub", {'browserName': 'chrome'}) browser.get("https://www.google.com") print(f"Retrieved URL: {browser.current_url}.") browser.close()
Scraper Template in Docker with User-Defined Bridge Network
Let’s wrap this up by putting our little scraper into a Docker image.
FROM python:3.8.5-slim AS base RUN pip3 install selenium==3.141.0 COPY google-selenium-bridge-user-defined.py / CMD python3 google-selenium-bridge-user-defined.py
Now build the image.
docker build -t google-selenium-bridge-user-defined .
And run it.
docker run --net=google google-selenium-bridge-user-defined
Retrieved URL: https://www.google.com/.
We specified
--net=google
to ensure that this container is launched onto the google
network.Our setup now has everything covered in the previous post but also keeps all of the networking within Docker, so everything is isolated from the host.
Cleaning Up
Always good practice to mop up: stop the
selenium
container and remove the google
network.docker stop selenium docker network rm google
Want to share your content on python-bloggers? click here.