Articles by Python | datawookie

Scrapy with a Rotating Tor Proxy

June 9, 2021 | Python | datawookie

This post shows an approach to using a rotating Tor proxy with Scrapy. I’m using the scrapy-rotating-proxies download middleware package to rotate through a set of proxies, ensuring that my requests are originating from a selection of IP addresses. However, I need to have those IP addresses evolve over ... [...Read more...]

Selenium Template #4: Deploying to ECS

April 25, 2021 | Python | datawookie

This is part of a series of posts: Part 1: Selenium Template — Docker Host Network Part 2: Selenium Template — Docker Bridge Network Part 3: Selenium Template — Docker Compose Part 4: Selenium Template — Deploying to ECS In the last few posts we’ve looked at a few ways to set up the infrastructure for a ...
[...Read more...]

Selenium Crawler #3: Docker Compose

April 19, 2021 | Python | datawookie

In two previous posts we’ve looked at how to set up a simple scraper which uses Selenium in Docker, communicating via the host network and bridge network. Both of those setups have involved launching separate containers for the scraper and Selenium. In this post we’ll see how to ...
[...Read more...]

Selenium Template #3: Docker Compose

April 19, 2021 | Python | datawookie

This is part of a series of posts: Part 1: Selenium Template — Docker Host Network Part 2: Selenium Template — Docker Bridge Network Part 3: Selenium Template — Docker Compose Part 4: Selenium Template — Deploying to ECS In two previous posts we’ve looked at how to set up a simple scraper which uses Selenium in ...
[...Read more...]

Selenium Template #2: Docker Bridge Network

April 18, 2021 | Python | datawookie

This is part of a series of posts: Part 1: Selenium Template — Docker Host Network Part 2: Selenium Template — Docker Bridge Network Part 3: Selenium Template — Docker Compose Part 4: Selenium Template — Deploying to ECS In the previous post we set up a scraper template which used Selenium on Docker via the host network. ...
[...Read more...]
1 2 3