Articles by Python - datawookie

Scrapy with a Rotating Tor Proxy

June 9, 2021 | Python - datawookie

This post shows an approach to using a rotating Tor proxy with Scrapy. I’m using the scrapy-rotating-proxies download middleware package to rotate through a set of proxies, ensuring that my requests are originating from a selection of IP addresses. However, I need to have those IP addresses evolve over ... [...Read more...]

Selenium Crawler #3: Docker Compose

April 19, 2021 | Python - datawookie

In two previous posts we’ve looked at how to set up a simple scraper which uses Selenium in Docker, communicating via the host network and bridge network. Both of those setups have involved launching separate containers for the scraper and Selenium. In this post we’ll see how to ...
[...Read more...]

Beautiful Data

October 15, 2015 | Python - datawookie

I’ve just finished reading Beautiful Data (published by O’Reilly in 2009), a collection of essays edited by Toby Segaran and Jeff Hammerbacher. The 20 essays from 39 contributors address a diverse array of topics relating to data a...
[...Read more...]