Articles by Python - datawookie

Persisting Data with Pickle & S3

July 28, 2022 | Python - datawookie

I occasionally write scripts where I need to persist some information between runs. These scripts are often wrapped in a Docker image and deployed on Amazon ECS. This means that there is no persistent storage. I could use a database, but this would be overkill for the volume of data ...
[...Read more...]

Firing Up Firestore

March 20, 2022 | Python - datawookie

I’ve just started collaborating on a new project, Votela, with Luke. We’re going to be using Firestore for stashing our data. I’ve never worked with Firestore before, so one of my first tasks was just figuring out how to get connected and how to shift some data ...
[...Read more...]

Scrapy with a Rotating Tor Proxy

June 9, 2021 | Python - datawookie

This post shows an approach to using a rotating Tor proxy with Scrapy. I’m using the scrapy-rotating-proxies download middleware package to rotate through a set of proxies, ensuring that my requests are originating from a selection of IP addresses. However, I need to have those IP addresses evolve over ... [...Read more...]

Selenium Crawler #3: Docker Compose

April 19, 2021 | Python - datawookie

In two previous posts we’ve looked at how to set up a simple scraper which uses Selenium in Docker, communicating via the host network and bridge network. Both of those setups have involved launching separate containers for the scraper and Selenium. In this post we’ll see how to ...
[...Read more...]

Beautiful Data

October 15, 2015 | Python - datawookie

I’ve just finished reading Beautiful Data (published by O’Reilly in 2009), a collection of essays edited by Toby Segaran and Jeff Hammerbacher. The 20 essays from 39 contributors address a diverse array of topics relating to data a...
[...Read more...]