I occasionally write scripts where I need to persist some information between runs. These scripts are often wrapped in a Docker image and deployed on Amazon ECS. This means that there is no persistent storage. I could use a database, but this would be overkill for the volume of data involved. This post describes a simple approach to storing these data on S3 using a pickle file.
botocore packages (the latter package is only required for the
import boto3, botocore
Create an S3 client object.
s3 = boto3.client("s3")
How does authentication work? I store my credentials in
~/.aws/credentials with multiple AWS accounts, each identified by an unique profile name. I set the
AWS_PROFILE environment variable to choose a specific account. I also specify a suitable value for the
AWS_DEFAULT_REGION environment variable.
export AWS_PROFILE=fathom export AWS_DEFAULT_REGION=eu-west-1
Now store the S3 bucket name and a name for the pickle file.
BUCKET = "state-persist" PICKLE = "state.pkl"
First try to load the data. On the first iteration this won’t work because there’s nothing persisted yet. But after you’ve been through the process once, these steps will load the data from the previous iteration.
Attempt to download the pickle file from S3. If it’s not there, handle the error gracefully.
try: s3.download_file(BUCKET, PICKLE, PICKLE) except botocore.exceptions.ClientError: # You'll arrive here on the first iteration. pass
Read the pickle file. On failure, set
None (or some other appropriate default value).
try: with open(PICKLE, "rb") as file: data = pickle.load(file) except (FileNotFoundError, EOFError): # You'll arrive here on the first iteration. data = None
Since both of the first two steps will normally fail together, it might make sense to place the second step in an
else clause of the first exception handler.
As the script runs the state information is assigned to (or updated in)
data. At the end we need to persist those data.
Create or update the pickle file.
pickle.dump(data, open(PICKLE, "wb"))
Write that file to S3.
s3.upload_file(PICKLE, BUCKET, PICKLE)
A simple procedure for persisting information between jobs.
This approach is vulnerable to race conditions if there are multiple instances of the script running simultaneously. You could handle this with a lock file (also stored on S3) or by just being careful to avoid simultaneous execution.