Want to share your content on python-bloggers? click here.
I occasionally write scripts where I need to persist some information between runs. These scripts are often wrapped in a Docker image and deployed on Amazon ECS. This means that there is no persistent storage. I could use a database, but this would be overkill for the volume of data involved. This post describes a simple approach to storing these data on S3 using a pickle file.
Setup
Import the boto3
and botocore
packages (the latter package is only required for the ClientError
exception).
import boto3, botocore
Create an S3 client object.
s3 = boto3.client("s3")
How does authentication work? I store my credentials in ~/.aws/credentials
with multiple AWS accounts, each identified by an unique profile name. I set the AWS_PROFILE
environment variable to choose a specific account. I also specify a suitable value for the AWS_DEFAULT_REGION
environment variable.
export AWS_PROFILE=fathom export AWS_DEFAULT_REGION=eu-west-1
Now store the S3 bucket name and a name for the pickle file.
BUCKET = "state-persist" PICKLE = "state.pkl"
Retrieve
First try to load the data. On the first iteration this won’t work because there’s nothing persisted yet. But after you’ve been through the process once, these steps will load the data from the previous iteration.
Attempt to download the pickle file from S3. If it’s not there, handle the error gracefully.
try: s3.download_file(BUCKET, PICKLE, PICKLE) except botocore.exceptions.ClientError: # You'll arrive here on the first iteration. pass
Read the pickle file. On failure, set data
to None
(or some other appropriate default value).
try: with open(PICKLE, "rb") as file: data = pickle.load(file) except (FileNotFoundError, EOFError): # You'll arrive here on the first iteration. data = None
Since both of the first two steps will normally fail together, it might make sense to place the second step in an else
clause of the first exception handler.
Store
As the script runs the state information is assigned to (or updated in) data
. At the end we need to persist those data.
Create or update the pickle file.
pickle.dump(data, open(PICKLE, "wb"))
Write that file to S3.
s3.upload_file(PICKLE, BUCKET, PICKLE)
Conclusion
A simple procedure for persisting information between jobs.
This approach is vulnerable to race conditions if there are multiple instances of the script running simultaneously. You could handle this with a lock file (also stored on S3) or by just being careful to avoid simultaneous execution.
Want to share your content on python-bloggers? click here.