Want to share your content on python-bloggers? click here.
MinIO is a object storage database which uses S3(from Amazon). This is a very convenient tool in for data scientists or machine learning engineers to easily collaborate and share data and machine learning models. MinIO is a cloud storage server compatible with Amazon S3, released under Apache License v2. As an object store, MinIO can store unstructured data such as photos, videos, log files, backups and container images. The maximum size of an object is 5TB.
In this tutorial, I would show you how to build a simple machine learning model, connect to MinIO server, load and extract saved models. What this model will not cover is installing MinIO as the documentation on the website is well written.
https://min.io/download#/linux
1. Import Iris data set from sklearn
from sklearn.datasets import load_iris
# load iris data set iris = load_iris() type(iris)
sklearn.utils.Bunch
2. Split the data set into train and test
# define features and class labels x=iris.data y=iris.target
from sklearn.model_selection import train_test_split # split the data into train and test x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=.5)
3. Define a decision tree classifier and train the model
from sklearn import tree
# define a classifier classifier=tree.DecisionTreeClassifier() # fit the classifier classifier.fit(x_train,y_train)
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, presort=False, random_state=None, splitter='best')
# perform predictions and save the predictions in object predictions=classifier.predict(x_test)
from sklearn.metrics import accuracy_score # estimate the accuracy of the model print(accuracy_score(y_test,predictions))
0.9466666666666667
4. Save the model to a local file system
# use joblib library to to the exporting from sklearn.externals import joblib from joblib import dump
/usr/local/lib/python3.7/site-packages/sklearn/externals/joblib/__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+. warnings.warn(msg, category=DeprecationWarning)
filename = 'decisionTree.sav' joblib.dump(classifier, filename)
['decisionTree.sav']
5. Export the model to MinIO database or object storage
from minio import Minio
- make – used to make or create a bucket
- put – used to put an object to a bucket
- get – used to get an object from the bucket
- copy – copy objects from one bucket to another
- list – used to list objects or buckets
- remove – used to either remove an object or a bucket
from minio.error import ResponseError # create a connection to server minioClient = Minio('192.168.1.1:8080', access_key='test', secret_key='test123', secure=False)
Now, we can export the model, decesionTree.sav to a bucket called example. File_data an file_stat parameters are a must to export them.
import os # open the file and put the object in bucket called example with open('decisionTree.sav', 'rb') as file_data: file_stat = os.stat('decisionTree.sav') minioClient.put_object('example', 'decisionTree.sav', file_data, file_stat.st_size)
We can now list the objects in the bucket to see if we have the uploaded file. Below we can see the bucket is example, file name is decisionTree.sav and the time it was uploaded.
# List all object paths in bucket that begin with my-prefixname. objects = minioClient.list_objects('example', recursive=True) for obj in objects: print(obj.bucket_name, obj.object_name.encode('utf-8'), obj.last_modified, obj.etag, obj.size, obj.content_type)
example b'decisionTree.sav' 2020-01-13 16:26:35.982000+00:00 895f7dd35c0723a74338825e78a8d7d3-1 2022 None
# get the object from MinIO and safe it as newfile print(minioClient.fget_object('example', 'decisionTree.sav', "newfile"))
<Object: bucket_name: example object_name: b'decisionTree.sav' last_modified: time.struct_time(tm_year=2020, tm_mon=1, tm_mday=13, tm_hour=16, tm_min=26, tm_sec=35, tm_wday=0, tm_yday=13, tm_isdst=0) etag: 895f7dd35c0723a74338825e78a8d7d3-1 size: 2022 content_type: application/octet-stream, is_dir: False, metadata: {'Content-Type': 'application/octet-stream'}>
# use the downloaded object to do the predictions and print result filename = 'newfile' loaded_model = joblib.load(filename) result = loaded_model.score(x_test, y_test) print(result)
0.9466666666666667
If you are a R person, I have rewritten a package for MinIO. It’s based on aws.s3 from cloudyR.
Want to share your content on python-bloggers? click here.