How to save Machine Learning models in Python using Pickle and Joblib

Tunde Wey
4 min readDec 8, 2020

When building a Machine Learning model, there’s always a need to save the trained model to a file to minimize or completely avoid the stress of training the model all over again when required. This saved model can be used and reused whenever required in a program.

Solving a machine learning problem consists of 2 basic steps; training the model and making predictions with the trained model.
This article covers a step-by-step approach on how to save a Machine Learning model in Python using Pickle and Joblib.

Content Outline:

1. Getting the data
2. Data preprocessing
3. Visualizing the data on a pair plot
4. Training the model
5. Using Pickle and Joblib to save the trained model
6. Making predictions with the saved model.

Getting the data

Usually in Machine Learning, the size of train/test sets are pretty huge, consisting of a large number of rows and columns which helps the trained model to be more accurate during prediction.
In this article, we shall be working on a small multivariate dataset that was gotten by scraping a property-sales website in Nigeria and calling it ‘lekki_house_pricing.csv’.
Here’s the link to the dataset.

Viewing the data
Viewing the data

Data preprocessing

Computing the summary of statistics pertaining to the DataFrame columns.

Summary of statistics
Summary of statistics

Here, the standard deviation for the bedrooms and toilets features is zero which indicates no variation in their data points, this makes them irrelevant to the data set. Therefore, we remove them from the data.

Data cleaning
Data cleaning

This leaves us with 3 columns; bathrooms, parking space, and price.

Visualizing the data on a pairplot

To get an idea of the distribution of data points, we make a visualization using the seaborn pairplot.

Visualizing the data
Visualizing the data

Training the model

To train the model, there’s a need to separate the data set into features and label — X and y respectively.

Feature Columns
Feature Columns
Label Column
Label Column

We shall be training our model to predict home prices using the Linear Regression technique.

Training the model
Training the model

Predicting the prices of two houses with different numbers of bedrooms and parking spaces using our trained model.

House Price Prediction
House Price Prediction

We can see our model predicts a price of about 56 million naira for 5 bedrooms with 4 parking spaces, while for 4 bedrooms with 5 parking spaces our model predicted a price of about 63.5 million naira.

Using Pickle and Joblib modules to save the trained model

Using Pickle

Pickle is a python module used for serializing and de-serializing python object structures. The process of converting python objects such as lists, dictionaries, etc. into byte streams (0s and 1s) to store it in a file/database is referred to as ‘pickling’ or ‘serialization’.

Saving the model using Pickle
Saving the model using Pickle

Running the code snippet above saves the model as a binary file which can be seen in the working directly. We can see this in the image below.

Locating the Pickle file in the working directory
Locating the Pickle file in the working directory

Using Joblib

Technically, sklearn’s Joblib module does primarily the same thing as the Pickle module.

Saving the model using Joblib
Saving the model using Joblib

The code snippet above saves the model into a file called ‘joblib_model’. Just like the Pickle module, the file can be seen in the working directly as can be seen in the image below.

Locating the Joblib file in the working directory
Locating the Joblib file in the working directory

Making predictions with the saved model

Using the Pickle model

To use the saved model, we open the file in read mode and load it into a model object, here we call the model object ‘pickle_file’. This model object is used to make predictions.

House price predictions with the Pickle file
House price predictions with the Pickle file

Using the Joblib model

Just like in Pickle, we load the file into a model object, here we call the model object ‘Joblib_file’ and use it for predictions.

House price predictions using the Joblib file
House price predictions using the Joblib file

Conclusion

Most experts say Joblib is significantly faster on large NumPy arrays internally than Pickle. I have not done any profiling myself to ascertain which is better but I advise you to figure out which works for you, following the needs of the project as none of these represent an optimal solution.

If this was helpful, feel free to connect with me on my Twitter and LinkedIn media.
Happy Data Science-ing!

Reference

https://nigeriapropertycentre.com

--

--