33  Save data and objects

Author

Andres Patrignani

Published

February 14, 2024

Imagine you have Python code that downloads large datasets from the web or performs computationally expensive tasks that you don’t want to repeat. While saving data in .CSV files is common, Python offers more flexible options for saving objects and data structures. In this tutorial, we will explore two powerful modules from the Python standard library: pickle and json. These modules allow you to serialize and deserialize Python objects, making it easy to save your work and load it later without rerunning time-consuming code.

# Create a sample dictionary with metadata for some stations of the Kansas Mesonet
data = [
    {'name': 'Ashland Bottoms', 'latitude': 39.125773, 'longitude': -96.63653},
    {'name': 'Colby', 'latitude': 39.39247, 'longitude': -101.06864},
    {'name': 'Garden City', 'latitude': 37.99733, 'longitude': -100.81514},
    {'name': 'Manhattan', 'latitude': 39.20857, 'longitude': -96.59169},
    {'name': 'Parsons', 'latitude': 37.36875, 'longitude': -95.28771},
    {'name': 'Tribune 6NE', 'latitude': 38.53041, 'longitude': -101.66434},
]
Note

In this particular case we are using a dictionary as an example, so that the same dataset can be used with both the pickle and the json modules. But you can also pickle other objects and data structures like Pandas Dataframes.

Pickle module

The pickle module is used for serializing and deserializing Python object structures, also called “pickling” and “unpickling”. Serialization is the process of converting a Python object into a byte stream, and deserialization is the inverse process, converting a byte stream back into an object.

The pickle module lets you save Python objects in a binary format, which is efficient and suitable for complex data types, but the resulting file is not human-readable.

# Import module
import pickle
# Save the dataset using pickle
# Open file in write binary mode (data will not be written as text)
with open('../datasets/data.pkl', 'wb') as f:
    pickle.dump(data, f)
# Load the dataset using pickle
# Read file in binary mode
with open('../datasets/data.pkl', 'rb') as f:
    data_pickle = pickle.load(f)

# Print first entry of dictionary
print(data_pickle[0])
{'name': 'Ashland Bottoms', 'latitude': 39.125773, 'longitude': -96.63653}

JSON module

The json module provides a way to encode and decode data in JavaScript Object Notation (JSON) format. JSON is a lightweight format that is easy for humans to read and write, and easy for machines to parse and generate.

This format is very similar to Python dictionaries (both use a key-value pair structure), is interoperable with other programming languages, and is ideal for web applications. The JSON format is limited to data types like strings, numbers, lists, and dictionaries.

# Import module
import json
# Save the dataset using JSON
# Open file in write mode (data will be written as text)
with open('../datasets/data.json', 'w') as f:
    json.dump(data, f)
# Load the dataset using JSON
with open('../datasets/data.json', 'r') as f:
    data_json = json.load(f)
# Print first entry of dictionary
print(data_pickle[0])
{'name': 'Ashland Bottoms', 'latitude': 39.125773, 'longitude': -96.63653}