# Import Numpy and Pandas modules
import numpy as np
import pandas as pd
30 Plotting
Plotting is an essential component of data analysis that enables researchers to effectively communicate data insights through visualizations. Python has several powerful libraries for creating a wide range of figures. Matplotlib, renowned for its vast gallery and versatility, is ideal for creating static and publication-quality figures. Bokeh is another library that excels in interactive and web-ready visualizations, making it perfect for dynamic data exploration. Seaborn is a library that was built on top of Matplotlib, and specializes in statistical graphics and provides a more high-level interface for creating sophisticated plots.
Dataset
To keep this plotting notebook simple, we will start by reading some daily environmental data recorded in a tallgrass prairie in the Kings Creek watershed, which is located within the Konza Prairie Biological Station near Manhattan, KS. The dataset includes the following variables:
Variable Name | Units | Description | Sensor |
---|---|---|---|
datetime | - | Timestamp of the data record | |
pressure | kPa | Atmospheric pressure | Atmos 41 |
tmin | °C | Minimum temperature | Atmos 41 |
tmax | °C | Maximum temperature | Atmos 41 |
tavg | °C | Average temperature | Atmos 41 |
rmin | % | Minimum relative humidity | Atmos 41 |
rmax | % | Maximum relative humidity | Atmos 41 |
prcp | mm | Precipitation amount | Atmos 41 |
srad | MJ/m² | Solar radiation | Atmos 41 |
wspd | m/s | Wind speed | Atmos 41 |
wdir | degrees | Wind direction | Atmos 41 |
vpd | kPa | Vapor pressure deficit | Atmos 41 |
vwc_5cm | m³/m³ | Volumetric water content at 5 cm depth | Teros 12 |
vwc_20cm | m³/m³ | Volumetric water content at 20 cm depth | Teros 12 |
vwc_40cm | m³/m³ | Volumetric water content at 40 cm depth | Teros 12 |
soiltemp_5cm | °C | Soil temperature at 5 cm depth | Teros 12 |
soiltemp_20cm | °C | Soil temperature at 20 cm depth | Teros 12 |
soiltemp_40cm | °C | Soil temperature at 40 cm depth | Teros 12 |
battv | millivolts | Battery voltage of the datalogger | AA Batt. |
discharge | m³/s | Streamflow | USGS gauge |
# Read some tabulated weather data
= pd.read_csv('../datasets/kings_creek_2022_2023_daily.csv',
df =['datetime'])
parse_dates
# Display a few rows to inspect column headers and data
3) df.head(
datetime | pressure | tmin | tmax | tavg | rmin | rmax | prcp | srad | wspd | wdir | vpd | vwc_5cm | vwc_20cm | vwc_40cm | soiltemp_5cm | soiltemp_20cm | soiltemp_40cm | battv | discharge | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2022-01-01 | 96.838 | -14.8 | -4.4 | -9.6 | 78.475 | 98.012 | 0.25 | 2.098 | 5.483 | 0.969 | 0.028 | 0.257 | 0.307 | 0.359 | 2.996 | 5.392 | 7.425 | 8714.833 | 0.0 |
1 | 2022-01-02 | 97.995 | -20.4 | -7.2 | -13.8 | 50.543 | 84.936 | 0.25 | 9.756 | 2.216 | 2.023 | 0.072 | 0.256 | 0.307 | 0.358 | 2.562 | 4.250 | 6.692 | 8890.042 | 0.0 |
2 | 2022-01-03 | 97.844 | -9.4 | 8.8 | -0.3 | 40.622 | 82.662 | 0.50 | 9.681 | 2.749 | 5.667 | 0.262 | 0.255 | 0.307 | 0.358 | 2.454 | 3.917 | 6.208 | 8924.833 | 0.0 |
Matplotlib module
Matplotlib is a powerful and widely-used Python library for creating high-quality static and animated visualizations with a few lines of code that are suitable for scientific research. Matplotlib integrates well with other libraries like Numpy and Pandas, and can generate a wide range of graphs and has an extensive gallery of examples, so in this tutorial we will go over a few examples to learn the syntax, properties, and methods available to users in order to customize figures. To learn more visit Matplotlib’s official documentation.
Components of Matplotlib figures
- Figure: The entire window that everything is drawn on. The top-level container for all the elements.
- Axes: The part of the figure where the data is plotted, including any axes labeling, ticks, and tick labels. It’s the area that contains the plot elements.
- Plotting area: The space where the data points are visualized. It’s contained within the axes.
- Axis: These are the line-like objects and take care of setting the graph limits and generating the ticks and tick labels.
- Ticks and Tick Labels: The marks on the axis to denote data points and the labels assigned to these ticks.
- Labels: Descriptive text added to the x axis and y axis to identify what each represents.
- Title: A text label placed at the top of the axes to provide a summary or comment about the plot.
- Legend: A small area describing the different elements or data series of the plot. It’s used to identify plots represented by different colors, markers, or line styles.
Essentially, a Matplotlib figure is an assembly of interconnected objects, each customizable through various properties and functions. When a figure is created, attributes such as the figure dimensions, axes properties, tick marks, font size of labels, and more come with pre-set default values. Understanding this object hierarchy is key customize figures to your visualization needs.
Matplotlib syntax
Matplotlib has two syntax styles or interfaces for creating figures:
function-based interface (easy) that resembles Matlab’s plotting syntax. This interface relies on using the
plt.___<function>____
construction for adding/modifying each component of a figure. It is simpler and more straightforward that the object-based interface (see below), so the function-based style is sometimes easier for beginners or students with background in Matlab. The main disadvantage of this method is that is implicit, meaning that the axes object (with all its attributes and methods) remains temporarily in the background since we are not saving it into a variable. This means that if we want to add/remove/modify something later on in one axes, we don’t have that object available to us to implement the changes. One option is to get the current axes usingplt.gca()
, but we need to do this before adding another axes (say another subplot) to the figure. If you don’t need to create sophisticated figures, then this method usually works just fine.object-based interface (advanced) that offers more flexibility and control, particularly when dealing with multiple axes. In this interface, the
figure
andaxes
objects are explicit, meaning that each figure and axes object are stored as regular variables that provide the programmer access to all configuration options at any point in the script. The downside is that this syntax is a bit more verbose and sometimes less intuitive to beginners compared tot he function-based approach. The official documentation typically favors the object-based syntax, so it is good to become familair with it.
But don’t panic, the syntax between these two methods is not that different. Below I added more syntax details, a cheat sheet to help you understand some of the differences, and several examples using real data. In this article you can learn more about the pros and cons of each style.
Function-based syntax
# Sample data
= [1, 2, 3, 4]
x = [10, 20, 25, 30]
y
# Create figure and plot
=(4,4))
plt.figure(figsize
plt.plot(x, y)"Simple Line Plot")
plt.title("X-axis")
plt.xlabel("Y-axis")
plt.ylabel( plt.show()
Object-based syntax
# Sample data
= [1, 2, 3, 4]
x = [10, 20, 25, 30]
y
# Create figure and axes
= plt.subplots(figsize=(4,4))
fig, ax
ax.plot(x, y)"Simple Line Plot")
ax.set_title("X-axis")
ax.set_xlabel("Y-axis")
ax.set_ylabel( plt.show()
Matplotlib Cheat Sheet
Operation | function-based syntax | object-based syntax |
---|---|---|
Create figure | plt.figure() |
fig,ax = plt.subplots() fig,ax = plt.subplots(1,1) |
Simple line or scatter plot | plt.plot(x, y) plt.scatter(x, y) |
ax.plot(x, y) ax.scatter(x, y) |
Add axis labels | plt.xlabel('label', fontsize=size) plt.ylabel('label', fontsize=size) |
ax.set_xlabel('label', fontsize=size) ax.set_ylabel('label', fontsize=size) |
Change font size of tick marks | plt.xticks(fontsize=size) plt.yticks(fontsize=size) |
ax.tick_params(axis='both', labelsize=size) |
Add a legend | plt.legend() |
ax.legend() |
Remove tick marks and labels | plt.tick_params(axis='both', which='both', bottom=False, top=False, labelbottom=False) |
ax.tick_params(axis='both', which='both', bottom=False, top=False, labelbottom=False) |
Remove tick labels only | plt.gca().tick_params(axis='x', labelbottom=False) |
ax.tick_params(axis='x', labelbottom=False) |
Add a title | plt.title('title') |
ax.set_title('title') |
Add a secondary axis | plt.twinx() |
ax_secondary = ax.twinx() |
Rotate tick labels | plt.xticks(rotation=angle) plt.yticks(rotation=angle) |
ax.tick_params(axis='x', rotation=angle) ax.tick_params(axis='y', rotation=angle) |
Change scale | plt.xscale('log') plt.yscale('log') |
ax.set_xscale('log') ax.set_yscale('log') |
Change axis limits | plt.xlim([xmin, xmax]) plt.ylim([ymin, ymax]) |
ax.set_xlim([xmin, xmax]) ax.set_ylim([ymin, ymax]) |
Create subplots | plt.subplots(1, 2, 1) plt.subplots(2, 2, 1) |
fig, (ax1, ax2) = plt.subplots(1, 2) fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2) fig, axs = plt.subplots(2, 2); axs[0, 0].plot(x, y) |
Change xaxis dateformat | fmt = mdates.DateFormatter('%b-%y') plt.gca().xaxis.set_major_formatter(fmt) |
fmt = mdates.DateFormatter('%b-%y') ax.xaxis.set_major_formatter(fmt) |
# Import matplotlib modules
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
Access and modify plot configuration properties globally
We can use the code below to print the default value of all the properties within Matplotlib. You can also use this to set global properties.
# Print all default properties (warning output is long!)
plt.rcParams
# Inspect some default properties
print(plt.rcParams['font.family'])
print(plt.rcParams['font.size'])
print(plt.rcParams['axes.labelsize'])
print(plt.rcParams['xtick.labelsize'])
print(plt.rcParams['ytick.labelsize'])
# Remove comment to update the value of these properties
# Changes will affect all charts in this notebook
# plt.rcParams.update({'font.family':'Verdana'})
# plt.rcParams.update({'font.size':11})
# plt.rcParams.update({'axes.labelsize':14})
# plt.rcParams.update({'xtick.labelsize':12})
# plt.rcParams.update({'ytick.labelsize':12})
# Reset to default configuration values (this will undo the previous line)
# plt.rcdefaults()
['sans-serif']
10.0
medium
medium
medium
Line plot
A common plot when working with meteorological data is to show maximum and minimum air temperature.
# Create figure
=(8,4)) # If you set dpi=300 the figure is much better quality
plt.figure(figsize
# Add lines to axes
'datetime'], df['tmax'], color='tomato', linewidth=1, label='Tmax')
plt.plot(df['datetime'], df['tmin'], color='navy', linewidth=1, label='Tmin')
plt.plot(df[
# Customize chart attributes
'Kings Creek Watershed',
plt.title(={'family':'Verdana', 'color':'black', 'weight':'bold', 'size':14})
fontdict'Time', fontsize=10)
plt.xlabel('Air Temperature (Celsius)', fontsize=10)
plt.ylabel(=10, rotation=20)
plt.xticks(fontsize=10)
plt.yticks(fontsize=10)
plt.legend(fontsize
# Create custom dateformat for x-axis
= mdates.DateFormatter('%b-%y')
date_format
# We don't have the axes object saved into a variable, so to set the date format
# we need to get the current axes (gca). If we were adding more axes to this figure,
# then gca() will return the current axes
plt.gca().xaxis.set_major_formatter(date_format)
# Save figure. Call before plt.show()
#plt.savefig('line_plot.jpg', dpi=300, facecolor='w', pad_inches=0.1)
# Render figure
plt.show()
Object-based code
= plt.subplots(1,1,figsize=(8,4), dpi=300)
fig, ax 'datetime'], df['tmax'], label='Tmax')
ax.plot(df['datetime'], df['tmin'], label='Tmin')
ax.plot(df['Time', fontsize=10)
ax.set_xlabel('Air Temperature (Celsius)', fontsize=10)
ax.set_ylabel(='both', labelsize=10)
ax.tick_params(axis=10)
ax.legend(fontsize= mdates.DateFormatter('%b-%y')
date_format
# Here we have the axes object saved into the `ax` variable, so it is explicit and
# we just need to access the method within the object.
# We could do this step later, even if we create other axes with different variable names.
ax.xaxis.set_major_formatter(date_format)
# Save figure. Call before plt.show()
'line_plot.jpg', dpi=300, facecolor='w', pad_inches=0.1)
plt.savefig(
# Render figure
plt.show()
Scatter plot
Let’s inspect soil temperature data a 5 and 40 cm depth and see how similar or different these two variables are. A 1:1 to line will serve as the reference of perfect equality.
# Scatter plot
=(5,4))
plt.figure(figsize'soiltemp_5cm'], df['soiltemp_40cm'],
plt.scatter(df[='o', facecolor=(0.8, 0.1, 0.1, 0.3),
marker='k', linewidth=0.5, label='Observations')
edgecolor-5, 35], [-5, 35], linestyle='--', color='k', label='1:1 line') # 1:1 line
plt.plot(['Scatter plot', fontsize=12, fontweight='normal')
plt.title('Soil temperature 5 cm $\mathrm{\degree{C}}$', size=12)
plt.xlabel('Soil temperature 40 cm $\mathrm{\degree{C}}$', size=12)
plt.ylabel(=10)
plt.xticks(fontsize=10)
plt.yticks(fontsize-5, 35])
plt.xlim([-5, 35])
plt.ylim([=10)
plt.legend(fontsize
plt.grid()
# Use the following lines to hide the axis tick marks and labels
#plt.xticks([])
#plt.yticks([])
plt.show()
To remove the italics style in your units and equations use \mathrm{ }
within your LaTeX text.
Object-based syntax
# Scatter plot
= plt.subplots(1, 1, figsize=(6,5), edgecolor='k')
fig, ax 'soiltemp_5cm'], df['soiltemp_40cm'],
ax.scatter(df[='o', facecolor=(0.8, 0.1, 0.1, 0.3),
marker='k', linewidth=0.5, label='Observations')
edgecolor-5, 35], [-5, 35], linestyle='--', color='k', label='1:1 line')
ax.plot(['Scatter plot', fontsize=12, fontweight='normal')
ax.set_title("Soil temperature 5 cm $^\degree{C}$", size=12)
ax.set_xlabel("Soil temperature 40 cm $^\degree{C}$", size=12)
ax.set_ylabel(-5, 35])
ax.set_xlim([-5, 35])
ax.set_ylim([='both', labelsize=12)
ax.tick_params(axisTrue)
ax.grid( plt.show()
Histogram
One of the most common and useful charts to describe the distribution of a dataset is the histogram.
# Histogram
=(6,5))
plt.figure(figsize'vwc_5cm'], bins='scott', density=False,
plt.hist(df[='g', alpha=0.75, edgecolor='black', linewidth=1.2)
facecolor'Soil moisture distribution', fontsize=12)
plt.title('Soil moisture $cm^3 cm^{-3}$', fontsize=12)
plt.xlabel('Count', fontsize=12)
plt.ylabel(=10)
plt.xticks(fontsize=10)
plt.yticks(fontsize
= df['vwc_5cm'].mean()
avg = f"Mean = {avg:.3f} "
ann_val = "$\mathrm{cm^3 cm^{-3}}$"
ann_units
-0.01, 85, ann_val + ann_units,
plt.text(avg=10, rotation=90, family='color='black')
sizeplt.axvline(df['vwc_5cm'].mean(), linestyle='--', color='k')
plt.show()
Object-based syntax
# Histogram
= plt.subplots(figsize=(6,5))
fig, ax 'vwc_5cm'], bins='scott', density=False,
ax.hist(df[='g', alpha=0.75, edgecolor='black', linewidth=1.2)
facecolor
= df['vwc_5cm'].mean()
avg = f"Mean = {avg:.3f} "
ann_val = "$\mathrm{cm^3 cm^{-3}}$"
ann_units
-0.01, 85, ann_val + ann_units, size=10, rotation=90, color='black')
ax.text(avg'Volumetric water content $cm^3 cm^{-3}$', fontsize=12)
ax.set_xlabel('Count', fontsize=12)
ax.set_ylabel('both', labelsize=12)
ax.tick_params('Soil moisture distribution', fontsize=12)
ax.set_title('vwc_5cm'].mean(), linestyle='--', color='k')
ax.axvline(df[ plt.show()
Subplots
In fields like agronomy, environmental science, hydrology, and meteorology sometimes we want to show multiple variables in one figure, but in different charts. Other times we want to show the same variable, but in separate charts for different locations or sites. In Matplotlib we can achieve this using subplots.
# Subplots with all labels and axis tick marks
# Define date format
= mdates.ConciseDateFormatter(mdates.AutoDateLocator)
date_format
# Create figure
=(10,6))
plt.figure(figsize
# Set width and height spacing between subplots
=0.3, hspace=0.4)
plt.subplots_adjust(wspace
# Add superior title for entire figure
'Kings Creek temperatures 2022-2023')
plt.suptitle(
# Subplot 1 of 4
2, 2, 1)
plt.subplot("datetime"], df["tavg"])
plt.plot(df['Air temperature', fontsize=12)
plt.title('Temperature', fontsize=12)
plt.ylabel(-20, 40])
plt.ylim([
plt.gca().xaxis.set_major_formatter(date_format)'datetime'].iloc[0], 32, 'A', fontsize=14)
plt.text( df[
# Hide tick labels on the x-axis. Add bottom=False to remove the ticks
='x', labelbottom=False)
plt.gca().tick_params(axis
# Subplot 2 of 4
2, 2, 2)
plt.subplot("datetime"], df["soiltemp_5cm"])
plt.plot(df['Soil temperature 5 cm', size=12)
plt.title('Temperature', size=12)
plt.ylabel(-20, 40])
plt.ylim([
plt.gca().xaxis.set_major_formatter(date_format)'datetime'].iloc[0], 32, 'B', fontsize=14)
plt.text( df[='x', labelbottom=False)
plt.gca().tick_params(axis
# Subplot 3 of 4
2, 2, 3)
plt.subplot("datetime"], df["soiltemp_20cm"])
plt.plot(df['Soil temperature 20 cm', size=12)
plt.title('Temperature', size=12)
plt.ylabel(-20, 40])
plt.ylim([
plt.gca().xaxis.set_major_formatter(date_format)'datetime'].iloc[0], 32, 'C', fontsize=14)
plt.text( df[
# Subplot 4 of 4
2, 2, 4)
plt.subplot('datetime'].iloc[0], 32, 'D', fontsize=14)
plt.text( df["datetime"], df["soiltemp_40cm"])
plt.plot(df['Soil temperature 40 cm', size=12)
plt.title('Temperature', size=12)
plt.ylabel(-20, 40])
plt.ylim([
plt.gca().xaxis.set_major_formatter(date_format)'datetime'].iloc[0], 32, 'D', fontsize=14)
plt.text( df[
# Adjust height padding (hspace) and width padding (wspace)
# between subplots using fractions of the average axes height and width
=0.3, wspace=0.3)
plt.subplots_adjust(hspace
# Render figure
plt.show()
Object-based syntax
# Subplots
# Define date format
= mdates.ConciseDateFormatter(mdates.AutoDateLocator)
date_format
# Create figure (each row is returned as a tuple of axes)
= plt.subplots(2, 2, figsize=(10,6))
fig, ((ax1,ax2),(ax3,ax4))
# Set width and height spacing between subplots
=0.3, hspace=0.4)
fig.subplots_adjust(wspace
# Add superior title for entire figure
'Kings Creek temperatures 2022-2023')
fig.suptitle(
'Air temperature', size=12)
ax1.set_title('Temperature', size=12)
ax1.set_ylabel(-20, 40])
ax1.set_ylim([
ax1.xaxis.set_major_formatter(date_format)'datetime'].iloc[0], 32, 'A', fontsize=14)
ax1.text( df[='x', labelbottom=False)
ax1.tick_params(axis
'Soil temperature 5 cm', size=12)
ax2.set_title('Temperature', size=12)
ax2.set_ylabel(-20, 40])
ax2.set_ylim([
ax2.xaxis.set_major_formatter(date_format)'datetime'].iloc[0], 32, 'B', fontsize=14)
ax2.text( df[='x', labelbottom=False)
ax2.tick_params(axis
'Soil temperature 20 cm', size=12)
ax3.set_title('Temperature', size=12)
ax3.set_ylabel(-20, 40])
ax3.set_ylim([
ax3.xaxis.set_major_formatter(date_format)'datetime'].iloc[0], 32, 'C', fontsize=14)
ax3.text( df[
'Soil temperature 40 cm', size=12)
ax4.set_title('Temperature', size=12)
ax4.set_ylabel(-20, 40])
ax4.set_ylim([
ax4.xaxis.set_major_formatter(date_format)'datetime'].iloc[0], 32, 'D', fontsize=14)
ax4.text( df[
# ------ ADDING ALL LINES AT THE END -----
# To illustrate the power of the object-based notation I set
# all the line plots here at the end. In the function-based syntax you are forced
# to set all the elements and attributes within the block of code for that subplot
"datetime"], df["tavg"])
ax1.plot(df["datetime"], df["soiltemp_5cm"])
ax2.plot(df["datetime"], df["soiltemp_20cm"])
ax3.plot(df["datetime"], df["soiltemp_40cm"])
ax4.plot(df[
plt.show()
Fill area plots
To illustrate the use of filled area charts we will use a time series of drought conditions obtained from the U.S. Drought Monitor.
# Read U.S. Drought Monitor data
= pd.read_csv('../datasets/riley_usdm_20210701_20220916.csv',
df_usdm =['MapDate'], date_format='%Y%m%d')
parse_dates3) df_usdm.head(
MapDate | FIPS | County | State | None | D0 | D1 | D2 | D3 | D4 | ValidStart | ValidEnd | StatisticFormatID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2022-09-13 | 20161 | Riley County | KS | 0.00 | 100.00 | 0.0 | 0.0 | 0.0 | 0.0 | 2022-09-13 | 2022-09-19 | 1 |
1 | 2022-09-06 | 20161 | Riley County | KS | 0.00 | 100.00 | 0.0 | 0.0 | 0.0 | 0.0 | 2022-09-06 | 2022-09-12 | 1 |
2 | 2022-08-30 | 20161 | Riley County | KS | 81.02 | 18.98 | 0.0 | 0.0 | 0.0 | 0.0 | 2022-08-30 | 2022-09-05 | 1 |
# Fill area plot
= plt.figure(figsize=(8,3))
fig 'Drough COnditions for Riley County, KS')
plt.title('MapDate'], df_usdm['D0'],
plt.fill_between(df_usdm[='yellow', edgecolor='k', label='D0-D4')
color'MapDate'], df_usdm['D1'],
plt.fill_between(df_usdm[='navajowhite', edgecolor='k', label='D1-D4')
color'MapDate'], df_usdm['D2'],
plt.fill_between(df_usdm[='orange', edgecolor='k', label='D2-D4')
color'MapDate'], df_usdm['D3'],
plt.fill_between(df_usdm[='red', edgecolor='k', label='D3-D4')
color'MapDate'], df_usdm['D4'],
plt.fill_between(df_usdm[='maroon', edgecolor='k', label='D4')
color
0,105)
plt.ylim(=(1.18, 1.05))
plt.legend(bbox_to_anchor'Area (%)', fontsize=12)
plt.ylabel( plt.show()
Object-based syntax
# Fill area plot
= plt.subplots(figsize=(8,3))
fig, ax 'Drough Conditions for Riley County, KS')
ax.set_title('MapDate'], df_usdm['D0'],
ax.fill_between(df_usdm[='yellow', edgecolor='k', label='D0-D4')
color'MapDate'], df_usdm['D1'],
ax.fill_between(df_usdm[='navajowhite', edgecolor='k', label='D1-D4')
color'MapDate'], df_usdm['D2'],
ax.fill_between(df_usdm[='orange', edgecolor='k', label='D2-D4')
color'MapDate'], df_usdm['D3'],
ax.fill_between(df_usdm[='red', edgecolor='k', label='D3-D4')
color'MapDate'], df_usdm['D4'],
ax.fill_between(df_usdm[='maroon', edgecolor='k', label='D4')
color
0,105)
ax.set_ylim(=(1.18, 1.05))
ax.legend(bbox_to_anchor'Area (%)', fontsize=12)
ax.set_ylabel( plt.show()
Secondary Y axis plots
Sometimes we want to show two related variables with different range or entirely different units in the same chart. In this case, two chart axes can share the same x-axis but have two different y-axes. A typical example of this consists of displaying soil moisture variations together with precipitation. While less common, it is also possible for two charts to share the same y-axis and have two different x-axes.
# Creating plot with secondary y-axis
=(8,4))
plt.figure(figsize
"datetime"], df["vwc_5cm"], '-k')
plt.plot(df['Time', size=12)
plt.xlabel('Volumetric water content', color='k', size=12)
plt.ylabel(
plt.twinx()
"datetime"], df["prcp"], width=2, color='tomato', linestyle='-')
plt.bar(df['Precipitation (mm)', color='tomato', size=12)
plt.ylabel(0, 50])
plt.ylim([
plt.show()
Object-based syntax
# Creating plot with secondary y-axis
= plt.subplots(figsize=(8,4), facecolor='w')
fig, ax
"datetime"], df["vwc_5cm"], '-k')
ax.plot(df['Time', size=12)
ax.set_xlabel('Volumetric water content', color='k', size=12)
ax.set_ylabel(
= ax.twinx()
ax2
"datetime"], df["prcp"], width=2, color='tomato', linestyle='-')
ax2.bar(df['Precipitation (mm)', color='tomato', size=12)
ax2.set_ylabel(0, 50])
ax2.set_ylim([
plt.show()
Donut charts
In agronomy, often times we need to represent complex cropping systems with multiple crops and fallow periods. In a single figure, donut charts can display:
- the crop sequence,
- the duration of each crop in the sequence,
- the time of the year in which each crop is in the field from planting to harvesting,
- and the number of yers of the rotation
Here is an example for a typical two-year crop rotation in the U.S. Midwest that consists of winter wheat, double crop soybeans, a winter fallow period, and corn.
# Crops of the rotation
= ['W','SB','F','C'] # Crops labels
crops = [9, 5, 4, 6] # Duration of each crop in the rotation
crop_values = ['wheat','forestgreen','whitesmoke','orange']
crop_colors
# Months of the year
= ['J', 'F', 'M', 'A', 'M', 'J', 'J', 'A', 'S', 'O', 'N', 'D'] * 2
months = [1] * len(months) # Each month is given a value of 1 for equal distribution
month_values
# Here are other alternatives for coloring the month tiles
#month_colors = plt.cm.tab20c(np.linspace(0, 1, 12))
#month_colors = ['tomato']*12 + ['lightgreen']*12
# Years of the rotation
= ['Y1','Y2']
years = [12] * 2
year_values = ['hotpink','lightblue']
year_colors
# Create figure and axis
=(5,5))
plt.figure(figsize'Wheat-Soybean-Corn Rotation')
plt.title(
# Crops donut chart
=1.65, labels=crops,
plt.pie(crop_values, radius=dict(width=0.35, edgecolor='k', linewidth=0.5),
wedgeprops=90, labeldistance=0.83, rotatelabels=True,
startangle=False, colors=crop_colors)
counterclock
# Month donut chart
=1.2, labels=months,
plt.pie(month_values, radius=dict(width=0.3, edgecolor='k', linewidth=0.5),
wedgeprops=45, labeldistance=0.8, rotatelabels=True,
startangle=False, colors='w')
counterclock
# Years donut chart
=0.8, labels=years,
plt.pie(year_values, radius=dict(width=0.25, edgecolor='k', linewidth=0.5),
wedgeprops=45, labeldistance=0.65, rotatelabels=True, counterclock=False,
startangle=year_colors)
colors
# Add annotation (83% of the rotation time is with crops)
-0.25, -0.05, f"{83:.0f}%", fontsize=20)
plt.text(
# Equal aspect ratio ensures that pie charts are drawn as circles
'equal')
plt.axis(
plt.tight_layout()
plt.show()
Themes
In addition to the default style, Matplotlib also offers a variery of pre-defined themes. To see some examples visit the following websites:
Gallery 1 at: https://matplotlib.org/gallery/style_sheets/style_sheets_reference.html
Gallery 2 at: https://tonysyu.github.io/raw_content/matplotlib-style-gallery/gallery.html
# Run this line to see all the styling themes available
print(plt.style.available)
['Solarize_Light2', '_classic_test_patch', '_mpl-gallery', '_mpl-gallery-nogrid', 'bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'seaborn-v0_8', 'seaborn-v0_8-bright', 'seaborn-v0_8-colorblind', 'seaborn-v0_8-dark', 'seaborn-v0_8-dark-palette', 'seaborn-v0_8-darkgrid', 'seaborn-v0_8-deep', 'seaborn-v0_8-muted', 'seaborn-v0_8-notebook', 'seaborn-v0_8-paper', 'seaborn-v0_8-pastel', 'seaborn-v0_8-poster', 'seaborn-v0_8-talk', 'seaborn-v0_8-ticks', 'seaborn-v0_8-white', 'seaborn-v0_8-whitegrid', 'tableau-colorblind10']
# Change plot defaults to ggplot style (similar to ggplot R language library)
# Use plt.style.use('default') to revert style
'ggplot') # Use plt.style.use('default') to revert.
plt.style.use(
=(8,4))
plt.figure(figsize"datetime"], df["srad"], '-g')
plt.plot(df["Solar radiation $MJ \ m^{-2} \ day^{-1}$")
plt.ylabel(
plt.show()
Bokeh module
The Bokeh plotting library was designed for creating interactive visualizations for modern web browsers. Compared to Matplotlib, which excels in creating static plots, Bokeh emphasizes interactivity, offering tools to create dynamic and interactive graphics. Additionally, Bokeh integrates well with the Pandas library and provides a consistent and standardized syntax. This focus on interactivity and ease of use makes Bokeh well suited for web-based data visualizations and applications.
Unlike Matplotlib, where you typically import the entire library with a single command, Bokeh is organized into various sub-modules catered to different functionalities. This structure means that you import specific components from their respective modules, which aligns with the functionality you intend to use. While this might require a bit more upfront learning about the library, it also means that you are only importing what you need.
Interactive line plot
# Create figure
= figure(width=700, height=350, title='Kings Creek', x_axis_type='datetime')
p
# Add line to the figure
'datetime'], df['tmin'], line_color='blue', line_width=2, legend_label='Tmin')
p.line(df[
# Add another line. In this case I used a different, but equivalent, syntax
# This syntax leverages the dataframe and its column names
=df, x='datetime', y='tmax', line_color='tomato', line_width=2, legend_label='Tmax')
p.line(source
# Customize figure properties
= 'Time'
p.xaxis.axis_label = 'Air Temperature (Celsius)'
p.yaxis.axis_label
# Set the font size of the x-axis and y-axis labels
= '12pt' # Defined as a string using points simialr to Word
p.yaxis.axis_label_text_font_size = '12pt'
p.xaxis.axis_label_text_font_size
# Set up the size of the labels in the major ticks
= '12pt'
p.xaxis.major_label_text_font_size = '12pt'
p.yaxis.major_label_text_font_size
# Add legend
= "top_left"
p.legend.location = "Legend"
p.legend.title = "italic"
p.legend.label_text_font_style = "black"
p.legend.label_text_color = 1
p.legend.border_line_width = "navy"
p.legend.border_line_color = 0.8
p.legend.border_line_alpha = "white"
p.legend.background_fill_color = 0.9
p.legend.background_fill_alpha
# Hover tool for interactive tooltips on mouse hover over the plot
=[("Date:", "$x{%F}"),("Temperature:","$y{%0.1f} Celsius")],
p.add_tools(HoverTool(tooltips={'$x':'datetime', '$y':'printf'},
formatters='mouse'))
mode# Display figure
show(p)
Seaborn module
Seaborn is a plotting library based on Matplotlib, specifically tailored for statistical data visualization. It stands out in scientific research for its ability to create informative and attractive statistical graphics with ease. Seaborn integrates well with Pandas DataFrames and its default styles and color palettes are designed to be aesthetically pleasing and ready for publication. Seaborn offers complex visualizations like heatmaps, violin plots, boxplots, and matrix scatter plots.
# Import Seaborn module
import seaborn as sns
# Use the following style for a matplotlib-like style
# Comment line to have a ggplot-like style
="ticks") sns.set_theme(style
Line plot
# Basic line plot
= df[['datetime', 'vwc_5cm', 'vwc_20cm', 'vwc_40cm']].copy()
df_subset 'year'] = df_subset['datetime'].dt.year
df_subset[
=(8,4))
plt.figure(figsize=df_subset, x='datetime', y='vwc_5cm',
sns.lineplot(data='year', palette=['tomato','navy']) # Can also use palette='Set1'
hue'Volumetric water content')
plt.ylabel( plt.show()
Correlation matrix
# Compute the correlation matrix
= df.corr(numeric_only=True)
corr
# Generate a mask for the upper triangle
= np.triu(np.ones_like(corr, dtype=bool))
mask
# Draw the heatmap with the mask and correct aspect ratio
=mask, cmap='Spectral', vmax=.3, center=0,
sns.heatmap(corr, mask=True, linewidths=.5, cbar_kws={"shrink": .5}) square
<Axes: >
Scatterplot matrix
# Create subset of main dataframe
= df[['datetime','tavg', 'vpd', 'srad', 'wspd']].copy()
df_subset 'year'] = df_subset['datetime'].dt.year
df_subset[
=(8,6))
plt.figure(figsize=df_subset, hue="year", palette="Set2")
sns.pairplot(data plt.show()
<Figure size 800x600 with 0 Axes>
Heatmap
Visualization of air and soil temperature at 5, 20, and 40 cm depths on a weekly basis.
# Summarize data by month
= df[['datetime', 'tavg', 'soiltemp_5cm', 'soiltemp_20cm', 'soiltemp_40cm']].copy()
df_subset 'week'] = df['datetime'].dt.isocalendar().week
df_subset[
# Average the values for both years on a weekly basis
= df_subset.groupby(["week"]).mean(numeric_only=True).round(2)
df_subset
# Create Heatmap
=(15,3))
plt.figure(figsize=False, linewidths=1, cmap="RdBu_r")
sns.heatmap(df_subset.T, annot
plt.show()
Boxplot
# Summarize data by month
= df[['datetime', 'vwc_5cm']].copy()
df_subset 'month'] = df['datetime'].dt.month
df_subset[
# Draw a nested boxplot to show bills by day and time
=(8,6))
plt.figure(figsize=df_subset, x="month", y="vwc_5cm", hue='month')
sns.boxplot(data=10, trim=True)
sns.despine(offset plt.show()