# Import numpy module
import numpy as np
import matplotlib.pyplot as plt # We will need this for some figures
28 Numpy module
Numpy is a Python library for efficient numerical computing that is widely used in data science and engineering. Numpy provides support for large, multi-dimensional arrays and matrices, along with a collection of high-level functions to operate on these arrays. The true power of Numpy lies in the ability to vectorize computations, which allow for element-wise operations on arrays without the need for explicit loops. This feature enhances performance and efficiency, particularly in data-intensive tasks. Numpy Documentation
Element-wise computations using Numpy arrays
We will start this tutorial by learning about some of the limitations of traditional Python lists. Then, the power of Numpy element-wise (or vectorized) computations will become evident. As much as I like agronomic examples, for the first few exercises I will use some trivial arrays of numbers to keep it simple.
Vectorized or element-wise computations refer to operations that are performed on arrays (vectors, matrices) directly and simultaneously. Instead of processing elements one by one using loops, vectorized operations apply a single action to each element in the array without explicit iteration. This leads to more concise code and often improved performance. Both vectorized
and element-wise
temrs are correct and often are used interchangeably.
Let’s begin by importing the Numpy module, so that we already have it for the entire tutorial.
Product of a regular list by a scalar
# Create a list of elements
= [1,2,3,4]
A
# Multiply the list by a scalar
print(A * 3)
[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]
Product of two regular lists with the same shape
# Create list B, with the same size as A
= [5,6,7,8]
B
# Multiply the two lists together. Heads up! This will not work!
print(A*B)
TypeError: can't multiply sequence by non-int of type 'list'
The first operation repeats the list three times, which is probably not exactly what you were expecting. The second example results in an error. Now let’s import the Numpy module and try these operations again to see what happens.
Product of a Numpy array by a scalar
# Create a list of elements
= np.array([1,2,3,4])
A
# Check type of newly created Numpy array
print(type(A))
# Multiply array by a scalar
print(A * 3)
<class 'numpy.ndarray'>
[ 3 6 9 12]
Product of two Numpy arrays with the same shape
Notice that the operation occurs for corresponding elements of each array.
# Re-define the previous B array as a numpy array
= np.array([5,6,7,8])
B
# Multiply the two vectors
print(A * B)
[ 5 12 21 32]
Other operations with Numpy arrays
print(A * 3) # Vector times a scalar
print(A + B)
print(A - B)
print(A * B)
print(A / B)
print(np.sqrt(A**2 + B**2)) # Exponentiation Calculate hypotenuse of multiple rectangle triangle
print(A.sum())
print(B.sum())
[ 3 6 9 12]
[ 6 8 10 12]
[-4 -4 -4 -4]
[ 5 12 21 32]
[0.2 0.33333333 0.42857143 0.5 ]
[5.09901951 6.32455532 7.61577311 8.94427191]
10
26
Example 1: Compute soil water storage for a single field
Soil water storage directly influences plant growth and crop yield since it is an essential processes for transpiration and nutrient uptake. In irrigated cropping systems, monitoring of soil water storage is important for irrigation scheduling and effective water management and conservation.
Assume a field that has a soil profile with five horizons, each measuring: 10, 20, 30, 30, and 30
cm in thickness (so about 1.2 m depth). The volumetric water content for each horizon was determined by soil moisture sensors located at the center of each horizon, hence providing an average moisture value of: 0.350, 0.280, 0.255, 0.210, 0.137
cm^3/cm^3. Based on this information, compute the soil water storage for each soil horizon and the total for the entire soil profile. Recall that the volumetric water content represents the volume of water per unit volume of soil, so 0.350 cm^3/cm^3 is the same as 35% moisture by volume or by thickness of the soil horizon. How much irrigation water does a farmer need to add to reach a field capacity of 420 mm?
# Define variables
= np.array([0.350, 0.280, 0.255, 0.210, 0.137]) # cm^3/cm^3
theta_v = np.array([10, 20, 30, 30, 30]) * 10 # horizons in mm
depths
# Compute water storage for each layer
= theta_v * depths # mm of water per layer
storage_per_layer print('Storage for each layer:', np.round(storage_per_layer,1))
# Compute soil water storage for the entire profile
= np.sum(storage_per_layer)
profile_storage print(f'Total water storage in profile is: {profile_storage:.1f} mm')
= 420 # mm
field_capacity = field_capacity - profile_storage
irrigation_req print(f'Required irrigation: {irrigation_req:.1f} mm')
Storage for each layer: [35. 56. 76.5 63. 41.1]
Total water storage in profile is: 271.6 mm
Required irrigation: 148.4 mm
Since the water storage for the entire soil profile is the weighted sum of the volumetric water content by the thickness of each layer, we can also use the dot
product:
Let’s review a few operations:
depths = np.array([10, 20, 30, 30, 30]) * 10
is the product of a vector and a scalar
storage_per_layer = theta_v * depths
is a vector times another vector
# dot product
print(np.dot(theta_v, depths), 'mm')
271.6 mm
Example 2: Compute soil water storage for multiple fields
Now imagine that we are managing three irrigated fields in the region? Assume that all the fields are nearby and have the same soil horizons, but farmers have different crops and irrigation strategies so they also have different soil moisture contents across the profile. What is the soil water storage of each field? How much water do we need to apply in each field to bring them back to field capacity?
# Example soil moisture for the 5 horizons and three fields
# The first field is the same as in the previous exercise
= np.array([[0.350, 0.280, 0.255, 0.210, 0.137],
theta_v 0.250, 0.380, 0.355, 0.110, 0.250],
[0.150, 0.180, 0.155, 0.110, 0.320]]) # cm^3/cm^3
[
# Compute storage for all horizons
= theta_v * depths
storage_per_layer print(storage_per_layer)
# Compute
= np.sum(storage_per_layer, axis=1) # axis=1 means add along columns
storage_profiles
# Alternatively we can do the dot product
# storage_profiles = np.dot(theta_v, depths)
# Show storage values
print(storage_profiles)
# Irrigation requirements
= field_capacity - storage_profiles
irrigation_req print('Irrigation for each field:', irrigation_req, 'mm')
[[ 35. 56. 76.5 63. 41.1]
[ 25. 76. 106.5 33. 75. ]
[ 15. 36. 46.5 33. 96. ]]
[271.6 315.5 226.5]
Irrigation for each field: [148.4 104.5 193.5] mm
Example 3: Determine the CEC of a soil
The cation exchange capacity (CEC, meq/100 g of soil) of a soil is determined by the nature and amount of clay minerals and organic matter. Compute the CEC of a soil that has 32% clay and 3% organic matter. The clay fraction is represented by 30% kaolinite, 50% montmorillonite, and 20% vermiculite. The CEC for clay minerals and organic matter can be found in most soil fertility textbooks.
# Determine percentage of each clay mineral
= np.array([4]) # percent
om = np.array([200]) # meq/100 g
om_cec
= 32 * np.array([30, 50, 20])/100 # This is the % of each clay type
clay = np.array([10, 100, 140]) # meq/100 g
clay_cec
# Merge the fractions and CEC together into a single aray
= np.concatenate((om, clay))/100 # percent to fraction
all_fractions = np.concatenate((om_cec, clay_cec))
all_cec
print(all_fractions)
print(all_cec)
[0.04 0.096 0.16 0.064]
[200 10 100 140]
# Compute soil CEC as the weighed-sum of its components
= np.sum(all_cec * all_fractions)
soil_cec print(f'The soil CEC is: {soil_cec:.1f} meq/100 g of soil')
The soil CEC is: 33.9 meq/100 g of soil
Let’s review a few operations:
np.array([30, 50, 20])/100
is a vector divided by a scalar.
all_cec * all_fractions
is a vector times another vector
Create arrays with specific data types
# An alternative by specifying the data type
print(np.array([1,2,3,4], dtype="int64"))
print(np.array([1,2,3,4], dtype="float64"))
[1 2 3 4]
[1. 2. 3. 4.]
A common pitfall among beginners is to create a numpy array only using parentheses like this: array = np.array(1,2,3,4)
. This will not work.
Operations with two-dimensional arrays
# Define arrays
# The values in M and v were arbitrarily selected
# so that the operations result in round numbers for clarity. You can change them.
= np.array([ [10,2,1], [25,6,55] ]) # 2D matrix
M = np.array([0.2, 0.5, 1]) # 1D vector v
# Access Numpy array shape and size properties
print(M.shape) # rows and columns
print(M.size) # Total number of elements
(2, 3)
6
# Element-wise multiplication
print('Matrix by a scalar')
print(M*2)
print('Matrix by a vector with same number of columns')
print(M * v)
print('Matrix by matrix of the same size')
print(M*M)
Matrix by a scalar
[[ 20 4 2]
[ 50 12 110]]
Matrix by a vector with same number of columns
[[ 2. 1. 1.]
[ 5. 3. 55.]]
Matrix by matrix of the same size
[[ 100 4 1]
[ 625 36 3025]]
# Dot product (useful for linear algebra operations)
# In this case the dot product is row-wise (sum of Matrix by a vector, see above)
print('Dot product operation')
np.dot(M,v)
Dot product operation
array([ 4., 63.])
Reshape arrays
# Reshape M array (original 2 rows 3 columns)
print(M)
# Reshape to 3 rows and 2 columns
print(M.reshape(3,2))
# Reshape to 1 row and 0 columns (1D array)
print(M.reshape(6,1))
print(M.reshape(6,1).shape) # Check the shape
# Similar
[[10 2 1]
[25 6 55]]
[[10 2]
[ 1 25]
[ 6 55]]
[[10]
[ 2]
[ 1]
[25]
[ 6]
[55]]
(6, 1)
Numpy boolean operations
= np.array([1,2,3,4]) == 3
expr1 print(expr1)
= np.array([1,2,3,4]) == 2
expr2 print(expr2)
[False False True False]
[False True False False]
# Elements in both vectors need to match to be considered True
print(expr1 & expr2) # print(expr1 and expr2)
# It is sufficient with a single match in one of the vectors
print(expr1 | expr2) # print(expr1 or expr2)
[False False False False]
[False True True False]
Flattening
Sometimes we want to serialize our 2D or 3D matrix into a long one-dimensional vector. This operation is called “flattening” and Numpy has a specific method to carry this operation that is called flattening and array.
# Flatten two-dimensional array M
M.flatten()
array([10, 2, 1, 25, 6, 55])
Use the Numpy random module to create a random image
To show some of the power of working with Numpy arrays we will create a random image. Images in the RGB (red-green-blue) color space are represented by three matrices that encode the color of each pixel. Colors are represented with an integer between 0 and 255. This is called an 8-bit integer, and there are only 2^8 = 256 possible integers. Because each image has three bands (one for red, one for green, and one for blue) there is a total of 17 million (256x256x256 or 256^3) possible colors for each pixel.
# Define image size (keep it small so that you can see each pixel)
= 20
rows = 30
cols
# Set seed for reproducibility.
# Everyone will obtain the same random numbers
1)
np.random.seed(
# Create image bands
# uint8 means unsigned interger of 8-bits
= np.random.randint(0, 255, (rows, cols), dtype='uint8')
R = np.random.randint(0, 255, (rows, cols), dtype='uint8')
G = np.random.randint(0, 255, (rows, cols), dtype='uint8')
B
# Stack image bands (axis=2 means along the third dimension, or on top of each other)
= np.stack( (R,G,B), axis=2)
RGB print('Image size:', RGB.shape) # Shape of the RGB variable
# Display image using the matplotlib library (we imported this at the top)
=(8,8))
plt.figure(figsize
plt.imshow(RGB)'off') # Mute this line to see what the image looks like without it.
plt.axis( plt.show()
Image size: (20, 30, 3)
Numpy handy functions
# Generate a range of integers
# np.arange(start,stop,step)
print('range()')
print(np.arange(0,100,10))
# Generate linear range
# numpy.linspace(start, stop, num=50, endpoint=True)
print('')
print('linspace()')
print(np.linspace(0, 10, 5))
# Array of zeros
print('')
print('zeros()')
print(np.zeros([5,3]))
# Array of ones
print('')
print('ones()')
print(np.ones([4,3]))
# Array of NaN values
print('')
print('full()')
print(np.full([4,3], np.nan)) # This also worksprint(np.ones([4,3])*np.nan)
# Meshgrid (first create 1D vectors, then create a 2D mesh)
= 5
N = np.linspace(36, 40, N)
lat = np.linspace(-102, -98, N)
lon = np.meshgrid(lat,lon)
LAT,LON print('')
print('Grid of latitudes')
print(LAT)
print('')
print('Grid of longitudes')
print(LON)
range()
[ 0 10 20 30 40 50 60 70 80 90]
linspace()
[ 0. 2.5 5. 7.5 10. ]
zeros()
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
ones()
[[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]
[1. 1. 1.]]
full()
[[nan nan nan]
[nan nan nan]
[nan nan nan]
[nan nan nan]]
Grid of latitudes
[[36. 37. 38. 39. 40.]
[36. 37. 38. 39. 40.]
[36. 37. 38. 39. 40.]
[36. 37. 38. 39. 40.]
[36. 37. 38. 39. 40.]]
Grid of longitudes
[[-102. -102. -102. -102. -102.]
[-101. -101. -101. -101. -101.]
[-100. -100. -100. -100. -100.]
[ -99. -99. -99. -99. -99.]
[ -98. -98. -98. -98. -98.]]
Create a noisy wave
With Numpy we can easily implement models, create timeseries, add noise, and perform trigonometric operations. In this example we will create a synthetic timeseries of air temperature using a cosine wave. To make this more realistic we will also add some noise.
# Set random seed for reproducibility
1)
np.random.seed(
# Define wave inputs
= 15 # Annual average in Celsius
T_avg = 10 # Annual amplitude [Celsius]
A = np.arange(1,366) # Vector of days of the year
doy
# Generate x and y axis
= 2 * np.pi * doy/365 # Convert doy into pi-radians
x = T_avg - A*np.cos(x) # Sine wave
y
# Add random noise
= np.random.normal(0, 3, x.size) # White noise having zero mean
noise = y + noise
y_noisy
# Visualize wave using Matplotlib
=(6,3))
plt.figure(figsize'Noisy temperature timeseries')
plt.title('-k', label="Wave with noise")
plt.plot(doy, y_noisy, '-r', linewidth=2, label="Wave without noise")
plt.plot(doy, y, 'Day of the Year', size=12)
plt.xlabel('Air Temperature (Celsius)', size=12)
plt.ylabel(
plt.legend() plt.show()
Descriptive stats
To finish our tutorial, let’s inpsect Numpy methods to obtain some descriptive statistics for the wave we created earlier.
# Descriptive stats
print('Mean:', y.mean()) # Arithmetic average
print('Standard deviation:', y.std()) # Standard deviation
print('Variance:', y.var()) # Variance
print('Median:', np.median(y)) # Median
print('Minimum:', y.min()) # Minimum
print('Maximum:', y.max()) # Maximum
print('Index of minimum:', y.argmin()) # Position of minimum value
print('Index of maximum:', y.argmax()) # Position of maximum value
print('50th percentile:', np.percentile(y, 50)) # 50th percentile (should equal to the median)
print('5th and 95th percentiles:', np.percentile(y, [5,95])) # 5th and 95th percentile
Mean: 15.0
Standard deviation: 7.0710678118654755
Variance: 50.0
Median: 15.000000000000007
Minimum: 5.000092602638098
Maximum: 24.999907397361902
Index of minimum: 273
Index of maximum: 90
50th percentile: 15.000000000000007
5th and 95th percentiles: [ 5.12930816 24.87069184]
Reference
Walt, S.V.D., Colbert, S.C. and Varoquaux, G., 2011. The NumPy array: a structure for efficient numerical computation. Computing in science & engineering, 13(2), pp.22-30.