# Import modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gamma
from scipy.optimize import curve_fit77 Rainfall generator
rainfall, weather generator, stochastic
Rainfall weather generators are tools designed to simulate realistic precipitation events based on statistical properties of historical actual rainfall data. These generators often rely on probability distributions to model the frequency, intensity, and duration of rain events within a specific region. By fitting historical rainfall patterns to known statistical distributions, such as the Poisson distribution for occurrence and the Gamma or Log-Normal distributions for rainfall amounts, weather generators can create synthetic rainfall series that mimic the natural variability observed in real-world precipitation records.
Rainfall generators are useful in agricultural and hydrological modeling by providing a way to generate plausible rainfall scenarios to assist researchers and practitioners in planning and decision-making processes, particularly in contexts where understanding the variability and extremes of rainfall is essential.
Read and inspect dataset
# Load sample data
df = pd.read_csv("../datasets/KS_Manhattan_6_SSW.csv",
                 na_values=[-9999], parse_dates=['LST_DATE'], date_format='%Y%m%d')
# Inspect a few rows
df.head(3)| WBANNO | LST_DATE | CRX_VN | LONGITUDE | LATITUDE | T_DAILY_MAX | T_DAILY_MIN | T_DAILY_MEAN | T_DAILY_AVG | P_DAILY_CALC | ... | SOIL_MOISTURE_5_DAILY | SOIL_MOISTURE_10_DAILY | SOIL_MOISTURE_20_DAILY | SOIL_MOISTURE_50_DAILY | SOIL_MOISTURE_100_DAILY | SOIL_TEMP_5_DAILY | SOIL_TEMP_10_DAILY | SOIL_TEMP_20_DAILY | SOIL_TEMP_50_DAILY | SOIL_TEMP_100_DAILY | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 53974 | 2003-10-01 | 1.201 | -96.61 | 39.1 | NaN | NaN | NaN | NaN | NaN | ... | -99.0 | -99.0 | -99.0 | -99 | -99 | NaN | NaN | NaN | NaN | NaN | 
| 1 | 53974 | 2003-10-02 | 1.201 | -96.61 | 39.1 | 18.9 | 2.5 | 10.7 | 11.7 | 0.0 | ... | -99.0 | -99.0 | -99.0 | -99 | -99 | NaN | NaN | NaN | NaN | NaN | 
| 2 | 53974 | 2003-10-03 | 1.201 | -96.61 | 39.1 | 22.6 | 8.1 | 15.4 | 14.8 | 0.0 | ... | -99.0 | -99.0 | -99.0 | -99 | -99 | NaN | NaN | NaN | NaN | NaN | 
3 rows × 28 columns
# Add year, month, and day of the year to summarize data in future steps.
df["YEAR"] = df["LST_DATE"].dt.year
df["MONTH"] = df["LST_DATE"].dt.month
df["DOY"] = df["LST_DATE"].dt.dayofyear
df.head(3)| WBANNO | LST_DATE | CRX_VN | LONGITUDE | LATITUDE | T_DAILY_MAX | T_DAILY_MIN | T_DAILY_MEAN | T_DAILY_AVG | P_DAILY_CALC | ... | SOIL_MOISTURE_50_DAILY | SOIL_MOISTURE_100_DAILY | SOIL_TEMP_5_DAILY | SOIL_TEMP_10_DAILY | SOIL_TEMP_20_DAILY | SOIL_TEMP_50_DAILY | SOIL_TEMP_100_DAILY | YEAR | MONTH | DOY | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 53974 | 2003-10-01 | 1.201 | -96.61 | 39.1 | NaN | NaN | NaN | NaN | NaN | ... | -99 | -99 | NaN | NaN | NaN | NaN | NaN | 2003 | 10 | 274 | 
| 1 | 53974 | 2003-10-02 | 1.201 | -96.61 | 39.1 | 18.9 | 2.5 | 10.7 | 11.7 | 0.0 | ... | -99 | -99 | NaN | NaN | NaN | NaN | NaN | 2003 | 10 | 275 | 
| 2 | 53974 | 2003-10-03 | 1.201 | -96.61 | 39.1 | 22.6 | 8.1 | 15.4 | 14.8 | 0.0 | ... | -99 | -99 | NaN | NaN | NaN | NaN | NaN | 2003 | 10 | 276 | 
3 rows × 31 columns
# Observe trends in precipitation data (2004 to 2016)
plt.figure(figsize=(5,4))
plt.xlim(0,400)
plt.xlabel('Day of the year')
plt.ylabel('Precipitation (mm)')
for year in range(2004,2017):
    idx_year = df["YEAR"] == year
    P_cum = df["P_DAILY_CALC"][idx_year].cumsum()
    plt.plot(range(1,len(P_cum)+1), P_cum)
    
# Observe histograms of precipitation
plt.figure(figsize=(5,4))
plt.xlabel('Annual precipitation (mm)')
plt.ylabel('Density')
for year in range(2004,2017):
    idx_year = df["YEAR"] == year
    rainfall_year = df["P_DAILY_CALC"][idx_year]
    rainfall_year = rainfall_year[rainfall_year > 1]
    plt.hist(rainfall_year, bins=25, density=True)
Modeling rainfall as a Markov chain.
This type of rainfall generators is kwnon as the Richardson-type models (Richardson and Wright, 1984). The model computes the rainfall at time t as a function of time t-1 in three steps. To be consistent with the original manuscript by Richardson and Wright (1984), we will use the term “wet” for a day with measurable rainfall and “dry” for a day without measurable rainfall.
Step 1: Find if the previous day was dry or wet
Step 2: Compute the probability of having a rainfall event on day t given the condition on the previous day. So, here we need to have two different probability distributions, one that we will use if yesterday was dry and another if yesterday was wet.
- What is the probability of having a wet day if yesterday was dry? (i.e. P(W/D) ) 
- What is the probability of having a wet day if yesterday was wet? (i.e. P(W/W) ) 
To accurately simulate seasonal rainfall trends, the model relies on Wet-Dry and Wet-Wet distributions on a monthly basis.
Steps 1 and 2 deal with the probability of rainfall occurrence, not the amount of rainfall. The next step will help us determine the amout of rainfall on wet days.
Step 3: If as a consequence of the random process we obtain that on day t there is a rainfall event (i.e., day t is a wet day), then we have to compute the amount. To do this we will use a gamma distribution, which is suited to heavily skewed distributions, such as those resulting from histograms of daily rainfall (see previous histogram figure). To complete this step were need to have some parameters that describe this rainfall amount distribution on a monthly basis. These parameters are often determined by fitting the gamma distribution to historical rainfall data.
We will first run an example using the loaded dataset to learn how to compute the Wet-Dry and Wet-Wet probabilities for the entire year. It’s important that we first learn how to compute the simplest step before moving into a more detailed characterization of the rainfall process on a monthly basis.
Determine probability of rainfall occurrence by month
The first step of this process is to characterize the probability functions using observed data.
Probability of Wet-Dry and Wet-Wet days
# Initialize empty arrays
WW = []
WD = []
month_WW = []
month_WD = []
# Create shorter variable names
months = df["MONTH"].values
rainfall = df["P_DAILY_CALC"].values
for i in range(1,df.shape[0]):
    
    # Determine if yesterday was Wet or Dry
    # Append "1" if today was Wet, else append "0"
    if rainfall[i-1] > 0:
        
        month_WW.append(months[i])
        if rainfall[i] > 0:
            WW.append(1)
        else:
            WW.append(0)
    
    elif rainfall[i-1] == 0:
        
        month_WD.append(months[i])
        if rainfall[i] > 0:
            WD.append(1)
        else:
            WD.append(0)
print(sum(WW)/len(WW))
print(sum(WD)/len(WD))0.4166666666666667
0.20590604026845638Create table with probabilities by month
# Wet/Wet table
df_WW = pd.DataFrame({'month':month_WW, 'WW':WW})
# Wet/Dry
df_WD = pd.DataFrame({'month':month_WD, 'WD':WD})
# Compute monthly probabilities per month
monthly_WW = df_WW.groupby("month").sum() / df_WW.groupby("month").count()
monthly_WD = df_WD.groupby("month").sum() / df_WD.groupby("month").count()
# Create single table of parameters
coeff = pd.concat([monthly_WW, monthly_WD], axis=1)
coeff| WW | WD | |
|---|---|---|
| month | ||
| 1 | 0.338028 | 0.141176 | 
| 2 | 0.320513 | 0.161184 | 
| 3 | 0.401709 | 0.219048 | 
| 4 | 0.503448 | 0.273063 | 
| 5 | 0.480769 | 0.290909 | 
| 6 | 0.442953 | 0.293680 | 
| 7 | 0.415254 | 0.228571 | 
| 8 | 0.412214 | 0.245847 | 
| 9 | 0.362745 | 0.201258 | 
| 10 | 0.409524 | 0.193353 | 
| 11 | 0.388889 | 0.129794 | 
| 12 | 0.381579 | 0.144092 | 
Determine probability of daily rainfall amount
# Fit gamma distribution to each month
monthly_shape = []
monthly_scale = []
for m in range(1,13):
    idx_month = df["MONTH"] == m
    x = df["P_DAILY_CALC"].loc[idx_month]
    x = x[~np.isnan(x)] # Save only non NAN values
    x = x[x>0] # Save only positive rainfall values
    shape, loc, scale = gamma.fit(x, floc=0)
    monthly_shape.append(shape)
    monthly_scale.append(scale)
# Append amount parameters to monthly lookup table
coeff["shape"] = monthly_shape
coeff["scale"] = monthly_scale
coeff.head(12)| WW | WD | shape | scale | |
|---|---|---|---|---|
| month | ||||
| 1 | 0.338028 | 0.141176 | 0.748257 | 4.166760 | 
| 2 | 0.320513 | 0.161184 | 0.678656 | 8.142081 | 
| 3 | 0.401709 | 0.219048 | 0.632519 | 11.092325 | 
| 4 | 0.503448 | 0.273063 | 0.589697 | 15.483557 | 
| 5 | 0.480769 | 0.290909 | 0.630033 | 13.661336 | 
| 6 | 0.442953 | 0.293680 | 0.599585 | 22.291664 | 
| 7 | 0.415254 | 0.228571 | 0.677182 | 16.648971 | 
| 8 | 0.412214 | 0.245847 | 0.625811 | 21.885339 | 
| 9 | 0.362745 | 0.201258 | 0.585308 | 15.593047 | 
| 10 | 0.409524 | 0.193353 | 0.659315 | 11.230858 | 
| 11 | 0.388889 | 0.129794 | 0.628394 | 10.430029 | 
| 12 | 0.381579 | 0.144092 | 0.668791 | 9.900732 | 
Check that our gamma function can generate sound rainfall distributions. Note that the figure below is the probability density function (pdf), not the actual rainfall amount. The y-axis contains density information while the x-axis contains rainfall data.
# Show frequency of rainfall amount for last month of the fitting
a,b,c = gamma.fit(x, floc=0)
plt.figure(figsize=(5,4))
plt.hist(x, bins='scott', density=True)
rainfall_range = np.linspace(1, 50, 1000)
plt.plot(rainfall_range, gamma.pdf(rainfall_range,a,b,c))
plt.xlabel('Daily precipitation (mm)')
plt.ylabel('Density')
plt.show()
Build rainfall generator
Now that we have all the distribution properties we can implement the rainfall generator.
# Define rainfall simulator
def rainfall_gen(dates,coeff):
    
    P = np.ones(dates.shape[0])*np.nan
    P[0] = 0
    for t in range(1,dates.shape[0]):
        month = dates.month[t]
        if P[t-1] == 0:
            if np.random.rand() > coeff["WD"][month]:
                P[t] = 0
            else:
                P[t] = gamma.rvs(coeff["shape"][month],0,coeff["scale"][month])
                P[t] = np.round(P[t]*10)/10
        elif P[t-1] > 0:
            if np.random.rand() > coeff["WW"][month]:
                P[t] = 0
            else:
                P[t] = gamma.rvs(coeff["shape"][month],0,coeff["scale"][month])
                P[t] = np.round(P[t]*10)/10
    P_total = P.sum()
    return PCreate rainfall scenarios with the generator
# Create an example set of dates
dates = pd.date_range("2018-01-01","2018-12-31",freq="D")
# Call rainfall generator multiple times and plot cumulative rainfall
plt.figure(figsize=(5,4))
for i in range(20):
    P = rainfall_gen(dates,coeff)
    plt.plot(P.cumsum())
    
plt.title("Rainfall generator")
plt.xlabel("DOY")
plt.ylabel("Rainfall amount (mm)")
plt.show()
# Plot the last iteration
plt.figure(figsize=(10,3))
plt.bar(dates,P)
plt.title("Daily rainfall") # for last year of the previous simulation
plt.ylabel("Rainfall amount (mm)")
plt.show()
References
Intergernmental Panel on Climate Change Data Distribution Centre: https://www.ipcc-data.org/guidelines/pages/weather_generators.html
Jones, P., Harpham, C., Kilsby, C., Glenis, V. and Burton, A., 2010. UK Climate Projections science report: Projections of future daily climate for the UK from the Weather Generator.
Kilsby, C.G., Jones, P.D., Burton, A., Ford, A.C., Fowler, H.J., Harpham, C., James, P., Smith, A. and Wilby, R.L., 2007. A daily weather generator for use in climate change studies. Environmental Modelling & Software, 22(12), pp.1705-1719.
Richardson, C.W., 1982. Dependence structure of daily temperature and solar radiation. Transactions of the ASAE, 25(3), pp.735-0739.
Richardson, C.W. and Wright, D.A., 1984. WGEN: A model for generating daily weather variables.