# Import modules
import numpy as np
57 Basic statistical tests
Power analysis
A power analysis is an important concept in experimental design and hypothesis testing, as it helps researchers determine the appropriate sample size needed to detect a significant effect, if it exists. Without adequate power, studies may fail to detect genuine effects, leading to inconclusive or misleading results.
In this context, pilot experiments are essential to quantify the approximate variability of a response variable. In the example below our goal is to identify how many downward-facing images we need to collect in a field to compute the percentage of green canopy cover withing a 5% with a 95% confidence. The provided values are the result of an exploratory field data collection to compute the variability of the field. The power analysis rests on the notion that:
n = \Bigg( \frac{1.96 \sigma}{\delta} \Bigg)^2
where n is the number of samples to be collected, \delta is the error marging with 95% confidence, 1.96 is the z-score, and \sigma is the standard deviation (required in advance). For more insights, check the manuscript by Patrignani and Ochsner, 2015.
# Values from a pilot experiment
= np.array([25, 16, 19, 23, 12]) # Percent canopy_cover
# Define parameters
= 1.96 # z-score for a 95% confidence level
z_score = 5 # Margin of error
margin_of_error = np.std(canopy_cover)
standard_deviation
# Calculate sample size
= (z_score * standard_deviation / margin_of_error) ** 2
sample_size
print("Sample size needed:", round(sample_size))
Sample size needed: 3
One-sample T-test
The scipy.stats
module includes function like ttest_1samp
that enables researchers to determine whether the mean of a sample significantly differs from a known population mean. For instance, suppose a farmer wants to assess whether a new wheat variety significantly increases the average yield of a specific crop compared to the historical average yield. By collecting yield data from different wheat variety and using ttest_1samp
, the farmer can infer whether the observed increase in yield by the fields with the new variety is statistically significant.
Question: Do newer wheat varieties have a statistically significant performance compared to the existing varieties?
# Import modules
import pandas as pd
from scipy import stats
# Read dataset
= pd.read_csv('../datasets/wheat_variety_trial_2023_greeley_county.csv', skiprows=[0])
df_new_varieties 3) df_new_varieties.head(
brand | name | yield_bu_ac | |
---|---|---|---|
0 | LIMAGRAIN | LCH19DH-152-6 | 50.7 |
1 | POLANSKY | ROCKSTAR | 45.2 |
2 | KWA | KS WESTERN STAR | 44.5 |
# Define historical average yield of farm
= 8 # bu/ac historical_mean_yield
# Estimate mean of all new varieties
'yield_bu_ac'].mean().round() df_new_varieties[
34.0
Two-sided test
Null hypothesis: the mean yield of the new varieties is the same as the historical field yield (popmean).
Alternative hypothesis: the mean yield of the new varieties is different than the historical field yield (popmean).
# Define historical average yield of farm
# Perform a one-sample t-test for each new variety against the historical data
= stats.ttest_1samp(df_new_varieties['yield_bu_ac'],
results_two_sided
historical_mean_yield,='two-sided')
alternative
print(f"t_statistic: {results_two_sided[0]}")
print(f"p_value: {results_two_sided[1]}")
t_statistic: 22.10361576051108
p_value: 5.9931027593421575e-21
# Define significance level
= 0.05
alpha
# Find whether we accept or reject the null hypothesis
if results_two_sided[1] < alpha:
print("""We reject the null hypothesis in favor of the alternative hypothesis that
the mean yield of new varieties is statistically different from the historical yield.""")
else:
print("""We accept the null hypothesis. There is no statistically significant evidence that the combined
mean yield of new varieties differs from the historical yield.""")
We reject the null hypothesis in favor of the alternative hypothesis that
the mean yield of new varieties is statistically different from the historical yield.
Two-sided confidence intervals
# Compute 95% confidence intervals
= results_two_sided.confidence_interval(confidence_level=0.95)
ci print(ci)
ConfidenceInterval(low=31.6727760618944, high=36.47873908962076)
One-sided test (less option)
Null hypothesis: The mean of all new varieties is not less than the historical yield.
Alternative hypothesis: The mean of all new varieties is less than the historical yield.
= stats.ttest_1samp(df_new_varieties['yield_bu_ac'],
results_one_sided =historical_mean_yield,
popmean='less')
alternative
print(results_one_sided)
TtestResult(statistic=22.10361576051108, pvalue=1.0, df=32)
if results_one_sided[1] < alpha:
print("""We reject the null hypothesisin favor of the alternative hypothesis that
the mean of the new varieties is less than the historical yield""")
else:
print("""We accept the null hypothesis that the mean of the new varieties
is not less than the historical yield""")
We accept the null hypothesis that the mean of the new varieties
is not less than the historical yield
One-sided test (greater)
Null hypothesis: The mean of all new varieties is not greater than the historical yield.
Alternative hypothesis: The mean of all new varieties is greater than the historical yield.
= stats.ttest_1samp(df_new_varieties['yield_bu_ac'],
results_one_sided =historical_mean_yield,
popmean='greater')
alternative
print(results_one_sided)
TtestResult(statistic=22.10361576051108, pvalue=2.9965513796710787e-21, df=32)
if results_one_sided[1] < alpha:
print("""We reject the null hypothesis in favor of the alternative hypothesis that
the mean of the new varieties is greater than the historical yield""")
else:
print("""We accept the null hypothesis that the mean of the new varieties
is not greater than the historical yield""")
We reject the null hypothesis in favor of the alternative hypothesis that
the mean of the new varieties is greater than the historical yield
Two-sample T-test
This analysis calculates the T-test for the means of two independent samples. In this exercise we will evaluate if the mean grain yield of two different corn hybrids are statistically different from each other.
Null hypothesis: The mean yield of the two corn hybrids are not statistically different from each other.
Alternative hypothesis: The means of the two corn hybrids are statistically different.
# Import modules
import pandas as pd
from scipy import stats
= pd.read_csv('../datasets/corn_dryland_trial.csv', skiprows=[0])
df 3) df.head(
brand | replication | yield_bu_ac | lodging_perc | |
---|---|---|---|---|
0 | LEWIS | 1 | 172.9 | 47 |
1 | LEWIS | 2 | 218.3 | 20 |
2 | LEWIS | 3 | 196.8 | 40 |
# Find mean for each brand
='brand').mean() df.groupby(by
replication | yield_bu_ac | lodging_perc | |
---|---|---|---|
brand | |||
LEWIS | 3.0 | 194.46 | 43.4 |
NK | 3.0 | 170.40 | 61.0 |
# Get yield values for each group
= df['brand'] == 'LEWIS'
idx_group_1 = df.loc[idx_group_1,'yield_bu_ac']
values_group_1
= df['brand'] == 'NK'
idx_group_2 = df.loc[idx_group_2,'yield_bu_ac'] values_group_2
# PRint statistical results
= stats.ttest_ind(values_group_1, values_group_2)
t_statistic, p_value print(t_statistic, p_value)
2.1525722332699373 0.06352024499469316
= 0.05
alpha
if p_value < alpha:
print("""We reject the null hypothesis in favor of the alternative hypothesis that
the mean of the two hybrids are statistically different from each other.""")
else:
print("""We accept the null hypothesis that the mean of the two hybrids
are equal.""")
We accept the null hypothesis that the mean of the two hybrids
are equal.
We can plot the data using a boxplot and inspect if the notches (which represent the 95% confidence interval of the median) overlap. If they do, then that suggests that the median values are not statistically different.
=(4,5), column='yield_bu_ac', by='brand',
df.boxplot(figsize=True, showmeans=True, ylabel='Yield (bu/ac)'); notch
58 Analysis of Variance
The goal of analysis of variance (ANOVA) is to assess whether there are statistically significant differences between the means of three or more groups. It helps to determine whether any observed differences in the group means are likely to be genuine or simply due to random variation.
A one-way ANOVA is used to analyze the effect of a single categorical factor on a continuous variable.
A two-way ANOVA is used to assess the influence of two categorical factors simultaneously on a continuous variable. It allows for examining both main effects of each factor as well as their interaction effect.
In this exercise we will use a dataset of corn yield for different treatments of nitrogen fertilizer on multiple US states. The dataset is a subset of the study published by Tremblay et al., 2012 (see references for more details) and it was obtained from (http://www.nue.okstate.edu/).
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
from statsmodels.stats.multicomp import MultiComparison
# Load data
= pd.read_csv("../datasets/corn_nue_multiple_locs.csv")
df df.head()
Year | State | Site | Textural_class | Replications | Treatments | N_Planting_kg_ha | N_Sidedress_kg_ha | N_Total_kg_ha | Yield_T_ha | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2006 | Illinois | Pad | Silt loam | 1 | 1 | 0 | 0 | 0 | 3.26 |
1 | 2006 | Illinois | Pad | Silt loam | 1 | 3 | 36 | 0 | 36 | 4.15 |
2 | 2006 | Illinois | Pad | Silt loam | 1 | 5 | 36 | 54 | 90 | 8.64 |
3 | 2006 | Illinois | Pad | Silt loam | 1 | 7 | 36 | 107 | 143 | 10.52 |
4 | 2006 | Illinois | Pad | Silt loam | 1 | 9 | 36 | 161 | 197 | 11.47 |
# Print some useful properties of the dataset
print(df['Site'].unique()) # Locations
print(df['Treatments'].unique()) # Treatments
print(df['Replications'].unique()) # Replications
print(df.shape)
['Pad' 'Dixon' 'Manhattan' 'Copeland' 'Diederick']
[1 3 5 7 9]
[1 2 3 4]
(120, 10)
# Examine yield data using boxplots for all locations combined
=(5,5), column='Yield_T_ha', by='N_Total_kg_ha')
df.boxplot(figsize plt.show()
# Examine yield by state
=(4,5), column='Yield_T_ha', by='State')
df.boxplot(figsize plt.show()
# Examine yield by site
=(4,5), column='Yield_T_ha', by='Site')
df.boxplot(figsize plt.show()
ANOVA assumptions
Samples drawn from a population are normally distributed. Test: Shapiro-Wilk
Samples drawn from all populations have (approximately) the same variance. This property is called homoscedasticity or homogeneity of variances.” Tests: Bartlett’s and Levene’s tests.
Samples are independent of each other. Test: No test. Here we rely on the nature of the variable being observed and the experimental design.
# Test the assumption of normality
# Shapiro-WIlk's null hypothesis: Data was obtained from a normal distribution
'Yield_T_ha']) stats.shapiro(df[
ShapiroResult(statistic=0.9829135537147522, pvalue=0.1326330304145813)
# Test for homogeneity of variance
# Bartlett's null hypothesis: All the groups have equal variance
= {}
D for tmt in df['Treatments'].unique():
= df['Treatments'] == tmt
idx_tmt = df.loc[idx_tmt, 'Yield_T_ha'].values
D[tmt]
#print(D)
1], D[3], D[5], D[7], D[9]) stats.bartlett(D[
BartlettResult(statistic=6.054092711026625, pvalue=0.19514485256182393)
One-way ANOVA
Here we will compare an independent variable with a single predictor. The predictor N_Total_kg_ha
will be used as a categorical varaible. Alternatively we could use the Treatments
column, but it is easier to read the table if we present the values using the actual treatment values, so that we quickly devise which Nitrogen rates show statistical differences.
Below we will explore the one-way ANOVA using both SciPy (simpler code) and Statsmodels (more complete output)
Using SciPy module
# One-way test
1], D[3], D[5], D[7], D[9]) stats.f_oneway(D[
F_onewayResult(statistic=11.431258827879908, pvalue=7.582810948341893e-08)
# Tukey test
print(stats.tukey_hsd(D[1], D[3], D[5], D[7], D[9]))
Tukey's HSD Pairwise Group Comparisons (95.0% Confidence Interval)
Comparison Statistic p-value Lower CI Upper CI
(0 - 1) -1.317 0.519 -3.642 1.007
(0 - 2) -3.036 0.004 -5.361 -0.711
(0 - 3) -4.310 0.000 -6.634 -1.985
(0 - 4) -4.745 0.000 -7.070 -2.421
(1 - 0) 1.317 0.519 -1.007 3.642
(1 - 2) -1.718 0.250 -4.043 0.607
(1 - 3) -2.992 0.005 -5.317 -0.667
(1 - 4) -3.428 0.001 -5.753 -1.103
(2 - 0) 3.036 0.004 0.711 5.361
(2 - 1) 1.718 0.250 -0.607 4.043
(2 - 3) -1.274 0.553 -3.599 1.051
(2 - 4) -1.710 0.255 -4.034 0.615
(3 - 0) 4.310 0.000 1.985 6.634
(3 - 1) 2.992 0.005 0.667 5.317
(3 - 2) 1.274 0.553 -1.051 3.599
(3 - 4) -0.436 0.985 -2.761 1.889
(4 - 0) 4.745 0.000 2.421 7.070
(4 - 1) 3.428 0.001 1.103 5.753
(4 - 2) 1.710 0.255 -0.615 4.034
(4 - 3) 0.436 0.985 -1.889 2.761
Using Statsmodels module
# Anova table with statsmodels
= 'Yield_T_ha ~ C(N_Total_kg_ha)'
formula =df).fit())
anova_lm(ols(formula, data
df | sum_sq | mean_sq | F | PR(>F) | |
---|---|---|---|---|---|
C(N_Total_kg_ha) | 4.0 | 386.085542 | 96.521385 | 11.431259 | 7.582811e-08 |
Residual | 115.0 | 971.018108 | 8.443636 | NaN | NaN |
The ANOVA table shows the there is significant differences between treatments. The catch is that we don’t know which groups are different. The ANOVA table only tells us that at least one group has a mean value that is substantially (read significantly) different from the rest.
The F is the F-statistic to test the null hypothesis that the corresponding coefficient is zero. The goal is to compare the mean variability within groups to the mean variability between groups. The F-statistic is just the ratio of the two (96.5/8.44=11.4). 96.52 is the variability between groups and 8.44 is the variability within the groups.
The pValue of the F-statistic indicates whether a factor is not significant at the 5% significance level given the other terms in the model.
A mean multicomparison test can help us identify which treatments show signifcant differences.
# Multicomparison test
= MultiComparison(df["Yield_T_ha"],df['N_Total_kg_ha']).tukeyhsd(alpha=0.05)
groups print(groups)
Multiple Comparison of Means - Tukey HSD, FWER=0.05
===================================================
group1 group2 meandiff p-adj lower upper reject
---------------------------------------------------
0 36 1.3175 0.5192 -1.0073 3.6423 False
0 90 3.0358 0.004 0.711 5.3607 True
0 143 4.3096 0.0 1.9847 6.6344 True
0 197 4.7454 0.0 2.4206 7.0703 True
36 90 1.7183 0.2499 -0.6065 4.0432 False
36 143 2.9921 0.0047 0.6672 5.3169 True
36 197 3.4279 0.0008 1.1031 5.7528 True
90 143 1.2738 0.5527 -1.0511 3.5986 False
90 197 1.7096 0.2547 -0.6153 4.0344 False
143 197 0.4358 0.9852 -1.889 2.7607 False
---------------------------------------------------
# Visualize significantly different groups relative to a specific group
=(5,4), comparison_name=90)
groups.plot_simultaneous(figsize'Corn yield in T/ha', size=12)
plt.xlabel('Nitrogen Treatments in kg/ha', size=12)
plt.ylabel( plt.show()
Two-way ANOVA
In this case we will add two predictors variables, soil textural class and total nitrogen applied. In many cases researchers add Location
as a proxy for local environmental conditions (including soil) and Year
as a proxy for the particular weather conditions during each growing season. In this case we have soil textural class available, so we will make use of that first, since it will give our results broader applications that, in principle, can be related to soil types elsewhere.
# Two predictors
= 'Yield_T_ha ~ C(N_Total_kg_ha) + C(Textural_class)'
formula =df).fit()) anova_lm(ols(formula, data
df | sum_sq | mean_sq | F | PR(>F) | |
---|---|---|---|---|---|
C(N_Total_kg_ha) | 4.0 | 386.085542 | 96.521385 | 11.750059 | 5.022955e-08 |
C(Textural_class) | 1.0 | 34.560000 | 34.560000 | 4.207172 | 4.254546e-02 |
Residual | 114.0 | 936.458108 | 8.214545 | NaN | NaN |
Soil textual class was barely significant considering all locations and years at the the 0.05 level.
# Two predictors with interaction
= 'Yield_T_ha ~ C(N_Total_kg_ha) * C(Textural_class)'
formula =df).fit()) anova_lm(ols(formula, data
df | sum_sq | mean_sq | F | PR(>F) | |
---|---|---|---|---|---|
C(N_Total_kg_ha) | 4.0 | 386.085542 | 96.521385 | 12.530231 | 1.960683e-08 |
C(Textural_class) | 1.0 | 34.560000 | 34.560000 | 4.486516 | 3.641510e-02 |
C(N_Total_kg_ha):C(Textural_class) | 4.0 | 89.119178 | 22.279795 | 2.892322 | 2.547188e-02 |
Residual | 110.0 | 847.338930 | 7.703081 | NaN | NaN |
The interaction of nitrogen rate and textural class also resulted statistically significant at the 0.05 level.
# Classical ANOVA with Treatment, Location, and Year interactions
= 'Yield_T_ha ~ C(N_Total_kg_ha) * State * C(Year)'
formula =df).fit()) anova_lm(ols(formula, data
df | sum_sq | mean_sq | F | PR(>F) | |
---|---|---|---|---|---|
C(N_Total_kg_ha) | 4.0 | 386.085542 | 96.521385 | 20.522835 | 3.270251e-12 |
State | 2.0 | 234.432785 | 117.216393 | 24.923106 | 1.989675e-09 |
C(Year) | 2.0 | 112.504702 | 56.252351 | 11.960642 | 2.328310e-05 |
C(N_Total_kg_ha):State | 8.0 | 140.688973 | 17.586122 | 3.739245 | 7.649526e-04 |
C(N_Total_kg_ha):C(Year) | 8.0 | 36.595110 | 4.574389 | 0.972628 | 4.621589e-01 |
State:C(Year) | 4.0 | 3.745613 | 0.936403 | 0.199103 | 9.382617e-01 |
C(N_Total_kg_ha):State:C(Year) | 16.0 | 14.832950 | 0.927059 | 0.197116 | 9.996634e-01 |
Residual | 95.0 | 446.796537 | 4.703121 | NaN | NaN |
In a multi-treatment, multi-year, and multi-state study, it is not surprising that treatment, site, and year resulted highly significant (P<0.01).
Note that the interactions N_Total_kg_ha:Year
, State:Year
, and N_Total_kg_ha:State:Year
were not significant at the P<0.05 level.
References
Patrignani, A., & Ochsner, T. E. (2015). Canopeo: A powerful new tool for measuring fractional green canopy cover. Agronomy journal, 107(6), 2312-2320.
Tremblay, N., Y.M. Bouroubi, C. Bélec, R.W. Mullen, N.R. Kitchen, W.E. Thomason, S. Ebelhar, D.B. Mengel, W.R. Raun, D.D. Francis, E.D. Vories, and I. Ortiz-Monasterio. 2012. Corn response to nitrogen is influenced by soil texture and weather. Agron. J. 104:1658–1671. doi:10.2134/agronj2012.018.