For loops are essential in programming for executing a block of code multiple times, automating repetitive tasks efficiently. They are particularly useful in data science for iterating through various data structures like lists and dictionaries. Unlike conventional counting that starts from 1, Python’s for loops begin at index 0, iterating over sequences starting from the first element.
Syntax
for item in iterable:# Code block to execute for each item
Example 1: Basic For loop
Suppose we have a list of soil nitrogen levels from different test sites and we want to print each value. Here’s how you can do it:
# Example of a for loop# List of soil nitrogen levels in mg/kgnitrogen_levels = [15, 20, 10, 25, 18]# Iterating through the listfor level in nitrogen_levels:print(f"Soil Nitrogen Level: {level} mg/kg")
The enumerate function adds a counter to the loop, providing the index position along with the value. This is helpful when you need to access the position of the elements as you iterate.
Let’s modify the previous example to include the sample number using enumerate:
# Iterating through the list with enumeratefor index, level inenumerate(nitrogen_levels):print(f"Sample {index +1}: Soil Nitrogen Level = {level} mg/kg")
In this example, index represents the position of each element in the list (starting from 0), and level is the nitrogen level. We use index + 1 in the print statement to start the sample numbering from 1 instead of 0.
Example 3: Combine for loop with if statement
Combining a for loop with if statements unleashes a powerful and precise control over data processing and decision-making within iterative sequences. The for loop provides a structured way to iterate over a range of elements in a collection, such as lists, tuples, or strings. When an if statement is nested within this loop, it introduces conditional logic, allowing the program to execute specific blocks of code only when certain criteria are met. This combination is incredibly versatile: it can be used for filtering data, conditional aggregation of data, and applying different operations to elements based on specific conditions.
Example 3a
In this short example we will combine a for loop with if statements to generate the complementary DNA strand by iterating over each nucleotide. The code will also filter if there is an incorrect base and in which position that incorrect base is located.
# Example of DNA strandstrand ='ACCTTATCGGC'# Create an empty complementary strandstrand_c =''# Iterate over each base in the DNA strand (a string)for k,base inenumerate(strand):if base =='A': strand_c +='T'elif base =='T': strand_c +='A'elif base =='C': strand_c +='G'elif base =='G': strand_c +='C'else:print('Incorrect base', base, 'in position', k+1)print(strand_c)
TGGAATAGCCG
Try inserting or changing one of the bases in the sequence for another character not representing a DNA nucleotide.
Example 3b
In this example we will compute the total number of growing degree days for corn over the period of one week based on daily average air temperatures.
# Define daily temperatures for a week T_daily = [6, 12, 18, 8, 22, 19, 16] # degrees Celsius# Define base temperature for cornT_base =8# degrees Celsius# Initialize growing degree days accumulatorgdd =0# Loop through each day of the weekfor T in T_daily:if T > T_base: gdd_daily = T - T_baseelse: gdd_daily =0# Accumulate daily growing degree days gdd += gdd_daily# Output total growing degree days for the weekprint(f"Total Growing Degree Days for the Week: {gdd} Celsius-Days")
Total Growing Degree Days for the Week: 47 Celsius-Days
Example 4: For loop using a dictionary
# Record air temperatures for a few cities in Kansaskansas_weather = {"Topeka": {"Record High Temperature": 40, "Date": "July 20, 2003"},"Wichita": {"Record High Temperature": 42, "Date": "August 8, 2010"},"Lawrence": {"Record High Temperature": 39, "Date": "June 15, 2006"},"Manhattan": {"Record High Temperature": 41, "Date": "July 18, 2003"}}# Iterating through the dictionaryfor city, weather_details in kansas_weather.items():print(f"Record Weather in {city}:")print(f" High Temperature: {weather_details['Record High Temperature']}°C")print(f" Date of Occurrence: {weather_details['Date']}")
Record Weather in Topeka:
High Temperature: 40°C
Date of Occurrence: July 20, 2003
Record Weather in Wichita:
High Temperature: 42°C
Date of Occurrence: August 8, 2010
Record Weather in Lawrence:
High Temperature: 39°C
Date of Occurrence: June 15, 2006
Record Weather in Manhattan:
High Temperature: 41°C
Date of Occurrence: July 18, 2003
The .items() method of a dictionary returns a view object as a list of tuples representing the key-value pairs of the dictionary. So, we can assign the key to one variable and the value to another variable when defining the for loop.
Think of a view object as a window into the original data structure. It doesn’t create a new copy of the data. View objects are useful because they allow you to work with the data in a flexible and memory-efficient way, and they are especially handy for working with large datasets.
View objects do not support indexing directly like lists or tuples. If you need to access specific elements by index frequently, you should consider converting the view object to a list or tuple first.
# Show the content returned by .items()print(kansas_weather.items())
dict_items([('Topeka', {'Record High Temperature': 40, 'Date': 'July 20, 2003'}), ('Wichita', {'Record High Temperature': 42, 'Date': 'August 8, 2010'}), ('Lawrence', {'Record High Temperature': 39, 'Date': 'June 15, 2006'}), ('Manhattan', {'Record High Temperature': 41, 'Date': 'July 18, 2003'})])
First item: ('Topeka', {'Record High Temperature': 40, 'Date': 'July 20, 2003'})
key: 'Topeka'
value: {'Record High Temperature': 40, 'Date': 'July 20, 2003'}
Example 5: Nested for loops
Imagine we are analyzing soil samples from different fields. Each field has multiple samples, and each sample has various measurements. We’ll use nested for loops to iterate through the fields and then through each measurement in the samples.
# Soil data from multiple fieldssoil_data = {"Field 1": [ {"pH": 6.5, "Moisture": 20, "Nitrogen": 3}, {"pH": 6.8, "Moisture": 22, "Nitrogen": 3.2} ],"Field 2": [ {"pH": 7.0, "Moisture": 18, "Nitrogen": 2.8}, {"pH": 7.1, "Moisture": 19, "Nitrogen": 2.9} ]}# Iterating through each fieldfor field, samples in soil_data.items():print(f"Data for {field}:")# Nested loop to iterate through each sample in the fieldfor sample in samples:print(f" Sample - pH: {sample['pH']}, Moisture: {sample['Moisture']}%, Nitrogen: {sample['Nitrogen']}%")
Data for Field 1:
Sample - pH: 6.5, Moisture: 20%, Nitrogen: 3%
Sample - pH: 6.8, Moisture: 22%, Nitrogen: 3.2%
Data for Field 2:
Sample - pH: 7.0, Moisture: 18%, Nitrogen: 2.8%
Sample - pH: 7.1, Moisture: 19%, Nitrogen: 2.9%
In this example, soil_data is a dictionary where each key is a field, and the value is a list of soil samples (each sample is a dictionary of measurements). The first for loop iterates over the fields, and the nested loop iterates over the samples within each field, printing out the pH, Moisture, and Nitrogen content for each sample.
Example 6: For loop using break and continue
Imagine we are evaluating crop yields from different fields. We want to stop processing if we encounter a field with exceptionally low yield (signifying a possible data error or a major issue with the field) and skip over fields with average yields to focus on fields with exceptionally high or low yields.
# Crop yield data (in tons per hectare) for different fieldscrop_yields = {"Field 1": 2.5, "Field 2": 3.2, "Field 3": 1.0, "Field 4": 3.8, "Field 5": 0.8}# Thresholds for yield considerationlow_yield_threshold =1.5high_yield_threshold =3.0for field, yield_data in crop_yields.items():if (yield_data < low_yield_threshold) or (yield_data > high_yield_threshold):print(f"{field} is a potential outlier: {yield_data} tons/ha")break# Stop processing further as this could indicate a major issueelse:continue
Field 2 is a potential outlier: 3.2 tons/ha
We use break to stop the iteration when we encounter a yield below the low_yield_threshold or above high_yield_threshold, which could indicate an outlier that requires immediate attention.
We use continue to skip to the next iteration without executing any additional code in hte loop.