14  Data types

Author

Andres Patrignani

Published

January 4, 2024

In Python, data types are categories that determine the kind of value an object can have and the operations that can be performed on it. The most basic data types include integers (int), floating-point numbers (float), strings (str), and booleans (bool). Each of these types serves a specific purpose: integers for whole numbers, floats for decimal numbers, strings for text, and booleans for true/false values. Understanding these data types is crucial as they form the building blocks of Python coding, influencing how you store, handle, and interact with data in your programs.

Integers

plants_per_m2 = 8
print(plants_per_m2)
print(type(plants_per_m2)) # This is class int
8
<class 'int'>

Floating point

rainfall_amount = 3.4 # inches
print(rainfall_amount)
print(type(rainfall_amount)) # This is class float (numbers with decimal places)
3.4
<class 'float'>

Strings

# Strings are defined using single quotes ...
common_name = 'Winter wheat'

# ... or double quotes
scientific_name = 'Triticum aestivum'

# ... but do not mix them

print(common_name)
print(scientific_name)

print(type(common_name))
print(type(scientific_name))
Winter wheat
Triticum aestivum
<class 'str'>
<class 'str'>
# For longer blocks that span multiple lines we can use triple quotes (''' or """)

soil_definition = """The layer(s) of generally loose mineral and/or organic material 
that are affected by physical, chemical, and/or biological processes
at or near the planetary surface and usually hold liquids, gases, 
and biota and support plants."""

print(soil_definition)
The layer(s) of generally loose mineral and/or organic material 
that are affected by physical, chemical, and/or biological processes
at or near the planetary surface and usually hold liquids, gases, 
and biota and support plants.
Note

The multi-line string appears as separate lines due to hidden line breaks at the end of each line, like when we press the Enter key. These breaks are represented by \n, which are not displayed, but can be used for splitting the long string into individual lines. Try the following line: soil_definition.splitlines(), which is equivalent to soil_definition.split('\n')

# Split a string
filename = 'corn_riley_2024.csv' # Filename with corn yield data for Riley county in 2024

# Split the string and assign the result to different variables
# This only works if the number of variables on the LHS matches the number of outputs
# Try running filename.split('.') on its own to see the resulting list
base_filename, ext_filename = filename.split('.')
print(base_filename)

# Now we can do the same, but splitting at the underscore
crop, county, year = base_filename.split('_')

print(crop)
corn_riley_2024
corn
Note

The command base_filename, ext_filename = filename.split('.') splits the string at the ., and automatically assigns each of the resulting elements ('corn_riley_2024' and 'csv') to each variable on the left-hand side. If you only run filename.split('.') the result is a list with two elements: ['corn_riley_2024', 'csv']

# Replace characteres
print(filename.replace('_', '-'))
corn-riley-2024.csv
# Join strings using the `+` operator
filename = "myfile"
extension = ".csv"
path = "/User/Documents/Datasets/"

fullpath_file = path + filename + extension
print(fullpath_file)
/User/Documents/Datasets/myfile.csv
# Find if word starts with one of the following sequences
print(base_filename.startswith(('corn','maize'))) # Note that the input is a tuple

# Find if word ends with one of the following sequences
print(base_filename.endswith(('2022','2023')))   # Note that the input is a tuple
True
False
# Passing variables into strings
station = 'Manhattan'
precip_amount = 25
precip_units = 'mm'

# Option 1 (preferred): f-string format (note the leading f)
option_1 = f"Today's Precipitation at the {station} station was {precip_amount} {precip_units}."
print(option_1)

# Option 2: %-string
# Note how much longer this syntax is. This also requires to keep track of the order of the variables
option_2 = "Today's Precipitation at the %s station was %s %s." % (station, precip_amount, precip_units)
print(option_2)

# ... however this syntax can sometimes be handy.
# Say you want to report parameter values using a label for one of your plots
par_values = [0.3, 0.1, 120] # Three parameter values, a list typically obtained by curve fitting
label = 'fit: a=%5.3f, b=%5.3f, c=%5.1f' % tuple(par_values)
print(label)
Today's Precipitation at the Manhattan station was 25 mm.
Today's Precipitation at the Manhattan station was 25 mm.
fit: a=0.300, b=0.100, c=120.0
DOY:A001
DOY:A365
# Formatting of values in strings
soil_pH = 6.7832
crop_name = "corn"

# Using an f-string to embed variables and format the pH value
message = f"The soil pH suitable for growing {crop_name} is {soil_pH:.2f}."

To specify the number of decimals for a value in an f-string, you can use the colon : followed by a format specifier inside the curly braces {}. For example, {variable:.2f} will format the variable to two decimal places. In this example, {soil_pH:.2f} within the f-string takes the soil_pH variable and formats it to two decimal places. The :.2f part is the format specifier, where . indicates precision, 2 is the number of decimal places, and f denotes floating-point number formatting. This approach is highly efficient and readable, making f-strings a favorite among Python programmers.

# Say that you want to download data using the url from the NASA-MODIS satellite for a specific day of the year
doy = 1
print(f"DOY:A{doy:03d}")
DOY:A001

In the f-string f"A{number:03d}", {number:03d} formats the variable number. The 03d specifier means that the number should be padded with zeros to make it three digits long (d stands for ‘decimal integer’ and 03 means ‘three digits wide, padded with zeros’). The letter ‘A’ is added as a prefix directly in the string. So, if number is 1, it gets formatted as 001, and the complete string becomes A001.

# Compare strings
print('loam' == 'Loam') # Returns False since case matters

print('loam' == 'Loam'.lower()) # Returns True since we convert the second word to lower case
False
True
Note

When comparing strings, using either lower() or upper() helps standarizing the strings before the boolean operation. This is particularly useful if you need to request information from users or deal with messy datasets that are not consistent.

Booleans

Boolean data types represent one of two values: True or False. Booleans are particularly powerful when used with conditional statements like if. By evaluating a boolean expression in an if statement, you can control the flow of your program, allowing it to make decisions and execute different code based on certain conditions. For instance, an if statement can check if a condition is True, and only then execute a specific block of code. Check the section about if statements for some examples.

Boolean logical operators
or: Will evaluate to True if at least one (but not necessarily both) statements is True
and: Will evaluate to True only if both statements are True
not: Reverses the result of the statement

Boolean comparison operators
==: equal
!=: not equal
>=: greater or equal than
<=: less or equal than
>: greater than
<: less than

Note

Python evaluates conditional arguments from left to right. The evaluation halts as soon as the outcome is determined, and the resulting value is returned. Python does not evaluate subsequent operands unless it is necessary to resolve the result.

# Example boolean logical operator

adequate_moisture = True
print(adequate_moisture)
print(type(adequate_moisture))
True
<class 'bool'>
# Example boolean comparison operators
optimal_moisture_level = 30  # optimal soil moisture level as a percentage
current_moisture_level = 25  # current soil moisture level as a percentage

is_moisture_optimal = current_moisture_level >= optimal_moisture_level
print(is_moisture_optimal)
True
chance_rain_tonight = 10  # probability of rainfall as a percentage

water_plants = (current_moisture_level >= optimal_moisture_level) and (chance_rain_tonight < 50)
print(water_plants)
True

Conversion between data types

In Python, converting between different data types, a process known as type casting, is a common and straightforward operation. You can convert data types using built-in functions like int(), float(), str(), and bool(). For instance, int() can change a floating-point number or a string into an integer, float() can turn an integer or string into a floating-point number, and str() can convert an integer or float into a string. These conversions are especially useful when you need to perform operations that require specific data types, such as mathematical calculations or text manipulation. However, it’s important to be mindful that attempting to convert incompatible types (like trying to turn a non-numeric string into a number) can lead to errors.

# Integers to string

int_num = 8
print(int_num)
print(type(int_num)) # Print data type before conversion

int_str = str(int_num)
print(int_str)
print(type(int_str)) # Print resulting data type 
8
<class 'int'>
8
<class 'str'>
# Floats to string

float_num = 3.1415
print(float_num)
print(type(float_num)) # Print data type before conversion

float_str = str(float_num)
print(float_str)
print(type(float_str)) # Print resulting data type 
3.1415
<class 'float'>
3.1415
<class 'str'>
# Strings to integers/floats

float_str = '3'
float_num = float(float_str)
print(float_num)
print(type(float_num))

# Check if string is numeric
float_str.isnumeric()
3.0
<class 'float'>
True
# Floats to integers
float_num = 4.9
int_num = int(float_num)

print(int_num)
print(type(int_num))
4
<class 'int'>

In some cases Python will change the class according to the operation. For instance, the following code starts from two integers and results in a floating point.

numerator = 5
denominator = 2
print(type(numerator))
print(type(denominator))

answer = numerator / denominator  # Two integers
print(answer)
print(type(answer))   # Result is a float
<class 'int'>
<class 'int'>
2.5
<class 'float'>