In the realm of programming and data science, data structures act as containers where information is stored for later use. Python offers a variety of built-in data structures like lists, tuples, sets, and dictionaries, each with its own unique properties and use cases.
The choice of the right data structure often depends on factors like scalability, data format, data complexity, and the programmer’s preference. Consider devoting some time before starting a new script to test and select an appropriate data structure for your program.
Tip
Lists are created using [], dictionaries using {}, and tuples using (), but to access the content inside all of them we use [].
Lists
Lists are versatile data structures defined by square brackets [ ] that ideal for storing sequences of elements, such as strings, numbers, or a mix of different data types. Lists are mutable, meaning that you can modify their content. Lists also support nesting, where a list can contain other lists. A key feature of lists is the ability to access elements through indexing (for single item) or slicing (for multiple items). While similar to arrays in other languages, like Matlab, it’s important to note that Python lists do not natively support element-wise operations, a functionality that is characteristic of NumPy arrays, a more advanced module that we will explore later.
# List with same data typesoil_texture = ["Sand", "Loam", "Silty clay", "Silt loam", "Silt"] # Strings (soil textural classes)mean_sand = [92, 40, 5, 20, 5] # Integers (percent sand for each soil textural class)print(soil_texture)print(type(soil_texture)) # Print type of data structure# List with mixed data types (strings, floats, and an entire dictionary)# Sample ID, soil texture, pH value, and multiple nutrient concentration in ppmsoil_sample = ["Sample_001", "Loam", 6.5, {"N": 20, "P": 15, "K": 5}]
Appending multiple items using the append() method will result in nested lists, while using the extend() method will results in merged lists. Give it a try and see if you can observe the difference.
# Remove list elementsoil_texture.remove("Clay")print(soil_texture)
Tuples are an efficient data structure defined by parentheses ( ), and are especially useful for storing fixed sets of elements like coordinates in a two-dimensional plane (e.g., point(x, y)) or triplets of color values in the RGB color space (e.g., (r, g, b)). While tuples can be nested within lists and support operations similar to lists, like indexing and slicing, the main difference is that tuples are immutable. Once a tuple is created, its content cannot be changed. This makes tuples particularly valuable for storing critical information that must remain constant in your code.
What happens if we want to change the first element of the third tuple from 0 to 255? Hint: colors[2][0] = 255
Dictionaries
Dictionaries are a highly versatile and popular data structure that have the peculiar ability to store and retrieve data using key-value pairs defined within curly braces { } or using the dict() function. This means that you can access, add, or modify data using unique keys, making dictionaries incredibly efficient for organizing and handling data using named references.
Dictionaries are particularly useful in situations where data doesn’t fit neatly into a matrix or table format and has multiple attributes, such as weather data, where you might store various weather parameters (temperature, humidity, wind speed) using descriptive keys. Unlike lists or tuples, dictionaries aren’t ordered by nature, but they excel in scenarios where each piece of data needs to be associated with a specific identifier. This structure provides a straightforward and intuitive way to manage complex, unstructured data.
# Weather data is often stored in dictionary or dictionary-like data structures.D = {'city':'Manhattan','state':'Kansas','coords': (39.208722, -96.592248, 350),'data': [{'date' : '20220101', 'precipitation' : {'value':12.5, 'unit':'mm', 'instrument':'TE525'},'air_temperature' : {'value':5.6, 'units':'Celsius', 'instrument':'ATMOS14'} }, {'date' : '20220102', 'precipitation' : {'value':0, 'unit':'mm', 'instrument':'TE525'},'air_temperature' : {'value':1.3, 'units':'Celsius', 'instrument':'ATMOS14'} }] }print(D)print(type(D))
The example above has several interesting features: - The city and state names are ordinary strings - The geographic coordinates (latitude, longitude, and elevation) are grouped using a tuple. - Weather data for each day is a list of dictionaries - In a single dictionary we have observations for a given timestamp together with the associated metadata including units, sensors, and location. Personally I think that dictionaries are ideal data structures in the context of reproducible science.
Note
The structure of the dictionary above depends on programmer preferences. For instance, rather than grouping all three coordinates into a tuple, a different programmer may prefer to store the values under individual name:value pairs, such as: latitude : 39.208722, longitude : -96.592248, and altitude : 350)
Sets
Sets are a unique and somewhat less commonly used data structure compared to lists, tuples, and dictionaries. Sets are defined with curly braces { } (without defining key-value pairs) or the set() function and are similar to mathematical sets, meaning they store unordered collections of unique items. In other words, Sets don’t allow for duplicate items, items cannot be changed (although items can be added and removed), and items are not indexed. This makes sets ideal for operations like determining membership, eliminating duplicates, and performing mathematical set operations such as unions, intersections, and differences. In scenarios like database querying or data analysis where you need to compare different datasets, sets can be used to find common elements (intersection), all elements (union), or differences between datasets.
For this particular example, you could leverage a set data structure to easily compare field notes from multiple agronomists collecting information across farmer fields in a given region and quickly determine dominant weed species.
Practice
Create a list with the scientific names of three common grasses in the US Great Plains: big bluestem, switchgrass, indian grass, and little bluestem.
Using a periodic table, store in a dictionary the name, symbol, atomic mass, melting point, and boiling point of oxygen, nitrogen, phosphorus, and hydrogen. Then, write two separate python statements to retrieve the boiling point of oxygen and hydrogen. Combined, these two atoms can form water, which has a boiling point of 100 degrees Celsius. How does this value compare to the boiling point of the individual elements?
Without editing the dictionary that you created in the previous point, append the properties for a new element: carbon.
Create a list of tuples encoding the latitude, longitude, and altitude of three national parks of your choice.