32 SQLite Database

Author

Andres Patrignani

Published

January 15, 2024

Keywords

database, sql, data

SQLite is a lightweight, disk-based database that doesn’t require a separate server. Python comes with a built-in sqlite3 module, which allows you to work with SQLite databases. SQL stands for Structured Query Language, which is a standardized programming language that is used to manage relational databases. To learn more about SQLite daabases, in this tutorial we will:

create a simple database by adding and removing a few entries. This example illustrates how to start a new database from scratch without prior tabulated data.
create a more advanced database using a real-world dataset and the Pandas library. This example illustrates how to convert an existing spreadsheet with data into a database that you can then continue to add, update, modify, or delete data entries.

Additional software

To access and inspect the database I recommend using an open source tool like sqlitebrowser. With this tool you can also create, design, and edit sqlite databases, but we will do some of these steps using Python.

Key commands

CREATE TABLE: A SQL command used to create a new table in a database.
INSERT: A SQL command used to add new rows of data to a table in the database.
SELECT: A SQL command used to query data from a table, returning rows that match the specified criteria.
UPDATE: A SQL command used to modify existing data in a table.
DELETE: A SQL command used to remove rows from a table in the database.

Data types

SQLite supports a variety of data types:

TEXT: For storing character data. SQLite supports UTF-8, UTF-16BE, and UTF-16LE encodings.
INTEGER: For storing integer values. The size can be from 1 byte to 8 bytes, depending on the magnitude of the value.
REAL: For storing floating-point values. It is a double-precision (8-byte) floating point number.
BLOB: Stands for Binary Large Object. Used to store data exactly as it was input, such as images, files, or binary data.
NUMERIC: This type can be used for both integers and floating-point numbers. SQLite decides whether to use integer or real based on the value’s nature.
BOOLEAN: SQLite does not have a separate boolean storage class. Instead, boolean values are stored as integers 0 (false) and 1 (true).
DATE and TIME: SQLite does not have a storage class set aside for storing dates and/or times. Instead, they are stored as TEXT (as ISO8601 strings), REAL (as Julian day numbers), or INTEGER (as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC).

In SQLite the datatype you specify for a column acts more like a hint than a strict enforcement, allowing for flexibility in the types of data that can be inserted into a column. This is a distinctive feature compared to more rigid type systems in other database management systems.

Set up a simple database

After importing the sqlite3 module we create a connection to a new SQLite database. If the file doesn’t exist, the SQLite module will create it. This is convenient since we don’t have to be constantly checking whether the database exists or worry about overwriting the database.

# Import modules
import sqlite3

# Connect to the database
conn = sqlite3.connect('soils.db')

# Create a cursor object using the cursor() method
cursor = conn.cursor()

# Create table
cursor.execute('''CREATE TABLE soils
                 (id INTEGER PRIMARY KEY, date TEXT, lat REAL, lon REAL, vwc INTEGER);''')

# Save (commit) the changes
conn.commit()

Add Data

In SQLite databases, the construction (?,?,?,?) is used as a placeholder for parameter substitution in SQL statements, especially with the INSERT, UPDATE, and SELECT commands. This construction offers the following advantages:

SQL Injection Prevention: By using placeholders, you prevent SQL injection, a common web security vulnerability where attackers can interfere with the queries that an application makes to its database.
Data Handling: It automatically handles the quoting of strings and escaping of special characters, reducing errors in SQL query syntax due to data.
Query Efficiency: When running similar queries multiple times, parameterized queries can improve performance as the database engine can reuse the query plan and execution path.

Each ? is a placeholder that is replaced with provided data values in a tuple when the execute method is called. This ensures that the values are properly formatted and inserted into the database, enhancing security and efficiency.

# Insert a row of data
obs = ('2024-01-02', 37.54, -98.78, 38)
cursor.execute("INSERT INTO soils (date, lat, lon, vwc) VALUES (?,?,?,?)", obs1)

# Save (commit) the changes
conn.commit()

Add with multiple entries

You can insert multiple entries at once using executemany()

# A list of multiple crop records
new_data = [('2024-01-02', 36.54, -98.12, 18),
            ('2024-04-14', 38.46, -99.78, 21),
            ('2024-05-23', 38.35, -98.01, 29)]

# Inserting multiple records at a time
cursor.executemany("INSERT INTO soils (date, lat, lon, vwc) VALUES (?,?,?,?)", new_data)

# Save (commit) the changes
conn.commit()

# Retrieve all data
cursor.execute('SELECT * FROM soils;')
for row in cursor.fetchall():
    print(row)

(1, '2024-01-02', 37.54, -98.78, 38)
(2, '2024-01-02', 36.54, -98.12, 18)
(3, '2024-04-14', 38.46, -99.78, 21)
(4, '2024-05-23', 38.35, -98.01, 29)

Query data

To query specific data from a table in an SQLite database using the SELECT statement, you can specify conditions using the WHERE clause. Here’s a basic syntax:

SELECT col1, col2, ... FROM table_name WHERE condition1 AND condition2;

To execute these queries remember to establish a connection, create a cursor object, execute the query using cursor.execute(query), and then use cursor.fetchall() to retrieve the results. Close the connection to the database once you’re done.

# Retrieve all data
cursor.execute('SELECT * FROM soils;')
for row in cursor.fetchall():
    print(row)

(1, '2024-01-02', 37.54, -98.78, 38)
(2, '2024-01-02', 36.54, -98.12, 18)
(3, '2024-04-14', 38.46, -99.78, 21)
(4, '2024-05-23', 38.35, -98.01, 29)

# Retrieve specific data
cursor.execute('SELECT date FROM soils WHERE vwc >= 30;')
for row in cursor.fetchall():
    print(row)

('2024-01-02',)

# Retrieve specific data (note that we need spacify the date as a string)
cursor.execute('SELECT lat,lon FROM soils WHERE date == "2024-03-07";')
for row in cursor.fetchall():
    print(row)

cursor.execute('SELECT * FROM soils WHERE vwc >=30 AND vwc < 40;')
for row in cursor.fetchall():
    print(row)

(1, '2024-01-02', 37.54, -98.78, 38)

Modify data

You can update records that match certain criteria.

# Update quantity for specific date note that this will update both rows with the same date)
cursor.execute("UPDATE soils SET vwc = 15 WHERE date = '2024-01-02';")

# Save (commit) the changes
conn.commit()

Update SQLite

Note that the previous command updates both rows with the same date. To only specify a single row we need to be more specific in our command.

# Retrieve all data
cursor.execute('SELECT * FROM soils;')
for row in cursor.fetchall():
    print(row)

(1, '2024-01-02', 37.54, -98.78, 15)
(2, '2024-01-02', 36.54, -98.12, 15)
(3, '2024-04-14', 38.46, -99.78, 21)
(4, '2024-05-23', 38.35, -98.01, 29)

# Update quantity for specific date note that this will update both rows with the same date)
cursor.execute("UPDATE soils SET vwc = 5 WHERE date = '2024-01-02' AND id = 2;")

# Save (commit) the changes
conn.commit()

# Retrieve all data
cursor.execute('SELECT * FROM soils;')
for row in cursor.fetchall():
    print(row)

(1, '2024-01-02', 37.54, -98.78, 15)
(2, '2024-01-02', 36.54, -98.12, 5)
(3, '2024-04-14', 38.46, -99.78, 21)
(4, '2024-05-23', 38.35, -98.01, 29)

Remove data

To remove records, use the DELETE statement.

# Delete the Wheat record
cursor.execute("DELETE FROM soils WHERE id = 3")

# Save (commit) the changes
conn.commit()

# Retrieve all data
cursor.execute('SELECT * FROM soils;')
for row in cursor.fetchall():
    print(row)

(1, '2024-01-02', 37.54, -98.78, 15)
(2, '2024-01-02', 36.54, -98.12, 5)
(4, '2024-05-23', 38.35, -98.01, 29)

Add new column/header

# Add a new column
# Use the DEFAULT construction like: DEFAULT 'Unknown' 
# to populate new column with custom value
cursor.execute("ALTER TABLE soils ADD COLUMN soil_type TEXT;")

# Retrieve data one more time before we close the database
cursor.execute('SELECT * FROM soils')
for row in cursor.fetchall():
    print(row)

(1, '2024-01-02', 37.54, -98.78, 15, None)
(2, '2024-01-02', 36.54, -98.12, 5, None)
(4, '2024-05-23', 38.35, -98.01, 29, None)

Closing the connection

Once done with the operations, close the connection to the database.

conn.close()

Use Pandas to set up database

For this exercise we will use a table of sorghum yields for Franklin county, KS obtained in 2023. The dataset contains breeder brand, hybrid name, yield, moisture, and total weight. The spreadsheet contains metadata on the first line (which we are going to skip) and in the last few rows. From these last rows, we will use functions to match strings and retrieve the planting and harvest dates of the trial. We will then add this information to the dataframe, and then create an SQLite database.

Pandas provides all the methods to read, clean, and export the Dataframe to a SQLite database.

# Import modules
import pandas as pd
import sqlite3

df = pd.read_csv('../datasets/sorghum_franklin_county_2023.csv',
                skiprows=[0,1,3])
# Inspect first few rows
df.head()

	BRAND	NAME	YIELD	PAVG	MOIST	TW
0	POLANSKY	5719	137.2	107.8	14.7	59.1
1	DYNA-GRO	M60GB88	135.3	106.3	14.0	57.9
2	DYNA-GRO	GX22936	134.7	105.8	13.9	58.7
3	POLANSKY	5522	132.3	103.9	13.9	58.4
4	DYNA-GRO	GX22932	131.8	103.6	14.5	59.1

# Inspect last few rows
df.tail(10)

	BRAND	NAME	YIELD	PAVG	MOIST	TW
18	DYNA-GRO	M67GB87	120.5	94.7	13.9	56.1
19	DYNA-GRO	M59GB94	117.7	92.5	13.8	57.7
20	NaN	AVERAGE	127.2	100.0	14.1	58.3
21	NaN	CV (%)	8.4	8.4	0.3	0.8
22	NaN	LSD (0.05)	5.2	4.1	0.3	0.3
23	*Yields must differ by more than the LSD value...	NaN	NaN	NaN	NaN	NaN
24	different.	NaN	NaN	NaN	NaN	NaN
25	Planted 5-24-23	NaN	NaN	NaN	NaN	NaN
26	Harvested 11-15-23	NaN	NaN	NaN	NaN	NaN
27	Fertility 117-38-25-20 Strip till	NaN	NaN	NaN	NaN	NaN

df.dropna(subset='BRAND', inplace=True) 
df.tail(10)

	BRAND	NAME	YIELD	PAVG	MOIST	TW
15	DYNA-GRO	M63GB78	122.7	96.4	13.9	58.0
16	DYNA-GRO	GX22937	121.3	95.4	14.2	58.4
17	DYNA-GRO	GX22923	121.2	95.2	13.7	55.8
18	DYNA-GRO	M67GB87	120.5	94.7	13.9	56.1
19	DYNA-GRO	M59GB94	117.7	92.5	13.8	57.7
23	*Yields must differ by more than the LSD value...	NaN	NaN	NaN	NaN	NaN
24	different.	NaN	NaN	NaN	NaN	NaN
25	Planted 5-24-23	NaN	NaN	NaN	NaN	NaN
26	Harvested 11-15-23	NaN	NaN	NaN	NaN	NaN
27	Fertility 117-38-25-20 Strip till	NaN	NaN	NaN	NaN	NaN

# Extract planting date
idx = df['BRAND'].str.contains("Planted")
planting_date_str = df.loc[idx, 'BRAND'].values[0]
planting_date = planting_date_str.split(' ')[1]
print(planting_date)

5-24-23

# Extract harvest date
idx = df['BRAND'].str.contains("Harvested")
harvest_date_str = df.loc[idx, 'BRAND'].values[0]
harvest_date = harvest_date_str.split(' ')[1]
print(harvest_date)

11-15-23

# Once we are done extracting metadata, let's remove the last few rows
df = df.iloc[:-5]

# Convert header names to lower case to avoid conflict with SQL syntax
df.rename(str.lower, axis='columns', inplace=True)
df.head()

	brand	name	yield	pavg	moist	tw	planting_date	harvest_date
0	POLANSKY	5719	137.2	107.8	14.7	59.1	5-24-23	11-15-23
1	DYNA-GRO	M60GB88	135.3	106.3	14.0	57.9	5-24-23	11-15-23
2	DYNA-GRO	GX22936	134.7	105.8	13.9	58.7	5-24-23	11-15-23
3	POLANSKY	5522	132.3	103.9	13.9	58.4	5-24-23	11-15-23
4	DYNA-GRO	GX22932	131.8	103.6	14.5	59.1	5-24-23	11-15-23

# Add planting and harvest date to Dataframe to make it more complete
df['planting_date'] = planting_date
df['harvest_date'] = harvest_date
df.head()

	brand	name	yield	pavg	moist	tw	planting_date	harvest_date
0	POLANSKY	5719	137.2	107.8	14.7	59.1	5-24-23	11-15-23
1	DYNA-GRO	M60GB88	135.3	106.3	14.0	57.9	5-24-23	11-15-23
2	DYNA-GRO	GX22936	134.7	105.8	13.9	58.7	5-24-23	11-15-23
3	POLANSKY	5522	132.3	103.9	13.9	58.4	5-24-23	11-15-23
4	DYNA-GRO	GX22932	131.8	103.6	14.5	59.1	5-24-23	11-15-23

# Use Pandas to turn DataFrame into a SQL Database

# Connect to SQLite database (if it doesn't exist, it will be created)
conn = sqlite3.connect('sorghum_trial.db')

# Write the data to a sqlite table
df.to_sql('sorghum_trial', conn, index=False, if_exists='replace') # to overwrite use option if_exists='replace'

# Close the connection
conn.close()

Connect, access all data, and close database

# Connect to SQLite database (if it doesn't exist, it will be created)
conn = sqlite3.connect('sorghum_trial.db')

# Create cursor
cursor = conn.cursor()

# Access all data
cursor.execute('SELECT * FROM sorghum_trial')
for row in cursor.fetchall():
    print(row)
    
# Access all data
print('') # Add some white space
cursor.execute('SELECT brand, name FROM sorghum_trial WHERE yield > 130')
for row in cursor.fetchall():
    print(row)
    
# Close the connection
conn.close()

('POLANSKY', '5719', 137.2, 107.8, 14.7, 59.1, '5-24-23', '11-15-23')
('DYNA-GRO', 'M60GB88', 135.3, 106.3, 14.0, 57.9, '5-24-23', '11-15-23')
('DYNA-GRO', 'GX22936', 134.7, 105.8, 13.9, 58.7, '5-24-23', '11-15-23')
('POLANSKY', '5522', 132.3, 103.9, 13.9, 58.4, '5-24-23', '11-15-23')
('DYNA-GRO', 'GX22932', 131.8, 103.6, 14.5, 59.1, '5-24-23', '11-15-23')
('DYNA-GRO', 'M72GB71', 130.9, 102.9, 14.5, 59.0, '5-24-23', '11-15-23')
('MATURITY CHECK', 'MED', 128.7, 101.1, 14.0, 58.2, '5-24-23', '11-15-23')
('DYNA-GRO', 'M71GR91', 128.7, 101.1, 14.4, 59.3, '5-24-23', '11-15-23')
('PIONEER', '86920', 128.1, 100.7, 13.9, 57.9, '5-24-23', '11-15-23')
('MATURITY CHECK', 'EARLY', 127.9, 100.5, 14.2, 58.8, '5-24-23', '11-15-23')
('POLANSKY', '5629', 126.5, 99.4, 13.8, 57.2, '5-24-23', '11-15-23')
('MATURITY CHECK', 'LATE', 126.2, 99.2, 14.2, 58.3, '5-24-23', '11-15-23')
('PIONEER', '84980', 125.5, 98.6, 14.1, 58.8, '5-24-23', '11-15-23')
('DYNA-GRO', 'M60GB31', 124.9, 98.2, 14.2, 59.2, '5-24-23', '11-15-23')
('DYNA-GRO', 'GX22934', 122.8, 96.5, 14.6, 59.5, '5-24-23', '11-15-23')
('DYNA-GRO', 'M63GB78', 122.7, 96.4, 13.9, 58.0, '5-24-23', '11-15-23')
('DYNA-GRO', 'GX22937', 121.3, 95.4, 14.2, 58.4, '5-24-23', '11-15-23')
('DYNA-GRO', 'GX22923', 121.2, 95.2, 13.7, 55.8, '5-24-23', '11-15-23')
('DYNA-GRO', 'M67GB87', 120.5, 94.7, 13.9, 56.1, '5-24-23', '11-15-23')
('DYNA-GRO', 'M59GB94', 117.7, 92.5, 13.8, 57.7, '5-24-23', '11-15-23')

('POLANSKY', '5719')
('DYNA-GRO', 'M60GB88')
('DYNA-GRO', 'GX22936')
('POLANSKY', '5522')
('DYNA-GRO', 'GX22932')
('DYNA-GRO', 'M72GB71')