Open In Colab

Using SQL in Python

After we have explored the most important pieces of SQL Syntax in the previous chapters, we'll now have a look at how we can use pyhton to work with SQL (specifically SQLite) databases.

Luckily, this is rather easy once we have understood the underlying SQL syntax.

In a first step, we need to install two libraries: sqlite3 and pandas. The first one helps us with connecting to and manipluating databases whereas pandas makes reading from a database very easy.

In [ ]:
import sqlite3
import pandas as pd # it is convention to install pandas as pd

Loading a SQL Database in Python

Once we have imported the libraries above, we can connect our Python kernel to an existing database. This is done with the following command:

DB = sqlite3.connect('<our database>')

The sqlite3 module provides a straightforward interface for interacting with SQLite databases. A connection object is created using sqlite3.connect(). Once our session has ended, we must close the connection as follows:

DB.close()

While it is possible to directly read from the database using the sqlite3 library, using pandas is way easier.

Fetching Information via pd.read_sql

As stated above, it is preferable to use the pandas library, when trying to read data from an existing database. This is because pandas directly transforms the received information in a so called DataFrame (which will be discussed in the Data Science chapters). Habdlibg a DataFrame is much easier than reading the tuples or lists that are returned by the sqlite3 library.

To read a database we will use the pd.read_sql() function, which consists of the following syntax.


pandas.read_sql(sql, DB, index_col=None)

where

  • sql contains a string of the applicable SQL syntax (by convention we do triple strings ("""))
  • DB specifies our Database object from above
  • index_col specifies the column name (as a string), which will be used as the index for the returned DataFrame object. If index_col is not passed, index values starting from 0 appear to the left of the DataFrame’s rows.

As stated before, this function returns a DataFrame object.

Let's consider our STUDENT example from before.


ROLL_NO NAME AGE BRANCH CITY
10001 Kate 22 CSE Mumbai
10002 Richard 21 ECE Delhi
10003 Rick 33 ME Chennai
10004 Peter 26 CSE Delhi
10005 Peter 22 CSE Noida
10006 Mark 22 ECE Jaipur
10007 Brad 23 ME Rampur

Let's say we wanted to

  1. Open the database in python
  2. Fetch the first four columns (we do not care about the city in this case) of students that are in the CSE branch, ordered according to their age.
  3. Return the fetched information as a pandas DataFrame and print it

This would look as follows:

import sqlite3
import pandas as pd

DB = sqlite3.connect(STUDENT.db)

DataFrame = pd.read_sql("""
                          SELECT ROLL_NO, NAME, AGE, BRANCH
                          FROM 
                            STUDENT
                          WHERE
                            BRANCH = 'CSE'
                          ORDER BY
                            AGE ASC;
                            """, 
                            DB,
                            index_col = ['ROLL_NO'])



print(DataFrame)

DB.close()

Writing to a database via sqlite3

To manipulate an existing Database, pandas is, unfortunately, of little use. In such a case, we have to do the following:

First we connect the database to Python as explained above.

DB = sqlite3.connect('<our database>')

Once this is done, we open a so-called cursor object. A cursor can be understood as an iterable (a tuple or a list) that shows theresults from Database queries.

cursor = DB.cursor()

Using the .execute() method of the newly created cursor object, we are then able to directly pass SQL syntax to our Database.

cursor = cursor.execute("""UPDATE STUDENT SET Name='Brad'
                            WHERE NAME='Kate'""")

# we rename Kate to Brad

So far so good. If we want to print out the updated DataBase, we must consider that the cursor object, is an iterable. A print(cursor)would therefore merely result in an ugly string. This can be solved by printing out the individual rows of our Database.

for row in cursor:
  print(row)

Lastly, we need to close our Database - and thus save the changes to it.

DB.close()

The full documentation of sqlite3`s cursor object, can be found here (not relevant for the exam)