< Previous Page | Home Page | Next Page >
Databases are logically modelled clusters of information, or data. Any collection of data is a database, regardless of how or where it is stored. Even a file cabinet containing payroll information is a database, as is a stack of hospital patient forms, or a company’s collection of customer information spread across multiple locations.
As was discussed in the lecture, Database management systems (DBMS) are computer programs that allow users to interact with a specific database. A DBMS allows users to control access to a database, write data, run queries, and perform any other tasks related to database management.
In order to perform any of these tasks, though, the DBMS must have some kind of underlying model that defines how the data is organized. The relational model is one approach for organizing data that has found wide use in database software.
A relational model can be described much like a spreadsheet:
Data is organized into tables (relations) that represent a collection of similar objects (e.g. employees or sales). The columns of the table represent the attributes that members of the collection share (first name, last name, department, order). Each row in the table represents an individual member of the collection (one employee or one individual sale). And the values in the row represent the attributes of that individual (Anna, Wrigley, Sales Department).
Much of the power of a relational database lies in the ability to query these relations, both within a table (give me all employees that work in the sales department) and among tables (give me a list of customers that were serviced by Thea Comiskey)
In the relational model, each table contains at least one column that can be used to uniquely identify each row, called a primary key. This is important, because it means that users don’t need to know where their data is physically stored on a machine; instead, their DBMS can keep track of each record and return them on an ad hoc basis. In turn, this means that records have no defined logical order, and users have the ability to return their data in whatever order or through whatever filters they wish.
If you have two tables that you’d like to associate with one another, one way you can do so is with a foreign key. A foreign key is essentially a copy of one table’s (the “parent” table) primary key (e.g. the employees) inserted into a column in another table (the “child”, e.g. the sales done by an employee).
The relational model’s structural elements help to keep data stored in an organized way, but storing data is only useful if you can retrieve it. To retrieve information from an RDBMS, you can issue a query, or a structured request for a set of information.
Most relational databases use SQL to manage and query data. SQL allows you to filter and manipulate query results with a variety of clauses, predicates, and expressions, giving you fine control over what data will appear in the result set.
Before we have a look at how SQL can be used with python, we'll quickly look at the syntax of SQL in the next chapter.
SQL statements fall into two primary categories: data definition language (DDL) and data manipulation language (DML). We use DDL statements to alter the structure of our database. For example, a CREATE TABLE statement is an example of a DDL statement. DML statements, on the other hand, are used to manipulate the data in the database. INSERT and UPDATE statements are examples of DML statements.
In this course, you will mostly be looking at how to interact with, retrieve info from or change existing databases. This is why we’re going to focus on DML statements.
While the way SQL is written depends on the specific DBMS you use, the following two rules are pretty universal:
In SQLite (a version of the SQL Language used in this course) several keywords exist - a full list can be found here. For your exam the following keywords are of importance (as seen in the lecture).
SQL keyword | Description |
---|---|
SELECT |
Retrieves data from one or more tables. |
FROM |
Tables involved in the query. Required in every SELECT. |
WHERE |
Criteria for selection that determine the rows to be retrieved, deleted or updated. Optional in a SQL statement. |
GROUP BY |
Criteria for grouping rows. Optional in a SELECT query. |
ORDER BY |
Criteria for ordering rows. Optional in a SELECT query. |
LIMIT |
Criteria for constraining the number of rows returned . Optional in a SELECT query. |
INNER JOIN |
Merge rows from multiple tables. |
INSERT |
Insert rows into a specified table. |
UPDATE |
Update rows in a specified table. |
DELETE |
Delete rows from a specified table. |
Let's have a more detailed look at each keyword below.
< Previous Page | Home Page | Next Page >