SQL Basics
As part of the coursework for the Back-End Engineer program at Turing School of Software & Design, I compiled the following reference guide to the basics of SQL and PostreSQL. The information is largely derived from the JumpstartLab SQL Tutorial and these PostgreSQL Exercises, which I highly recommend reading and completing for yourself. But if you’d like a more condensed summary, read on!
-
What is a database?
Databases are a means of storing, fetching, calculating, and sorting data.
-
What is SQL?
SQL (Structured Query Language) is a programming language with which one can create, modify, and arrange tables in a database, update or remove data on those tables, and retrieve data from the database using complex conditions and run calculations on that data.
-
What is SQLite3?
SQLite3 is one of many database systems based on SQL, a relatively simple one. It is pre-installed in Mac OSX and stores its data in plaintext files, whereas other systems would store their data in a separate space in the filesystem and only allow access through its tools.
-
What is a Table?
A table is a database object which has columns, each of which takes a specified data type and typically has one column that is its primary key, which rows of data can be stored on.
-
What is a primary key?
A primary key is a unique identifier for each row on a table, frequently set to autoincrement so that each new row is automatically assigned the next unused value and no values are reused.
-
What is a foreign key?
A foreign key is column on a table which allows rows to refer to the primary key on another table, in order to allow for relationships between tables and their rows.
-
Explanations of the most important SQL commands:
-
insert
INSERT
allows for the addition of new rows to a table. Its basic syntax looks like this:INSERT INTO table(column_name1, column_name2) VALUES ('String Value', 0);
-
select
SELECT
opens a query statement used to display data according to the criteria and conditions that follow it. Its basic syntax looks like the following:SELECT fruits.name, sales.created_at FROM fruits INNER JOIN sales ON fruits.id=sales.fruit_id;
The arguments that immediately follow
SELECT
specify which columns to retrieve data from; the argument followingFROM
specifies which table or tables to retrieve that data from, usingINNER JOIN
(or anotherJOIN
type) to specify how to relate those tables; and the argument followingON
specifies which columns should match each other for a specificJOIN
. Multiple joins can be made to the first table followingFROM
, each with its ownON
criteria, like this:SELECT customers.name, fruits.name, sales.created_at FROM fruits INNER JOIN sales ON fruits.id=sales.fruit_id INNER JOIN customers ON sales.customer_id=customers.id;
Alternatively,
*
can be used as the single argument ofSELECT
to return all the columns in a givenFROM
table. -
where
WHERE
further restricts which rows are returned by aSELECT
statement or altered by anUPDATE
orDELETE
statement by providing additional criteria. Its basic syntax looks like this:UPDATE fruits SET quantity=17 WHERE name='bananas';
DELETE FROM fruits WHERE name='oranges';
SELECT * FROM fruits WHERE name='apples';
In addition to comparing specific values,
WHERE
can use SQL methods to perform calculations, such as looking at the length of a string:SELECT * FROM fruits WHERE LENGTH(name)=7;
-
order by
ORDER BY
can be used in conjunction withSELECT
to order the results of the query by the values in a specific column, either by ascending (the default) or descending order. Its basic syntax looks like this:SELECT * from fruits ORDER BY name DESC;
-
inner join
As described above under
SELECT
,INNER JOIN
is used to specify the relationship between tables when retrieving data from a combination of two or more tables. It is an “inner join” in the sense that it returns only what the two tables have in common, like the internal overlapping space in a Venn diagram. The commonality in question is specified by what follows theON
keyword, which specifies which columns are being compared and how. Its basic syntax looks like this:SELECT fruits.name, sales.created_at FROM fruits INNER JOIN sales ON fruits.id=sales.fruit_id;
With this and other kinds of joins, an alias for any table can be given by simply inserting it one space after the table. E.g.,
SELECT frt.name, sls.created_at FROM fruits frt INNER JOIN sales sls ON frt.id=sls.fruit_id;
-
-
How can you limit which columns you select from a table?
By providing specific column names after
SELECT
and beforeFROM
, only the data in those columns is retrieve from rows in the table specified byFROM
. E.g.,SELECT name, membercost FROM cd.facilities
-
How can you limit which rows you select from a table?
The
WHERE
keyword can be used to filter which rows are returned from those candidates specified bySELECT
andFROM
. It can take one or more conditions by which to filter them. E.g.,SELECT * FROM cd.facilities WHERE membercost > 0
Comparisons can be made against dates if the column uses a date-related datatype by referring to dates using the YYYY-MM-DD format as a string. E.g.,
SELECT memid, surname, firstname, joindate FROM cd.members WHERE joindate > '2012-08-31';
Multiple
WHERE
conditions are combined using theAND
andOR
keywords. E.g.,SELECT facid, name, membercost, monthlymaintenance FROM cd.facilities WHERE membercost > 0 AND membercost < monthlymaintenance / 50;
Additionally, the
LIKE
keyword can be used in place of a comparison operator (i.e., =, <, >, >=, <=, etc.) to perform pattern matching. It is provided a string in quotations, wherein%
can take the place of any number of characters and_
takes the place of any single character. E.g.,SELECT * FROM cd.facilities WHERE name LIKE '%Tennis%';
Finally, the
IN
keyword can be used to look for the existence of matching values in a given column or list of values in parentheses. E.g. the following example selects only those rows where itsfacid
value exists in the list(1, 5)
:SELECT * FROM cd.facilities WHERE facid IN (1,5);
The results of two or more
SELECT
statements can be combined into one output usingUNION
between them, as long as the number of columns and their respective datatypes match. E.g.,SELECT surname FROM cd.members UNION SELECT name FROM cd.facilities;
-
How can you give a selected column a different name in your output?
Using the
AS
keyword, a column given as an argument forSELECT
can be renamed. E.g.,SELECT memid AS member_id FROM cd.facilities;
Additionally, using the
CASE...WHEN...THEN...END
keywords which SQL uses for conditional statements, a new column can be made with conditional values. E.g.,SELECT name, CASE WHEN (monthlymaintenance > 100) THEN 'expensive' ELSE 'cheap' END AS cost FROM cd.facilities;
-
How can you sort your output from a SQL statement?
The
ORDER BY
keyword sorts the retrieved rows by a specified column; the sort direction can optionally be specified after the column name withASC
for ascending (the default order) andDESC
for descending. E.g.,SELECT surname FROM cd.members ORDER BY surname;
The number of results from a SQL statement can be limitied using the
LIMIT
keyword at the end of the statement followed by an integer specifying the max number of rows to display.SELECT surname FROM cd.members ORDER BY surname LIMIT 10;
And the results can further by modified by using the
SELECT DISCTINCT
keyword, which does not display additional duplicate rows (that is, rows with identical values across all selected columns) beyond the first.SELECT DISTINCT surname FROM cd.members ORDER BY surname LIMIT 10;
-
What is joining? When do you need to join?
Joining combines two (or more) tables into one by the use of shared values (i.e., the primary key on one table matches the foreign key on another), resulting in one table with rows that have been combined where those values are shared. Joining is needed when data across different tables needs to be compared or simply retrieved and displayed together where they correlate. The
INNER JOIN
keyword introduces the additional tables beyond the first after itsFROM
keyword, followed by theON
keyword which specifies which values should equal one another. E.g.,SELECT starttime FROM cd.bookings INNER JOIN cd.members ON cd.bookings.memid=cd.members.memid WHERE cd.members.firstname='David' AND cd.members.surname='Farrell';
Outer joins come in the varieties of
LEFT JOIN
,RIGHT JOIN
, andFULL JOIN
.LEFT JOIN
produces all of the rows on the “left” side of the join (i.e., the table introduced byFROM
) whether or not they match a foreign key on the “right” side of the join. In other words, in a left outer join the rows from the right side are optional rather than mandatory.RIGHT JOIN
works the same but in the other direction, andFULL JOIN
produces all rows on both sides and is rarely used. E.g.,SELECT mems.firstname AS memfname, mems.surname AS memsname, refs.firstname AS recfname, refs.surname AS recsname FROM cd.members mems LEFT JOIN cd.members refs ON mems.recommendedby = refs.memid ORDER BY mems.surname, mems.firstname;
-
What is an aggregate function?
An aggregate function takes all the possible results from a group and performs some kind of calculation on them, returning a scalar (i.e., single) value.
-
List three aggregate functions and what they do.
MAX()
takes a group of rows and returns the one that outputs the highest value. E.g., in the following line all the values in thejoindate
column on thecd.members
table are compared and the highest value is returned:SELECT MAX(joindate) AS latest FROM cd.members;
-
COUNT()
takes a group of rows and returns the total number of them.COUNT(*)
returns the total number of rows in a given result set (i.e., afterWHERE
andJOIN
keywords have filtered it).COUNT(column_name)
returns the total number of non-null values in that column.COUNT(DISTINCT column_name)
counts the total number of unique non-null values. SUM()
takes a column and returns the sum of the values in that column, provided the data type is numeric.
-
What does the
group
statement do?The
GROUP BY
keywords batch rows into groups, one for each distinct/unique value in the given column or, if more than one is provided, for each unique combination of those columns. E.g.,SELECT department FROM employees GROUP BY department
-
How does the
group
statement relate to aggregates?The
GROUP BY
keywords are particularly suited to allowing aggregates to be executed on groups of rows and then used as column values, so that aggregate values can be displayed alongside the group to which they apply. For example:SELECT names.recommendedby, COUNT(recommendedby) FROM cd.members names WHERE names.recommendedby IS NOT null GROUP BY names.recommendedby ORDER BY names.recommendedby
GROUP BY
groups rows after anyWHERE
statements, but before anyHAVING
statements, soHAVING
is useful for filtering which groups are retrieved and displayed by a query and it can take aggregates as arguments whileWHERE
cannot.