SQL Basics
As part of the coursework for the Back-End Engineer program at Turing School of Software & Design, I compiled the following reference guide to the basics of SQL and PostreSQL. The information is largely derived from the JumpstartLab SQL Tutorial and these PostgreSQL Exercises, which I highly recommend reading and completing for yourself. But if you’d like a more condensed summary, read on!
-
What is a database?
Databases are a means of storing, fetching, calculating, and sorting data.
-
What is SQL?
SQL (Structured Query Language) is a programming language with which one can create, modify, and arrange tables in a database, update or remove data on those tables, and retrieve data from the database using complex conditions and run calculations on that data.
-
What is SQLite3?
SQLite3 is one of many database systems based on SQL, a relatively simple one. It is pre-installed in Mac OSX and stores its data in plaintext files, whereas other systems would store their data in a separate space in the filesystem and only allow access through its tools.
-
What is a Table?
A table is a database object which has columns, each of which takes a specified data type and typically has one column that is its primary key, which rows of data can be stored on.
-
What is a primary key?
A primary key is a unique identifier for each row on a table, frequently set to autoincrement so that each new row is automatically assigned the next unused value and no values are reused.
-
What is a foreign key?
A foreign key is column on a table which allows rows to refer to the primary key on another table, in order to allow for relationships between tables and their rows.
-
Explanations of the most important SQL commands:
-
insert
INSERTallows for the addition of new rows to a table. Its basic syntax looks like this:INSERT INTO table(column_name1, column_name2) VALUES ('String Value', 0); -
select
SELECTopens a query statement used to display data according to the criteria and conditions that follow it. Its basic syntax looks like the following:SELECT fruits.name, sales.created_at FROM fruits INNER JOIN sales ON fruits.id=sales.fruit_id;The arguments that immediately follow
SELECTspecify which columns to retrieve data from; the argument followingFROMspecifies which table or tables to retrieve that data from, usingINNER JOIN(or anotherJOINtype) to specify how to relate those tables; and the argument followingONspecifies which columns should match each other for a specificJOIN. Multiple joins can be made to the first table followingFROM, each with its ownONcriteria, like this:SELECT customers.name, fruits.name, sales.created_at FROM fruits INNER JOIN sales ON fruits.id=sales.fruit_id INNER JOIN customers ON sales.customer_id=customers.id;Alternatively,
*can be used as the single argument ofSELECTto return all the columns in a givenFROMtable. -
where
WHEREfurther restricts which rows are returned by aSELECTstatement or altered by anUPDATEorDELETEstatement by providing additional criteria. Its basic syntax looks like this:UPDATE fruits SET quantity=17 WHERE name='bananas';DELETE FROM fruits WHERE name='oranges';SELECT * FROM fruits WHERE name='apples';In addition to comparing specific values,
WHEREcan use SQL methods to perform calculations, such as looking at the length of a string:SELECT * FROM fruits WHERE LENGTH(name)=7; -
order by
ORDER BYcan be used in conjunction withSELECTto order the results of the query by the values in a specific column, either by ascending (the default) or descending order. Its basic syntax looks like this:SELECT * from fruits ORDER BY name DESC; -
inner join
As described above under
SELECT,INNER JOINis used to specify the relationship between tables when retrieving data from a combination of two or more tables. It is an “inner join” in the sense that it returns only what the two tables have in common, like the internal overlapping space in a Venn diagram. The commonality in question is specified by what follows theONkeyword, which specifies which columns are being compared and how. Its basic syntax looks like this:SELECT fruits.name, sales.created_at FROM fruits INNER JOIN sales ON fruits.id=sales.fruit_id;With this and other kinds of joins, an alias for any table can be given by simply inserting it one space after the table. E.g.,
SELECT frt.name, sls.created_at FROM fruits frt INNER JOIN sales sls ON frt.id=sls.fruit_id;
-
-
How can you limit which columns you select from a table?
By providing specific column names after
SELECTand beforeFROM, only the data in those columns is retrieve from rows in the table specified byFROM. E.g.,SELECT name, membercost FROM cd.facilities -
How can you limit which rows you select from a table?
The
WHEREkeyword can be used to filter which rows are returned from those candidates specified bySELECTandFROM. It can take one or more conditions by which to filter them. E.g.,SELECT * FROM cd.facilities WHERE membercost > 0Comparisons can be made against dates if the column uses a date-related datatype by referring to dates using the YYYY-MM-DD format as a string. E.g.,
SELECT memid, surname, firstname, joindate FROM cd.members WHERE joindate > '2012-08-31';Multiple
WHEREconditions are combined using theANDandORkeywords. E.g.,SELECT facid, name, membercost, monthlymaintenance FROM cd.facilities WHERE membercost > 0 AND membercost < monthlymaintenance / 50;Additionally, the
LIKEkeyword can be used in place of a comparison operator (i.e., =, <, >, >=, <=, etc.) to perform pattern matching. It is provided a string in quotations, wherein%can take the place of any number of characters and_takes the place of any single character. E.g.,SELECT * FROM cd.facilities WHERE name LIKE '%Tennis%';Finally, the
INkeyword can be used to look for the existence of matching values in a given column or list of values in parentheses. E.g. the following example selects only those rows where itsfacidvalue exists in the list(1, 5):SELECT * FROM cd.facilities WHERE facid IN (1,5);The results of two or more
SELECTstatements can be combined into one output usingUNIONbetween them, as long as the number of columns and their respective datatypes match. E.g.,SELECT surname FROM cd.members UNION SELECT name FROM cd.facilities; -
How can you give a selected column a different name in your output?
Using the
ASkeyword, a column given as an argument forSELECTcan be renamed. E.g.,SELECT memid AS member_id FROM cd.facilities;Additionally, using the
CASE...WHEN...THEN...ENDkeywords which SQL uses for conditional statements, a new column can be made with conditional values. E.g.,SELECT name, CASE WHEN (monthlymaintenance > 100) THEN 'expensive' ELSE 'cheap' END AS cost FROM cd.facilities; -
How can you sort your output from a SQL statement?
The
ORDER BYkeyword sorts the retrieved rows by a specified column; the sort direction can optionally be specified after the column name withASCfor ascending (the default order) andDESCfor descending. E.g.,SELECT surname FROM cd.members ORDER BY surname;The number of results from a SQL statement can be limitied using the
LIMITkeyword at the end of the statement followed by an integer specifying the max number of rows to display.SELECT surname FROM cd.members ORDER BY surname LIMIT 10;And the results can further by modified by using the
SELECT DISCTINCTkeyword, which does not display additional duplicate rows (that is, rows with identical values across all selected columns) beyond the first.SELECT DISTINCT surname FROM cd.members ORDER BY surname LIMIT 10; -
What is joining? When do you need to join?
Joining combines two (or more) tables into one by the use of shared values (i.e., the primary key on one table matches the foreign key on another), resulting in one table with rows that have been combined where those values are shared. Joining is needed when data across different tables needs to be compared or simply retrieved and displayed together where they correlate. The
INNER JOINkeyword introduces the additional tables beyond the first after itsFROMkeyword, followed by theONkeyword which specifies which values should equal one another. E.g.,SELECT starttime FROM cd.bookings INNER JOIN cd.members ON cd.bookings.memid=cd.members.memid WHERE cd.members.firstname='David' AND cd.members.surname='Farrell';Outer joins come in the varieties of
LEFT JOIN,RIGHT JOIN, andFULL JOIN.LEFT JOINproduces all of the rows on the “left” side of the join (i.e., the table introduced byFROM) whether or not they match a foreign key on the “right” side of the join. In other words, in a left outer join the rows from the right side are optional rather than mandatory.RIGHT JOINworks the same but in the other direction, andFULL JOINproduces all rows on both sides and is rarely used. E.g.,SELECT mems.firstname AS memfname, mems.surname AS memsname, refs.firstname AS recfname, refs.surname AS recsname FROM cd.members mems LEFT JOIN cd.members refs ON mems.recommendedby = refs.memid ORDER BY mems.surname, mems.firstname; -
What is an aggregate function?
An aggregate function takes all the possible results from a group and performs some kind of calculation on them, returning a scalar (i.e., single) value.
-
List three aggregate functions and what they do.
MAX()takes a group of rows and returns the one that outputs the highest value. E.g., in the following line all the values in thejoindatecolumn on thecd.memberstable are compared and the highest value is returned:SELECT MAX(joindate) AS latest FROM cd.members;-
COUNT()takes a group of rows and returns the total number of them.COUNT(*)returns the total number of rows in a given result set (i.e., afterWHEREandJOINkeywords have filtered it).COUNT(column_name)returns the total number of non-null values in that column.COUNT(DISTINCT column_name)counts the total number of unique non-null values. SUM()takes a column and returns the sum of the values in that column, provided the data type is numeric.
-
What does the
groupstatement do?The
GROUP BYkeywords batch rows into groups, one for each distinct/unique value in the given column or, if more than one is provided, for each unique combination of those columns. E.g.,SELECT department FROM employees GROUP BY department -
How does the
groupstatement relate to aggregates?The
GROUP BYkeywords are particularly suited to allowing aggregates to be executed on groups of rows and then used as column values, so that aggregate values can be displayed alongside the group to which they apply. For example:SELECT names.recommendedby, COUNT(recommendedby) FROM cd.members names WHERE names.recommendedby IS NOT null GROUP BY names.recommendedby ORDER BY names.recommendedbyGROUP BYgroups rows after anyWHEREstatements, but before anyHAVINGstatements, soHAVINGis useful for filtering which groups are retrieved and displayed by a query and it can take aggregates as arguments whileWHEREcannot.