A Comprehensive SQL Notebook for Students Professionals
Generated by Gemini (Reviewed by Alex, Brenda, and Charles)
September 19, 2025
Contents
1 Module 1: SQL Fundamentals (Expert Refresher) 3
1.1 Core Concept 1: Logical Query Processing Order . . . . . . . . . . . . . . . . . . 3
1.2 Core Concept 2: The Power of Aliases . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Syntax in Practice: A Business Question . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Practice Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Module 2: Advanced Sorting and Filtering 5
2.1 Core Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Practice Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Module 3: Aggregation and Grouping 7
3.1 Core Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Practice Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
4 Module 4: The Power of Joins 9
4.1 Visualizing Joins with Venn Diagrams . . . . . . . . . . . . . . . . . . . . . . . . 9
4.2 The Join Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.3 Practice Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5 Module 5: Subqueries & Common Table Expressions (CTEs) 11
5.1 Core Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.2 Syntax in Practice: Chained CTEs . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.3 Practice Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
6 Module 6: Window Functions 13
6.1 Core Concept 1: Row Preservation . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.2 Core Concept 2: The OVER() Clause . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.3 Categories of Window Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.4 Practice Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7 Module 7: Data Manipulation & Definition (DDL & DML) 15
7.1 Core Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.2 DDL - Defining Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.3 DML - Manipulating Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7.4 Practice Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2
1 Module 1: SQL Fundamentals (Expert Refresher)
This module is a high-level review of the foundational concepts. It’s designed to ensure everyone
is on the same page before we tackle more advanced topics.
1.1 Core Concept 1: Logical Query Processing Order
You write a query in one order, but the database executes it in a completely different logical
order. Understanding this is the key to debugging complex queries. The logical execution order
is:
1. FROM and JOINs
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. DISTINCT
7. ORDER BY
8. LIMIT / FETCH FIRST
1.2 Core Concept 2: The Power of Aliases
Column Aliases: Used to make the output headers of your query results more readable.
Table Aliases: Essential for JOINs and subqueries. It is a strong best practice to use them
in any query with more than one table.
1.3 Syntax in Practice: A Business Question
Task: Find the 3 most recently hired employees from the ’Engineering’ department.
1 -- For both MySQL and PostgreSQL
2 SELECT
3 first_name ,
4 last_name ,
5 hire_date
6 FROM
7 employees
8 WHERE
9 department = ’ Engineering ’
10 ORDER BY
11 hire_date DESC
12 LIMIT 3;
3
Tips & Tricks to Remember
• The SELECT * Trap: In production code, it increases network traffic, can break
code if the schema changes, and prevents the database from using certain optimiza-
tions like index-only scans.
• Look Under the Hood with EXPLAIN: The EXPLAIN command asks the database
”How will you run this query?” without actually running it. It’s the most important
tool for understanding query performance. In PostgreSQL, EXPLAIN ANALYZE is
even more powerful as it also runs the query.
• Self-Documenting Queries: Good aliases make queries easier to read for you
and your colleagues six months from now.
1.4 Practice Questions
1. Write a query that returns the full name and the yearly salary of all employees, but
displays the column headers as ”Employee Name” and ”Annual Salary”. Use a table
alias.
Solution:
1 SELECT
2 CONCAT ( e . first_name , ’ ’ , e . last_name ) AS " Employee Name " ,
3 e . salary AS " Annual Salary "
4 FROM
5 employees AS e ;
6
2. From the entire company, find the 5 employees with the lowest salaries.
Solution:
1 SELECT
2 first_name ,
3 last_name ,
4 salary
5 FROM
6 employees
7 ORDER BY
8 salary ASC
9 LIMIT 5;
10
4
2 Module 2: Advanced Sorting and Filtering
Here we move beyond simple WHERE clauses to isolate the exact data you need to answer specific
and complex business questions.
2.1 Core Concepts
• Compound Filtering: Master the logical operators AND, OR, and NOT. Always use
parentheses () to control the order of operations, as AND is evaluated before OR.
• Advanced Predicates: Use IN for checking against a list, BETWEEN for filtering within
a range (inclusive), and LIKE for pattern matching in strings (% matches any sequence,
matches a single character).
• Handling the ”Unknown”: NULL represents a missing or unknown state. You must
use IS NULL or IS NOT NULL to filter for it. You can never use = NULL.
• Multi-Level Sorting: Sorting by multiple columns with ORDER BY col1, col2 sorts by
col1, then uses col2 to break any ties within the col1 groups.
Tips & Tricks to Remember
• The LIKE Performance Trap: A query with a leading wildcard (LIKE ’%text’)
is notoriously slow because it prevents the database from using an index, forcing a
full table scan. A trailing wildcard (LIKE ’text%’) is much faster.
• NULL is Not Equal to Anything: The expression NULL = NULL is not true. Since
NULL means ”unknown,” the database cannot say for certain that one unknown
value is the same as another.
• IN vs. Multiple ORs: Using IN is cleaner, more readable, and often performs
better than stringing together multiple OR conditions.
2.2 Practice Questions
1. Write a query to find all employees who are not in the ’Engineering’ department and were
hired in 2022. The results should be sorted by their hire date, from oldest to newest.
Solution:
1 SELECT
2 first_name ,
3 department ,
4 hire_date
5 FROM
6 employees
7 WHERE
8 department != ’ Engineering ’
9 AND hire_date BETWEEN ’ 2022 -01 -01 ’ AND ’ 2022 -12 -31 ’
10 ORDER BY
11 hire_date ASC ;
12
2. Find all products where the SKU starts with ’ABC’, has exactly three characters in the
middle, and ends with ’-X’.
Solution:
1 SELECT product_sku
2 FROM products
3 WHERE product_sku LIKE ’ABC - ___ - X ’;
5
4
6
3 Module 3: Aggregation and Grouping
This module is the heart of data analysis in SQL, where you learn to calculate meaningful
summaries like totals, averages, and counts.
3.1 Core Concepts
• Aggregate Functions: Functions that take a set of values and return a single summary
value, such as COUNT(), SUM(), AVG(), MIN(), and MAX().
• GROUP BY: Organizes rows into ”buckets” so that aggregate functions can operate on each
bucket individually.
• The Nuances of COUNT:
– COUNT(*): Counts all rows in the group.
– COUNT(column): Counts all non-NULL values in that column.
– COUNT(DISTINCT column): Counts unique non-NULL values.
• HAVING: Filters the results after aggregation. It’s the WHERE clause for your GROUP BY
results.
Tips & Tricks to Remember
• WHERE Filters Rows, HAVING Filters Groups: This is the golden rule. WHERE is
processed before GROUP BY, and HAVING is processed after.
• Filter Early for Performance: If you can filter a row out with WHERE, do it.
Don’t wait to filter it out with HAVING, as that forces the database to aggregate
more data than necessary.
• The SELECT/GROUP BY Consistency Rule: Any non-aggregated column in your
SELECT list must also be in your GROUP BY clause.
3.2 Practice Questions
1. Find the highest and lowest salary for each department.
Solution:
1 SELECT
2 department ,
3 MAX ( salary ) AS max_salary ,
4 MIN ( salary ) AS min_salary
5 FROM
6 employees
7 GROUP BY
8 department ;
9
2. Find the average salary for departments that hired at least one person in 2022. Only show
departments where this average salary is greater than $75,000.
Solution:
1 SELECT
2 department ,
3 ROUND ( AVG ( salary ) , 2) AS average_salary
4 FROM
5 employees
7
6 WHERE
7 hire_date BETWEEN ’ 2022 -01 -01 ’ AND ’ 2022 -12 -31 ’
8 GROUP BY
9 department
10 HAVING
11 AVG ( salary ) > 75000;
12
8
4 Module 4: The Power of Joins
JOINs are the commands we use to connect related tables and create a single, meaningful result
set. A deep understanding of JOINs is arguably the most important skill in SQL.
4.1 Visualizing Joins with Venn Diagrams
The best way to understand JOINs is with Venn diagrams. Imagine the left table is the left
circle and the right table is the right circle.
Figure 1: Venn Diagrams for SQL Joins.
4.2 The Join Types
• INNER JOIN: The intersection. Only rows that have a match in both tables.
9
• LEFT JOIN: The entire left circle. All rows from the left table are returned, along with
any matching data from the right table. If there’s no match, the columns from the right
table will be NULL.
• The LEFT JOIN ... IS NULL Pattern: A classic and critical pattern to find all records
in the left table that do not have a match in the right table.
• SELF JOIN: A pattern where a table is joined to itself (using aliases) to find relationships
within the same dataset (e.g., finding an employee’s manager).
Tips & Tricks to Remember
• Always Use Table Aliases: A non-negotiable best practice for readability and
preventing ”ambiguous column” errors.
• Join on Indexed Keys: For huge performance gains, the columns in your ON
clause should almost always be indexed (Primary/Foreign Keys are indexed by
default).
• The ON vs. WHERE ”Gotcha”: In a LEFT JOIN, putting a filter condition for the
right table in the ON clause filters it before the join. Putting it in the WHERE clause
filters after the join, often turning your LEFT JOIN into an INNER JOIN by mistake.
4.3 Practice Questions
1. You have a products table and a categories table. Write a query to list the names of
all products alongside their category name.
Solution:
1 SELECT
2 p . product_name ,
3 c . category_name
4 FROM
5 products AS p
6 INNER JOIN
7 categories AS c ON p . category_id = c . id ;
8
2. Using the same tables, write a query to find all product categories that do not have any
products associated with them.
Solution:
1 SELECT
2 c . category_name
3 FROM
4 categories AS c
5 LEFT JOIN
6 products AS p ON c . id = p . category_id
7 WHERE
8 p . id IS NULL ;
9
10
5 Module 5: Subqueries & Common Table Expressions (CTEs)
These tools are essential for breaking down complex problems into manageable, readable steps.
5.1 Core Concepts
• Subqueries (Inner Queries): A SELECT statement nested inside another statement.
• Correlated vs. Non-Correlated Subqueries:
– Non-Correlated: The inner query is independent and runs only once. This is
fast.
– Correlated: The inner query depends on the outer query and runs once for every
single row of the outer query. This can be a performance killer.
• Common Table Expressions (CTEs): A temporary, named result set defined using a
WITH clause. CTEs are the modern, preferred way to handle multi-step logic because they
are far more readable.
5.2 Syntax in Practice: Chained CTEs
Question: ”Find our top 2 customers by total revenue.”
1 WITH CustomerTotals AS (
2 -- Step 1: Calculate total revenue for each customer
3 SELECT
4 customer_id ,
5 SUM ( order_value ) AS total_revenue
6 FROM orders
7 GROUP BY customer_id
8 ),
9 CustomerRanks AS (
10 -- Step 2: Use the first CTE to rank customers
11 SELECT
12 customer_id ,
13 total_revenue ,
14 RANK () OVER ( ORDER BY total_revenue DESC ) AS rev_rank
15 FROM CustomerTotals
16 )
17 -- Final Step : Select the top - ranked customers
18 SELECT customer_id , total_revenue
19 FROM CustomerRanks
20 WHERE rev_rank <= 2;
Tips & Tricks to Remember
• Prefer CTEs for Readability: If your logic has more than one step, a CTE is
almost always cleaner than a nested subquery.
• The Correlated Subquery Performance Trap: Be very careful when your
inner query references a column from your outer query. Often, this can be rewritten
as a faster JOIN.
• Subquery vs. JOIN: Many WHERE ... IN (SELECT ...) subqueries can be ex-
pressed as a JOIN. The JOIN is often more performant as the database optimizer
has more ways to efficiently execute it.
5.3 Practice Questions
1. Find all orders placed by customers who have placed more than 5 orders in total.
11
Solution:
1 SELECT *
2 FROM orders
3 WHERE customer_id IN (
4 SELECT customer_id
5 FROM orders
6 GROUP BY customer_id
7 HAVING COUNT (*) > 5
8 );
9
2. Using a CTE, find the department with the highest average employee salary.
Solution:
1 WITH DepartmentAverages AS (
2 SELECT
3 department ,
4 AVG ( salary ) as avg_salary
5 FROM employees
6 GROUP BY department
7 )
8 SELECT department , avg_salary
9 FROM DepartmentAverages
10 ORDER BY avg_salary DESC
11 LIMIT 1;
12
12
6 Module 6: Window Functions
Window functions perform calculations on a set of rows while keeping the detail of each indi-
vidual row, unlocking a new level of analytical power.
6.1 Core Concept 1: Row Preservation
This is the key difference from GROUP BY. Window functions do not collapse rows; they return
a value for every single row based on the defined window.
6.2 Core Concept 2: The OVER() Clause
The OVER() clause is what makes a regular function a window function. It has two key parts:
• PARTITION BY: Divides the rows into groups (the ”window”). This is the window function
equivalent of GROUP BY.
• ORDER BY: Sorts the rows within each partition, which is essential for ranking and posi-
tional functions.
6.3 Categories of Window Functions
1. Ranking Functions: Assign a rank to each row within a partition (RANK(), DENSE RANK(),
ROW NUMBER()).
2. Aggregate Window Functions: Use standard aggregates like SUM(), AVG(), COUNT()
over a partition.
3. Positional Functions: Access data from a previous (LAG()) or following (LEAD()) row.
Tips & Tricks to Remember
• RANK vs. DENSE RANK vs. ROW NUMBER: This shows how each handles ties on a set
of values (100, 90, 90, 80):
– ROW NUMBER(): 1, 2, 3, 4 (Always unique)
– RANK(): 1, 2, 2, 4 (Skips a rank after a tie)
– DENSE RANK(): 1, 2, 2, 3 (Doesn’t skip a rank after a tie)
• When to Use Window Functions vs. GROUP BY: If you need to show both
individual row details and an aggregate value on the same line, you need a window
function. If you only need the final summary, use GROUP BY.
• LAG/LEAD for Time-Series Gold: These are the go-to tools for easily calculating
period-over-period changes, a cornerstone of business analytics.
6.4 Practice Questions
1. Write a query that ranks products by their price within each product category, from most
expensive to least expensive.
Solution:
1 SELECT
2 category ,
3 product_name ,
4 price ,
5 RANK () OVER ( PARTITION BY category ORDER BY price DESC ) as price_rank
6 FROM products ;
7
13
2. You have a table of user activity with user id and login date. Write a query to show
each user’s login date and the date of their previous login.
Solution:
1 SELECT
2 user_id ,
3 login_date ,
4 LAG ( login_date , 1) OVER ( PARTITION BY user_id ORDER BY login_date ) AS
previous_login
5 FROM user_activity ;
6
14
7 Module 7: Data Manipulation & Definition (DDL & DML)
This module moves beyond querying and into the commands that create, modify, and delete
both data and the tables themselves.
7.1 Core Concepts
• DML (Data Manipulation Language): For working with the data inside tables
(INSERT, UPDATE, DELETE).
• DDL (Data Definition Language): For working with the table structures themselves
(CREATE, ALTER, DROP).
7.2 DDL - Defining Structure
CREATE TABLE defines a new table, its columns, data types, and constraints (rules like PRIMARY
KEY, FOREIGN KEY, NOT NULL, UNIQUE, and DEFAULT).
1 CREATE TABLE products (
2 product_id INT PRIMARY KEY ,
3 product_sku VARCHAR (20) UNIQUE NOT NULL ,
4 product_name VARCHAR (100) NOT NULL ,
5 category_id INT ,
6 price DECIMAL (10 , 2) ,
7 in_stock_count INT DEFAULT 0 ,
8 FOREIGN KEY ( category_id ) REFERENCES categories ( id )
9 );
ALTER TABLE modifies a table, and DROP TABLE deletes it completely.
7.3 DML - Manipulating Data
These commands add, change, and remove rows.
1 -- Insert a single row
2 INSERT INTO employees ( emp_id , first_name ) VALUES (106 , ’ Frank ’) ;
3
4 -- Update a specific row
5 UPDATE employees SET salary = 82000 WHERE emp_id = 106;
6
7 -- Delete a specific row
8 DELETE FROM employees WHERE emp_id = 106;
Tips & Tricks to Remember
• Your WHERE Clause is Your Only Friend: If you run an UPDATE or DELETE
command without a WHERE clause, you will change or delete every single row
in the table.
• Transactions are Your Safety Net: Wrap critical operations in a transaction
(BEGIN; ... COMMIT; or ROLLBACK;) to preview changes before making them
permanent.
• DELETE vs. TRUNCATE vs. DROP:
– DELETE: DML, removes rows one by one (slow, can be rolled back).
– TRUNCATE: DDL, instantly deallocates all rows (fast, cannot be rolled back).
– DROP: DDL, removes the entire table and its structure.
15
7.4 Practice Questions
1. CREATE a table named courses with three columns: course id (integer, primary key),
course name (varchar, not null), and credits (integer).
Solution:
1 CREATE TABLE courses (
2 course_id INT PRIMARY KEY ,
3 course_name VARCHAR (100) NOT NULL ,
4 credits INT
5 );
6
2. INSERT a new course: ’Advanced SQL’ with 3 credits and an ID of 101.
Solution:
1 INSERT INTO courses ( course_id , course_name , credits )
2 VALUES (101 , ’ Advanced SQL ’ , 3) ;
3
16