SQL Joins and Subqueries – Complete Guide for DP-900
Why SQL Joins and Subqueries Matter
Relational databases store data across multiple tables to reduce redundancy and maintain data integrity — a principle known as normalization. Because related information is spread across separate tables, you need reliable mechanisms to combine and filter that data when you query it. This is precisely where SQL Joins and Subqueries come in. Understanding them is essential not only for real-world database work but also for the Microsoft DP-900: Azure Data Fundamentals exam, which tests your ability to describe core data concepts and how relational data is queried on Azure.
What Are SQL Joins?
A JOIN is a SQL clause that combines rows from two or more tables based on a related column (usually a primary key–foreign key relationship). Joins let you retrieve meaningful, combined datasets in a single query.
Types of Joins You Should Know
1. INNER JOIN
Returns only the rows where there is a match in both tables.
Example: Retrieve customers and their orders — only customers who have placed at least one order will appear.
2. LEFT (OUTER) JOIN
Returns all rows from the left table and the matching rows from the right table. If there is no match, NULL values are returned for the right table's columns.
Example: Retrieve all customers and their orders — customers without orders will still appear, with NULL in the order columns.
3. RIGHT (OUTER) JOIN
Returns all rows from the right table and the matching rows from the left table. If there is no match, NULL values are returned for the left table's columns.
4. FULL (OUTER) JOIN
Returns all rows from both tables. Where there is no match, NULLs fill the missing side.
5. CROSS JOIN
Returns the Cartesian product — every row from the first table paired with every row from the second table. No join condition is specified. This can produce very large result sets.
6. SELF JOIN
A table is joined with itself. This is useful for hierarchical data, such as finding an employee's manager when both are stored in the same table.
How Joins Work — Step by Step
1. The query engine identifies the tables listed in the FROM and JOIN clauses.
2. It evaluates the ON condition to find matching rows between the tables.
3. Depending on the join type (INNER, LEFT, RIGHT, FULL), it decides which unmatched rows to include or exclude.
4. The combined result set is then filtered further by any WHERE, GROUP BY, HAVING, or ORDER BY clauses.
Example Syntax:
SELECT c.CustomerName, o.OrderDate
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID;
This returns only customers who have matching orders.
What Are Subqueries?
A subquery (also called an inner query or nested query) is a query embedded inside another SQL statement. The outer query uses the result of the subquery to complete its own operation.
Types of Subqueries
1. Scalar Subquery
Returns a single value (one row, one column). Often used in SELECT lists or WHERE clauses with comparison operators (=, >, <).
Example: SELECT ProductName FROM Products WHERE Price > (SELECT AVG(Price) FROM Products);
2. Multi-Row Subquery
Returns multiple rows. Used with operators like IN, ANY, or ALL.
Example: SELECT CustomerName FROM Customers WHERE CustomerID IN (SELECT CustomerID FROM Orders WHERE OrderDate > '2024-01-01');
3. Correlated Subquery
References a column from the outer query, so it is executed once for each row processed by the outer query. These can be less efficient but are very powerful.
Example: SELECT e.EmployeeName FROM Employees e WHERE e.Salary > (SELECT AVG(e2.Salary) FROM Employees e2 WHERE e2.DepartmentID = e.DepartmentID);
4. Subquery in FROM clause (Derived Table)
The subquery acts as a temporary table.
Example: SELECT DeptAvg.DepartmentID, DeptAvg.AvgSalary FROM (SELECT DepartmentID, AVG(Salary) AS AvgSalary FROM Employees GROUP BY DepartmentID) AS DeptAvg;
Joins vs. Subqueries — Key Differences
- Joins combine columns from multiple tables side by side into one result set. They are generally more efficient for combining data.
- Subqueries return a result set that the outer query uses for filtering or computation. They are ideal for conditions that depend on aggregated or derived data.
- Many subqueries can be rewritten as joins and vice versa, but clarity and performance may favor one approach over the other.
- The DP-900 exam expects you to understand the conceptual difference — joins merge tables, subqueries nest queries.
How These Concepts Apply on Azure
On the Azure platform, SQL Joins and Subqueries are used across multiple relational services:
- Azure SQL Database — Fully managed relational database supporting T-SQL, including all join types and subqueries.
- Azure SQL Managed Instance — Near-100% SQL Server compatibility, supporting complex joins and nested queries.
- Azure Synapse Analytics (SQL Pool) — Uses T-SQL for analytical queries that frequently involve multi-table joins on large datasets.
- Azure Database for MySQL / PostgreSQL — These managed services also support standard SQL joins and subqueries using their respective SQL dialects.
Understanding joins and subqueries is fundamental to writing effective queries in any of these services.
Exam Tips: Answering Questions on SQL Joins and Subqueries
Tip 1: Know the Visual Model
Many exam questions use Venn diagram-style illustrations. Remember: INNER JOIN = overlapping center; LEFT JOIN = entire left circle plus overlap; FULL JOIN = both circles entirely.
Tip 2: Focus on INNER JOIN and LEFT JOIN
These are the most commonly tested join types on DP-900. Be absolutely clear on the difference — INNER excludes unmatched rows; LEFT includes all rows from the left table regardless of match.
Tip 3: Understand NULLs in Outer Joins
When a row from one table has no match in the other, the missing columns are filled with NULL. If a question asks what value appears for a non-matching row, the answer is NULL.
Tip 4: Recognize Subquery Patterns
If a question describes filtering data based on an aggregate (e.g., 'products with a price above average'), the answer likely involves a subquery. Look for keywords like nested query or inner query.
Tip 5: Know When to Use IN vs. = with Subqueries
Use = when the subquery returns exactly one value. Use IN when the subquery returns multiple values. Exam questions may test whether you can identify the correct operator.
Tip 6: Don't Confuse Joins with UNION
A JOIN combines columns from different tables (horizontal combination). A UNION combines rows from different queries (vertical combination). This distinction appears frequently on the exam.
Tip 7: Read the Scenario Carefully
DP-900 questions often present a business scenario and ask which type of query would produce the desired result. Pay close attention to whether all records from one table are needed (LEFT/RIGHT JOIN) or only matching records (INNER JOIN).
Tip 8: Remember the ON Clause
Joins require an ON clause specifying the relationship between tables (except CROSS JOIN). If a question shows a join without an ON clause and it is not a CROSS JOIN, it is likely incorrect syntax.
Tip 9: Performance is Secondary for DP-900
The DP-900 exam is a fundamentals exam. You are unlikely to be tested on query optimization. Focus on conceptual understanding rather than execution plans or index strategies.
Tip 10: Practice with Simple Examples
Create mental models with small tables — Customers (3 rows) and Orders (4 rows) — and trace through what each join type returns. This builds the intuition needed to answer quickly under exam pressure.
Summary
SQL Joins and Subqueries are foundational concepts for working with relational data on Azure. Joins allow you to combine data from multiple tables using related keys, while subqueries let you nest one query inside another for filtering and computation. For the DP-900 exam, focus on understanding the different join types (especially INNER and LEFT), recognizing when a subquery is appropriate, and distinguishing joins from other SQL operations like UNION. Mastering these concepts will help you confidently tackle relational data questions on the exam and build a strong foundation for more advanced Azure data certifications.