0% found this document useful (0 votes)
77 views47 pages

SQL Interview Questions for Senior Data Engineers

The document contains a comprehensive list of SQL interview questions tailored for senior data engineers with over five years of experience. It covers various topics including window functions, JOIN types, handling NULL values, CTEs, subqueries, and transaction management, providing explanations and SQL examples for each topic. Additionally, it addresses performance optimization techniques and best practices for working with large datasets.

Uploaded by

Jeet Modi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views47 pages

SQL Interview Questions for Senior Data Engineers

The document contains a comprehensive list of SQL interview questions tailored for senior data engineers with over five years of experience. It covers various topics including window functions, JOIN types, handling NULL values, CTEs, subqueries, and transaction management, providing explanations and SQL examples for each topic. Additionally, it addresses performance optimization techniques and best practices for working with large datasets.

Uploaded by

Jeet Modi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SQL Interview Questions for Senior Data Engineer (5+ Years

Experience)
1. What are Window Functions and how do they differ from GROUP BY?
Answer: Window functions perform calculations across a set of rows related to the current row
without collapsing the result set, unlike GROUP BY which reduces rows. They use the OVER() clause
to define the window of rows.

Example:

sql
-- Window Function - keeps all rows
SELECT
employee_id,
department,
salary,
AVG(salary) OVER (PARTITION BY department) as dept_avg_salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as salary_rank
FROM employees;

-- GROUP BY - reduces rows


SELECT
department,
AVG(salary) as dept_avg_salary
FROM employees
GROUP BY department;

2. Explain the difference between RANK(), DENSE_RANK(), and


ROW_NUMBER()
Answer:

ROW_NUMBER(): Assigns unique sequential numbers, no ties

RANK(): Assigns same rank to ties, skips subsequent ranks

DENSE_RANK(): Assigns same rank to ties, doesn't skip ranks


Example:

sql

SELECT
name,
score,
ROW_NUMBER() OVER (ORDER BY score DESC) as row_num,
RANK() OVER (ORDER BY score DESC) as rank_val,
DENSE_RANK() OVER (ORDER BY score DESC) as dense_rank_val
FROM students;

-- Results:
-- Alice, 95, 1, 1, 1
-- Bob, 90, 2, 2, 2
-- Charlie, 90, 3, 2, 2 -- Same score as Bob
-- David, 85, 4, 4, 3 -- RANK skips 3, DENSE_RANK doesn't

3. How do you handle NULL values in SQL? Explain COALESCE vs ISNULL


Answer:

COALESCE: ANSI standard, accepts multiple parameters, returns first non-null value

ISNULL: SQL Server specific, accepts only 2 parameters

Example:
sql

-- COALESCE (works across databases)


SELECT
customer_id,
COALESCE(mobile_phone, home_phone, work_phone, 'No Phone') as contact_phone
FROM customers;

-- ISNULL (SQL Server specific)


SELECT
customer_id,
ISNULL(mobile_phone, 'No Phone') as contact_phone
FROM customers;

-- NULL handling in calculations


SELECT
product_id,
price,
discount,
price * (1 - COALESCE(discount, 0)) as final_price
FROM products;

4. What are CTEs (Common Table Expressions) and when would you use
them?
Answer: CTEs are temporary result sets that exist only during query execution. They improve
readability, enable recursion, and can be referenced multiple times within the same query.

Example:

sql
-- Basic CTE
WITH high_performers AS (
SELECT
employee_id,
department,
salary,
performance_score
FROM employees
WHERE performance_score > 8.5
),
dept_stats AS (
SELECT
department,
AVG(salary) as avg_salary,
COUNT(*) as emp_count
FROM high_performers
GROUP BY department
)
SELECT
hp.employee_id,
[Link],
[Link],
ds.avg_salary,
[Link] - ds.avg_salary as salary_diff
FROM high_performers hp
JOIN dept_stats ds ON [Link] = [Link];
-- Recursive CTE for hierarchical data
WITH employee_hierarchy AS (
-- Anchor member
SELECT employee_id, manager_id, name, 1 as level
FROM employees
WHERE manager_id IS NULL

UNION ALL

-- Recursive member
SELECT e.employee_id, e.manager_id, [Link], [Link] + 1
FROM employees e
JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM employee_hierarchy;

5. Explain different types of JOINs with examples


Answer: JOINs combine rows from multiple tables based on related columns.

Example:

sql
-- Sample tables
CREATE TABLE customers (id INT, name VARCHAR(50));
CREATE TABLE orders (id INT, customer_id INT, amount DECIMAL(10,2));

-- INNER JOIN - only matching records


SELECT [Link], [Link]
FROM customers c
INNER JOIN orders o ON [Link] = o.customer_id;

-- LEFT JOIN - all records from left table


SELECT [Link], COALESCE([Link], 0) as amount
FROM customers c
LEFT JOIN orders o ON [Link] = o.customer_id;

-- RIGHT JOIN - all records from right table


SELECT [Link], [Link]
FROM customers c
RIGHT JOIN orders o ON [Link] = o.customer_id;

-- FULL OUTER JOIN - all records from both tables


SELECT [Link], [Link]
FROM customers c
FULL OUTER JOIN orders o ON [Link] = o.customer_id;

-- CROSS JOIN - cartesian product


SELECT [Link], p.product_name
FROM customers c
CROSS JOIN products p;

6. What is the difference between HAVING and WHERE clauses?


Answer:

WHERE: Filters rows before grouping, cannot use aggregate functions

HAVING: Filters groups after grouping, can use aggregate functions

Example:

sql

-- WHERE filters individual rows


SELECT
department,
COUNT(*) as emp_count,
AVG(salary) as avg_salary
FROM employees
WHERE salary > 50000 -- Filters rows before grouping
GROUP BY department
HAVING COUNT(*) > 5; -- Filters groups after grouping

-- This would be WRONG:


-- WHERE AVG(salary) > 60000 -- ERROR: Cannot use aggregate in WHERE
7. How do you find duplicate records and remove them?
Answer: Use window functions or GROUP BY to identify duplicates, then use appropriate method
to remove them.

Example:

sql
-- Find duplicates
SELECT
email,
COUNT(*) as duplicate_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

-- Remove duplicates using ROW_NUMBER()


WITH duplicate_cte AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_date DESC) as rn
FROM users
)
DELETE FROM duplicate_cte WHERE rn > 1;

-- Alternative: Keep only unique records


WITH unique_users AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_date DESC) as rn
FROM users
)
SELECT * FROM unique_users WHERE rn = 1;
8. Explain UNION vs UNION ALL
Answer:

UNION: Combines results and removes duplicates (slower)

UNION ALL: Combines results keeping duplicates (faster)

Example:

sql
-- UNION removes duplicates
SELECT customer_id, 'Premium' as customer_type FROM premium_customers
UNION
SELECT customer_id, 'Standard' as customer_type FROM standard_customers;

-- UNION ALL keeps duplicates (faster)


SELECT customer_id, 'Premium' as customer_type FROM premium_customers
UNION ALL
SELECT customer_id, 'Standard' as customer_type FROM standard_customers;

-- Performance consideration
SELECT 'Q1' as quarter, SUM(sales) FROM sales_q1
UNION ALL -- Use when you know there are no duplicates
SELECT 'Q2' as quarter, SUM(sales) FROM sales_q2
UNION ALL
SELECT 'Q3' as quarter, SUM(sales) FROM sales_q3
UNION ALL
SELECT 'Q4' as quarter, SUM(sales) FROM sales_q4;

9. How do you perform pagination in SQL?


Answer: Use OFFSET and FETCH (SQL Server) or LIMIT with OFFSET (MySQL/PostgreSQL) for
pagination.

Example:
sql

-- SQL Server pagination


DECLARE @PageSize INT = 10;
DECLARE @PageNumber INT = 3;

SELECT
product_id,
product_name,
price
FROM products
ORDER BY product_name
OFFSET (@PageNumber - 1) * @PageSize ROWS
FETCH NEXT @PageSize ROWS ONLY;

-- MySQL/PostgreSQL pagination
SELECT
product_id,
product_name,
price
FROM products
ORDER BY product_name
LIMIT 10 OFFSET 20; -- Page 3, 10 records per page

-- Performance tip: Use indexed columns for ORDER BY


CREATE INDEX idx_products_name ON products(product_name);
10. What are indexes and how do they improve query performance?
Answer: Indexes are data structures that improve query performance by providing fast access
paths to data, similar to a book's index.

Example:

sql
-- Create indexes for common query patterns
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
CREATE INDEX idx_products_category ON products(category);

-- Query that benefits from index


SELECT order_id, total_amount
FROM orders
WHERE customer_id = 123
AND order_date >= '2024-01-01'
ORDER BY order_date DESC;

-- Composite index usage


CREATE INDEX idx_sales_region_date_product ON sales(region, sale_date, product_id);

-- This query uses the composite index efficiently


SELECT SUM(amount)
FROM sales
WHERE region = 'North'
AND sale_date BETWEEN '2024-01-01' AND '2024-12-31'
AND product_id IN (1, 2, 3);

11. Explain the concept of SQL execution plan and how to optimize
queries
Answer: Execution plans show how SQL Server executes a query, helping identify performance
bottlenecks.
Example:

sql
-- View execution plan
SET SHOWPLAN_ALL ON;
SELECT * FROM orders WHERE customer_id = 123;

-- Or use EXPLAIN in other databases


EXPLAIN SELECT * FROM orders WHERE customer_id = 123;

-- Optimization techniques
-- 1. Use appropriate indexes
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- 2. Avoid SELECT *
SELECT order_id, order_date, total_amount
FROM orders
WHERE customer_id = 123;

-- 3. Use WHERE clauses to filter early


SELECT o.order_id, c.customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date >= '2024-01-01' -- Filter before join
AND [Link] = 'Active';

-- 4. Use EXISTS instead of IN for subqueries


SELECT customer_id, customer_name
FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.customer_id
);

12. How do you handle hierarchical data in SQL?


Answer: Use recursive CTEs, adjacency lists, or nested set models to handle hierarchical data.

Example:

sql
-- Recursive CTE for organization hierarchy
WITH org_hierarchy AS (
-- Find all top-level managers
SELECT
employee_id,
employee_name,
manager_id,
1 as level,
CAST(employee_name AS VARCHAR(1000)) as hierarchy_path
FROM employees
WHERE manager_id IS NULL

UNION ALL

-- Find all subordinates


SELECT
e.employee_id,
e.employee_name,
e.manager_id,
[Link] + 1,
CAST(oh.hierarchy_path + ' -> ' + e.employee_name AS VARCHAR(1000))
FROM employees e
JOIN org_hierarchy oh ON e.manager_id = oh.employee_id
)
SELECT
employee_id,
employee_name,
level,
hierarchy_path
FROM org_hierarchy
ORDER BY level, employee_name;

-- Find all subordinates of a specific manager


WITH subordinates AS (
SELECT employee_id, employee_name, manager_id
FROM employees
WHERE employee_id = 5 -- Starting manager

UNION ALL

SELECT e.employee_id, e.employee_name, e.manager_id


FROM employees e
JOIN subordinates s ON e.manager_id = s.employee_id
)
SELECT * FROM subordinates;

13. What are the different types of subqueries?


Answer: Subqueries can be correlated/non-correlated, and can return single values, multiple
values, or tables.

Example:
sql
-- Scalar subquery (returns single value)
SELECT
product_name,
price,
price - (SELECT AVG(price) FROM products) as price_diff_from_avg
FROM products;

-- Multiple-row subquery with IN


SELECT customer_name
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM orders
WHERE order_date >= '2024-01-01'
);

-- Correlated subquery
SELECT
c.customer_name,
(SELECT COUNT(*)
FROM orders o
WHERE o.customer_id = c.customer_id) as order_count
FROM customers c;

-- EXISTS subquery (efficient for large datasets)


SELECT customer_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
AND o.order_date >= '2024-01-01'
);

-- Table subquery in FROM clause


SELECT
monthly_sales.month,
monthly_sales.total_sales,
monthly_sales.total_sales / yearly_total.total as percentage
FROM (
SELECT
MONTH(order_date) as month,
SUM(total_amount) as total_sales
FROM orders
WHERE YEAR(order_date) = 2024
GROUP BY MONTH(order_date)
) monthly_sales
CROSS JOIN (
SELECT SUM(total_amount) as total
FROM orders
WHERE YEAR(order_date) = 2024
) yearly_total;
14. How do you handle date and time operations in SQL?
Answer: Use built-in date functions for manipulation, formatting, and calculations.

Example:

sql
-- Date arithmetic and functions
SELECT
order_date,
YEAR(order_date) as order_year,
MONTH(order_date) as order_month,
DATEPART(week, order_date) as week_number,
DATEDIFF(day, order_date, GETDATE()) as days_since_order,
DATEADD(day, 30, order_date) as due_date,
FORMAT(order_date, 'yyyy-MM-dd') as formatted_date
FROM orders;

-- Date range queries


SELECT *
FROM sales
WHERE sale_date >= DATEADD(month, -3, GETDATE()) -- Last 3 months
AND sale_date < CAST(GETDATE() AS DATE) + 1; -- Before tomorrow

-- First and last day of month


SELECT
DATEFROMPARTS(YEAR(GETDATE()), MONTH(GETDATE()), 1) as first_day_month,
EOMONTH(GETDATE()) as last_day_month;

-- Time zone handling


SELECT
order_date,
order_date AT TIME ZONE 'UTC' AT TIME ZONE 'Eastern Standard Time' as est_time
FROM orders;

-- Age calculation
SELECT
employee_name,
birth_date,
DATEDIFF(year, birth_date, GETDATE()) as age,
CASE
WHEN DATEADD(year, DATEDIFF(year, birth_date, GETDATE()), birth_date) > GETDATE()
THEN DATEDIFF(year, birth_date, GETDATE()) - 1
ELSE DATEDIFF(year, birth_date, GETDATE())
END as accurate_age
FROM employees;

15. What are transactions and how do you handle them in SQL?
Answer: Transactions ensure data consistency by grouping multiple operations that must succeed
or fail together.

Example:

sql
-- Basic transaction
BEGIN TRANSACTION;

BEGIN TRY
UPDATE accounts SET balance = balance - 1000 WHERE account_id = 1;
UPDATE accounts SET balance = balance + 1000 WHERE account_id = 2;

INSERT INTO transaction_log (from_account, to_account, amount, transaction_date)


VALUES (1, 2, 1000, GETDATE());

COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
THROW;
END CATCH;

-- Transaction with savepoints


BEGIN TRANSACTION;

INSERT INTO orders (customer_id, order_date) VALUES (1, GETDATE());


SAVE TRANSACTION order_created;

BEGIN TRY
INSERT INTO order_items (order_id, product_id, quantity)
VALUES (@@IDENTITY, 1, 5);
UPDATE products SET stock_quantity = stock_quantity - 5 WHERE product_id = 1;

COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION order_created;
-- Order remains but items are rolled back
COMMIT TRANSACTION;
END CATCH;

-- Isolation levels
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- or READ UNCOMMITTED, REPEATABLE READ, SERIALIZABLE

16. How do you optimize queries for large datasets?


Answer: Use indexing, partitioning, query optimization techniques, and appropriate data types.

Example:

sql
-- Partitioning for large tables
CREATE PARTITION FUNCTION pf_sales_date (DATE)
AS RANGE RIGHT FOR VALUES ('2023-01-01', '2024-01-01', '2025-01-01');

CREATE PARTITION SCHEME ps_sales_date


AS PARTITION pf_sales_date TO (fg1, fg2, fg3, fg4);

CREATE TABLE sales_partitioned (


sale_id INT IDENTITY(1,1),
sale_date DATE,
amount DECIMAL(10,2),
customer_id INT
) ON ps_sales_date(sale_date);

-- Efficient pagination for large datasets


WITH ordered_results AS (
SELECT
ROW_NUMBER() OVER (ORDER BY sale_date DESC) as rn,
sale_id,
sale_date,
amount
FROM sales
WHERE sale_date >= '2024-01-01'
)
SELECT sale_id, sale_date, amount
FROM ordered_results
WHERE rn BETWEEN 10001 AND 10100; -- Page 1001, 100 records per page

-- Use appropriate data types


CREATE TABLE optimized_sales (
sale_id INT, -- Instead of BIGINT if not needed
sale_date DATE, -- Instead of DATETIME if time not needed
amount DECIMAL(10,2), -- Specific precision
customer_id INT,
status TINYINT, -- Instead of VARCHAR for status codes
notes VARCHAR(500) -- Limited length instead of TEXT
);

-- Covering indexes for query optimization


CREATE INDEX idx_sales_covering
ON sales (customer_id, sale_date)
INCLUDE (amount, product_id);

-- Query that uses covering index


SELECT customer_id, sale_date, amount, product_id
FROM sales
WHERE customer_id = 123
AND sale_date >= '2024-01-01'
ORDER BY sale_date;

17. Explain different types of database constraints


Answer: Constraints enforce data integrity rules at the database level.
Example:

sql
-- Primary Key constraint
CREATE TABLE customers (
customer_id INT IDENTITY(1,1) PRIMARY KEY,
customer_name VARCHAR(100) NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
phone VARCHAR(20),
created_date DATE DEFAULT GETDATE()
);

-- Foreign Key constraint


CREATE TABLE orders (
order_id INT IDENTITY(1,1) PRIMARY KEY,
customer_id INT,
order_date DATE NOT NULL,
total_amount DECIMAL(10,2),
CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
ON DELETE CASCADE ON UPDATE CASCADE
);

-- Check constraints
ALTER TABLE orders
ADD CONSTRAINT chk_positive_amount CHECK (total_amount > 0);

ALTER TABLE customers


ADD CONSTRAINT chk_valid_email CHECK (email LIKE '%@%.%');
-- Unique constraint on multiple columns
ALTER TABLE order_items
ADD CONSTRAINT uk_order_product UNIQUE (order_id, product_id);

-- Custom constraint with function


CREATE FUNCTION fn_validate_age(@birth_date DATE)
RETURNS BIT
AS
BEGIN
DECLARE @result BIT = 0;
IF DATEDIFF(year, @birth_date, GETDATE()) >= 18
SET @result = 1;
RETURN @result;
END;

ALTER TABLE employees


ADD CONSTRAINT chk_adult_age CHECK (dbo.fn_validate_age(birth_date) = 1);

18. How do you handle JSON data in SQL?


Answer: Modern SQL databases provide functions to parse, query, and manipulate JSON data.

Example:

sql
-- JSON data storage and querying (SQL Server)
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
specifications NVARCHAR(MAX) CHECK (ISJSON(specifications) = 1)
);

INSERT INTO products VALUES


(1, 'Laptop', '{"brand": "Dell", "ram": "16GB", "storage": "512GB SSD", "features": ["WiFi", "Bluetooth", "USB-C"]}'),
(2, 'Phone', '{"brand": "Samsung", "storage": "128GB", "camera": "64MP", "features": ["5G", "Wireless Charging"]}');

-- Query JSON data


SELECT
product_id,
product_name,
JSON_VALUE(specifications, '$.brand') as brand,
JSON_VALUE(specifications, '$.ram') as ram,
JSON_VALUE(specifications, '$.storage') as storage
FROM products;

-- Query JSON arrays


SELECT
product_id,
product_name,
[Link] as feature
FROM products
CROSS APPLY OPENJSON(specifications, '$.features') feature;

-- Update JSON data


UPDATE products
SET specifications = JSON_MODIFY(specifications, '$.price', 999.99)
WHERE product_id = 1;

-- Complex JSON queries


SELECT
product_id,
product_name
FROM products
WHERE JSON_VALUE(specifications, '$.brand') = 'Dell'
AND JSON_VALUE(specifications, '$.ram') LIKE '%16%'
AND JSON_QUERY(specifications, '$.features') LIKE '%WiFi%';

 

19. What are stored procedures and functions? When would you use
each?
Answer:

Stored Procedures: Can perform multiple operations, modify data, no return value
requirement
Functions: Must return a value, cannot modify data, can be used in SELECT statements
Example:

sql
-- Stored Procedure
CREATE PROCEDURE sp_process_order
@customer_id INT,
@product_id INT,
@quantity INT,
@order_id INT OUTPUT
AS
BEGIN
SET NOCOUNT ON;

BEGIN TRANSACTION;

BEGIN TRY
-- Insert order
INSERT INTO orders (customer_id, order_date, status)
VALUES (@customer_id, GETDATE(), 'Pending');

SET @order_id = SCOPE_IDENTITY();

-- Insert order items


INSERT INTO order_items (order_id, product_id, quantity, unit_price)
SELECT @order_id, @product_id, @quantity, price
FROM products WHERE product_id = @product_id;

-- Update inventory
UPDATE products
SET stock_quantity = stock_quantity - @quantity
WHERE product_id = @product_id;

COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
THROW;
END CATCH;
END;

-- Usage
DECLARE @new_order_id INT;
EXEC sp_process_order
@customer_id = 1,
@product_id = 5,
@quantity = 2,
@order_id = @new_order_id OUTPUT;

-- Function
CREATE FUNCTION fn_calculate_age(@birth_date DATE)
RETURNS INT
AS
BEGIN
DECLARE @age INT;
SET @age = DATEDIFF(year, @birth_date, GETDATE());
-- Adjust for birthday not yet occurred this year
IF (MONTH(@birth_date) > MONTH(GETDATE()) OR
(MONTH(@birth_date) = MONTH(GETDATE()) AND DAY(@birth_date) > DAY(GETDATE())))
SET @age = @age - 1;

RETURN @age;
END;

-- Usage in SELECT
SELECT
employee_name,
birth_date,
dbo.fn_calculate_age(birth_date) as age
FROM employees;

-- Table-valued function
CREATE FUNCTION fn_get_customer_orders(@customer_id INT)
RETURNS TABLE
AS
RETURN (
SELECT
o.order_id,
o.order_date,
o.total_amount,
oi.product_id,
[Link]
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.customer_id = @customer_id
);

-- Usage
SELECT * FROM dbo.fn_get_customer_orders(123);

20. How do you handle recursive queries and what are their performance
considerations?
Answer: Recursive queries use CTEs to traverse hierarchical data. Performance depends on data
structure and termination conditions.

Example:

sql
-- Employee hierarchy with performance optimization
WITH employee_hierarchy AS (
-- Anchor: Find top-level managers
SELECT
employee_id,
manager_id,
employee_name,
salary,
1 as level,
CAST(employee_id AS VARCHAR(MAX)) as path
FROM employees
WHERE manager_id IS NULL

UNION ALL

-- Recursive: Find direct reports


SELECT
e.employee_id,
e.manager_id,
e.employee_name,
[Link],
[Link] + 1,
[Link] + ',' + CAST(e.employee_id AS VARCHAR(MAX))
FROM employees e
INNER JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
WHERE [Link] < 10 -- Prevent infinite recursion
)
SELECT
employee_id,
employee_name,
level,
path,
salary
FROM employee_hierarchy
ORDER BY level, employee_name
OPTION (MAXRECURSION 10); -- Limit recursion depth

-- Bill of Materials (BOM) explosion


WITH bom_explosion AS (
-- Anchor: Top-level products
SELECT
product_id,
component_id,
quantity_needed,
1 as level,
CAST(product_id AS VARCHAR(100)) as product_path
FROM bill_of_materials
WHERE product_id = 1001 -- Specific product

UNION ALL

-- Recursive: Sub-components
SELECT
bom.product_id,
bom.component_id,
bom.quantity_needed * be.quantity_needed as quantity_needed,
[Link] + 1,
be.product_path + '->' + CAST(bom.component_id AS VARCHAR(100))
FROM bill_of_materials bom
INNER JOIN bom_explosion be ON bom.product_id = be.component_id
WHERE [Link] < 5
)
SELECT
be.component_id,
p.component_name,
SUM(be.quantity_needed) as total_quantity_needed,
[Link],
be.product_path
FROM bom_explosion be
JOIN products p ON be.component_id = p.product_id
GROUP BY be.component_id, p.component_name, [Link], be.product_path
ORDER BY [Link], be.component_id;

-- Performance optimization for recursive queries


-- 1. Create appropriate indexes
CREATE INDEX idx_employees_manager ON employees(manager_id);
CREATE INDEX idx_bom_product ON bill_of_materials(product_id);

-- 2. Use OPTION (MAXRECURSION) to prevent runaway queries


-- 3. Add level limits in WHERE clause
-- 4. Consider materialized path approach for frequently queried hierarchies

-- Alternative: Materialized path approach


ALTER TABLE employees ADD hierarchy_path VARCHAR(500);

-- Update hierarchy path (one-time or scheduled)


WITH hierarchy_update AS (
SELECT
employee_id,
manager_id,
employee_name,
CAST(employee_id AS VARCHAR(500)) as path
FROM employees
WHERE manager_id IS NULL

UNION ALL

SELECT
e.employee_id,
e.manager_id,
e.employee_name,
[Link] + '/' + CAST(e.employee_id AS VARCHAR(500))
FROM employees e
INNER JOIN hierarchy_update hu ON e.manager_id = hu.employee_id
)
UPDATE e
SET hierarchy_path = [Link]
FROM employees e
INNER JOIN hierarchy_update hu ON e.employee_id = hu.employee_id;

-- Fast hierarchical queries using materialized path


SELECT * FROM employees
WHERE hierarchy_path LIKE '1/%' -- All subordinates of employee 1
ORDER BY hierarchy_path;

Performance Tips for SQL Interviews:


1. Always consider indexing when discussing query optimization

2. Use appropriate data types to minimize storage and improve performance


3. **Avoid SELECT *** in production queries
4. Use EXISTS instead of IN for subqueries when possible

5. Consider query execution plans when optimizing


6. Use UNION ALL instead of UNION when duplicates are acceptable

7. Implement proper error handling in stored procedures


8. Use parameters to prevent SQL injection

9. Consider partitioning for very large tables


10. Use appropriate isolation levels for transactions

These questions cover the essential SQL concepts for a senior data engineer role, focusing on
practical applications and performance considerations that are crucial in real-world scenarios.

You might also like