SQL Interview Questions for Senior Data Engineer (5+ Years
Experience)
1. What are Window Functions and how do they differ from GROUP BY?
Answer: Window functions perform calculations across a set of rows related to the current row
without collapsing the result set, unlike GROUP BY which reduces rows. They use the OVER() clause
to define the window of rows.
Example:
sql
-- Window Function - keeps all rows
SELECT
employee_id,
department,
salary,
AVG(salary) OVER (PARTITION BY department) as dept_avg_salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) as salary_rank
FROM employees;
-- GROUP BY - reduces rows
SELECT
department,
AVG(salary) as dept_avg_salary
FROM employees
GROUP BY department;
2. Explain the difference between RANK(), DENSE_RANK(), and
ROW_NUMBER()
Answer:
ROW_NUMBER(): Assigns unique sequential numbers, no ties
RANK(): Assigns same rank to ties, skips subsequent ranks
DENSE_RANK(): Assigns same rank to ties, doesn't skip ranks
Example:
sql
SELECT
name,
score,
ROW_NUMBER() OVER (ORDER BY score DESC) as row_num,
RANK() OVER (ORDER BY score DESC) as rank_val,
DENSE_RANK() OVER (ORDER BY score DESC) as dense_rank_val
FROM students;
-- Results:
-- Alice, 95, 1, 1, 1
-- Bob, 90, 2, 2, 2
-- Charlie, 90, 3, 2, 2 -- Same score as Bob
-- David, 85, 4, 4, 3 -- RANK skips 3, DENSE_RANK doesn't
3. How do you handle NULL values in SQL? Explain COALESCE vs ISNULL
Answer:
COALESCE: ANSI standard, accepts multiple parameters, returns first non-null value
ISNULL: SQL Server specific, accepts only 2 parameters
Example:
sql
-- COALESCE (works across databases)
SELECT
customer_id,
COALESCE(mobile_phone, home_phone, work_phone, 'No Phone') as contact_phone
FROM customers;
-- ISNULL (SQL Server specific)
SELECT
customer_id,
ISNULL(mobile_phone, 'No Phone') as contact_phone
FROM customers;
-- NULL handling in calculations
SELECT
product_id,
price,
discount,
price * (1 - COALESCE(discount, 0)) as final_price
FROM products;
4. What are CTEs (Common Table Expressions) and when would you use
them?
Answer: CTEs are temporary result sets that exist only during query execution. They improve
readability, enable recursion, and can be referenced multiple times within the same query.
Example:
sql
-- Basic CTE
WITH high_performers AS (
SELECT
employee_id,
department,
salary,
performance_score
FROM employees
WHERE performance_score > 8.5
),
dept_stats AS (
SELECT
department,
AVG(salary) as avg_salary,
COUNT(*) as emp_count
FROM high_performers
GROUP BY department
)
SELECT
hp.employee_id,
[Link],
[Link],
ds.avg_salary,
[Link] - ds.avg_salary as salary_diff
FROM high_performers hp
JOIN dept_stats ds ON [Link] = [Link];
-- Recursive CTE for hierarchical data
WITH employee_hierarchy AS (
-- Anchor member
SELECT employee_id, manager_id, name, 1 as level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive member
SELECT e.employee_id, e.manager_id, [Link], [Link] + 1
FROM employees e
JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM employee_hierarchy;
5. Explain different types of JOINs with examples
Answer: JOINs combine rows from multiple tables based on related columns.
Example:
sql
-- Sample tables
CREATE TABLE customers (id INT, name VARCHAR(50));
CREATE TABLE orders (id INT, customer_id INT, amount DECIMAL(10,2));
-- INNER JOIN - only matching records
SELECT [Link], [Link]
FROM customers c
INNER JOIN orders o ON [Link] = o.customer_id;
-- LEFT JOIN - all records from left table
SELECT [Link], COALESCE([Link], 0) as amount
FROM customers c
LEFT JOIN orders o ON [Link] = o.customer_id;
-- RIGHT JOIN - all records from right table
SELECT [Link], [Link]
FROM customers c
RIGHT JOIN orders o ON [Link] = o.customer_id;
-- FULL OUTER JOIN - all records from both tables
SELECT [Link], [Link]
FROM customers c
FULL OUTER JOIN orders o ON [Link] = o.customer_id;
-- CROSS JOIN - cartesian product
SELECT [Link], p.product_name
FROM customers c
CROSS JOIN products p;
6. What is the difference between HAVING and WHERE clauses?
Answer:
WHERE: Filters rows before grouping, cannot use aggregate functions
HAVING: Filters groups after grouping, can use aggregate functions
Example:
sql
-- WHERE filters individual rows
SELECT
department,
COUNT(*) as emp_count,
AVG(salary) as avg_salary
FROM employees
WHERE salary > 50000 -- Filters rows before grouping
GROUP BY department
HAVING COUNT(*) > 5; -- Filters groups after grouping
-- This would be WRONG:
-- WHERE AVG(salary) > 60000 -- ERROR: Cannot use aggregate in WHERE
7. How do you find duplicate records and remove them?
Answer: Use window functions or GROUP BY to identify duplicates, then use appropriate method
to remove them.
Example:
sql
-- Find duplicates
SELECT
email,
COUNT(*) as duplicate_count
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
-- Remove duplicates using ROW_NUMBER()
WITH duplicate_cte AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_date DESC) as rn
FROM users
)
DELETE FROM duplicate_cte WHERE rn > 1;
-- Alternative: Keep only unique records
WITH unique_users AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_date DESC) as rn
FROM users
)
SELECT * FROM unique_users WHERE rn = 1;
8. Explain UNION vs UNION ALL
Answer:
UNION: Combines results and removes duplicates (slower)
UNION ALL: Combines results keeping duplicates (faster)
Example:
sql
-- UNION removes duplicates
SELECT customer_id, 'Premium' as customer_type FROM premium_customers
UNION
SELECT customer_id, 'Standard' as customer_type FROM standard_customers;
-- UNION ALL keeps duplicates (faster)
SELECT customer_id, 'Premium' as customer_type FROM premium_customers
UNION ALL
SELECT customer_id, 'Standard' as customer_type FROM standard_customers;
-- Performance consideration
SELECT 'Q1' as quarter, SUM(sales) FROM sales_q1
UNION ALL -- Use when you know there are no duplicates
SELECT 'Q2' as quarter, SUM(sales) FROM sales_q2
UNION ALL
SELECT 'Q3' as quarter, SUM(sales) FROM sales_q3
UNION ALL
SELECT 'Q4' as quarter, SUM(sales) FROM sales_q4;
9. How do you perform pagination in SQL?
Answer: Use OFFSET and FETCH (SQL Server) or LIMIT with OFFSET (MySQL/PostgreSQL) for
pagination.
Example:
sql
-- SQL Server pagination
DECLARE @PageSize INT = 10;
DECLARE @PageNumber INT = 3;
SELECT
product_id,
product_name,
price
FROM products
ORDER BY product_name
OFFSET (@PageNumber - 1) * @PageSize ROWS
FETCH NEXT @PageSize ROWS ONLY;
-- MySQL/PostgreSQL pagination
SELECT
product_id,
product_name,
price
FROM products
ORDER BY product_name
LIMIT 10 OFFSET 20; -- Page 3, 10 records per page
-- Performance tip: Use indexed columns for ORDER BY
CREATE INDEX idx_products_name ON products(product_name);
10. What are indexes and how do they improve query performance?
Answer: Indexes are data structures that improve query performance by providing fast access
paths to data, similar to a book's index.
Example:
sql
-- Create indexes for common query patterns
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
CREATE INDEX idx_products_category ON products(category);
-- Query that benefits from index
SELECT order_id, total_amount
FROM orders
WHERE customer_id = 123
AND order_date >= '2024-01-01'
ORDER BY order_date DESC;
-- Composite index usage
CREATE INDEX idx_sales_region_date_product ON sales(region, sale_date, product_id);
-- This query uses the composite index efficiently
SELECT SUM(amount)
FROM sales
WHERE region = 'North'
AND sale_date BETWEEN '2024-01-01' AND '2024-12-31'
AND product_id IN (1, 2, 3);
11. Explain the concept of SQL execution plan and how to optimize
queries
Answer: Execution plans show how SQL Server executes a query, helping identify performance
bottlenecks.
Example:
sql
-- View execution plan
SET SHOWPLAN_ALL ON;
SELECT * FROM orders WHERE customer_id = 123;
-- Or use EXPLAIN in other databases
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
-- Optimization techniques
-- 1. Use appropriate indexes
CREATE INDEX idx_orders_customer ON orders(customer_id);
-- 2. Avoid SELECT *
SELECT order_id, order_date, total_amount
FROM orders
WHERE customer_id = 123;
-- 3. Use WHERE clauses to filter early
SELECT o.order_id, c.customer_name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date >= '2024-01-01' -- Filter before join
AND [Link] = 'Active';
-- 4. Use EXISTS instead of IN for subqueries
SELECT customer_id, customer_name
FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.customer_id
);
12. How do you handle hierarchical data in SQL?
Answer: Use recursive CTEs, adjacency lists, or nested set models to handle hierarchical data.
Example:
sql
-- Recursive CTE for organization hierarchy
WITH org_hierarchy AS (
-- Find all top-level managers
SELECT
employee_id,
employee_name,
manager_id,
1 as level,
CAST(employee_name AS VARCHAR(1000)) as hierarchy_path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Find all subordinates
SELECT
e.employee_id,
e.employee_name,
e.manager_id,
[Link] + 1,
CAST(oh.hierarchy_path + ' -> ' + e.employee_name AS VARCHAR(1000))
FROM employees e
JOIN org_hierarchy oh ON e.manager_id = oh.employee_id
)
SELECT
employee_id,
employee_name,
level,
hierarchy_path
FROM org_hierarchy
ORDER BY level, employee_name;
-- Find all subordinates of a specific manager
WITH subordinates AS (
SELECT employee_id, employee_name, manager_id
FROM employees
WHERE employee_id = 5 -- Starting manager
UNION ALL
SELECT e.employee_id, e.employee_name, e.manager_id
FROM employees e
JOIN subordinates s ON e.manager_id = s.employee_id
)
SELECT * FROM subordinates;
13. What are the different types of subqueries?
Answer: Subqueries can be correlated/non-correlated, and can return single values, multiple
values, or tables.
Example:
sql
-- Scalar subquery (returns single value)
SELECT
product_name,
price,
price - (SELECT AVG(price) FROM products) as price_diff_from_avg
FROM products;
-- Multiple-row subquery with IN
SELECT customer_name
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM orders
WHERE order_date >= '2024-01-01'
);
-- Correlated subquery
SELECT
c.customer_name,
(SELECT COUNT(*)
FROM orders o
WHERE o.customer_id = c.customer_id) as order_count
FROM customers c;
-- EXISTS subquery (efficient for large datasets)
SELECT customer_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
AND o.order_date >= '2024-01-01'
);
-- Table subquery in FROM clause
SELECT
monthly_sales.month,
monthly_sales.total_sales,
monthly_sales.total_sales / yearly_total.total as percentage
FROM (
SELECT
MONTH(order_date) as month,
SUM(total_amount) as total_sales
FROM orders
WHERE YEAR(order_date) = 2024
GROUP BY MONTH(order_date)
) monthly_sales
CROSS JOIN (
SELECT SUM(total_amount) as total
FROM orders
WHERE YEAR(order_date) = 2024
) yearly_total;
14. How do you handle date and time operations in SQL?
Answer: Use built-in date functions for manipulation, formatting, and calculations.
Example:
sql
-- Date arithmetic and functions
SELECT
order_date,
YEAR(order_date) as order_year,
MONTH(order_date) as order_month,
DATEPART(week, order_date) as week_number,
DATEDIFF(day, order_date, GETDATE()) as days_since_order,
DATEADD(day, 30, order_date) as due_date,
FORMAT(order_date, 'yyyy-MM-dd') as formatted_date
FROM orders;
-- Date range queries
SELECT *
FROM sales
WHERE sale_date >= DATEADD(month, -3, GETDATE()) -- Last 3 months
AND sale_date < CAST(GETDATE() AS DATE) + 1; -- Before tomorrow
-- First and last day of month
SELECT
DATEFROMPARTS(YEAR(GETDATE()), MONTH(GETDATE()), 1) as first_day_month,
EOMONTH(GETDATE()) as last_day_month;
-- Time zone handling
SELECT
order_date,
order_date AT TIME ZONE 'UTC' AT TIME ZONE 'Eastern Standard Time' as est_time
FROM orders;
-- Age calculation
SELECT
employee_name,
birth_date,
DATEDIFF(year, birth_date, GETDATE()) as age,
CASE
WHEN DATEADD(year, DATEDIFF(year, birth_date, GETDATE()), birth_date) > GETDATE()
THEN DATEDIFF(year, birth_date, GETDATE()) - 1
ELSE DATEDIFF(year, birth_date, GETDATE())
END as accurate_age
FROM employees;
15. What are transactions and how do you handle them in SQL?
Answer: Transactions ensure data consistency by grouping multiple operations that must succeed
or fail together.
Example:
sql
-- Basic transaction
BEGIN TRANSACTION;
BEGIN TRY
UPDATE accounts SET balance = balance - 1000 WHERE account_id = 1;
UPDATE accounts SET balance = balance + 1000 WHERE account_id = 2;
INSERT INTO transaction_log (from_account, to_account, amount, transaction_date)
VALUES (1, 2, 1000, GETDATE());
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
THROW;
END CATCH;
-- Transaction with savepoints
BEGIN TRANSACTION;
INSERT INTO orders (customer_id, order_date) VALUES (1, GETDATE());
SAVE TRANSACTION order_created;
BEGIN TRY
INSERT INTO order_items (order_id, product_id, quantity)
VALUES (@@IDENTITY, 1, 5);
UPDATE products SET stock_quantity = stock_quantity - 5 WHERE product_id = 1;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION order_created;
-- Order remains but items are rolled back
COMMIT TRANSACTION;
END CATCH;
-- Isolation levels
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- or READ UNCOMMITTED, REPEATABLE READ, SERIALIZABLE
16. How do you optimize queries for large datasets?
Answer: Use indexing, partitioning, query optimization techniques, and appropriate data types.
Example:
sql
-- Partitioning for large tables
CREATE PARTITION FUNCTION pf_sales_date (DATE)
AS RANGE RIGHT FOR VALUES ('2023-01-01', '2024-01-01', '2025-01-01');
CREATE PARTITION SCHEME ps_sales_date
AS PARTITION pf_sales_date TO (fg1, fg2, fg3, fg4);
CREATE TABLE sales_partitioned (
sale_id INT IDENTITY(1,1),
sale_date DATE,
amount DECIMAL(10,2),
customer_id INT
) ON ps_sales_date(sale_date);
-- Efficient pagination for large datasets
WITH ordered_results AS (
SELECT
ROW_NUMBER() OVER (ORDER BY sale_date DESC) as rn,
sale_id,
sale_date,
amount
FROM sales
WHERE sale_date >= '2024-01-01'
)
SELECT sale_id, sale_date, amount
FROM ordered_results
WHERE rn BETWEEN 10001 AND 10100; -- Page 1001, 100 records per page
-- Use appropriate data types
CREATE TABLE optimized_sales (
sale_id INT, -- Instead of BIGINT if not needed
sale_date DATE, -- Instead of DATETIME if time not needed
amount DECIMAL(10,2), -- Specific precision
customer_id INT,
status TINYINT, -- Instead of VARCHAR for status codes
notes VARCHAR(500) -- Limited length instead of TEXT
);
-- Covering indexes for query optimization
CREATE INDEX idx_sales_covering
ON sales (customer_id, sale_date)
INCLUDE (amount, product_id);
-- Query that uses covering index
SELECT customer_id, sale_date, amount, product_id
FROM sales
WHERE customer_id = 123
AND sale_date >= '2024-01-01'
ORDER BY sale_date;
17. Explain different types of database constraints
Answer: Constraints enforce data integrity rules at the database level.
Example:
sql
-- Primary Key constraint
CREATE TABLE customers (
customer_id INT IDENTITY(1,1) PRIMARY KEY,
customer_name VARCHAR(100) NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
phone VARCHAR(20),
created_date DATE DEFAULT GETDATE()
);
-- Foreign Key constraint
CREATE TABLE orders (
order_id INT IDENTITY(1,1) PRIMARY KEY,
customer_id INT,
order_date DATE NOT NULL,
total_amount DECIMAL(10,2),
CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
ON DELETE CASCADE ON UPDATE CASCADE
);
-- Check constraints
ALTER TABLE orders
ADD CONSTRAINT chk_positive_amount CHECK (total_amount > 0);
ALTER TABLE customers
ADD CONSTRAINT chk_valid_email CHECK (email LIKE '%@%.%');
-- Unique constraint on multiple columns
ALTER TABLE order_items
ADD CONSTRAINT uk_order_product UNIQUE (order_id, product_id);
-- Custom constraint with function
CREATE FUNCTION fn_validate_age(@birth_date DATE)
RETURNS BIT
AS
BEGIN
DECLARE @result BIT = 0;
IF DATEDIFF(year, @birth_date, GETDATE()) >= 18
SET @result = 1;
RETURN @result;
END;
ALTER TABLE employees
ADD CONSTRAINT chk_adult_age CHECK (dbo.fn_validate_age(birth_date) = 1);
18. How do you handle JSON data in SQL?
Answer: Modern SQL databases provide functions to parse, query, and manipulate JSON data.
Example:
sql
-- JSON data storage and querying (SQL Server)
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
specifications NVARCHAR(MAX) CHECK (ISJSON(specifications) = 1)
);
INSERT INTO products VALUES
(1, 'Laptop', '{"brand": "Dell", "ram": "16GB", "storage": "512GB SSD", "features": ["WiFi", "Bluetooth", "USB-C"]}'),
(2, 'Phone', '{"brand": "Samsung", "storage": "128GB", "camera": "64MP", "features": ["5G", "Wireless Charging"]}');
-- Query JSON data
SELECT
product_id,
product_name,
JSON_VALUE(specifications, '$.brand') as brand,
JSON_VALUE(specifications, '$.ram') as ram,
JSON_VALUE(specifications, '$.storage') as storage
FROM products;
-- Query JSON arrays
SELECT
product_id,
product_name,
[Link] as feature
FROM products
CROSS APPLY OPENJSON(specifications, '$.features') feature;
-- Update JSON data
UPDATE products
SET specifications = JSON_MODIFY(specifications, '$.price', 999.99)
WHERE product_id = 1;
-- Complex JSON queries
SELECT
product_id,
product_name
FROM products
WHERE JSON_VALUE(specifications, '$.brand') = 'Dell'
AND JSON_VALUE(specifications, '$.ram') LIKE '%16%'
AND JSON_QUERY(specifications, '$.features') LIKE '%WiFi%';
19. What are stored procedures and functions? When would you use
each?
Answer:
Stored Procedures: Can perform multiple operations, modify data, no return value
requirement
Functions: Must return a value, cannot modify data, can be used in SELECT statements
Example:
sql
-- Stored Procedure
CREATE PROCEDURE sp_process_order
@customer_id INT,
@product_id INT,
@quantity INT,
@order_id INT OUTPUT
AS
BEGIN
SET NOCOUNT ON;
BEGIN TRANSACTION;
BEGIN TRY
-- Insert order
INSERT INTO orders (customer_id, order_date, status)
VALUES (@customer_id, GETDATE(), 'Pending');
SET @order_id = SCOPE_IDENTITY();
-- Insert order items
INSERT INTO order_items (order_id, product_id, quantity, unit_price)
SELECT @order_id, @product_id, @quantity, price
FROM products WHERE product_id = @product_id;
-- Update inventory
UPDATE products
SET stock_quantity = stock_quantity - @quantity
WHERE product_id = @product_id;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION;
THROW;
END CATCH;
END;
-- Usage
DECLARE @new_order_id INT;
EXEC sp_process_order
@customer_id = 1,
@product_id = 5,
@quantity = 2,
@order_id = @new_order_id OUTPUT;
-- Function
CREATE FUNCTION fn_calculate_age(@birth_date DATE)
RETURNS INT
AS
BEGIN
DECLARE @age INT;
SET @age = DATEDIFF(year, @birth_date, GETDATE());
-- Adjust for birthday not yet occurred this year
IF (MONTH(@birth_date) > MONTH(GETDATE()) OR
(MONTH(@birth_date) = MONTH(GETDATE()) AND DAY(@birth_date) > DAY(GETDATE())))
SET @age = @age - 1;
RETURN @age;
END;
-- Usage in SELECT
SELECT
employee_name,
birth_date,
dbo.fn_calculate_age(birth_date) as age
FROM employees;
-- Table-valued function
CREATE FUNCTION fn_get_customer_orders(@customer_id INT)
RETURNS TABLE
AS
RETURN (
SELECT
o.order_id,
o.order_date,
o.total_amount,
oi.product_id,
[Link]
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.customer_id = @customer_id
);
-- Usage
SELECT * FROM dbo.fn_get_customer_orders(123);
20. How do you handle recursive queries and what are their performance
considerations?
Answer: Recursive queries use CTEs to traverse hierarchical data. Performance depends on data
structure and termination conditions.
Example:
sql
-- Employee hierarchy with performance optimization
WITH employee_hierarchy AS (
-- Anchor: Find top-level managers
SELECT
employee_id,
manager_id,
employee_name,
salary,
1 as level,
CAST(employee_id AS VARCHAR(MAX)) as path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: Find direct reports
SELECT
e.employee_id,
e.manager_id,
e.employee_name,
[Link],
[Link] + 1,
[Link] + ',' + CAST(e.employee_id AS VARCHAR(MAX))
FROM employees e
INNER JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
WHERE [Link] < 10 -- Prevent infinite recursion
)
SELECT
employee_id,
employee_name,
level,
path,
salary
FROM employee_hierarchy
ORDER BY level, employee_name
OPTION (MAXRECURSION 10); -- Limit recursion depth
-- Bill of Materials (BOM) explosion
WITH bom_explosion AS (
-- Anchor: Top-level products
SELECT
product_id,
component_id,
quantity_needed,
1 as level,
CAST(product_id AS VARCHAR(100)) as product_path
FROM bill_of_materials
WHERE product_id = 1001 -- Specific product
UNION ALL
-- Recursive: Sub-components
SELECT
bom.product_id,
bom.component_id,
bom.quantity_needed * be.quantity_needed as quantity_needed,
[Link] + 1,
be.product_path + '->' + CAST(bom.component_id AS VARCHAR(100))
FROM bill_of_materials bom
INNER JOIN bom_explosion be ON bom.product_id = be.component_id
WHERE [Link] < 5
)
SELECT
be.component_id,
p.component_name,
SUM(be.quantity_needed) as total_quantity_needed,
[Link],
be.product_path
FROM bom_explosion be
JOIN products p ON be.component_id = p.product_id
GROUP BY be.component_id, p.component_name, [Link], be.product_path
ORDER BY [Link], be.component_id;
-- Performance optimization for recursive queries
-- 1. Create appropriate indexes
CREATE INDEX idx_employees_manager ON employees(manager_id);
CREATE INDEX idx_bom_product ON bill_of_materials(product_id);
-- 2. Use OPTION (MAXRECURSION) to prevent runaway queries
-- 3. Add level limits in WHERE clause
-- 4. Consider materialized path approach for frequently queried hierarchies
-- Alternative: Materialized path approach
ALTER TABLE employees ADD hierarchy_path VARCHAR(500);
-- Update hierarchy path (one-time or scheduled)
WITH hierarchy_update AS (
SELECT
employee_id,
manager_id,
employee_name,
CAST(employee_id AS VARCHAR(500)) as path
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT
e.employee_id,
e.manager_id,
e.employee_name,
[Link] + '/' + CAST(e.employee_id AS VARCHAR(500))
FROM employees e
INNER JOIN hierarchy_update hu ON e.manager_id = hu.employee_id
)
UPDATE e
SET hierarchy_path = [Link]
FROM employees e
INNER JOIN hierarchy_update hu ON e.employee_id = hu.employee_id;
-- Fast hierarchical queries using materialized path
SELECT * FROM employees
WHERE hierarchy_path LIKE '1/%' -- All subordinates of employee 1
ORDER BY hierarchy_path;
Performance Tips for SQL Interviews:
1. Always consider indexing when discussing query optimization
2. Use appropriate data types to minimize storage and improve performance
3. **Avoid SELECT *** in production queries
4. Use EXISTS instead of IN for subqueries when possible
5. Consider query execution plans when optimizing
6. Use UNION ALL instead of UNION when duplicates are acceptable
7. Implement proper error handling in stored procedures
8. Use parameters to prevent SQL injection
9. Consider partitioning for very large tables
10. Use appropriate isolation levels for transactions
These questions cover the essential SQL concepts for a senior data engineer role, focusing on
practical applications and performance considerations that are crucial in real-world scenarios.