0% found this document useful (0 votes)
8 views10 pages

SQL 5

The document contains a series of SQL interview questions and solutions aimed at data professionals, specifically for product-based companies. It covers various topics such as customer revenue analysis, order tracking, cohort analysis, and data quality checks, providing SQL queries along with explanations for each. The questions range from finding top customers to calculating average order values and detecting anomalies, catering to intermediate to advanced difficulty levels.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views10 pages

SQL 5

The document contains a series of SQL interview questions and solutions aimed at data professionals, specifically for product-based companies. It covers various topics such as customer revenue analysis, order tracking, cohort analysis, and data quality checks, providing SQL queries along with explanations for each. The questions range from finding top customers to calculating average order values and detecting anomalies, catering to intermediate to advanced difficulty levels.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SQL Interview QUERY Questions &

Answers for Product-Based Companies


Difficulty Level: Intermediate to Advanced
Target Audience: Data Engineers, Data Analysts, Product Analysts
Time to Practice: 2-3 hours

Question 1: Find Top N Customers by Revenue


Write a SQL query to find the top 5 customers by total order amount in the last 90 days.
Your output should include customer name, total amount spent, and the number of orders
placed.

Database Schema
customers (id, name, email, registration_date)
orders (id, customer_id, order_date, amount)

Solution
SELECT
[Link],
[Link],
COUNT([Link]) AS num_orders,
ROUND(SUM([Link]), 2) AS total_amount_spent
FROM customers c
INNER JOIN orders o ON [Link] = o.customer_id
WHERE o.order_date >= DATE_SUB(CURDATE(), INTERVAL 90 DAY)
GROUP BY [Link], [Link]
ORDER BY total_amount_spent DESC
LIMIT 5;

Explanation
• INNER JOIN: Combines customer and order data
• WHERE clause: Filters orders from the last 90 days
• GROUP BY: Aggregates data per customer
• ORDER BY DESC: Sorts by total amount in descending order
• LIMIT 5: Returns only top 5 customers

DEVIKRISHNA R +91 6235526324


Question 2: Identify Customers with No Orders
Write a query to find all customers who have registered but have never placed an order.

Solution
SELECT
[Link],
[Link],
[Link],
c.registration_date
FROM customers c
LEFT JOIN orders o ON [Link] = o.customer_id
WHERE [Link] IS NULL
ORDER BY c.registration_date DESC;

Explanation
• LEFT JOIN: Keeps all customers even if they have no orders
• WHERE [Link] IS NULL: Filters for customers without matching orders
• This helps identify inactive users who need engagement campaigns

Business Use Case


Product managers use this to identify users who signed up but never converted to paying customers,
informing onboarding or re-engagement strategies.

Question 3: Cohort Analysis - Monthly Retention


Write a query to calculate monthly retention rates. Show for each cohort (first purchase month), what
percentage of customers returned in the following months.

Solution
WITH first_purchase AS (
SELECT
customer_id,
DATE_TRUNC(MIN(order_date), MONTH) AS cohort_month
FROM orders
GROUP BY customer_id
),
customer_activity AS (
SELECT
fp.customer_id,
fp.cohort_month,
DATE_TRUNC(o.order_date, MONTH) AS activity_month,
RANK() OVER (PARTITION BY fp.customer_id ORDER BY DATE_TRUNC(o.order_date,
MONTH)) AS month_number
FROM first_purchase fp
LEFT JOIN orders o ON fp.customer_id = o.customer_id
)
SELECT
cohort_month,
month_number,
DEVIKRISHNA R +91 6235526324
COUNT(DISTINCT customer_id) AS retained_customers,
ROUND(
100.0 * COUNT(DISTINCT customer_id) /
(SELECT COUNT(DISTINCT customer_id) FROM first_purchase WHERE cohort_month =
ca.cohort_month),
2
) AS retention_percentage
FROM customer_activity ca
GROUP BY cohort_month, month_number
ORDER BY cohort_month, month_number;

Key Concepts
• CTE (Common Table Expression): WITH clause defines reusable subqueries
• WINDOW FUNCTION: RANK() OVER (PARTITION BY... ORDER BY...) assigns
sequence numbers
• DATE_TRUNC: Standardizes dates to month level
• Retention percentage shows how many customers from a cohort returned each month

Question 4: Running Total of Sales


Write a query to calculate the running total of sales for each product, ordered by sale date. Include
columns for product_id, sale_date, amount, and running_total.

Solution
SELECT
product_id,
sale_date,
amount,
SUM(amount) OVER (
PARTITION BY product_id
ORDER BY sale_date, sale_id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_total
FROM sales
ORDER BY product_id, sale_date;

Explanation
• WINDOW FUNCTION: SUM() OVER () calculates cumulative sum
• PARTITION BY product_id: Calculates running total separately for each product
• ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Includes all
rows from start to current row
• Useful for tracking cumulative sales metrics and performance trends

Question 5: Find Duplicate Records


Write a query to find all customers who have multiple records in the customers table (duplicates).

Solution

DEVIKRISHNA R +91 6235526324


SELECT
name,
email,
COUNT(
) AS duplicate_countFROM customersGROUP BY name, emailHAVING COUNT() > 1
ORDER BY duplicate_count DESC;

Explanation
• GROUP BY: Groups records by name and email
• HAVING: Filters groups with more than 1 record
• COUNT(*): Counts duplicates per group
• Essential for data quality checks and data cleaning

Question 6: Calculate Average Order Value by Segment


Write a query to calculate average order value (AOV) for each customer segment (new, returning, loyal).
Define segments based on total orders: new (1 order), returning (2-3 orders), loyal (4+ orders).

Solution
WITH customer_segments AS (
SELECT
[Link],
[Link],
COUNT([Link]) AS total_orders,
CASE
WHEN COUNT([Link]) = 1 THEN 'New'
WHEN COUNT([Link]) BETWEEN 2 AND 3 THEN 'Returning'
WHEN COUNT([Link]) >= 4 THEN 'Loyal'
END AS segment
FROM customers c
LEFT JOIN orders o ON [Link] = o.customer_id
GROUP BY [Link], [Link]
)
SELECT
segment,
COUNT(DISTINCT id) AS num_customers,
ROUND(AVG(total_orders), 2) AS avg_orders_per_customer,
(
SELECT ROUND(AVG(amount), 2)
FROM orders o
WHERE o.customer_id IN (
SELECT id FROM customer_segments WHERE segment = [Link]
)
) AS avg_order_value
FROM customer_segments cs
WHERE segment IS NOT NULL
GROUP BY segment
ORDER BY total_orders DESC;

Key Concepts
DEVIKRISHNA R +91 6235526324
• CASE WHEN: Creates conditional logic for segmentation
• Nested subquery: Calculates AOV for each segment
• Segmentation helps tailor marketing strategies and pricing

Question 7: Find the Nth Highest Salary (or Order Value)


Write a query to find the 3rd highest order amount in the orders table without using LIMIT.

Solution
SELECT DISTINCT amount
FROM orders o1
WHERE (
SELECT COUNT(DISTINCT amount)
FROM orders o2
WHERE [Link] >= [Link]
)=3
LIMIT 1;
Alternative Solution (Using DENSE_RANK)
WITH ranked_orders AS (
SELECT
amount,
DENSE_RANK() OVER (ORDER BY amount DESC) AS rank_num
FROM orders
)
SELECT amount
FROM ranked_orders
WHERE rank_num = 3;

Explanation
• First approach: Counts how many distinct amounts are greater than or equal to current amount
• DENSE_RANK(): Assigns ranks, treating equal values as same rank
• Both approaches handle ties differently—choose based on requirements

Question 8: Window Function - Percentage of Total


Write a query to show each product's sales amount and what percentage it represents of total sales for
that month.

Solution
SELECT
DATE_TRUNC(order_date, MONTH) AS month,
product_id,
SUM(amount) AS product_sales,
ROUND(
100.0 * SUM(amount) OVER (
PARTITION BY DATE_TRUNC(order_date, MONTH)
) / SUM(amount) OVER (
PARTITION BY DATE_TRUNC(order_date, MONTH)

DEVIKRISHNA R +91 6235526324


)*

SUM(amount),
2
) AS percentage_of_monthly_sales
FROM orders
GROUP BY DATE_TRUNC(order_date, MONTH), product_id
ORDER BY month, product_sales DESC;

Simplified Solution
SELECT
DATE_TRUNC(order_date, MONTH) AS month,
product_id,
SUM(amount) AS product_sales,
ROUND(
100.0 * SUM(amount) / SUM(SUM(amount)) OVER (
PARTITION BY DATE_TRUNC(order_date, MONTH)
),
2
) AS percentage_of_monthly_sales
FROM orders
GROUP BY month, product_id
ORDER BY month, product_sales DESC;
Explanation
• Window function with PARTITION BY: Calculates totals at different granularities
• Useful for understanding product mix and sales composition by time period

Question 9: Detect Anomalies - Days Since Last Order


Write a query to find customers whose last order was more than 30 days ago (potentially churned).
Include their name, last order date, and days since last order.

Solution
WITH last_orders AS (
SELECT
customer_id,
MAX(order_date) AS last_order_date
FROM orders
GROUP BY customer_id
)
SELECT
[Link],
[Link],
[Link],
lo.last_order_date,
DATEDIFF(CURDATE(), lo.last_order_date) AS days_since_last_order
FROM customers c
INNER JOIN last_orders lo ON [Link] = lo.customer_id
WHERE DATEDIFF(CURDATE(), lo.last_order_date) > 30
ORDER BY days_since_last_order DESC;

DEVIKRISHNA R +91 6235526324


Explanation
• MAX(order_date): Finds most recent purchase date per customer
• DATEDIFF(): Calculates days between last order and today
• WHERE > 30: Identifies potential churn risk
• Critical for churn prediction and retention campaigns

Question 10: Join Multiple Conditions - Complex Filtering


Write a query to find products that were purchased by customers in specific cities, where the customer
has spent more than $500 in the last 6 months, and exclude products already on sale.

Schema Extension
customers (id, name, city)
orders (id, customer_id, order_date, amount)
order_items (id, order_id, product_id, quantity, price)
products (id, name, is_on_sale)

Solution
SELECT DISTINCT
[Link],
[Link],
COUNT(DISTINCT [Link]) AS num_purchases,
SUM([Link]) AS total_sales
FROM products p
INNER JOIN order_items oi ON [Link] = oi.product_id
INNER JOIN orders o ON oi.order_id = [Link]
INNER JOIN customers c ON o.customer_id = [Link]
WHERE
[Link] IN ('New York', 'Los Angeles', 'Chicago')
AND p.is_on_sale = FALSE
AND o.order_date >= DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
GROUP BY [Link], [Link]
HAVING SUM([Link]) > 500
ORDER BY total_sales DESC;

Explanation
• Multiple INNER JOINs: Connects related tables
• Complex WHERE clause: Filters on multiple conditions across tables
• HAVING: Filters aggregated results (applied after GROUP BY)
• Shows understanding of data relationships and complex filtering

Question 11: Self-Join - Manager Hierarchy


Write a query to show each employee and their manager's name. Include employees who don't have a
manager assigned.

Schema
employees (id, name, manager_id)
DEVIKRISHNA R +91 6235526324
Solution
SELECT
[Link] AS employee_id,
[Link] AS employee_name,
[Link] AS manager_id,
[Link] AS manager_name,
COALESCE([Link], 'No Manager') AS manager_name_display
FROM employees e
LEFT JOIN employees m ON e.manager_id = [Link]
ORDER BY [Link];

Explanation
• Self-Join: The employees table joins with itself
• LEFT JOIN: Includes employees without managers (manager_id IS NULL)
• COALESCE(): Replaces NULL with 'No Manager' for readability
• Common in hierarchical data: org structures, referral chains, etc.

Question 12: Unpivot/Pivot Operations


Write a query to show total sales by product category and month (pivot table format). Months should be
columns.

Solution (Using CASE)


SELECT
pc.category_name,
SUM(CASE WHEN MONTH(o.order_date) = 1 THEN [Link] ELSE 0 END) AS January,
SUM(CASE WHEN MONTH(o.order_date) = 2 THEN [Link] ELSE 0 END) AS February,
SUM(CASE WHEN MONTH(o.order_date) = 3 THEN [Link] ELSE 0 END) AS March,
SUM(CASE WHEN MONTH(o.order_date) = 4 THEN [Link] ELSE 0 END) AS April,
SUM(CASE WHEN MONTH(o.order_date) = 5 THEN [Link] ELSE 0 END) AS May,
SUM(CASE WHEN MONTH(o.order_date) = 6 THEN [Link] ELSE 0 END) AS June,
SUM([Link]) AS total_sales
FROM orders o
INNER JOIN products p ON o.product_id = [Link]
INNER JOIN product_categories pc ON p.category_id = [Link]
WHERE YEAR(o.order_date) = YEAR(CURDATE())
GROUP BY pc.category_name
ORDER BY total_sales DESC;

Explanation
• CASE WHEN: Conditionally aggregates by month
• Creates a pivot table with months as columns
• Useful for cross-tabulation and trend analysis
• Alternative: Some databases have native PIVOT syntax

Question 13: String Matching and Pattern Analysis


Write a query to find all products whose names start with 'S' and have been ordered at least 5 times.
DEVIKRISHNA R +91 6235526324
Solution
SELECT
[Link],
[Link],
COUNT([Link]) AS num_orders,
ROUND(AVG([Link]), 2) AS avg_quantity_per_order
FROM products p
INNER JOIN order_items oi ON [Link] = oi.product_id
WHERE [Link] LIKE 'S%'
GROUP BY [Link], [Link]
HAVING COUNT([Link]) >= 5
ORDER BY num_orders DESC;

Explanation
• LIKE 'S%': Pattern matching (S at start, any characters after)
• HAVING COUNT(): Filters groups with 5+ orders
• Other patterns: '%S' (ends with S), '%S%' (contains S), 'S_t' (3 chars, starts with S, ends with t)

Question 14: Subquery vs JOIN Performance


Find customers who placed orders with amounts greater than the average order amount. Write solutions
using both subquery and JOIN approaches.

Solution 1: Subquery Approach


SELECT
[Link],
[Link],
o.order_date,
[Link]
FROM customers c
INNER JOIN orders o ON [Link] = o.customer_id
WHERE [Link] > (SELECT AVG(amount) FROM orders)
ORDER BY [Link] DESC;
Solution 2: JOIN Approach
SELECT
[Link],
[Link],
o.order_date,
[Link],
avg_stats.avg_amount
FROM customers c
INNER JOIN orders o ON [Link] = o.customer_id
INNER JOIN (
SELECT AVG(amount) AS avg_amount FROM orders
) avg_stats ON [Link] > avg_stats.avg_amount
ORDER BY [Link] DESC;

DEVIKRISHNA R +91 6235526324


Explanation
• Subquery: Simpler syntax, calculated once
• JOIN: More efficient for larger datasets
• Performance note: Use EXPLAIN PLAN to compare query performance
• Context matters—subqueries work well when calculated once; JOINs better for complex filtering

Question 15: Date Functions and Interval Calculations


Write a query to find the average time (in days) between a customer's first order and second order. Only
include customers who have placed at least 2 orders.

Solution
WITH customer_orders AS (
SELECT
customer_id,
order_date,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date) AS order_num
FROM orders
),
first_and_second AS (
SELECT
customer_id,
MAX(CASE WHEN order_num = 1 THEN order_date END) AS first_order_date,
MAX(CASE WHEN order_num = 2 THEN order_date END) AS second_order_date
FROM customer_orders
GROUP BY customer_id
HAVING MAX(order_num) >= 2
)
SELECT
AVG(DATEDIFF(second_order_date, first_order_date)) AS avg_days_to_second_order,
MIN(DATEDIFF(second_order_date, first_order_date)) AS min_days,
MAX(DATEDIFF(second_order_date, first_order_date)) AS max_days,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY DATEDIFF(second_order_date,
first_order_date)) AS median_days
FROM first_and_second;

Explanation
• ROW_NUMBER(): Assigns sequence numbers within each customer
• CASE WHEN: Extracts specific order dates
• DATEDIFF(): Calculates interval between dates
• PERCENTILE_CONT(): Calculates median (50th percentile)
• Insights: Time-to-repeat purchase helps optimize marketing campaigns

DEVIKRISHNA R +91 6235526324

You might also like