Interview Preparation: Top SQL Queries and Solutions for Data Engineers and Data Analyst

Pratik Barjatiya
4 min readMay 12, 2023

--

Photo by Christina @ wocintechchat.com on Unsplash

Basic Level:

  • Write a SQL query to find the number of unique customers in the sales table.
SELECT COUNT(DISTINCT customer_id) FROM sales;
  • Write a SQL query to find the total revenue generated from the sales table.
SELECT SUM(revenue) FROM sales;
  • Write a SQL query to find the top 5 products by revenue in the sales table.
SELECT product_name, SUM(revenue) as total_revenue
FROM sales
GROUP BY product_name
ORDER BY total_revenue DESC
LIMIT 5;

Intermediate Level:

  • Write a SQL query to find the average revenue per customer in the sales table.
SELECT AVG(revenue) FROM
(SELECT customer_id, SUM(revenue) as revenue
FROM sales
GROUP BY customer_id) as customer_revenue;
  • Write a SQL query to find the month with the highest revenue in the sales table.
SELECT YEAR(sale_date) as year, MONTH(sale_date) as month, SUM(revenue) as total_revenue
FROM sales
GROUP BY year, month
ORDER BY total_revenue DESC
LIMIT 1;
  • Write a SQL query to find the top 10 customers with the highest number of purchases in the sales table.
SELECT customer_id, COUNT(*) as total_purchases
FROM sales
GROUP BY customer_id
ORDER BY total_purchases DESC
LIMIT 10;

Advanced Level:

  • Write a SQL query to find the monthly revenue growth rate in the sales table.
SELECT
DATE_FORMAT(sale_date, '%Y-%m') AS month,
SUM(revenue) AS revenue,
(SUM(revenue) - LAG(SUM(revenue), 1) OVER (ORDER BY sale_date)) / LAG(SUM(revenue), 1) OVER (ORDER BY sale_date) AS revenue_growth_rate
FROM
sales
GROUP BY
DATE_FORMAT(sale_date, '%Y-%m')
ORDER BY
sale_date;
  • Write a SQL query to find the customer retention rate in the sales table.
SELECT
COUNT(DISTINCT b.customer_id) / COUNT(DISTINCT a.customer_id) AS customer_retention_rate
FROM
sales a
LEFT JOIN sales b ON a.customer_id = b.customer_id AND a.sale_date < b.sale_date AND DATEDIFF(b.sale_date, a.sale_date) <= 30
WHERE
a.sale_date BETWEEN '2022-01-01' AND '2022-12-31';
  • Write a SQL query to find the top 5 performing sales regions based on revenue in the sales table.
SELECT
region,
SUM(revenue) AS revenue
FROM
sales
GROUP BY
region
ORDER BY
revenue DESC
LIMIT 5;
  • Write a SQL query to find the top 10% of customers based on revenue in the sales table.
SELECT
customer_id,
revenue
FROM (
SELECT
customer_id,
SUM(revenue) AS revenue,
RANK() OVER (ORDER BY SUM(revenue) DESC) AS rank
FROM
sales
GROUP BY
customer_id
) AS subquery
WHERE
rank <= (SELECT COUNT(*) FROM sales) * 0.1
ORDER BY
revenue DESC;
  • Write a query to find the nth highest salary from a table.
SELECT salary
FROM employee
ORDER BY salary DESC
LIMIT n-1,1;
  • Write a query to find the employees who have a salary greater than their manager’s salary.
SELECT e.employee_name
FROM employee e, employee m
WHERE e.manager_id = m.employee_id
AND e.salary > m.salary;
  • Write a query to find the second highest salary and the department name of the employee.
SELECT d.department_name,
MAX(e.salary) as SecondHighestSalary
FROM employee e
JOIN department d ON e.department_id = d.department_id
WHERE e.salary < (SELECT MAX(salary) FROM employee)
GROUP BY d.department_name
ORDER BY SecondHighestSalary DESC
LIMIT 1;
  • Write a query to find the total sales for each month of the year.
SELECT MONTH(order_date) AS month,
YEAR(order_date) AS year,
SUM(total_sales) AS total_sales
FROM sales_table
GROUP BY YEAR(order_date), MONTH(order_date);
  • Write a query to find the running total of sales for each day.
SELECT order_date, SUM(total_sales) OVER (ORDER BY order_date) AS running_total
FROM sales_table;
  • Write a query to find the top 5% of customers based on their total purchases.
SELECT customer_name, total_purchases
FROM (SELECT customer_name, SUM(purchase_amount) AS total_purchases,
PERCENT_RANK() OVER (ORDER BY SUM(purchase_amount) DESC) AS percentile_rank
FROM purchases
GROUP BY customer_name)
WHERE percentile_rank <= 0.05;
  • Write a query to find the customers who have made purchases on consecutive days.
SELECT DISTINCT a.customer_name
FROM purchases a
JOIN purchases b
ON a.customer_name = b.customer_name
AND DATEDIFF(a.purchase_date, b.purchase_date) = 1;
  • Write a query to find the customers who have made purchases in all categories.
SELECT customer_name
FROM purchases
GROUP BY customer_name
HAVING COUNT(DISTINCT category) = (SELECT COUNT(DISTINCT category) FROM purchases);
  • Write a query to find the average revenue per customer per month.
SELECT customer_id,
DATE_TRUNC('month', purchase_date) AS month,
AVG(purchase_amount) AS average_revenue
FROM purchases
GROUP BY customer_id, month;
  • Write a query to find the top 3 products in each category based on their sales.
SELECT category, product_name, total_sales
FROM (SELECT category, product_name, SUM(purchase_amount) AS total_sales,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY SUM(purchase_amount) DESC) AS row_num
FROM purchases
GROUP BY category, product_name)
WHERE row_num <= 3;

Window function based questions:

  • Write a query to find the running total of sales for each day, resetting the total for each month.
SELECT order_date,
SUM(total_sales) OVER (PARTITION BY MONTH(order_date) ORDER BY order_date) AS monthly_running_total
FROM sales_table;
  • Calculate the running total of sales for each product category in a given time period.
SELECT
category,
date,
sales,
SUM(sales) OVER (PARTITION BY category ORDER BY date) as running_total
FROM
sales_table
WHERE
date >= '2022-01-01' AND date < '2023-01-01'
  • Rank employees within each department based on their salary.
SELECT
department,
employee_name,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank
FROM
employee_table
  • Calculate the moving average of sales for each product category over the past 7 days.
SELECT
category,
date,
sales,
AVG(sales) OVER (PARTITION BY category ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as moving_avg
FROM
sales_table
WHERE
date >= '2022-01-01' AND date < '2023-01-01'
  • Calculate the percentage of total sales contributed by each product within its category.
SELECT
category,
product,
sales,
sales / SUM(sales) OVER (PARTITION BY category) as pct_total_sales
FROM
sales_table
WHERE
date >= '2022-01-01' AND date < '2023-01-01'
  • Calculate the difference in sales between each day and the previous day for each product category.
SELECT
category,
date,
sales,
sales - LAG(sales) OVER (PARTITION BY category ORDER BY date) as daily_sales_diff
FROM
sales_table
WHERE
date >= '2022-01-01' AND date < '2023-01-01'

I hope this can be useful for you. In case of questions/comments, do not hesitate to write in the comments below or reach me directly through LinkedIn or Twitter.

You can also subscribe to my new articles.

--

--

Pratik Barjatiya
Pratik Barjatiya

Written by Pratik Barjatiya

Data Engineer | Big Data Analytics | Data Science Practitioner | MLE | Disciplined Investor | Fitness & Traveller

No responses yet