Interview Preparation: Top SQL Queries and Solutions for Data Engineers and Data Analyst

Pratik Barjatiya
4 min readMay 12, 2023

Photo by Christina @ wocintechchat.com on Unsplash

Basic Level:

  • Write a SQL query to find the number of unique customers in the sales table.
SELECT COUNT(DISTINCT customer_id) FROM sales;
  • Write a SQL query to find the total revenue generated from the sales table.
SELECT SUM(revenue) FROM sales;
  • Write a SQL query to find the top 5 products by revenue in the sales table.
SELECT product_name, SUM(revenue) as total_revenue
FROM sales
GROUP BY product_name
ORDER BY total_revenue DESC
LIMIT 5;

Intermediate Level:

  • Write a SQL query to find the average revenue per customer in the sales table.
SELECT AVG(revenue) FROM
(SELECT customer_id, SUM(revenue) as revenue
FROM sales
GROUP BY customer_id) as customer_revenue;
  • Write a SQL query to find the month with the highest revenue in the sales table.
SELECT YEAR(sale_date) as year, MONTH(sale_date) as month, SUM(revenue) as total_revenue
FROM sales
GROUP BY year, month
ORDER BY total_revenue DESC
LIMIT 1;
  • Write a SQL query to find the top 10 customers with the highest number of purchases in the sales table.
SELECT customer_id, COUNT(*) as total_purchases
FROM sales
GROUP BY customer_id
ORDER BY total_purchases DESC
LIMIT 10;

Advanced Level:

  • Write a SQL query to find the monthly revenue growth rate in the sales table.
SELECT
DATE_FORMAT(sale_date, '%Y-%m') AS month,
SUM(revenue) AS revenue,
(SUM(revenue) - LAG(SUM(revenue), 1) OVER (ORDER BY sale_date)) / LAG(SUM(revenue), 1) OVER (ORDER BY sale_date) AS revenue_growth_rate
FROM
sales
GROUP BY
DATE_FORMAT(sale_date, '%Y-%m')
ORDER BY
sale_date;
  • Write a SQL query to find the customer retention rate in the sales table.
SELECT
COUNT(DISTINCT b.customer_id) / COUNT(DISTINCT a.customer_id) AS customer_retention_rate
FROM
sales a
LEFT JOIN sales b ON a.customer_id = b.customer_id AND a.sale_date < b.sale_date AND DATEDIFF(b.sale_date, a.sale_date) <= 30
WHERE
a.sale_date BETWEEN '2022-01-01' AND '2022-12-31';
  • Write a SQL query to find the top 5 performing sales regions based on revenue in the sales table.
SELECT
region,
SUM(revenue) AS revenue
FROM
sales
GROUP BY
region
ORDER BY
revenue DESC
LIMIT 5;
  • Write a SQL query to find the top 10% of customers based on revenue in the sales table.
SELECT
customer_id,
revenue
FROM (
SELECT
customer_id,
SUM(revenue) AS revenue,
RANK() OVER (ORDER BY SUM(revenue) DESC) AS rank
FROM
sales
GROUP BY
customer_id
) AS subquery
WHERE
rank <= (SELECT COUNT(*) FROM sales) * 0.1
ORDER BY
revenue DESC;
  • Write a query to find the nth highest salary from a table.
SELECT salary
FROM employee
ORDER BY salary DESC
LIMIT n-1,1;
  • Write a query to find the employees who have a salary greater than their manager’s salary.
SELECT e.employee_name
FROM employee e, employee m
WHERE e.manager_id = m.employee_id
AND e.salary > m.salary;
  • Write a query to find the second highest salary and the department name of the employee.
SELECT d.department_name,
MAX(e.salary) as SecondHighestSalary
FROM employee e
JOIN department d ON e.department_id = d.department_id
WHERE e.salary < (SELECT MAX(salary) FROM employee)
GROUP BY d.department_name
ORDER BY SecondHighestSalary DESC
LIMIT 1;
  • Write a query to find the total sales for each month of the year.
SELECT MONTH(order_date) AS month,
YEAR(order_date) AS year,
SUM(total_sales) AS total_sales
FROM sales_table
GROUP BY YEAR(order_date), MONTH(order_date);
  • Write a query to find the running total of sales for each day.
SELECT order_date, SUM(total_sales) OVER (ORDER BY order_date) AS running_total
FROM sales_table;
  • Write a query to find the top 5% of customers based on their total purchases.
SELECT customer_name, total_purchases
FROM (SELECT customer_name, SUM(purchase_amount) AS total_purchases,
PERCENT_RANK() OVER (ORDER BY SUM(purchase_amount) DESC) AS percentile_rank
FROM purchases
GROUP BY customer_name)
WHERE percentile_rank <= 0.05;
  • Write a query to find the customers who have made purchases on consecutive days.
SELECT DISTINCT a.customer_name
FROM purchases a
JOIN purchases b
ON a.customer_name = b.customer_name
AND DATEDIFF(a.purchase_date, b.purchase_date) = 1;
  • Write a query to find the customers who have made purchases in all categories.
SELECT customer_name
FROM purchases
GROUP BY customer_name
HAVING COUNT(DISTINCT category) = (SELECT COUNT(DISTINCT category) FROM purchases);
  • Write a query to find the average revenue per customer per month.
SELECT customer_id,
DATE_TRUNC('month', purchase_date) AS month,
AVG(purchase_amount) AS average_revenue
FROM purchases
GROUP BY customer_id, month;
  • Write a query to find the top 3 products in each category based on their sales.
SELECT category, product_name, total_sales
FROM (SELECT category, product_name, SUM(purchase_amount) AS total_sales,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY SUM(purchase_amount) DESC) AS row_num
FROM purchases
GROUP BY category, product_name)
WHERE row_num <= 3;

Window function based questions:

  • Write a query to find the running total of sales for each day, resetting the total for each month.
SELECT order_date,
SUM(total_sales) OVER (PARTITION BY MONTH(order_date) ORDER BY order_date) AS monthly_running_total
FROM sales_table;
  • Calculate the running total of sales for each product category in a given time period.
SELECT
category,
date,
sales,
SUM(sales) OVER (PARTITION BY category ORDER BY date) as running_total
FROM
sales_table
WHERE
date >= '2022-01-01' AND date < '2023-01-01'
  • Rank employees within each department based on their salary.
SELECT
department,
employee_name,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) as rank
FROM
employee_table
  • Calculate the moving average of sales for each product category over the past 7 days.
SELECT
category,
date,
sales,
AVG(sales) OVER (PARTITION BY category ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as moving_avg
FROM
sales_table
WHERE
date >= '2022-01-01' AND date < '2023-01-01'
  • Calculate the percentage of total sales contributed by each product within its category.
SELECT
category,
product,
sales,
sales / SUM(sales) OVER (PARTITION BY category) as pct_total_sales
FROM
sales_table
WHERE
date >= '2022-01-01' AND date < '2023-01-01'
  • Calculate the difference in sales between each day and the previous day for each product category.
SELECT
category,
date,
sales,
sales - LAG(sales) OVER (PARTITION BY category ORDER BY date) as daily_sales_diff
FROM
sales_table
WHERE
date >= '2022-01-01' AND date < '2023-01-01'

I hope this can be useful for you. In case of questions/comments, do not hesitate to write in the comments below or reach me directly through LinkedIn or Twitter.

You can also subscribe to my new articles.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Pratik Barjatiya
Pratik Barjatiya

Written by Pratik Barjatiya

Data Engineer | Big Data Analytics | Data Science Practitioner | MLE | Disciplined Investor | Fitness & Traveller

No responses yet

Write a response