Data Analysis Interview Questions (30+ Questions)
Last Updated: June 8, 2026 • 30 Questions • Real Company Interviews
Are you getting ready for a data analysis interview? Data analysis involves examining, cleaning, transforming, and modeling data to uncover useful insights, draw conclusions, and support decision-making. It requires expertise in various tools and techniques to analyze and interpret complex data sets. Whether you're a beginner or a seasoned professional, mastering data analysis is crucial in today's data-centric job market.
Table of Contents
- Managing High I/O Processes (easy)
- Average Order Value (easy)
- Find Cheapest Product (easy)
- New vs. Returning Customer Report (easy)
- Above Average Price Products (medium)
- Frequent Price Change Detector (medium)
- Self-Join for Manager Names (easy)
- Join Employees and Departments (easy)
- Find Highest Salary (easy)
- Aggregate Employee Salaries (medium)
- Filter Orders by Date Range (easy)
- Count Distinct Product Categories (medium)
- Join Three Tables (easy)
- Use CASE in SELECT (medium)
- Calculate Cumulative Sales (medium)
- Pivot Daily Sales (easy)
- Find Customers Without Orders (easy)
- Group and Filter with HAVING (medium)
- Find Overlapping Date Ranges (medium)
- Use COALESCE for Null Handling (easy)
- Self-Join for Hierarchical Data (easy)
- Time Difference Between Order Phases (easy)
- Find Most Recent Order per Customer (easy)
- Monthly Hiring Trend by Department (easy)
- Aggregate Data with CUBE (medium)
- Calculate Percentage Contribution (easy)
- Over-Budget Project Detection (medium)
- Eligible Bonus Calculation with CASE (medium)
- Set Operation: INTERSECT (medium)
- Extract Year and Month (medium)
In this article, we cover Data Analysis Interview Questions with answers often asked in data analyst interviews at top-paying companies. Whether you're preparing for your first data-related position or seeking to advance your career, this guide will help you tackle the most commonly asked data analysis interview questions, giving you an edge in competitive job interviews.
💡 Pro Tips for Data Analysis Interviews
- Practice each question and understand the underlying concepts
- Review company-specific technologies and methodologies
- Prepare follow-up questions and edge cases
- Practice explaining your solutions clearly and concisely
Interview Questions & Answers
1. Managing High I/O Processes
Learn how to identify and manage high I/O consuming processes on a Linux server using command-line tools. This guide covers sorting running processes by disk activity and mitigating disk bottlenecks through throttling, stopping, or rescheduling heavy jobs, improving overall system responsiveness and application performance.
2. Average Order Value
Objective
To answer the SQL interview question effectively, you need to craft a query that calculates and displays the average expenditure per customer from a table named Orders. This table records customer purchases. The results should display each customer_id alongside their avg_order_value, rounded to two decimal places. Finally, ensure the output is sorted in ascending order based on customer_id.
Additional Information
Table Schema:
- Orders:
order_id(INTEGER): Unique identifier for each order.customer_id(INTEGER): Identifier for the customer who made the order.total_amount(DECIMAL): Total amount spent on the order.order_date(DATE): The date when the order was placed.
Constraints:
- Each
customer_idmay have multiple associated orders. - There is at least one order present in the table.
Output Requirements:
- Columns:
customer_id,avg_order_value avg_order_valuemust be rounded to two decimal places.- Results must be ordered by
customer_idin ascending order.
Detailed Query Explanation
The query you need to write should meet the specific output requirements and constraints mentioned. Here's the detailed SQL query designed to solve the problem:
SELECT
customer_id,
ROUND(AVG(total_amount), 2) AS avg_order_value
FROM
Orders
GROUP BY
customer_id
ORDER BY
customer_id;
Keyword Optimization
When formulating the answer:
- "SQL query to determine the average expenditure per customer"
- "display
customer_idalongside theiravg_order_value" - "rounded to two decimal places"
- "sorted in ascending order based on
customer_id" - "table schema includes
order_id,customer_id,total_amount, andorder_date"
These phrases and keywords are naturally embedded in the explanation and query to ensure the content is both informative and optimized for SEO.
3. Find Cheapest Product
Objective
Retrieve the name and price of the product(s) with the lowest price from the products table.
Additional Information
- The
productstable contains the following columns:id(integer): Unique identifier for each product.name(string): Name of the product.price(decimal): Price of the product.
- There may be multiple products sharing the same lowest price.
- Write an SQL query that returns the
nameandpriceof the cheapest product(s).
In order to tackle this SQL interview question effectively, you need to identify the product or products with the lowest price from the products table. Here's the step-by-step process to achieve this:
- First, determine the lowest price among all the products.
- Then, retrieve the
nameandpriceof the product(s) with this minimum price.
Now, let's write the precise SQL query to accomplish this objective:
SELECT
name,
price
FROM
products
WHERE
price = (SELECT MIN(price) FROM products);
This SQL query efficiently retrieves the name and price of the product(s) with the lowest price from the products table by leveraging a subquery to find the minimum price. This approach ensures that even if there are multiple products with the same lowest price, all of them will be included in the result set.
4. New vs. Returning Customer Report
Generating a Detailed Customer Order Report in SQL
The given objective of this interview question is to write an SQL query to generate a comprehensive report that lists each order along with specific associated details such as order date, customer ID, customer name, and customer type, which indicates whether the order is the customer's first or a subsequent order. This can be achieved by linking information from two database tables, orders and customers.
Here’s how you can achieve this:
SQL Query to Generate the Report
WITH first_orders AS (
SELECT
customer_id,
MIN(order_date) AS first_order_date
FROM
orders
GROUP BY
customer_id
)
SELECT
o.order_date,
o.customer_id,
c.customer_name,
CASE
WHEN o.order_date = fo.first_order_date THEN 'New'
ELSE 'Returning'
END AS customer_type
FROM
orders o
JOIN
customers c ON o.customer_id = c.id
LEFT JOIN
first_orders fo ON o.customer_id = fo.customer_id
ORDER BY
o.order_date,
o.customer_id;
Explanation of the SQL Query Structure
The query begins with a
WITHclause, also known as a Common Table Expression (CTE), namedfirst_orders. This CTE identifies the earliest order date for each customer by using the SQL aggregation functions:- Columns Selected:
customer_idfirst_order_date(calculated as the minimumorder_datefor eachcustomer_id)
- Grouped By:
customer_id
- Columns Selected:
The main
SELECTstatement retrieves the following details:order_date: The date when the order was made (orders.order_date).customer_id: Customer identifier (orders.customer_id).customer_name: Name of the customer (customers.customer_name).customer_type: A CASE statement determining if the order is the customer's first or subsequent order. Iforders.order_datematchesfirst_orders.first_order_date, it is marked as ‘New’; otherwise, it is marked as ‘Returning’.
Join Operations:
- The
orderstable is joined with thecustomerstable oncustomer_idto get the customer name. - The CTE
first_ordersis LEFT JOINED with theorderstable to determine if the order is the first order or not.
- The
Ordering:
- The results are ordered first by
order_dateand then bycustomer_idfor better readability and to match expected output structure.
- The results are ordered first by
Additional Notes
- The use of SQL functions such as
MIN()andGROUP BYensures that the query efficiently handles large datasets by summarizing order data per customer. - The
LEFT JOINis crucial to ensure that even if there are no orders yet, customers who have placed future orders can be correctly identified.
This SQL query is designed to provide all necessary details in an ordered and efficient manner to analyze customer purchasing patterns effectively.
5. Above Average Price Products
SQL Interview Question: Fetching Available Products Above Average Price
Objective
Develop an SQL query to fetch all products currently available in stock and priced above the average price of all available products. The dataset should be organized primarily based on price in descending order and secondarily by rating in descending order. Products lacking a rating should be placed at the end of the sorted list.
Additional Information
- Include only products with
stock_quantitygreater than 0. - The average price must be computed solely based on the products that are in stock.
- When sorting, products with a
NULLrating should appear after those with non-null ratings. - The query should return all columns from the
Productstable. - Account for the possibility that some products might not have a rating value.
This detailed explanation contributes valuable insights into constructing a sophisticated SQL query that efficiently retrieves and sorts products based on inventory status and price, offering critical functionality for database management. Understanding and mastering such queries are essential for optimizing data handling and retrieval processes in various business applications.
Sample Query
SELECT *
FROM Products
WHERE stock_quantity > 0
AND price > (SELECT AVG(price) FROM Products WHERE stock_quantity > 0)
ORDER BY price DESC, rating DESC
NULLS LAST;
By employing the above query, you can ensure that you are extracting relevant product data effectively, adhering to the given criteria, and enhancing your SQL skill set for advanced data manipulation and querying tasks.
6. Frequent Price Change Detector
Identifying Products with Multiple Price Changes Using SQL
Objective
In this interview question, you are provided with two tables: products and price_history. Your task is to write an SQL query that identifies all products that have experienced at least two price changes. A price change is defined as a difference in price from one entry to the next for the same product, based on the chronological order of dates.
Requirements
Tables:
products:id(integer): The unique identifier for each product.name(string): The name of the product.category(string): The category to which the product belongs.
price_history:product_id(integer): The identifier linking to theproductstable.price(decimal): The price of the product at a specific date.date(date): The date when the price was recorded.
Constraints:
- Each
product_idinprice_historycorresponds to anidinproducts. - Prices are recorded in chronological order for each product.
- Each
Output:
- The product’s name.
- The category of the product.
- The total number of price changes, returned as a string.
Conditions:
- Only consider consecutive price entries for determining a price change.
- A product must have at least two price changes to be included in the result.
Order:
- Results should be ordered by the number of price changes in descending order.
- In case of ties, results should be ordered by the product ID in ascending order.
This query helps in understanding the dynamics of product pricing over time and can be a valuable insight for inventory management, pricing strategies, and market analysis. Performing well on such questions requires a meticulous approach to grouping and ordering, which is fundamental in SQL querying.
7. Self-Join for Manager Names
SQL Query to Retrieve Employee and Manager Names Ordered by Employee ID
In a typical SQL job interview, you might encounter questions designed to evaluate your ability to manipulate and retrieve data using SQL queries. One common question involves retrieving each employee's name alongside their respective manager's name from a given employees table.
To solve this, you'll need to create a SQL query that:
- Retrieves all employees.
- Includes each employee's manager name (if one exists).
- Ensures that employees without managers are still listed with a
NULLvalue for the manager's name. - Orders the results by the employee's unique identifier in ascending order.
Sample SQL Query
Here’s a detailed SQL query that meets the requirements:
SELECT
e.id AS employee_id,
e.name AS employee_name,
m.name AS manager_name
FROM
employees e
LEFT JOIN
employees m ON e.manager_id = m.id
ORDER BY
e.id ASC;
Explanation of the Query
SELECT: Specifies the columns you want to retrieve. Here we select the
idandnamefrom theemployeestable (aliased ase) as well as thenamefrom the manager.e.id AS employee_id: Retrieves the employee's ID and labels it as
employee_id.e.name AS employee_name: Retrieves the employee's name and labels it as
employee_name.m.name AS manager_name: Retrieves the manager's name (if it exists) and labels it as
manager_name.FROM employees e: Specifies the primary table we are querying from (
employees), aliasing it ase.LEFT JOIN employees m ON e.manager_id = m.id: Performs a left join on the
employeestable itself to get the manager's details. The join condition matchese.manager_idwithm.id, ensuring managers are matched to their subordinates.ORDER BY e.id ASC: Orders the result by the
idof the employees in ascending order.
This query will yield a list of all employees with their respective manager names, if available. Employees without a manager will have NULL in the manager_name column, and the results will be neatly ordered by employee ID.
Conclusion
Answering SQL interview questions effectively often involves not just solving the problem but doing so in a clear, efficient, and accurate manner. This SQL query example demonstrates your ability to handle self-joins, NULL values, and sorting, making it a comprehensive answer to the posed question.
By fully understanding and implementing the core requirements—retrieving employees and their managers, handling employees without managers, and ordering the results by employee ID—you’ll showcase your SQL proficiency and logical problem-solving skills, critical assets for any data-centric role.
8. Join Employees and Departments
Understanding the Question
Objective
The goal is to write an SQL query to retrieve and display a list of employees who earn more than $50,000. For each employee that meets the specified salary condition, the output should include several details: employee ID, first name, last name, job title, hire date, and the name of their department. If an employee does not belong to any department, the department name should be presented as NULL. Furthermore, the employees should be listed in descending order based on their hire date, showcasing the most recently hired employees first.
Additional Information
Tables:
employeestable:employee_id(Integer): Unique identifier for each employee.first_name(String): Employee's first name.last_name(String): Employee's last name.department_id(Integer): Identifier for the department to which the employee belongs (can beNULL).job_title(String): The title of the employee's job.salary(Integer): The employee's salary.hire_date(Date): The date the employee was hired.
departmentstable:department_id(Integer): Unique identifier for each department.department_name(String): The name of the department.location(String): The location of the department.
Requirements
- Use a
LEFT JOINto merge theemployeesanddepartmentstables on thedepartment_id. - Filter to only include employees with a
salarygreater than 50000. - Display
department_nameasNULLfor employees who are not assigned to any department. - Order the results by
hire_datein descending order.
Example
Input:
employeestable:employee_id first_name last_name department_id job_title salary hire_date 1 John Smith 1 Senior Developer 85000 2020-01-15 2 Mary Johnson 2 Project Manager 75000 2021-03-20 3 Peter Brown NULL Consultant 65000 2022-06-10 4 Sarah Davis 1 Developer 45000 2021-09-01 departmentstable:department_id department_name location 1 Engineering New York 2 Project Management Boston 3 Marketing Chicago
Expected Output:
| employee_id | first_name | last_name | department_name | job_title | hire_date |
|---|---|---|---|---|---|
| 3 | Peter | Brown | NULL | Consultant | 2022-06-10 |
| 2 | Mary | Johnson | Project Management | Project Manager | 2021-03-20 |
| 1 | John | Smith | Engineering | Senior Developer | 2020-01-15 |
SQL Query:
SELECT
e.employee_id,
e.first_name,
e.last_name,
d.department_name,
e.job_title,
e.hire_date
FROM
employees e
LEFT JOIN
departments d ON e.department_id = d.department_id
WHERE
e.salary > 50000
ORDER BY
e.hire_date DESC;
9. Find Highest Salary
Objective
Write an SQL query to retrieve the names and salaries of the highest-paid employee(s) from the employees table.
Additional Information
- The
employeestable consists of the following columns:id(integer): The unique identifier for each employee.name(string): The name of the employee.salary(integer): The salary of the employee.
- If multiple employees share the highest salary, include all of them in the result.
- The output should contain two columns:
nameandsalary. - The results can be returned in any order.
This SQL query requires identifying the highest salary first and then selecting the names and salaries of all employees who earn this highest salary.
To start with, the query can use a subquery to determine the maximum salary from the employees table. Subsequently, a main query can be employed to fetch the names and salaries of employees whose salaries match the maximum salary obtained from the subquery.
Here is the complete SQL query to achieve this objective:
SQL Query
SELECT name, salary
FROM employees
WHERE salary = (SELECT MAX(salary) FROM employees);
This query effectively retrieves the names and salaries of all employees who have the highest salary in the employees table, satisfying the given requirements.
10. Aggregate Employee Salaries
Objective
Given two tables, employees and departments, construct a SQL query to display each department's name, the total number of employees in that department, the aggregate salary of all its employees, and the average salary rounded to two decimal places. The output should be sorted alphabetically by the department name.
Additional Information
Tables:
employees:
id(INTEGER): Unique identifier for each employee.name(VARCHAR): Name of the employee.department_id(INTEGER): Identifier linking to the employee's department.salary(INTEGER): Salary of the employee.
departments:
id(INTEGER): Unique identifier for each department.name(VARCHAR): Name of the department.location(VARCHAR): Location of the department.
Requirements:
- Use appropriate JOIN operations to link
employeeswithdepartments. - Calculate the total number of employees (
employee_count) in each department. - Compute the sum of salaries (
total_salary) for each department. - Determine the average salary (
average_salary) for each department, rounded to two decimal places. - Alias the department name as
department. - Order the final results by the
departmentname in ascending order.
Output Columns:
department(VARCHAR): Name of the department.employee_count(INTEGER): Number of employees in the department.total_salary(INTEGER): Combined salaries of all employees in the department.average_salary(DECIMAL(10,2)): Average salary within the department, rounded to two decimal places.
11. Filter Orders by Date Range
How to Construct an SQL Query to Fetch Order Details within a Specific Date Range
If you're preparing for an interview, crafting a precise SQL query to fetch order details is a skill you need to master. The task involves fetching order_id, customer_name, order_date, and total_amount from the orders table, only for orders placed between January 1, 2023, and June 30, 2023. The results must be sorted by order_date in ascending order. Here’s a step-by-step guide on how to nail this:
SQL Query Example
To achieve this, you need to follow these pointers:
- Select the desired columns:
order_id,customer_name,order_date, andtotal_amount. - Filter the results to only include orders where the
order_datefalls inclusively between '2023-01-01' and '2023-06-30'. - Order the results by
order_datein ascending order to get the earliest orders at the top.
Here is the SQL query you need:
SELECT order_id, customer_name, order_date, total_amount
FROM orders
WHERE order_date BETWEEN '2023-01-01' AND '2023-06-30'
ORDER BY order_date ASC;
Explanation
- FROM orders: Indicates the data source, which is the
orderstable. - SELECT order_id, customer_name, order_date, total_amount: Specifies the columns you want to retrieve.
- WHERE order_date BETWEEN '2023-01-01' AND '2023-06-30': Filters the results to include only those orders placed within the defined date range.
- ORDER BY order_date ASC: Ensures that the output is sorted by
order_datein ascending order.
This query effectively pulls the required data while maintaining the specified conditions and ordering.
Conclusion
Being able to craft an SQL query that accurately fetches order details based on specific criteria is crucial for data analysis and reporting. This particular query showcases your ability to filter data by date and sort it effectively, ensuring that you can produce meaningful and organized insights from your data tables. Practice writing and running similar queries to enhance your SQL skills and be interview-ready.
12. Count Distinct Product Categories
Analyzing Primary Categories and Subcategories in a Products Table: A Detailed Guide
Objective
In this analysis, we aim to examine the products table to identify and count all primary categories that do not have a parent category. Additionally, we will calculate the total number of unique subcategories directly associated with these primary categories.
Key Points
- Primary Category: Defined as a category with no
parent_category. - Subcategory: A distinct category whose
parent_categorymatches a primary category. - Output: The analysis should yield two specific metrics:
total_root_categories: The count of distinct primary categories.total_subcategories: The count of unique subcategories linked to the primary categories.
Table Description
The products table consist of these essential columns:
id(integer): Unique identifier for each product.name(string): Name of the product.category(string): The category to which the product belongs.parent_category(string or null): The parent category of the product's category, if applicable.
Analysis Goals
- Identify primary categories with
parent_categoryasnull. - Determine the distinct subcategories that have their
parent_categoryset to a primary category.
This analysis provides valuable insights into the category hierarchy within the products table and supports better categorization strategies.
By performing this analysis, businesses can understand and organize their product categories more effectively, ensuring a well-structured product categorization system that enhances searchability and user experience.
Conclusion
The aim is to extract the total_root_categories and total_subcategories to comprehend the structure and hierarchy of the product categories. This process is pivotal in optimizing product categorization for better inventory management and improved customer navigation.
This detailed guide serves as a basis for extracting meaningful insights from the products table, ensuring accurate and efficient data analysis.
13. Join Three Tables
Objective
Construct a SQL query to retrieve a comprehensive list of customer orders. For each order, display the customer's name, the date the order was placed, the name of the product ordered, the quantity of the product, and the unit price. Ensure that the results are organized first by the customer's name in ascending order and then by the order date in ascending order.
Additional Information
- Tables and Schemas:
Customers
customer_id(INTEGER): Unique identifier for each customer.customer_name(VARCHAR): Name of the customer.email(VARCHAR): Email address of the customer.
Orders
order_id(INTEGER): Unique identifier for each order.customer_id(INTEGER): Identifier linking the order to a customer.order_date(DATE): Date when the order was placed.
OrderDetails
order_id(INTEGER): Identifier linking the detail to an order.product_name(VARCHAR): Name of the product ordered.quantity(INTEGER): Quantity of the product ordered.unit_price(DECIMAL): Price per unit of the product.
Constraints
- Each customer can have multiple orders.
- Each order can include multiple products.
- Ensure that all relevant records from the three tables are accurately joined based on their relationships.
Output Requirements
- Columns to include in the result:
customer_nameorder_dateproduct_namequantityunit_price
- The result should be sorted by
customer_name(ascending) and then byorder_date(ascending).
Example SQL Query
SELECT
c.customer_name,
o.order_date,
od.product_name,
od.quantity,
od.unit_price
FROM
Customers c
JOIN
Orders o ON c.customer_id = o.customer_id
JOIN
OrderDetails od ON o.order_id = od.order_id
ORDER BY
c.customer_name ASC,
o.order_date ASC;
14. Use CASE in SELECT
Sure, here you go:
SQL Interview Question: Classifying Products by Price
Objective
The goal of this SQL interview question is to craft a query that retrieves the name and price of each product from the products table. Additionally, the query should classify each product into a specific price category:
- 'Low': if the product price is below 50,
- 'Medium': if the product price is between 50 and 100 inclusive,
- 'High': if the product price is above 100.
The results should include the product name, price, and its corresponding price category, and they should be ordered by price in ascending order.
Additional Information
The
productstable contains the following columns:id: integer, unique identifier for each product.name: string, the name of the product.price: decimal, the price of the product.
Price categories must be labeled exactly as 'Low', 'Medium', and 'High'.
Ensure the output is ordered by the
pricecolumn from lowest to highest.
With these details in mind, you can craft the perfect SQL query to solve this task. Executing this query proficiently can demonstrate your ability to handle basic SQL operations, perform conditional logic with CASE statements, and order results efficiently.
We hope this example of an SQL interview question helps you ace your next interview. Good luck!
15. Calculate Cumulative Sales
Example of How to Answer the Interview Question
The given scenario involves working with sales data from a sales_records table, comprising details such as sale dates, product names, and units sold. The task is to craft an SQL query that will yield each record alongside a new column named total_units_sold, which indicates the cumulative number of units sold for each product up to and including the current sale date. The final results should be ordered by the product name and the sale date. Below is a detailed, SEO-friendly explanation of how to handle this interview question:
SQL Solution
To tackle this task, you can utilize the SQL Window Functions, especially the SUM() function combined with the PARTITION BY clause to calculate running totals. The query will leverage window functions to maintain the total units sold for each product chronologically up to the given date.
Here's the SQL query for this solution:
SELECT
sale_date,
item_name,
units_sold,
SUM(units_sold) OVER (PARTITION BY item_name ORDER BY sale_date) AS total_units_sold
FROM
sales_records
ORDER BY
item_name,
sale_date;
Explanation
- SELECT Clause: We choose the relevant columns
sale_date,item_name,units_sold, and generate a new columntotal_units_sold. - SUM() Function with OVER Clause: The
SUM(units_sold) OVER (PARTITION BY item_name ORDER BY sale_date)computes the cumulative sum of units sold for each product (item_name), ordered by the sale date (sale_date). - Result Ordering:
ORDER BY item_name, sale_dateensures the output is first ordered by the product name and subsequently by the sale date for each product.
By structuring the SQL query in this manner, you ensure that the running total for each product is correctly calculated and effortlessly retrieve the desired data layout.
Implementation Context
This query will be effective in scenarios such as generating daily sales reports, monitoring product performance over time, or analyzing sales trends. It can be adapted to various relational database management systems (RDBMS) like PostgreSQL, MySQL, SQL Server, and Oracle, making it versatile for different SQL environments.
Relevance to SEO
By explaining the SQL solution in a clear and detailed manner, this content becomes highly relevant and useful for individuals searching for information on SQL interview questions related to cumulative totals and window functions. It provides a practical, step-by-step approach that is likely to rank well in search engine results for queries like "SQL running total query," "SQL cumulative sum example," and "SQL window functions interview question."
Final Thoughts
Providing a detailed yet concise solution to SQL interview questions can help demonstrate proficiency in SQL and understanding of advanced functions. When crafting SQL solutions, always consider the readability and efficiency of your queries to ensure they perform well even with larger datasets.
16. Pivot Daily Sales
How to Transform Sales Data for Specific Dates with SQL
Objective: In this task, you are required to utilize your SQL skills to transform sales data from a table named Sales. The table includes three significant attributes: product_name, sale_date, and sales_amount. Your goal is to craft a SQL query that restructures this data to show each product's sales amounts for the dates '2023-01-01', '2023-01-02', and '2023-01-03'. If sales are unavailable for any product on a given date, replace the value with NULL. The resulting table should be alphabetically ordered by product_name.
Table Schema:
- Sales
product_name(VARCHAR)sale_date(DATE: 'YYYY-MM-DD')sales_amount(INT)
Steps to Achieve the Desired Output:
- Select the required columns: Extract
product_nameand the sales data corresponding to the dates '2023-01-01', '2023-01-02', and '2023-01-03'. - Conditional Aggregation: Use conditional aggregation to pivot the sales_date column and align the sales data for the specific dates.
- Handle NULLs for missing sales: Ensure that if a product has no sales on a particular date, the output value for that date should be
NULL. - Ordering: Ensure the final result set is ordered alphabetically by
product_name.
SQL Query
Here is a comprehensive SQL query to achieve the required output:
SELECT
product_name,
MAX(CASE WHEN sale_date = '2023-01-01' THEN sales_amount END) AS '2023-01-01',
MAX(CASE WHEN sale_date = '2023-01-02' THEN sales_amount END) AS '2023-01-02',
MAX(CASE WHEN sale_date = '2023-01-03' THEN sales_amount END) AS '2023-01-03'
FROM
Sales
WHERE
sale_date IN ('2023-01-01', '2023-01-02', '2023-01-03')
GROUP BY
product_name
ORDER BY
product_name;
Explanation of the Query
- SELECT Clause: Here, we select the
product_nameand use conditional aggregation to get the sales amount for each of the specified dates. - MAX with CASE WHEN: This construct is used to pivot the table. It checks the
sale_dateand returns thesales_amountfor each date. If there is no sale, it returnsNULL. - FROM Sales: Indicates the table from which data is being retrieved.
- WHERE Clause: Filters records to include only those with
sale_datein the specified date range. - GROUP BY: Groups the results by
product_nameto ensure each product appears only once in the output. - ORDER BY: Sorts the results alphabetically by
product_name.
This approach ensures that you obtain a well-structured and readable table displaying each product's sales for January 1st, 2nd, and 3rd of 2023, handling any missing sale data gracefully.
17. Find Customers Without Orders
SQL Interview Question: Listing Customers Who Have Never Placed an Order
Objective
Construct an SQL query to identify and list the names of all customers who have never placed an order. Ensure the resulting list is ordered alphabetically by the customers' names.
Additional Information
You are provided with two tables:
customers
customer_id(integer): Unique identifier for each customer.customer_name(string): Name of the customer.
orders
order_id(integer): Unique identifier for each order.customer_id(integer): Identifier linking the order to a customer.order_date(date): The date on which the order was placed.
Your query should return a table with a single column, customer_name, containing the names of customers who have no corresponding entries in the orders table. If every customer has placed at least one order, the query should return an empty result set.
Example SQL Query
SELECT customer_name
FROM customers
WHERE customer_id NOT IN (SELECT DISTINCT customer_id FROM orders)
ORDER BY customer_name;
This query effectively lists the names of all customers who have never placed an order by filtering out the customer_ids found in the orders table. The resulting list is then ordered alphabetically by the customer_name to ensure clarity and easy navigation.
18. Group and Filter with HAVING
Crafting a SQL Query to Identify Departments with High Average Salaries
Introduction:
When preparing for an SQL interview, a common question revolves around manipulating and extracting data from multiple related tables. A notable example involves using SQL commands to identify departments where the average employee salary surpasses a certain threshold. For this particular scenario, we will focus on departments with an average salary exceeding $70,000 and order the results from highest to lowest average salary.
Requirements:
To tackle this query, we have to join two tables, employees and departments, and extract the necessary details:
- Retrieve the average salary of employees in each department.
- Filter departments with an average salary greater than $70,000.
- Display the results in descending order of average salary.
Table Structures:
employees:id: Integer, unique identifier for each employee.name: String, employee's name.department_id: Integer, identifies the employee's department.salary: Integer, employee's salary.
departments:id: Integer, unique identifier for each department.name: String, department's name.location: String, department's location.
Step-by-step Guide:
Perform a Join:
- Utilize the
INNER JOINto combineemployeesanddepartmentstables based ondepartment_id.
- Utilize the
Calculate Average Salary:
- Use the
AVG()function to compute the average salary per department.
- Use the
Filter by Salary:
- Implement a
HAVINGclause to filter departments where the average salary exceeds $70,000.
- Implement a
Select Required Columns:
- Select
departments.nameand the computed average salary.
- Select
Order Results:
- Sort the output by average salary in descending order using
ORDER BY.
- Sort the output by average salary in descending order using
Final SQL Query:
SELECT
d.name AS department_name,
AVG(e.salary) AS avg_salary
FROM
employees e
JOIN
departments d
ON
e.department_id = d.id
GROUP BY
d.name
HAVING
AVG(e.salary) > 70000
ORDER BY
avg_salary DESC;
Conclusion:
Mastering such SQL queries not only helps in successfully navigating through SQL interviews but is also essential for generating insightful data reports. This knowledge empowers you to derive significant business insights by efficiently querying and manipulating relational databases, thus playing a crucial role in data-driven decision-making processes.
19. Find Overlapping Date Ranges
Objective
To write a SQL query that identifies all project assignments where an employee is allocated to multiple projects with overlapping timeframes. The result should include the project ID, employee ID, start date, and end date of each such assignment. Ensure that the output is ordered by project ID and employee ID.
Additional Information
You are provided with a table named
assignmentswith the following columns:project_id(INTEGER): The unique identifier for each project.employee_id(INTEGER): The unique identifier for each employee.start_date(DATE): The start date of the employee's assignment to the project.end_date(DATE): The end date of the employee's assignment to the project.
An employee has overlapping assignments if they are assigned to different projects where the date ranges intersect. Specifically, two assignments overlap if:
- The projects are different (
project_iddiffers). - The start date of one assignment is on or before the end date of the other assignment.
- The end date of one assignment is on or after the start date of the other assignment.
- The projects are different (
The query should return distinct records without duplicates.
Order the final result set first by
project_idin ascending order and then byemployee_idin ascending order.Assume that all dates are valid and
start_dateis always on or beforeend_datefor each assignment.
SQL Query
SELECT DISTINCT a1.project_id, a1.employee_id, a1.start_date, a1.end_date
FROM assignments a1
JOIN assignments a2
ON a1.employee_id = a2.employee_id
AND a1.project_id <> a2.project_id
AND a1.start_date <= a2.end_date
AND a1.end_date >= a2.start_date
ORDER BY a1.project_id, a1.employee_id;
This SQL query is designed to find all instances where an employee is involved in overlapping project assignments and returns the necessary fields in an ordered format.
20. Use COALESCE for Null Handling
How to Replace NULL with 0 in SQL and Retrieve All Orders from the orders Table
Writing clean and efficient SQL queries is an essential skill for database management and data analysis. If you're asked to retrieve all orders from an orders table, ensuring that any NULL values in the discount column are replaced with 0, you need to follow specific steps to structure your query correctly. Below is a comprehensive guide on achieving this task.
Steps to Write the SQL Query
- Identify the Columns: The
orderstable contains the columnsorder_id,customer_name,discount, andtotal_amount. - Handle NULL Values: Ensure that the
discountcolumn does not contain anyNULLvalues by using theCOALESCEfunction, which allows you to replaceNULLvalues with0. - Select All Required Columns: Ensure that the query retrieves all the columns in the specified order -
order_id,customer_name,discount, andtotal_amount. - Order the Results: Use the
ORDER BYclause to sort the results byorder_idin ascending order.
Sample SQL Query
SELECT
order_id,
customer_name,
COALESCE(discount, 0) AS discount,
total_amount
FROM
orders
ORDER BY
order_id ASC;
Breaking Down the Query
SELECT statement: This part retrieves the columns you need.
SELECT order_id, customer_name,COALESCE Function: Use
COALESCE(discount, 0)to replaceNULLvalues in thediscountcolumn with0.COALESCE(discount, 0) AS discount,FROM clause: Specifies the table from which to fetch the data.
FROM ordersORDER BY clause: Ensures the result set is ordered by
order_idin ascending sequence, making it easier to read and analyze.ORDER BY order_id ASC;
Best Practices
- Readability: Write clear and readable queries. Using aliases (like
COALESCE(discount, 0) AS discount) makes it easier to interpret results. - Performance: Ensure your database has indexes on columns commonly used in
ORDER BYclauses, likeorder_id, to optimize query performance.
Leveraging these structured steps will help you efficiently write the required SQL query to retrieve and process the orders data from the orders table, replacing NULL discounts with 0, and ordering by order_id. This method ensures clean data handling and a structured output, which is a critical aspect of database queries and analysis.
21. Self-Join for Hierarchical Data
Interview Question: Hierarchical Category List Generation in SQL
Objective
This interview question assesses your ability to generate a hierarchical list of categories from a given table using SQL. You are required to use recursive queries to determine the level of each category in the hierarchy. Here's the specific task:
Given a table named categories with the following columns:
idnameparent_id
Write an SQL query that generates a hierarchical list of all categories. For each category, you need to determine its level in the hierarchy, where:
- Root categories (those with
parent_idequal toNULL) have a level of 1. - Direct children of root categories have a level of 2.
- The pattern continues similarly for subsequent levels.
The result set should contain the following columns:
idnameparent_idlevel
The output should be ordered first by level in ascending order, and within each level, by id in ascending order.
Additional Information
- The
parent_idof a root category isNULL. - The hierarchy is acyclic, meaning there are no cycles.
- Recursive queries should be used to traverse the hierarchy effectively.
- Ensure that all categories are included in the results with the correct level assigned.
To solve this task, you would typically use a common table expression (CTE) with recursion to traverse the category hierarchy and determine the levels. This challenge tests both your SQL skills and understanding of hierarchical data structures.
By mastering this interview question, you will demonstrate your ability to handle complex data retrieval scenarios using SQL, showcasing how well you can design queries to navigate hierarchical information seamlessly. This skill is particularly valuable in scenarios involving organizational structures, nested comments, or any other system with parent-child relationships.
Example SQL Query
Below is an exemplary SQL query that addresses the interview question requirements:
WITH RECURSIVE CategoryHierarchy AS (
SELECT
id,
name,
parent_id,
1 AS level
FROM
categories
WHERE
parent_id IS NULL
UNION ALL
SELECT
c.id,
c.name,
c.parent_id,
ch.level + 1 AS level
FROM
categories c
INNER JOIN
CategoryHierarchy ch
ON
c.parent_id = ch.id
)
SELECT
id,
name,
parent_id,
level
FROM
CategoryHierarchy
ORDER BY
level ASC,
id ASC;
This query employs a recursive CTE to determine each category's level in the hierarchy, starting from the root categories and proceeding to their children, recursively. The final SELECT statement retrieves the data and orders it as specified.
22. Time Difference Between Order Phases
Objective
Determine the mean number of hours taken to ship orders. Calculate the time elapsed between each order's placement and its shipping, then find the average of these durations.
Additional Information
- The result should be rounded to two decimal places.
- The data is stored in a table named
order_statuswith the following columns:order_id(integer)order_time(timestamp)shipping_time(timestamp)
- All timestamps are in the same timezone.
- Use SQL to write your query.
SQL Query
To determine the mean number of hours taken to ship orders, you need to calculate the time difference between shipping_time and order_time for each order in hours, and then find the average of these durations. The following SQL query effectively fulfills the objective:
SELECT ROUND(
AVG(
EXTRACT(EPOCH FROM (shipping_time - order_time)) / 3600
), 2) AS avg_shipping_hours
FROM order_status;
Explanation
This query extracts the number of seconds between shipping_time and order_time, converts this duration into hours by dividing by 3600 (the number of seconds in an hour), and then calculates the average of these hourly durations. Rounding the final result to two decimal places provides a precise answer.
Using this approach ensures that your SQL query is optimized for determining the average shipping time with clear and concise logic, making it ideal for both interview scenarios and practical applications in managing and analyzing shipping data.
23. Find Most Recent Order per Customer
Interview Question: Retrieving Each Customer's Latest Order Details
Objective
Write an SQL query to retrieve each customer's latest order details. For every customer, fetch their name, the ID of their most recent order, the date of that order, and the total amount spent. The results should be sorted alphabetically by the customer's name.
Additional Information
- The database consists of two tables:
- Customers
customer_id: Unique identifier for each customer.customer_name: Name of the customer.
- Orders
order_id: Unique identifier for each order.customer_id: Identifier linking the order to a customer.order_date: Date when the order was placed.total_amount: Total amount for the order.
- Customers
- Assume that each customer has placed at least one order.
- If a customer has multiple orders on the same latest date, return any one of those orders.
- The output should include the following columns:
customer_name,order_id,order_date,total_amount, ordered bycustomer_namein ascending order.
24. Monthly Hiring Trend by Department
Mastering the Interview Question:
Counting Employee Hires per Month by Department in SQL
Objective
To tackle this SQL interview question, we're given two tables: employees and departments. Our goal is to determine the number of employees hired each month for every department.
Expected Output
Our ideal output should display three columns:
department: The name of the department.month: The hiring month formatted asYYYY-MM.hires: The total number of employees hired in that month for the department.
The result must be ordered by department name and hiring month in ascending order.
Data Structure
Employees Table
id: Integer, primary key.name: String, name of the employee.department_id: Integer, foreign key referencingdepartments(id).hire_date: Date, the date the employee was hired.salary: Integer, the salary of the employee.
Departments Table
id: Integer, primary key.name: String, name of the department.
Constraints
- The
hire_dateshould be considered only up to the month and year. - If no employees are hired in a particular month for a department, that month should not appear in the result.
Example SQL Query
SELECT
d.name AS department,
DATE_FORMAT(e.hire_date, '%Y-%m') AS month,
COUNT(e.id) AS hires
FROM
employees e
JOIN
departments d ON e.department_id = d.id
GROUP BY
d.name, DATE_FORMAT(e.hire_date, '%Y-%m')
ORDER BY
d.name, month ASC;
Explanation
This query joins the employees and departments tables based on the department ID, groups the records by department name and the hiring month, and counts the number of employee hires per group. It orders the results first by the department name and then by the hiring month in ascending order.
This method ensures comprehensive and organized reporting on employee hiring trends across multiple departments, making it an impeccable solution to the given question.
25. Aggregate Data with CUBE
Mastering the 'sales_data' Table for Total Sales Calculation in SQL: Detailed Guide
Objective
Crafting an SQL query to compute total sales for region and product category combinations is a common requirement in database management and business analysis. By leveraging the sales_data table, which encompasses fields such as region, product_category, and sales_amount, your task is to generate an insightful summary of sales data. This includes providing subtotals by regions, product categories, and a comprehensive grand total.
Steps for the SQL Query:
Select the Necessary Fields:
- Begin by selecting the
regionandproduct_categoryfields. - Apply the SQL aggregation function
SUMto compute thetotal_sales.
- Begin by selecting the
Handle NULL Values:
- Utilize the
COALESCEfunction to replaceNULLvalues in theregioncolumn with the string "All Regions". - Similarly, replace
NULLvalues in theproduct_categorycolumn with "All Categories".
- Utilize the
Group and Aggregate Data:
- Implement the
GROUP BYclause to group your data byregionandproduct_category. - Use the
WITH ROLLUPgrouping technique to facilitate the generation of subtotals and a grand total.
- Implement the
Format the Result Set:
- Ensure the final output includes columns for
region,product_category, andtotal_sales. - Include specific rows such as:
- Grand total displaying "All Regions" and "All Categories".
- Subtotals for each region using "All Categories".
- Subtotals for each product category using "All Regions".
- Ensure the final output includes columns for
Order the Results:
- Order the results to display first by
regionand then byproduct_category.
- Order the results to display first by
The SQL Query:
SELECT
COALESCE(region, 'All Regions') AS region,
COALESCE(product_category, 'All Categories') AS product_category,
SUM(sales_amount) AS total_sales
FROM
sales_data
GROUP BY
region, product_category
WITH ROLLUP
ORDER BY
region IS NULL, region,
product_category IS NULL, product_category;
By executing this SQL query, you can obtain a comprehensive summary of total sales organized by region and product category. This query helps in clear visualization of overall sales performance, while maintaining structured and ordered results essential for business analysis and reporting.
This approach ensures you meet all the requirements specified for the sales data analysis, providing valuable insights for strategic decision-making within your organization.
26. Calculate Percentage Contribution
Guide to Answer the SQL Interview Question: Calculate Product Sales as a Percentage of Total Sales
Creating an SQL query to determine each product's sales as a percentage of the total sales involves several key steps. The query should return the product name, its sales amount, and the corresponding percentage, all ordered from the highest to the lowest percentage. Follow these instructions to craft an effective SQL query that meets all the specified requirements.
Objective:
Calculate each product's sales as a percentage of total sales using SQL.Table Structure:
- The
productstable includes:product_id(INTEGER): Unique identifier for each product.product_name(VARCHAR): Name of the product.sales(INTEGER): Total sales for the product.
- The
Query Requirements:
- Compute the total sales for all products.
- For each product, find its sales as a percentage of the total sales.
- Return the
product_name,salesamount, and the calculatedpercentage. - Round the percentage to two decimal places.
- Order the results by the
percentagein descending order.
SQL Query:
SELECT
product_name,
sales,
ROUND(CAST(sales AS DECIMAL) * 100 / (SELECT SUM(sales) FROM products), 2) AS percentage
FROM
products
ORDER BY
percentage DESC;
- Explanation:
- CAST Function: Converts the sales to a decimal for precise division.
- SUM Function: Calculates the total sales.
- ROUND Function: Rounds the percentage to two decimal places.
- ORDER BY Clause: Sorts the results by the percentage in descending order.
By following this guide, you can write an SQL query that accurately calculates and presents each product's sales percentage relative to the total sales, rounded to two decimal places, and orders the output by the highest percentage first. This method ensures that your results are precise, clear, and well-organized, aligning perfectly with the interview question requirements.
27. Over-Budget Project Detection
Objective
In a SQL-based database scenario, the interviewer asks candidates to craft an SQL query that pinpoints projects which have exceeded their allocated budgets. The given data is stored in two specific tables: Projects and Expenditures. The query's purpose is to identify projects where spending has surpassed the budget and display relevant details. The output should include the project name, assigned budget, total expenses, and the amount overspent. Moreover, the output results need to be sorted by the excess amount spent, arranged in descending order.
Additional Information
Tables:
Projects:id(INTEGER): A unique identifier for each project.project_name(VARCHAR): The name of the project.budget(INTEGER): The total budget allocated for the project.
Expenditures:id(INTEGER): A unique identifier for each expenditure record.project_id(INTEGER): The identifier that relates each expenditure to a specific project.amount(INTEGER): The amount spent in each expenditure.description(VARCHAR): A short description of the expenditure.
Constraints:
- Include only projects whose total expenditures exceed their budgets.
- Compute
total_spentas the sum of allamountvalues across each project. - Determine
over_budgetby subtracting thebudgetfromtotal_spentfor each project. - Display the resulting data ordered by the
over_budgetvalue in descending order.
Output Columns:
project_namebudgettotal_spentover_budget
28. Eligible Bonus Calculation with CASE
How to Retrieve and Calculate Employee Bonuses with SQL
To tackle the problem of constructing a SQL query that calculates and categorizes employee bonuses, follow these steps based on the given criteria and tables:
Objective:
Construct a SQL query to retrieve each employee's name, their corresponding department name, salary, bonus category, and the calculated bonus amount based on the following criteria:
- High Priority - 15%: Employees in the "Sales" department with a salary of 80,000 or more.
- Medium Priority - 10%: Employees with a salary of 70,000 or more, regardless of department.
- Standard - 5%: All other employees.
Calculate the bonus amount by applying the specified percentage to the employee's salary and round the result to the nearest integer. The final output should be ordered by the bonus amount in descending order.
Steps to Create the SQL Query
Join the Employees and Departments Tables: Use an INNER JOIN to combine the
employeesanddepartmentstables on thedepartment_id.Calculate Bonus Amount and Determine Bonus Category:
- Use a CASE statement to determine the bonus percentage and category based on the criteria provided.
- Calculate the bonus amount by applying the percentage to the employee's salary.
Round the Bonus Amount: Use the
ROUNDfunction to round the calculated bonus amount to the nearest integer.Select the Required Columns: Ensure the query selects the employee's name, department name, salary, bonus category, and calculated bonus amount.
Sort the Results: Sort the final output by the bonus amount in descending order.
SQL Query Example:
SELECT
e.name,
d.name AS department,
e.salary,
CASE
WHEN d.name = 'Sales' AND e.salary >= 80000 THEN 'High Priority'
WHEN e.salary >= 70000 THEN 'Medium Priority'
ELSE 'Standard'
END AS bonus_category,
ROUND(
CASE
WHEN d.name = 'Sales' AND e.salary >= 80000 THEN e.salary * 0.15
WHEN e.salary >= 70000 THEN e.salary * 0.10
ELSE e.salary * 0.05
END
) AS bonus_amount
FROM
employees e
INNER JOIN
departments d ON e.department_id = d.id
ORDER BY
bonus_amount DESC;
This query efficiently calculates and categorizes the bonuses for employees based on the specified criteria, ensuring that the output is both accurate and well-formatted. It uses SQL functions and conditional logic to achieve the desired result, making the process streamlined and efficient.
29. Set Operation: INTERSECT
Objective
Retrieve the list of customers who meet both of the following criteria:
- They are new customers with a monthly spending greater than 1000.
- They are loyal customers with at least 3 years of membership and have a 'Premium' tier status.
Provide the customer_id and name of these customers, sorted in ascending order by customer_id.
Additional information
Tables Description:
NewCustomers:
customer_id(INTEGER): Unique identifier for each customer.name(VARCHAR): Name of the customer.monthly_spend(INTEGER): Amount spent by the customer monthly.join_date(DATE): Date when the customer joined.
LoyalCustomers:
customer_id(INTEGER): Unique identifier for each customer.name(VARCHAR): Name of the customer.membership_years(INTEGER): Number of years the customer has been a member.tier(VARCHAR): Membership tier of the customer (e.g., 'Premium').
Constraints:
- Each customer appears only once in each table.
membership_yearsandmonthly_spendare positive integers.- The
tierfield contains single-word strings without special characters.
Output Requirements:
- The result should include only the
customer_idandnamecolumns. - The final output must be ordered by
customer_idin ascending order.
30. Extract Year and Month
Objective
Develop a SQL query to extract the year and month from each order_date in the orders table. Your query should return the order_id, total_amount, along with the derived year and month for each order. Ensure that the results are sorted in ascending order based on the order_date.
Additional Information
- The
orderstable includes the following columns:order_id(integer): Unique identifier for each order.order_date(date): The date when the order was placed.total_amount(decimal): The total amount for the order.
SQL Query
Here is a detailed, SEO-friendly SQL query addressing the above requirements.
SELECT
order_id,
total_amount,
EXTRACT(YEAR FROM order_date) AS order_year,
EXTRACT(MONTH FROM order_date) AS order_month
FROM
orders
ORDER BY
order_date ASC;
Explanation
This SQL query leverages the EXTRACT() function to obtain the year and month from each order_date. We select the order_id, total_amount, and the derived year (as order_year) and month (as order_month). The results are then sorted in ascending order based on the order_date, ensuring that the earliest orders are listed first.
Conclusion
By following the SQL query provided above, you can accurately and efficiently retrieve the necessary details from the orders table. This approach ensures a clear and organized dataset that is ideal for further analysis or reporting.
This SEO-friendly explanation is designed to cater to search engines while providing a comprehensive and informative guide to developing the required SQL query.
Ready to Practice More?
Explore interview questions from specific companies or try our hands-on labs to build practical experience.