Data Analysis Interview Questions (30+ Questions)

Last Updated: June 8, 2026 • 30 QuestionsReal Company Interviews

Are you getting ready for a data analysis interview? Data analysis involves examining, cleaning, transforming, and modeling data to uncover useful insights, draw conclusions, and support decision-making. It requires expertise in various tools and techniques to analyze and interpret complex data sets. Whether you're a beginner or a seasoned professional, mastering data analysis is crucial in today's data-centric job market.

30
Interview Questions
26
Companies
2
Difficulty Levels

Table of Contents

In this article, we cover Data Analysis Interview Questions with answers often asked in data analyst interviews at top-paying companies. Whether you're preparing for your first data-related position or seeking to advance your career, this guide will help you tackle the most commonly asked data analysis interview questions, giving you an edge in competitive job interviews.

💡 Pro Tips for Data Analysis Interviews

  • Practice each question and understand the underlying concepts
  • Review company-specific technologies and methodologies
  • Prepare follow-up questions and edge cases
  • Practice explaining your solutions clearly and concisely

Interview Questions & Answers

1. Managing High I/O Processes

Company: Revolut Difficulty: easy

Learn how to identify and manage high I/O consuming processes on a Linux server using command-line tools. This guide covers sorting running processes by disk activity and mitigating disk bottlenecks through throttling, stopping, or rescheduling heavy jobs, improving overall system responsiveness and application performance.

2. Average Order Value

Company: Accenture Difficulty: easy

Objective

To answer the SQL interview question effectively, you need to craft a query that calculates and displays the average expenditure per customer from a table named Orders. This table records customer purchases. The results should display each customer_id alongside their avg_order_value, rounded to two decimal places. Finally, ensure the output is sorted in ascending order based on customer_id.

Additional Information

Table Schema:

  • Orders:
    • order_id (INTEGER): Unique identifier for each order.
    • customer_id (INTEGER): Identifier for the customer who made the order.
    • total_amount (DECIMAL): Total amount spent on the order.
    • order_date (DATE): The date when the order was placed.

Constraints:

  • Each customer_id may have multiple associated orders.
  • There is at least one order present in the table.

Output Requirements:

  • Columns: customer_id, avg_order_value
  • avg_order_value must be rounded to two decimal places.
  • Results must be ordered by customer_id in ascending order.

Detailed Query Explanation

The query you need to write should meet the specific output requirements and constraints mentioned. Here's the detailed SQL query designed to solve the problem:

SELECT 
    customer_id, 
    ROUND(AVG(total_amount), 2) AS avg_order_value 
FROM 
    Orders 
GROUP BY 
    customer_id 
ORDER BY 
    customer_id;

Keyword Optimization

When formulating the answer:

  • "SQL query to determine the average expenditure per customer"
  • "display customer_id alongside their avg_order_value"
  • "rounded to two decimal places"
  • "sorted in ascending order based on customer_id"
  • "table schema includes order_id, customer_id, total_amount, and order_date"

These phrases and keywords are naturally embedded in the explanation and query to ensure the content is both informative and optimized for SEO.

3. Find Cheapest Product

Company: Apple Difficulty: easy

Objective

Retrieve the name and price of the product(s) with the lowest price from the products table.

Additional Information

  • The products table contains the following columns:
    • id (integer): Unique identifier for each product.
    • name (string): Name of the product.
    • price (decimal): Price of the product.
  • There may be multiple products sharing the same lowest price.
  • Write an SQL query that returns the name and price of the cheapest product(s).

In order to tackle this SQL interview question effectively, you need to identify the product or products with the lowest price from the products table. Here's the step-by-step process to achieve this:

  1. First, determine the lowest price among all the products.
  2. Then, retrieve the name and price of the product(s) with this minimum price.

Now, let's write the precise SQL query to accomplish this objective:

SELECT 
    name,
    price
FROM 
    products
WHERE 
    price = (SELECT MIN(price) FROM products);

This SQL query efficiently retrieves the name and price of the product(s) with the lowest price from the products table by leveraging a subquery to find the minimum price. This approach ensures that even if there are multiple products with the same lowest price, all of them will be included in the result set.

4. New vs. Returning Customer Report

Company: RedHat Difficulty: easy

Generating a Detailed Customer Order Report in SQL

The given objective of this interview question is to write an SQL query to generate a comprehensive report that lists each order along with specific associated details such as order date, customer ID, customer name, and customer type, which indicates whether the order is the customer's first or a subsequent order. This can be achieved by linking information from two database tables, orders and customers.

Here’s how you can achieve this:

SQL Query to Generate the Report

WITH first_orders AS (
   SELECT
      customer_id,
      MIN(order_date) AS first_order_date
   FROM
      orders
   GROUP BY
      customer_id
)
SELECT
   o.order_date,
   o.customer_id,
   c.customer_name,
   CASE
      WHEN o.order_date = fo.first_order_date THEN 'New'
      ELSE 'Returning'
   END AS customer_type
FROM
   orders o
JOIN
   customers c ON o.customer_id = c.id
LEFT JOIN
   first_orders fo ON o.customer_id = fo.customer_id
ORDER BY
   o.order_date,
   o.customer_id;

Explanation of the SQL Query Structure

  1. The query begins with a WITH clause, also known as a Common Table Expression (CTE), named first_orders. This CTE identifies the earliest order date for each customer by using the SQL aggregation functions:

    • Columns Selected:
      • customer_id
      • first_order_date (calculated as the minimum order_date for each customer_id)
    • Grouped By: customer_id
  2. The main SELECT statement retrieves the following details:

    • order_date : The date when the order was made (orders.order_date).
    • customer_id : Customer identifier (orders.customer_id).
    • customer_name : Name of the customer (customers.customer_name).
    • customer_type : A CASE statement determining if the order is the customer's first or subsequent order. If orders.order_date matches first_orders.first_order_date, it is marked as ‘New’; otherwise, it is marked as ‘Returning’.
  3. Join Operations:

    • The orders table is joined with the customers table on customer_id to get the customer name.
    • The CTE first_orders is LEFT JOINED with the orders table to determine if the order is the first order or not.
  4. Ordering:

    • The results are ordered first by order_date and then by customer_id for better readability and to match expected output structure.

Additional Notes

  • The use of SQL functions such as MIN() and GROUP BY ensures that the query efficiently handles large datasets by summarizing order data per customer.
  • The LEFT JOIN is crucial to ensure that even if there are no orders yet, customers who have placed future orders can be correctly identified.

This SQL query is designed to provide all necessary details in an ordered and efficient manner to analyze customer purchasing patterns effectively.

5. Above Average Price Products

Company: Hulu Difficulty: medium

SQL Interview Question: Fetching Available Products Above Average Price

Objective

Develop an SQL query to fetch all products currently available in stock and priced above the average price of all available products. The dataset should be organized primarily based on price in descending order and secondarily by rating in descending order. Products lacking a rating should be placed at the end of the sorted list.

Additional Information

  • Include only products with stock_quantity greater than 0.
  • The average price must be computed solely based on the products that are in stock.
  • When sorting, products with a NULL rating should appear after those with non-null ratings.
  • The query should return all columns from the Products table.
  • Account for the possibility that some products might not have a rating value.

This detailed explanation contributes valuable insights into constructing a sophisticated SQL query that efficiently retrieves and sorts products based on inventory status and price, offering critical functionality for database management. Understanding and mastering such queries are essential for optimizing data handling and retrieval processes in various business applications.

Sample Query

SELECT * 
FROM Products 
WHERE stock_quantity > 0 
AND price > (SELECT AVG(price) FROM Products WHERE stock_quantity > 0) 
ORDER BY price DESC, rating DESC 
NULLS LAST;

By employing the above query, you can ensure that you are extracting relevant product data effectively, adhering to the given criteria, and enhancing your SQL skill set for advanced data manipulation and querying tasks.

6. Frequent Price Change Detector

Company: Atlassian Difficulty: medium

Identifying Products with Multiple Price Changes Using SQL

Objective

In this interview question, you are provided with two tables: products and price_history. Your task is to write an SQL query that identifies all products that have experienced at least two price changes. A price change is defined as a difference in price from one entry to the next for the same product, based on the chronological order of dates.

Requirements

  1. Tables:

    • products:
      • id (integer): The unique identifier for each product.
      • name (string): The name of the product.
      • category (string): The category to which the product belongs.
    • price_history:
      • product_id (integer): The identifier linking to the products table.
      • price (decimal): The price of the product at a specific date.
      • date (date): The date when the price was recorded.
  2. Constraints:

    • Each product_id in price_history corresponds to an id in products.
    • Prices are recorded in chronological order for each product.
  3. Output:

    • The product’s name.
    • The category of the product.
    • The total number of price changes, returned as a string.
  4. Conditions:

    • Only consider consecutive price entries for determining a price change.
    • A product must have at least two price changes to be included in the result.
  5. Order:

    • Results should be ordered by the number of price changes in descending order.
    • In case of ties, results should be ordered by the product ID in ascending order.

This query helps in understanding the dynamics of product pricing over time and can be a valuable insight for inventory management, pricing strategies, and market analysis. Performing well on such questions requires a meticulous approach to grouping and ordering, which is fundamental in SQL querying.

7. Self-Join for Manager Names

Company: Backblaze Difficulty: easy

SQL Query to Retrieve Employee and Manager Names Ordered by Employee ID

In a typical SQL job interview, you might encounter questions designed to evaluate your ability to manipulate and retrieve data using SQL queries. One common question involves retrieving each employee's name alongside their respective manager's name from a given employees table.

To solve this, you'll need to create a SQL query that:

  1. Retrieves all employees.
  2. Includes each employee's manager name (if one exists).
  3. Ensures that employees without managers are still listed with a NULL value for the manager's name.
  4. Orders the results by the employee's unique identifier in ascending order.

Sample SQL Query

Here’s a detailed SQL query that meets the requirements:

SELECT 
    e.id AS employee_id,
    e.name AS employee_name,
    m.name AS manager_name
FROM 
    employees e
LEFT JOIN 
    employees m ON e.manager_id = m.id
ORDER BY 
    e.id ASC;

Explanation of the Query

  • SELECT: Specifies the columns you want to retrieve. Here we select the id and name from the employees table (aliased as e) as well as the name from the manager.

  • e.id AS employee_id: Retrieves the employee's ID and labels it as employee_id.

  • e.name AS employee_name: Retrieves the employee's name and labels it as employee_name.

  • m.name AS manager_name: Retrieves the manager's name (if it exists) and labels it as manager_name.

  • FROM employees e: Specifies the primary table we are querying from (employees), aliasing it as e.

  • LEFT JOIN employees m ON e.manager_id = m.id: Performs a left join on the employees table itself to get the manager's details. The join condition matches e.manager_id with m.id, ensuring managers are matched to their subordinates.

  • ORDER BY e.id ASC: Orders the result by the id of the employees in ascending order.

This query will yield a list of all employees with their respective manager names, if available. Employees without a manager will have NULL in the manager_name column, and the results will be neatly ordered by employee ID.

Conclusion

Answering SQL interview questions effectively often involves not just solving the problem but doing so in a clear, efficient, and accurate manner. This SQL query example demonstrates your ability to handle self-joins, NULL values, and sorting, making it a comprehensive answer to the posed question.

By fully understanding and implementing the core requirements—retrieving employees and their managers, handling employees without managers, and ordering the results by employee ID—you’ll showcase your SQL proficiency and logical problem-solving skills, critical assets for any data-centric role.

8. Join Employees and Departments

Company: Adobe Difficulty: easy

Understanding the Question

Objective

The goal is to write an SQL query to retrieve and display a list of employees who earn more than $50,000. For each employee that meets the specified salary condition, the output should include several details: employee ID, first name, last name, job title, hire date, and the name of their department. If an employee does not belong to any department, the department name should be presented as NULL. Furthermore, the employees should be listed in descending order based on their hire date, showcasing the most recently hired employees first.

Additional Information

Tables:

  • employees table:

    • employee_id (Integer): Unique identifier for each employee.
    • first_name (String): Employee's first name.
    • last_name (String): Employee's last name.
    • department_id (Integer): Identifier for the department to which the employee belongs (can be NULL).
    • job_title (String): The title of the employee's job.
    • salary (Integer): The employee's salary.
    • hire_date (Date): The date the employee was hired.
  • departments table:

    • department_id (Integer): Unique identifier for each department.
    • department_name (String): The name of the department.
    • location (String): The location of the department.

Requirements

  • Use a LEFT JOIN to merge the employees and departments tables on the department_id.
  • Filter to only include employees with a salary greater than 50000.
  • Display department_name as NULL for employees who are not assigned to any department.
  • Order the results by hire_date in descending order.

Example

Input:

  • employees table:

    employee_id first_name last_name department_id job_title salary hire_date
    1 John Smith 1 Senior Developer 85000 2020-01-15
    2 Mary Johnson 2 Project Manager 75000 2021-03-20
    3 Peter Brown NULL Consultant 65000 2022-06-10
    4 Sarah Davis 1 Developer 45000 2021-09-01
  • departments table:

    department_id department_name location
    1 Engineering New York
    2 Project Management Boston
    3 Marketing Chicago

Expected Output:

employee_id first_name last_name department_name job_title hire_date
3 Peter Brown NULL Consultant 2022-06-10
2 Mary Johnson Project Management Project Manager 2021-03-20
1 John Smith Engineering Senior Developer 2020-01-15

SQL Query:

SELECT
    e.employee_id,
    e.first_name,
    e.last_name,
    d.department_name,
    e.job_title,
    e.hire_date
FROM
    employees e
LEFT JOIN
    departments d ON e.department_id = d.department_id
WHERE
    e.salary > 50000
ORDER BY
    e.hire_date DESC;

9. Find Highest Salary

Company: JaneStreet Difficulty: easy

Objective

Write an SQL query to retrieve the names and salaries of the highest-paid employee(s) from the employees table.

Additional Information

  • The employees table consists of the following columns:
    • id (integer): The unique identifier for each employee.
    • name (string): The name of the employee.
    • salary (integer): The salary of the employee.
  • If multiple employees share the highest salary, include all of them in the result.
  • The output should contain two columns: name and salary.
  • The results can be returned in any order.

This SQL query requires identifying the highest salary first and then selecting the names and salaries of all employees who earn this highest salary.

To start with, the query can use a subquery to determine the maximum salary from the employees table. Subsequently, a main query can be employed to fetch the names and salaries of employees whose salaries match the maximum salary obtained from the subquery.

Here is the complete SQL query to achieve this objective:

SQL Query

SELECT name, salary
FROM employees
WHERE salary = (SELECT MAX(salary) FROM employees);

This query effectively retrieves the names and salaries of all employees who have the highest salary in the employees table, satisfying the given requirements.

10. Aggregate Employee Salaries

Company: RedHat Difficulty: medium

Objective

Given two tables, employees and departments, construct a SQL query to display each department's name, the total number of employees in that department, the aggregate salary of all its employees, and the average salary rounded to two decimal places. The output should be sorted alphabetically by the department name.

Additional Information

Tables:

employees:

  • id (INTEGER): Unique identifier for each employee.
  • name (VARCHAR): Name of the employee.
  • department_id (INTEGER): Identifier linking to the employee's department.
  • salary (INTEGER): Salary of the employee.

departments:

  • id (INTEGER): Unique identifier for each department.
  • name (VARCHAR): Name of the department.
  • location (VARCHAR): Location of the department.

Requirements:

  • Use appropriate JOIN operations to link employees with departments.
  • Calculate the total number of employees (employee_count) in each department.
  • Compute the sum of salaries (total_salary) for each department.
  • Determine the average salary (average_salary) for each department, rounded to two decimal places.
  • Alias the department name as department.
  • Order the final results by the department name in ascending order.

Output Columns:

  • department (VARCHAR): Name of the department.
  • employee_count (INTEGER): Number of employees in the department.
  • total_salary (INTEGER): Combined salaries of all employees in the department.
  • average_salary (DECIMAL(10,2)): Average salary within the department, rounded to two decimal places.

11. Filter Orders by Date Range

Company: Google Difficulty: easy

How to Construct an SQL Query to Fetch Order Details within a Specific Date Range

If you're preparing for an interview, crafting a precise SQL query to fetch order details is a skill you need to master. The task involves fetching order_id, customer_name, order_date, and total_amount from the orders table, only for orders placed between January 1, 2023, and June 30, 2023. The results must be sorted by order_date in ascending order. Here’s a step-by-step guide on how to nail this:

SQL Query Example

To achieve this, you need to follow these pointers:

  • Select the desired columns: order_id, customer_name, order_date, and total_amount.
  • Filter the results to only include orders where the order_date falls inclusively between '2023-01-01' and '2023-06-30'.
  • Order the results by order_date in ascending order to get the earliest orders at the top.

Here is the SQL query you need:

SELECT order_id, customer_name, order_date, total_amount
FROM orders
WHERE order_date BETWEEN '2023-01-01' AND '2023-06-30'
ORDER BY order_date ASC;

Explanation

  1. FROM orders: Indicates the data source, which is the orders table.
  2. SELECT order_id, customer_name, order_date, total_amount: Specifies the columns you want to retrieve.
  3. WHERE order_date BETWEEN '2023-01-01' AND '2023-06-30': Filters the results to include only those orders placed within the defined date range.
  4. ORDER BY order_date ASC: Ensures that the output is sorted by order_date in ascending order.

This query effectively pulls the required data while maintaining the specified conditions and ordering.

Conclusion

Being able to craft an SQL query that accurately fetches order details based on specific criteria is crucial for data analysis and reporting. This particular query showcases your ability to filter data by date and sort it effectively, ensuring that you can produce meaningful and organized insights from your data tables. Practice writing and running similar queries to enhance your SQL skills and be interview-ready.


12. Count Distinct Product Categories

Company: AMD Difficulty: medium

Analyzing Primary Categories and Subcategories in a Products Table: A Detailed Guide

Objective

In this analysis, we aim to examine the products table to identify and count all primary categories that do not have a parent category. Additionally, we will calculate the total number of unique subcategories directly associated with these primary categories.

Key Points

  • Primary Category: Defined as a category with no parent_category.
  • Subcategory: A distinct category whose parent_category matches a primary category.
  • Output: The analysis should yield two specific metrics:
    • total_root_categories: The count of distinct primary categories.
    • total_subcategories: The count of unique subcategories linked to the primary categories.

Table Description

The products table consist of these essential columns:

  • id (integer): Unique identifier for each product.
  • name (string): Name of the product.
  • category (string): The category to which the product belongs.
  • parent_category (string or null): The parent category of the product's category, if applicable.

Analysis Goals

  • Identify primary categories with parent_category as null.
  • Determine the distinct subcategories that have their parent_category set to a primary category.

This analysis provides valuable insights into the category hierarchy within the products table and supports better categorization strategies.

By performing this analysis, businesses can understand and organize their product categories more effectively, ensuring a well-structured product categorization system that enhances searchability and user experience.

Conclusion

The aim is to extract the total_root_categories and total_subcategories to comprehend the structure and hierarchy of the product categories. This process is pivotal in optimizing product categorization for better inventory management and improved customer navigation.

This detailed guide serves as a basis for extracting meaningful insights from the products table, ensuring accurate and efficient data analysis.

13. Join Three Tables

Company: AMD Difficulty: easy

Objective

Construct a SQL query to retrieve a comprehensive list of customer orders. For each order, display the customer's name, the date the order was placed, the name of the product ordered, the quantity of the product, and the unit price. Ensure that the results are organized first by the customer's name in ascending order and then by the order date in ascending order.

Additional Information

  • Tables and Schemas:
    • Customers

      • customer_id (INTEGER): Unique identifier for each customer.
      • customer_name (VARCHAR): Name of the customer.
      • email (VARCHAR): Email address of the customer.
    • Orders

      • order_id (INTEGER): Unique identifier for each order.
      • customer_id (INTEGER): Identifier linking the order to a customer.
      • order_date (DATE): Date when the order was placed.
    • OrderDetails

      • order_id (INTEGER): Identifier linking the detail to an order.
      • product_name (VARCHAR): Name of the product ordered.
      • quantity (INTEGER): Quantity of the product ordered.
      • unit_price (DECIMAL): Price per unit of the product.

Constraints

  • Each customer can have multiple orders.
  • Each order can include multiple products.
  • Ensure that all relevant records from the three tables are accurately joined based on their relationships.

Output Requirements

  • Columns to include in the result:
    • customer_name
    • order_date
    • product_name
    • quantity
    • unit_price
  • The result should be sorted by customer_name (ascending) and then by order_date (ascending).

Example SQL Query

SELECT 
    c.customer_name,
    o.order_date,
    od.product_name,
    od.quantity,
    od.unit_price
FROM 
    Customers c
JOIN 
    Orders o ON c.customer_id = o.customer_id
JOIN 
    OrderDetails od ON o.order_id = od.order_id
ORDER BY 
    c.customer_name ASC,
    o.order_date ASC;

14. Use CASE in SELECT

Company: Yahoo Difficulty: medium

Sure, here you go:

SQL Interview Question: Classifying Products by Price

Objective

The goal of this SQL interview question is to craft a query that retrieves the name and price of each product from the products table. Additionally, the query should classify each product into a specific price category:

  • 'Low': if the product price is below 50,
  • 'Medium': if the product price is between 50 and 100 inclusive,
  • 'High': if the product price is above 100.

The results should include the product name, price, and its corresponding price category, and they should be ordered by price in ascending order.

Additional Information

  • The products table contains the following columns:

    • id: integer, unique identifier for each product.
    • name: string, the name of the product.
    • price: decimal, the price of the product.
  • Price categories must be labeled exactly as 'Low', 'Medium', and 'High'.

  • Ensure the output is ordered by the price column from lowest to highest.

With these details in mind, you can craft the perfect SQL query to solve this task. Executing this query proficiently can demonstrate your ability to handle basic SQL operations, perform conditional logic with CASE statements, and order results efficiently.


We hope this example of an SQL interview question helps you ace your next interview. Good luck!

15. Calculate Cumulative Sales

Company: Uber Difficulty: medium

Example of How to Answer the Interview Question

The given scenario involves working with sales data from a sales_records table, comprising details such as sale dates, product names, and units sold. The task is to craft an SQL query that will yield each record alongside a new column named total_units_sold, which indicates the cumulative number of units sold for each product up to and including the current sale date. The final results should be ordered by the product name and the sale date. Below is a detailed, SEO-friendly explanation of how to handle this interview question:

SQL Solution

To tackle this task, you can utilize the SQL Window Functions, especially the SUM() function combined with the PARTITION BY clause to calculate running totals. The query will leverage window functions to maintain the total units sold for each product chronologically up to the given date.

Here's the SQL query for this solution:

SELECT
    sale_date,
    item_name,
    units_sold,
    SUM(units_sold) OVER (PARTITION BY item_name ORDER BY sale_date) AS total_units_sold
FROM
    sales_records
ORDER BY
    item_name,
    sale_date;

Explanation

  • SELECT Clause: We choose the relevant columns sale_date, item_name, units_sold, and generate a new column total_units_sold.
  • SUM() Function with OVER Clause: The SUM(units_sold) OVER (PARTITION BY item_name ORDER BY sale_date) computes the cumulative sum of units sold for each product (item_name), ordered by the sale date (sale_date).
  • Result Ordering: ORDER BY item_name, sale_date ensures the output is first ordered by the product name and subsequently by the sale date for each product.

By structuring the SQL query in this manner, you ensure that the running total for each product is correctly calculated and effortlessly retrieve the desired data layout.

Implementation Context

This query will be effective in scenarios such as generating daily sales reports, monitoring product performance over time, or analyzing sales trends. It can be adapted to various relational database management systems (RDBMS) like PostgreSQL, MySQL, SQL Server, and Oracle, making it versatile for different SQL environments.

Relevance to SEO

By explaining the SQL solution in a clear and detailed manner, this content becomes highly relevant and useful for individuals searching for information on SQL interview questions related to cumulative totals and window functions. It provides a practical, step-by-step approach that is likely to rank well in search engine results for queries like "SQL running total query," "SQL cumulative sum example," and "SQL window functions interview question."

Final Thoughts

Providing a detailed yet concise solution to SQL interview questions can help demonstrate proficiency in SQL and understanding of advanced functions. When crafting SQL solutions, always consider the readability and efficiency of your queries to ensure they perform well even with larger datasets.

16. Pivot Daily Sales

Company: Google Difficulty: easy

How to Transform Sales Data for Specific Dates with SQL

Objective: In this task, you are required to utilize your SQL skills to transform sales data from a table named Sales. The table includes three significant attributes: product_name, sale_date, and sales_amount. Your goal is to craft a SQL query that restructures this data to show each product's sales amounts for the dates '2023-01-01', '2023-01-02', and '2023-01-03'. If sales are unavailable for any product on a given date, replace the value with NULL. The resulting table should be alphabetically ordered by product_name.

Table Schema:

  • Sales
    • product_name (VARCHAR)
    • sale_date (DATE: 'YYYY-MM-DD')
    • sales_amount (INT)

Steps to Achieve the Desired Output:

  1. Select the required columns: Extract product_name and the sales data corresponding to the dates '2023-01-01', '2023-01-02', and '2023-01-03'.
  2. Conditional Aggregation: Use conditional aggregation to pivot the sales_date column and align the sales data for the specific dates.
  3. Handle NULLs for missing sales: Ensure that if a product has no sales on a particular date, the output value for that date should be NULL.
  4. Ordering: Ensure the final result set is ordered alphabetically by product_name.

SQL Query

Here is a comprehensive SQL query to achieve the required output:

SELECT 
    product_name,
    MAX(CASE WHEN sale_date = '2023-01-01' THEN sales_amount END) AS '2023-01-01',
    MAX(CASE WHEN sale_date = '2023-01-02' THEN sales_amount END) AS '2023-01-02',
    MAX(CASE WHEN sale_date = '2023-01-03' THEN sales_amount END) AS '2023-01-03'
FROM 
    Sales
WHERE 
    sale_date IN ('2023-01-01', '2023-01-02', '2023-01-03')
GROUP BY 
    product_name
ORDER BY 
    product_name;

Explanation of the Query

  • SELECT Clause: Here, we select the product_name and use conditional aggregation to get the sales amount for each of the specified dates.
  • MAX with CASE WHEN: This construct is used to pivot the table. It checks the sale_date and returns the sales_amount for each date. If there is no sale, it returns NULL.
  • FROM Sales: Indicates the table from which data is being retrieved.
  • WHERE Clause: Filters records to include only those with sale_date in the specified date range.
  • GROUP BY: Groups the results by product_name to ensure each product appears only once in the output.
  • ORDER BY: Sorts the results alphabetically by product_name.

This approach ensures that you obtain a well-structured and readable table displaying each product's sales for January 1st, 2nd, and 3rd of 2023, handling any missing sale data gracefully.

17. Find Customers Without Orders

Company: LinkedIn Difficulty: easy

SQL Interview Question: Listing Customers Who Have Never Placed an Order

Objective

Construct an SQL query to identify and list the names of all customers who have never placed an order. Ensure the resulting list is ordered alphabetically by the customers' names.

Additional Information

You are provided with two tables:

  • customers

    • customer_id (integer): Unique identifier for each customer.
    • customer_name (string): Name of the customer.
  • orders

    • order_id (integer): Unique identifier for each order.
    • customer_id (integer): Identifier linking the order to a customer.
    • order_date (date): The date on which the order was placed.

Your query should return a table with a single column, customer_name, containing the names of customers who have no corresponding entries in the orders table. If every customer has placed at least one order, the query should return an empty result set.

Example SQL Query

SELECT customer_name
FROM customers
WHERE customer_id NOT IN (SELECT DISTINCT customer_id FROM orders)
ORDER BY customer_name;

This query effectively lists the names of all customers who have never placed an order by filtering out the customer_ids found in the orders table. The resulting list is then ordered alphabetically by the customer_name to ensure clarity and easy navigation.

18. Group and Filter with HAVING

Company: GitLab Difficulty: medium

Crafting a SQL Query to Identify Departments with High Average Salaries

Introduction:
When preparing for an SQL interview, a common question revolves around manipulating and extracting data from multiple related tables. A notable example involves using SQL commands to identify departments where the average employee salary surpasses a certain threshold. For this particular scenario, we will focus on departments with an average salary exceeding $70,000 and order the results from highest to lowest average salary.

Requirements:
To tackle this query, we have to join two tables, employees and departments, and extract the necessary details:

  1. Retrieve the average salary of employees in each department.
  2. Filter departments with an average salary greater than $70,000.
  3. Display the results in descending order of average salary.

Table Structures:

  • employees:

    • id: Integer, unique identifier for each employee.
    • name: String, employee's name.
    • department_id: Integer, identifies the employee's department.
    • salary: Integer, employee's salary.
  • departments:

    • id: Integer, unique identifier for each department.
    • name: String, department's name.
    • location: String, department's location.

Step-by-step Guide:

  1. Perform a Join:

    • Utilize the INNER JOIN to combine employees and departments tables based on department_id.
  2. Calculate Average Salary:

    • Use the AVG() function to compute the average salary per department.
  3. Filter by Salary:

    • Implement a HAVING clause to filter departments where the average salary exceeds $70,000.
  4. Select Required Columns:

    • Select departments.name and the computed average salary.
  5. Order Results:

    • Sort the output by average salary in descending order using ORDER BY.

Final SQL Query:

SELECT 
    d.name AS department_name, 
    AVG(e.salary) AS avg_salary
FROM 
    employees e
JOIN 
    departments d
ON 
    e.department_id = d.id
GROUP BY 
    d.name
HAVING 
    AVG(e.salary) > 70000
ORDER BY 
    avg_salary DESC;

Conclusion:
Mastering such SQL queries not only helps in successfully navigating through SQL interviews but is also essential for generating insightful data reports. This knowledge empowers you to derive significant business insights by efficiently querying and manipulating relational databases, thus playing a crucial role in data-driven decision-making processes.

19. Find Overlapping Date Ranges

Company: X Difficulty: medium

Objective

To write a SQL query that identifies all project assignments where an employee is allocated to multiple projects with overlapping timeframes. The result should include the project ID, employee ID, start date, and end date of each such assignment. Ensure that the output is ordered by project ID and employee ID.

Additional Information

  • You are provided with a table named assignments with the following columns:

    • project_id (INTEGER): The unique identifier for each project.
    • employee_id (INTEGER): The unique identifier for each employee.
    • start_date (DATE): The start date of the employee's assignment to the project.
    • end_date (DATE): The end date of the employee's assignment to the project.
  • An employee has overlapping assignments if they are assigned to different projects where the date ranges intersect. Specifically, two assignments overlap if:

    • The projects are different (project_id differs).
    • The start date of one assignment is on or before the end date of the other assignment.
    • The end date of one assignment is on or after the start date of the other assignment.
  • The query should return distinct records without duplicates.

  • Order the final result set first by project_id in ascending order and then by employee_id in ascending order.

  • Assume that all dates are valid and start_date is always on or before end_date for each assignment.

SQL Query

SELECT DISTINCT a1.project_id, a1.employee_id, a1.start_date, a1.end_date
FROM assignments a1
JOIN assignments a2 
ON a1.employee_id = a2.employee_id
AND a1.project_id <> a2.project_id
AND a1.start_date <= a2.end_date
AND a1.end_date >= a2.start_date
ORDER BY a1.project_id, a1.employee_id;

This SQL query is designed to find all instances where an employee is involved in overlapping project assignments and returns the necessary fields in an ordered format.

20. Use COALESCE for Null Handling

Company: Samsung Difficulty: easy

How to Replace NULL with 0 in SQL and Retrieve All Orders from the orders Table

Writing clean and efficient SQL queries is an essential skill for database management and data analysis. If you're asked to retrieve all orders from an orders table, ensuring that any NULL values in the discount column are replaced with 0, you need to follow specific steps to structure your query correctly. Below is a comprehensive guide on achieving this task.

Steps to Write the SQL Query

  1. Identify the Columns: The orders table contains the columns order_id, customer_name, discount, and total_amount.
  2. Handle NULL Values: Ensure that the discount column does not contain any NULL values by using the COALESCE function, which allows you to replace NULL values with 0.
  3. Select All Required Columns: Ensure that the query retrieves all the columns in the specified order - order_id, customer_name, discount, and total_amount.
  4. Order the Results: Use the ORDER BY clause to sort the results by order_id in ascending order.

Sample SQL Query

SELECT 
    order_id,
    customer_name,
    COALESCE(discount, 0) AS discount,
    total_amount
FROM 
    orders
ORDER BY 
    order_id ASC;

Breaking Down the Query

  • SELECT statement: This part retrieves the columns you need.

    SELECT 
        order_id,
        customer_name,
    
  • COALESCE Function: Use COALESCE(discount, 0) to replace NULL values in the discount column with 0.

        COALESCE(discount, 0) AS discount,
    
  • FROM clause: Specifies the table from which to fetch the data.

    FROM 
        orders
    
  • ORDER BY clause: Ensures the result set is ordered by order_id in ascending sequence, making it easier to read and analyze.

    ORDER BY 
        order_id ASC;
    

Best Practices

  • Readability: Write clear and readable queries. Using aliases (like COALESCE(discount, 0) AS discount) makes it easier to interpret results.
  • Performance: Ensure your database has indexes on columns commonly used in ORDER BY clauses, like order_id, to optimize query performance.

Leveraging these structured steps will help you efficiently write the required SQL query to retrieve and process the orders data from the orders table, replacing NULL discounts with 0, and ordering by order_id. This method ensures clean data handling and a structured output, which is a critical aspect of database queries and analysis.

21. Self-Join for Hierarchical Data

Company: Broadcom Difficulty: easy

Interview Question: Hierarchical Category List Generation in SQL

Objective

This interview question assesses your ability to generate a hierarchical list of categories from a given table using SQL. You are required to use recursive queries to determine the level of each category in the hierarchy. Here's the specific task:

Given a table named categories with the following columns:

  • id
  • name
  • parent_id

Write an SQL query that generates a hierarchical list of all categories. For each category, you need to determine its level in the hierarchy, where:

  • Root categories (those with parent_id equal to NULL) have a level of 1.
  • Direct children of root categories have a level of 2.
  • The pattern continues similarly for subsequent levels.

The result set should contain the following columns:

  • id
  • name
  • parent_id
  • level

The output should be ordered first by level in ascending order, and within each level, by id in ascending order.

Additional Information

  • The parent_id of a root category is NULL.
  • The hierarchy is acyclic, meaning there are no cycles.
  • Recursive queries should be used to traverse the hierarchy effectively.
  • Ensure that all categories are included in the results with the correct level assigned.

To solve this task, you would typically use a common table expression (CTE) with recursion to traverse the category hierarchy and determine the levels. This challenge tests both your SQL skills and understanding of hierarchical data structures.

By mastering this interview question, you will demonstrate your ability to handle complex data retrieval scenarios using SQL, showcasing how well you can design queries to navigate hierarchical information seamlessly. This skill is particularly valuable in scenarios involving organizational structures, nested comments, or any other system with parent-child relationships.

Example SQL Query

Below is an exemplary SQL query that addresses the interview question requirements:

WITH RECURSIVE CategoryHierarchy AS (
    SELECT 
        id,
        name,
        parent_id,
        1 AS level
    FROM 
        categories
    WHERE 
        parent_id IS NULL
    
    UNION ALL
    
    SELECT 
        c.id,
        c.name,
        c.parent_id,
        ch.level + 1 AS level
    FROM 
        categories c
    INNER JOIN 
        CategoryHierarchy ch
    ON 
        c.parent_id = ch.id
)
SELECT 
    id,
    name,
    parent_id,
    level
FROM 
    CategoryHierarchy
ORDER BY 
    level ASC,
    id ASC;

This query employs a recursive CTE to determine each category's level in the hierarchy, starting from the root categories and proceeding to their children, recursively. The final SELECT statement retrieves the data and orders it as specified.

22. Time Difference Between Order Phases

Company: Spotify Difficulty: easy

Objective

Determine the mean number of hours taken to ship orders. Calculate the time elapsed between each order's placement and its shipping, then find the average of these durations.

Additional Information

  • The result should be rounded to two decimal places.
  • The data is stored in a table named order_status with the following columns:
    • order_id (integer)
    • order_time (timestamp)
    • shipping_time (timestamp)
  • All timestamps are in the same timezone.
  • Use SQL to write your query.

SQL Query

To determine the mean number of hours taken to ship orders, you need to calculate the time difference between shipping_time and order_time for each order in hours, and then find the average of these durations. The following SQL query effectively fulfills the objective:

SELECT ROUND(
  AVG(
    EXTRACT(EPOCH FROM (shipping_time - order_time)) / 3600
  ), 2) AS avg_shipping_hours
FROM order_status;

Explanation

This query extracts the number of seconds between shipping_time and order_time, converts this duration into hours by dividing by 3600 (the number of seconds in an hour), and then calculates the average of these hourly durations. Rounding the final result to two decimal places provides a precise answer.

Using this approach ensures that your SQL query is optimized for determining the average shipping time with clear and concise logic, making it ideal for both interview scenarios and practical applications in managing and analyzing shipping data.

23. Find Most Recent Order per Customer

Company: Crypto.Com Difficulty: easy

Interview Question: Retrieving Each Customer's Latest Order Details

Objective

Write an SQL query to retrieve each customer's latest order details. For every customer, fetch their name, the ID of their most recent order, the date of that order, and the total amount spent. The results should be sorted alphabetically by the customer's name.

Additional Information

  • The database consists of two tables:
    • Customers
      • customer_id: Unique identifier for each customer.
      • customer_name: Name of the customer.
    • Orders
      • order_id: Unique identifier for each order.
      • customer_id: Identifier linking the order to a customer.
      • order_date: Date when the order was placed.
      • total_amount: Total amount for the order.
  • Assume that each customer has placed at least one order.
  • If a customer has multiple orders on the same latest date, return any one of those orders.
  • The output should include the following columns: customer_name, order_id, order_date, total_amount, ordered by customer_name in ascending order.

24. Monthly Hiring Trend by Department

Company: Amazon Difficulty: easy

Mastering the Interview Question:

Counting Employee Hires per Month by Department in SQL

Objective

To tackle this SQL interview question, we're given two tables: employees and departments. Our goal is to determine the number of employees hired each month for every department.

Expected Output

Our ideal output should display three columns:

  • department: The name of the department.
  • month: The hiring month formatted as YYYY-MM.
  • hires: The total number of employees hired in that month for the department.

The result must be ordered by department name and hiring month in ascending order.

Data Structure

Employees Table

  • id: Integer, primary key.
  • name: String, name of the employee.
  • department_id: Integer, foreign key referencing departments(id).
  • hire_date: Date, the date the employee was hired.
  • salary: Integer, the salary of the employee.

Departments Table

  • id: Integer, primary key.
  • name: String, name of the department.

Constraints

  • The hire_date should be considered only up to the month and year.
  • If no employees are hired in a particular month for a department, that month should not appear in the result.

Example SQL Query

SELECT 
    d.name AS department,
    DATE_FORMAT(e.hire_date, '%Y-%m') AS month,
    COUNT(e.id) AS hires
FROM 
    employees e
JOIN 
    departments d ON e.department_id = d.id
GROUP BY 
    d.name, DATE_FORMAT(e.hire_date, '%Y-%m')
ORDER BY 
    d.name, month ASC;

Explanation

This query joins the employees and departments tables based on the department ID, groups the records by department name and the hiring month, and counts the number of employee hires per group. It orders the results first by the department name and then by the hiring month in ascending order.

This method ensures comprehensive and organized reporting on employee hiring trends across multiple departments, making it an impeccable solution to the given question.

25. Aggregate Data with CUBE

Company: Broadcom Difficulty: medium

Mastering the 'sales_data' Table for Total Sales Calculation in SQL: Detailed Guide

Objective

Crafting an SQL query to compute total sales for region and product category combinations is a common requirement in database management and business analysis. By leveraging the sales_data table, which encompasses fields such as region, product_category, and sales_amount, your task is to generate an insightful summary of sales data. This includes providing subtotals by regions, product categories, and a comprehensive grand total.

Steps for the SQL Query:

  1. Select the Necessary Fields:

    • Begin by selecting the region and product_category fields.
    • Apply the SQL aggregation function SUM to compute the total_sales.
  2. Handle NULL Values:

    • Utilize the COALESCE function to replace NULL values in the region column with the string "All Regions".
    • Similarly, replace NULL values in the product_category column with "All Categories".
  3. Group and Aggregate Data:

    • Implement the GROUP BY clause to group your data by region and product_category.
    • Use the WITH ROLLUP grouping technique to facilitate the generation of subtotals and a grand total.
  4. Format the Result Set:

    • Ensure the final output includes columns for region, product_category, and total_sales.
    • Include specific rows such as:
      • Grand total displaying "All Regions" and "All Categories".
      • Subtotals for each region using "All Categories".
      • Subtotals for each product category using "All Regions".
  5. Order the Results:

    • Order the results to display first by region and then by product_category.

The SQL Query:

SELECT
    COALESCE(region, 'All Regions') AS region,
    COALESCE(product_category, 'All Categories') AS product_category,
    SUM(sales_amount) AS total_sales
FROM
    sales_data
GROUP BY
    region, product_category
WITH ROLLUP
ORDER BY
    region IS NULL, region,
    product_category IS NULL, product_category;

By executing this SQL query, you can obtain a comprehensive summary of total sales organized by region and product category. This query helps in clear visualization of overall sales performance, while maintaining structured and ordered results essential for business analysis and reporting.

This approach ensures you meet all the requirements specified for the sales data analysis, providing valuable insights for strategic decision-making within your organization.

26. Calculate Percentage Contribution

Company: Okta Difficulty: easy

Guide to Answer the SQL Interview Question: Calculate Product Sales as a Percentage of Total Sales

Creating an SQL query to determine each product's sales as a percentage of the total sales involves several key steps. The query should return the product name, its sales amount, and the corresponding percentage, all ordered from the highest to the lowest percentage. Follow these instructions to craft an effective SQL query that meets all the specified requirements.

  1. Objective:
    Calculate each product's sales as a percentage of total sales using SQL.

  2. Table Structure:

    • The products table includes:
      • product_id (INTEGER): Unique identifier for each product.
      • product_name (VARCHAR): Name of the product.
      • sales (INTEGER): Total sales for the product.
  3. Query Requirements:

    • Compute the total sales for all products.
    • For each product, find its sales as a percentage of the total sales.
    • Return the product_name, sales amount, and the calculated percentage.
    • Round the percentage to two decimal places.
    • Order the results by the percentage in descending order.
  4. SQL Query:

SELECT 
    product_name, 
    sales,
    ROUND(CAST(sales AS DECIMAL) * 100 / (SELECT SUM(sales) FROM products), 2) AS percentage
FROM 
    products
ORDER BY 
    percentage DESC;
  1. Explanation:
    • CAST Function: Converts the sales to a decimal for precise division.
    • SUM Function: Calculates the total sales.
    • ROUND Function: Rounds the percentage to two decimal places.
    • ORDER BY Clause: Sorts the results by the percentage in descending order.

By following this guide, you can write an SQL query that accurately calculates and presents each product's sales percentage relative to the total sales, rounded to two decimal places, and orders the output by the highest percentage first. This method ensures that your results are precise, clear, and well-organized, aligning perfectly with the interview question requirements.

27. Over-Budget Project Detection

Company: Qualcomm Difficulty: medium

Objective

In a SQL-based database scenario, the interviewer asks candidates to craft an SQL query that pinpoints projects which have exceeded their allocated budgets. The given data is stored in two specific tables: Projects and Expenditures. The query's purpose is to identify projects where spending has surpassed the budget and display relevant details. The output should include the project name, assigned budget, total expenses, and the amount overspent. Moreover, the output results need to be sorted by the excess amount spent, arranged in descending order.

Additional Information

Tables:

  • Projects:
    • id (INTEGER): A unique identifier for each project.
    • project_name (VARCHAR): The name of the project.
    • budget (INTEGER): The total budget allocated for the project.
  • Expenditures:
    • id (INTEGER): A unique identifier for each expenditure record.
    • project_id (INTEGER): The identifier that relates each expenditure to a specific project.
    • amount (INTEGER): The amount spent in each expenditure.
    • description (VARCHAR): A short description of the expenditure.

Constraints:

  • Include only projects whose total expenditures exceed their budgets.
  • Compute total_spent as the sum of all amount values across each project.
  • Determine over_budget by subtracting the budget from total_spent for each project.
  • Display the resulting data ordered by the over_budget value in descending order.

Output Columns:

  • project_name
  • budget
  • total_spent
  • over_budget

28. Eligible Bonus Calculation with CASE

Company: IBM Difficulty: medium

How to Retrieve and Calculate Employee Bonuses with SQL

To tackle the problem of constructing a SQL query that calculates and categorizes employee bonuses, follow these steps based on the given criteria and tables:

Objective:

Construct a SQL query to retrieve each employee's name, their corresponding department name, salary, bonus category, and the calculated bonus amount based on the following criteria:

  • High Priority - 15%: Employees in the "Sales" department with a salary of 80,000 or more.
  • Medium Priority - 10%: Employees with a salary of 70,000 or more, regardless of department.
  • Standard - 5%: All other employees.

Calculate the bonus amount by applying the specified percentage to the employee's salary and round the result to the nearest integer. The final output should be ordered by the bonus amount in descending order.

Steps to Create the SQL Query

  1. Join the Employees and Departments Tables: Use an INNER JOIN to combine the employees and departments tables on the department_id.

  2. Calculate Bonus Amount and Determine Bonus Category:

    • Use a CASE statement to determine the bonus percentage and category based on the criteria provided.
    • Calculate the bonus amount by applying the percentage to the employee's salary.
  3. Round the Bonus Amount: Use the ROUND function to round the calculated bonus amount to the nearest integer.

  4. Select the Required Columns: Ensure the query selects the employee's name, department name, salary, bonus category, and calculated bonus amount.

  5. Sort the Results: Sort the final output by the bonus amount in descending order.

SQL Query Example:

SELECT 
    e.name, 
    d.name AS department, 
    e.salary, 
    CASE
        WHEN d.name = 'Sales' AND e.salary >= 80000 THEN 'High Priority'
        WHEN e.salary >= 70000 THEN 'Medium Priority'
        ELSE 'Standard'
    END AS bonus_category,
    ROUND(
        CASE
            WHEN d.name = 'Sales' AND e.salary >= 80000 THEN e.salary * 0.15
            WHEN e.salary >= 70000 THEN e.salary * 0.10
            ELSE e.salary * 0.05
        END
    ) AS bonus_amount
FROM 
    employees e
INNER JOIN 
    departments d ON e.department_id = d.id
ORDER BY 
    bonus_amount DESC;

This query efficiently calculates and categorizes the bonuses for employees based on the specified criteria, ensuring that the output is both accurate and well-formatted. It uses SQL functions and conditional logic to achieve the desired result, making the process streamlined and efficient.

29. Set Operation: INTERSECT

Company: DoorDash Difficulty: medium

Objective

Retrieve the list of customers who meet both of the following criteria:

  1. They are new customers with a monthly spending greater than 1000.
  2. They are loyal customers with at least 3 years of membership and have a 'Premium' tier status.

Provide the customer_id and name of these customers, sorted in ascending order by customer_id.

Additional information

Tables Description:

  • NewCustomers:

    • customer_id (INTEGER): Unique identifier for each customer.
    • name (VARCHAR): Name of the customer.
    • monthly_spend (INTEGER): Amount spent by the customer monthly.
    • join_date (DATE): Date when the customer joined.
  • LoyalCustomers:

    • customer_id (INTEGER): Unique identifier for each customer.
    • name (VARCHAR): Name of the customer.
    • membership_years (INTEGER): Number of years the customer has been a member.
    • tier (VARCHAR): Membership tier of the customer (e.g., 'Premium').

Constraints:

  • Each customer appears only once in each table.
  • membership_years and monthly_spend are positive integers.
  • The tier field contains single-word strings without special characters.

Output Requirements:

  • The result should include only the customer_id and name columns.
  • The final output must be ordered by customer_id in ascending order.

30. Extract Year and Month

Company: TikTok Difficulty: medium

Objective

Develop a SQL query to extract the year and month from each order_date in the orders table. Your query should return the order_id, total_amount, along with the derived year and month for each order. Ensure that the results are sorted in ascending order based on the order_date.

Additional Information

  • The orders table includes the following columns:
    • order_id (integer): Unique identifier for each order.
    • order_date (date): The date when the order was placed.
    • total_amount (decimal): The total amount for the order.

SQL Query

Here is a detailed, SEO-friendly SQL query addressing the above requirements.

SELECT
  order_id,
  total_amount,
  EXTRACT(YEAR FROM order_date) AS order_year,
  EXTRACT(MONTH FROM order_date) AS order_month
FROM
  orders
ORDER BY
  order_date ASC;

Explanation

This SQL query leverages the EXTRACT() function to obtain the year and month from each order_date. We select the order_id, total_amount, and the derived year (as order_year) and month (as order_month). The results are then sorted in ascending order based on the order_date, ensuring that the earliest orders are listed first.

Conclusion

By following the SQL query provided above, you can accurately and efficiently retrieve the necessary details from the orders table. This approach ensures a clear and organized dataset that is ideal for further analysis or reporting.


This SEO-friendly explanation is designed to cater to search engines while providing a comprehensive and informative guide to developing the required SQL query.


Ready to Practice More?

Explore interview questions from specific companies or try our hands-on labs to build practical experience.