Data Engineering Interview Questions (30+ Questions)

Last Updated: June 8, 2026 • 30 QuestionsReal Company Interviews

Are you preparing for a data engineering interview? Data engineering involves creating, building, and maintaining the infrastructure and systems that facilitate data collection, storage, and analysis. It focuses on developing robust data pipelines and ensuring seamless data flow from source to destination. For both newcomers and experienced professionals, mastering data engineering is vital in today's data-focused world.

30
Interview Questions
15
Companies
3
Difficulty Levels

Table of Contents

In this article, we present Data Engineering Interview Questions with answers typically asked in data engineer interviews at high-paying companies. Whether you're preparing for your first data engineering role or looking to progress in your career, this guide will help you navigate the most frequently asked data engineering interview questions and stand out in competitive job interviews.

💡 Pro Tips for Data Engineering Interviews

  • Practice each question and understand the underlying concepts
  • Review company-specific technologies and methodologies
  • Prepare follow-up questions and edge cases
  • Practice explaining your solutions clearly and concisely

Interview Questions & Answers

1. Top K Frequent Elements in Stream

Company: JPMorgan Difficulty: medium

import heapq
from collections import defaultdict

class TopKFrequent:
def init(self, k):
self.k = k
self.freq = defaultdict(int)

def add(self, num):
    self.freq[num] += 1

def topK(self):
    items = [(-count, num) for num, count in self.freq.items()]
    top = heapq.nsmallest(self.k, items)
    return [num for _, num in top]

2. LFU Cache

Company: Amazon Difficulty: hard

from collections import defaultdict, OrderedDict

class LFUCache:
def init(self, capacity):
self.cap = capacity
self.min_freq = 0
self.key_val = {}
self.key_freq = {}
self.freq_keys = defaultdict(OrderedDict)

def _update(self, key):
    freq = self.key_freq[key]
    self.freq_keys[freq].pop(key)
    if not self.freq_keys[freq] and self.min_freq == freq:
        self.min_freq += 1
    self.key_freq[key] = freq + 1
    self.freq_keys[freq + 1][key] = None

def get(self, key):
    if key not in self.key_val:
        return -1
    self._update(key)
    return self.key_val[key]

def put(self, key, value):
    if self.cap <= 0:
        return
    if key in self.key_val:
        self.key_val[key] = value
        self._update(key)
        return
    if len(self.key_val) >= self.cap:
        evict_key, _ = self.freq_keys[self.min_freq].popitem(last=False)
        del self.key_val[evict_key]
        del self.key_freq[evict_key]
    self.key_val[key] = value
    self.key_freq[key] = 1
    self.freq_keys[1][key] = None
    self.min_freq = 1

3. Event Stream Deduplicator

Company: Google Difficulty: medium

class Deduplicator:
def init(self, ttl):
self.ttl = ttl
self.seen = {}

def process(self, timestamp, eventId):
    expired = [k for k, t in self.seen.items() if timestamp - t > self.ttl]
    for k in expired:
        del self.seen[k]

    if eventId in self.seen:
        self.seen[eventId] = timestamp
        return False
    self.seen[eventId] = timestamp
    return True

4. Moving Average from Data Stream

Company: Amazon Difficulty: easy

from collections import deque

class MovingAverage:
def init(self, size):
self.size = size
self.queue = deque()
self.window_sum = 0

def next(self, val):
    self.queue.append(val)
    self.window_sum += val
    if len(self.queue) > self.size:
        self.window_sum -= self.queue.popleft()
    return self.window_sum / len(self.queue)

5. Skew-Aware Key Partitioner

Company: Okta Difficulty: medium

class SkewHandler:
def init(self, numBuckets, hotThreshold, splitFactor):
self.n = numBuckets
self.threshold = hotThreshold
self.split = splitFactor
self.counts = {}
self.robin = {}
self.load = [0] * numBuckets

def _hash(self, key):
    h = 0
    for c in key:
        h = h * 31 + ord(c)
    return h % self.n

def assign(self, key):
    self.counts[key] = self.counts.get(key, 0) + 1
    base = self._hash(key)

    if self.counts[key] > self.threshold:
        idx = self.robin.get(key, 0)
        bucket = (base + idx) % self.n
        self.robin[key] = (idx + 1) % self.split
    else:
        bucket = base

    self.load[bucket] += 1
    return bucket

def getLoad(self):
    return list(self.load)

6. Hash Join Simulator

Company: Apple Difficulty: medium

from collections import defaultdict

class HashJoin:
def init(self):
self.table = defaultdict(list)

def build(self, rows, keyIndex):
    self.table.clear()
    for row in rows:
        self.table[row[keyIndex]].append(row)

def probe(self, rows, keyIndex):
    result = []
    for row in rows:
        for build_row in self.table.get(row[keyIndex], []):
            result.append(build_row + row)
    result.sort()
    return result

7. Log Aggregator

Company: Microsoft Difficulty: medium

from collections import defaultdict
import bisect

class LogAggregator:
def init(self):
self.logs = defaultdict(list)

def record(self, timestamp, key):
    self.logs[key].append(timestamp)

def count(self, key, start, end):
    if key not in self.logs:
        return 0
    timestamps = self.logs[key]
    left = bisect.bisect_left(timestamps, start)
    right = bisect.bisect_right(timestamps, end)
    return right - left

8. Find Median from Data Stream

Company: Okta Difficulty: hard

import heapq

class MedianFinder:
def init(self):
self.small = [] # max-heap (negated values)
self.large = [] # min-heap

def addNum(self, num):
    heapq.heappush(self.small, -num)
    heapq.heappush(self.large, -heapq.heappop(self.small))
    if len(self.large) > len(self.small):
        heapq.heappush(self.small, -heapq.heappop(self.large))

def findMedian(self):
    if len(self.small) > len(self.large):
        return float(-self.small[0])
    return (-self.small[0] + self.large[0]) / 2.0

9. Min Stack

Company: Google Difficulty: medium

class MinStack:
def init(self):
self.stack = []
self.min_stack = []

def push(self, val: int) -> None:
    self.stack.append(val)
    if self.min_stack:
        self.min_stack.append(min(val, self.min_stack[-1]))
    else:
        self.min_stack.append(val)

def pop(self) -> None:
    self.stack.pop()
    self.min_stack.pop()

def top(self) -> int:
    return self.stack[-1]

def getMin(self) -> int:
    return self.min_stack[-1]

10. Design Twitter

Company: Google Difficulty: medium

class Twitter:
def init(self):
self.count = 0
self.tweetMap = {}
self.followMap = {}

def postTweet(self, userId: int, tweetId: int) -> None:
    if userId not in self.tweetMap:
        self.tweetMap[userId] = []
    self.tweetMap[userId].append([self.count, tweetId])
    self.count -= 1

def getNewsFeed(self, userId: int) -> list[int]:
    res = []
    minHeap = []
    
    if userId not in self.followMap:
        self.followMap[userId] = set()
        
    self.followMap[userId].add(userId)
    
    for followeeId in self.followMap[userId]:
        if followeeId in self.tweetMap and len(self.tweetMap[followeeId]) > 0:
            index = len(self.tweetMap[followeeId]) - 1
            count, tweetId = self.tweetMap[followeeId][index]
            heapq.heappush(minHeap, [count, tweetId, followeeId, index - 1])
            
    while minHeap and len(res) < 10:
        count, tweetId, followeeId, index = heapq.heappop(minHeap)
        res.append(tweetId)
        
        if index >= 0:
            next_count, next_tweetId = self.tweetMap[followeeId][index]
            heapq.heappush(minHeap, [next_count, next_tweetId, followeeId, index - 1])
            
    self.followMap[userId].remove(userId)
    return res

def follow(self, followerId: int, followeeId: int) -> None:
    if followerId not in self.followMap:
        self.followMap[followerId] = set()
    self.followMap[followerId].add(followeeId)

def unfollow(self, followerId: int, followeeId: int) -> None:
    if followerId in self.followMap and followeeId in self.followMap[followerId]:
        self.followMap[followerId].remove(followeeId)

11. Design Add and Search Words Data Structure

Company: Google Difficulty: medium

class TrieNode:
def init(self):
self.children = {}
self.is_end = False

class WordDictionary:
def init(self):
self.root = TrieNode()

def addWord(self, word: str) -> None:
    curr = self.root
    for char in word:
        if char not in curr.children:
            curr.children[char] = TrieNode()
        curr = curr.children[char]
    curr.is_end = True

def search(self, word: str) -> bool:
    def dfs(j, root):
        curr = root
        for i in range(j, len(word)):
            char = word[i]
            if char == '.':
                for child in curr.children.values():
                    if dfs(i + 1, child):
                        return True
                return False
            else:
                if char not in curr.children:
                    return False
                curr = curr.children[char]
        return curr.is_end
        
    return dfs(0, self.root)

12. Implement Trie (Prefix Tree)

Company: Google Difficulty: medium

class TrieNode:
def init(self):
self.children = {}
self.is_end_of_word = False

class Trie:
def init(self):
self.root = TrieNode()

def insert(self, word: str) -> None:
    curr = self.root
    for char in word:
        if char not in curr.children:
            curr.children[char] = TrieNode()
        curr = curr.children[char]
    curr.is_end_of_word = True

def search(self, word: str) -> bool:
    curr = self.root
    for char in word:
        if char not in curr.children:
            return False
        curr = curr.children[char]
    return curr.is_end_of_word

def startsWith(self, prefix: str) -> bool:
    curr = self.root
    for char in prefix:
        if char not in curr.children:
            return False
        curr = curr.children[char]
    return True

13. LRU Cache

Company: Google Difficulty: medium

class Node:
def init(self, key=0, val=0):
self.key = key
self.val = val
self.prev = None
self.next = None

class LRUCache:
def init(self, capacity: int):
self.cap = capacity
self.cache = {}

    self.left = Node()
    self.right = Node()
    self.left.next = self.right
    self.right.prev = self.left

def remove(self, node):
    prev_node = node.prev
    next_node = node.next
    prev_node.next = next_node
    next_node.prev = prev_node

def insert(self, node):
    prev_mru = self.right.prev
    prev_mru.next = node
    self.right.prev = node
    node.prev = prev_mru
    node.next = self.right

def get(self, key: int) -> int:
    if key in self.cache:
        self.remove(self.cache[key])
        self.insert(self.cache[key])
        return self.cache[key].val
    return -1

def put(self, key: int, value: int) -> None:
    if key in self.cache:
        self.remove(self.cache[key])
        
    self.cache[key] = Node(key, value)
    self.insert(self.cache[key])
    
    if len(self.cache) > self.cap:
        lru = self.left.next
        self.remove(lru)
        del self.cache[lru.key]

14. Continuous Subarray Sum

Company: Google Difficulty: medium

def check_subarray_sum(nums: list[int], k: int) -> bool:
remainder_map = {0: -1}
prefix_sum = 0

for i in range(len(nums)):
    prefix_sum += nums[i]
    remainder = prefix_sum % k
    
    if remainder in remainder_map:
        if i - remainder_map[remainder] >= 2:
            return True
    else:
        remainder_map[remainder] = i
        
return False

15. Accounts Merge

Company: Google Difficulty: medium

class UnionFind:
def init(self):
self.parent = {}

def find(self, x):
    if x not in self.parent:
        self.parent[x] = x
    if self.parent[x] != x:
        self.parent[x] = self.find(self.parent[x])
    return self.parent[x]
    
def union(self, x, y):
    rootX = self.find(x)
    rootY = self.find(y)
    if rootX != rootY:
        self.parent[rootY] = rootX

def accounts_merge(accounts: list[list[str]]) -> list[list[str]]:
uf = UnionFind()
email_to_name = {}

for acc in accounts:
    name = acc[0]
    first_email = acc[1]
    for i in range(1, len(acc)):
        email = acc[i]
        email_to_name[email] = name
        uf.union(first_email, email)
        
merged_emails = {}
for email in email_to_name:
    root = uf.find(email)
    if root not in merged_emails:
        merged_emails[root] = []
    merged_emails[root].append(email)
    
res = []
for root, emails in merged_emails.items():
    res.append([email_to_name[root]] + sorted(emails))
    
return res

16. Subarray Sum Equals K

Company: Google Difficulty: medium

def subarray_sum(nums: list[int], k: int) -> int:
count = 0
prefix_sum = 0
prefix_map = {0: 1}

for num in nums:
    prefix_sum += num
    diff = prefix_sum - k
    
    if diff in prefix_map:
        count += prefix_map[diff]
        
    prefix_map[prefix_sum] = prefix_map.get(prefix_sum, 0) + 1
    
return count

17. Sliding Window Median

Company: Google Difficulty: hard

def median_sliding_window(nums: list[int], k: int) -> list[float]:
small = []
large = []
lazy = {}

for i in range(k):
    heapq.heappush(small, -nums[i])
    
for i in range(k // 2):
    heapq.heappush(large, -heapq.heappop(small))
    
def get_median():
    if k % 2 == 1:
        return float(-small[0])
    return (-small[0] + large[0]) / 2.0
    
res = [get_median()]

for i in range(k, len(nums)):
    out_num = nums[i - k]
    in_num = nums[i]
    
    lazy[out_num] = lazy.get(out_num, 0) + 1
    
    balance = 0
    
    if out_num <= -small[0]:
        balance -= 1
    else:
        balance += 1
        
    if small and in_num <= -small[0]:
        balance += 1
        heapq.heappush(small, -in_num)
    else:
        balance -= 1
        heapq.heappush(large, in_num)
        
    if balance < 0:
        heapq.heappush(small, -heapq.heappop(large))
    elif balance > 0:
        heapq.heappush(large, -heapq.heappop(small))
        
    while small and lazy.get(-small[0], 0) > 0:
        lazy[-small[0]] -= 1
        heapq.heappop(small)
        
    while large and lazy.get(large[0], 0) > 0:
        lazy[large[0]] -= 1
        heapq.heappop(large)
        
    res.append(get_median())
    
return res

18. Managing High I/O Processes

Company: Revolut Difficulty: easy

Learn how to identify and manage high I/O consuming processes on a Linux server using command-line tools. This guide covers sorting running processes by disk activity and mitigating disk bottlenecks through throttling, stopping, or rescheduling heavy jobs, improving overall system responsiveness and application performance.

19. Average Order Value

Company: Accenture Difficulty: easy

Objective

To answer the SQL interview question effectively, you need to craft a query that calculates and displays the average expenditure per customer from a table named Orders. This table records customer purchases. The results should display each customer_id alongside their avg_order_value, rounded to two decimal places. Finally, ensure the output is sorted in ascending order based on customer_id.

Additional Information

Table Schema:

  • Orders:
    • order_id (INTEGER): Unique identifier for each order.
    • customer_id (INTEGER): Identifier for the customer who made the order.
    • total_amount (DECIMAL): Total amount spent on the order.
    • order_date (DATE): The date when the order was placed.

Constraints:

  • Each customer_id may have multiple associated orders.
  • There is at least one order present in the table.

Output Requirements:

  • Columns: customer_id, avg_order_value
  • avg_order_value must be rounded to two decimal places.
  • Results must be ordered by customer_id in ascending order.

Detailed Query Explanation

The query you need to write should meet the specific output requirements and constraints mentioned. Here's the detailed SQL query designed to solve the problem:

SELECT 
    customer_id, 
    ROUND(AVG(total_amount), 2) AS avg_order_value 
FROM 
    Orders 
GROUP BY 
    customer_id 
ORDER BY 
    customer_id;

Keyword Optimization

When formulating the answer:

  • "SQL query to determine the average expenditure per customer"
  • "display customer_id alongside their avg_order_value"
  • "rounded to two decimal places"
  • "sorted in ascending order based on customer_id"
  • "table schema includes order_id, customer_id, total_amount, and order_date"

These phrases and keywords are naturally embedded in the explanation and query to ensure the content is both informative and optimized for SEO.

20. Find Cheapest Product

Company: Apple Difficulty: easy

Objective

Retrieve the name and price of the product(s) with the lowest price from the products table.

Additional Information

  • The products table contains the following columns:
    • id (integer): Unique identifier for each product.
    • name (string): Name of the product.
    • price (decimal): Price of the product.
  • There may be multiple products sharing the same lowest price.
  • Write an SQL query that returns the name and price of the cheapest product(s).

In order to tackle this SQL interview question effectively, you need to identify the product or products with the lowest price from the products table. Here's the step-by-step process to achieve this:

  1. First, determine the lowest price among all the products.
  2. Then, retrieve the name and price of the product(s) with this minimum price.

Now, let's write the precise SQL query to accomplish this objective:

SELECT 
    name,
    price
FROM 
    products
WHERE 
    price = (SELECT MIN(price) FROM products);

This SQL query efficiently retrieves the name and price of the product(s) with the lowest price from the products table by leveraging a subquery to find the minimum price. This approach ensures that even if there are multiple products with the same lowest price, all of them will be included in the result set.

21. New vs. Returning Customer Report

Company: RedHat Difficulty: easy

Generating a Detailed Customer Order Report in SQL

The given objective of this interview question is to write an SQL query to generate a comprehensive report that lists each order along with specific associated details such as order date, customer ID, customer name, and customer type, which indicates whether the order is the customer's first or a subsequent order. This can be achieved by linking information from two database tables, orders and customers.

Here’s how you can achieve this:

SQL Query to Generate the Report

WITH first_orders AS (
   SELECT
      customer_id,
      MIN(order_date) AS first_order_date
   FROM
      orders
   GROUP BY
      customer_id
)
SELECT
   o.order_date,
   o.customer_id,
   c.customer_name,
   CASE
      WHEN o.order_date = fo.first_order_date THEN 'New'
      ELSE 'Returning'
   END AS customer_type
FROM
   orders o
JOIN
   customers c ON o.customer_id = c.id
LEFT JOIN
   first_orders fo ON o.customer_id = fo.customer_id
ORDER BY
   o.order_date,
   o.customer_id;

Explanation of the SQL Query Structure

  1. The query begins with a WITH clause, also known as a Common Table Expression (CTE), named first_orders. This CTE identifies the earliest order date for each customer by using the SQL aggregation functions:

    • Columns Selected:
      • customer_id
      • first_order_date (calculated as the minimum order_date for each customer_id)
    • Grouped By: customer_id
  2. The main SELECT statement retrieves the following details:

    • order_date : The date when the order was made (orders.order_date).
    • customer_id : Customer identifier (orders.customer_id).
    • customer_name : Name of the customer (customers.customer_name).
    • customer_type : A CASE statement determining if the order is the customer's first or subsequent order. If orders.order_date matches first_orders.first_order_date, it is marked as ‘New’; otherwise, it is marked as ‘Returning’.
  3. Join Operations:

    • The orders table is joined with the customers table on customer_id to get the customer name.
    • The CTE first_orders is LEFT JOINED with the orders table to determine if the order is the first order or not.
  4. Ordering:

    • The results are ordered first by order_date and then by customer_id for better readability and to match expected output structure.

Additional Notes

  • The use of SQL functions such as MIN() and GROUP BY ensures that the query efficiently handles large datasets by summarizing order data per customer.
  • The LEFT JOIN is crucial to ensure that even if there are no orders yet, customers who have placed future orders can be correctly identified.

This SQL query is designed to provide all necessary details in an ordered and efficient manner to analyze customer purchasing patterns effectively.

22. Above Average Price Products

Company: Hulu Difficulty: medium

SQL Interview Question: Fetching Available Products Above Average Price

Objective

Develop an SQL query to fetch all products currently available in stock and priced above the average price of all available products. The dataset should be organized primarily based on price in descending order and secondarily by rating in descending order. Products lacking a rating should be placed at the end of the sorted list.

Additional Information

  • Include only products with stock_quantity greater than 0.
  • The average price must be computed solely based on the products that are in stock.
  • When sorting, products with a NULL rating should appear after those with non-null ratings.
  • The query should return all columns from the Products table.
  • Account for the possibility that some products might not have a rating value.

This detailed explanation contributes valuable insights into constructing a sophisticated SQL query that efficiently retrieves and sorts products based on inventory status and price, offering critical functionality for database management. Understanding and mastering such queries are essential for optimizing data handling and retrieval processes in various business applications.

Sample Query

SELECT * 
FROM Products 
WHERE stock_quantity > 0 
AND price > (SELECT AVG(price) FROM Products WHERE stock_quantity > 0) 
ORDER BY price DESC, rating DESC 
NULLS LAST;

By employing the above query, you can ensure that you are extracting relevant product data effectively, adhering to the given criteria, and enhancing your SQL skill set for advanced data manipulation and querying tasks.

23. Frequent Price Change Detector

Company: Atlassian Difficulty: medium

Identifying Products with Multiple Price Changes Using SQL

Objective

In this interview question, you are provided with two tables: products and price_history. Your task is to write an SQL query that identifies all products that have experienced at least two price changes. A price change is defined as a difference in price from one entry to the next for the same product, based on the chronological order of dates.

Requirements

  1. Tables:

    • products:
      • id (integer): The unique identifier for each product.
      • name (string): The name of the product.
      • category (string): The category to which the product belongs.
    • price_history:
      • product_id (integer): The identifier linking to the products table.
      • price (decimal): The price of the product at a specific date.
      • date (date): The date when the price was recorded.
  2. Constraints:

    • Each product_id in price_history corresponds to an id in products.
    • Prices are recorded in chronological order for each product.
  3. Output:

    • The product’s name.
    • The category of the product.
    • The total number of price changes, returned as a string.
  4. Conditions:

    • Only consider consecutive price entries for determining a price change.
    • A product must have at least two price changes to be included in the result.
  5. Order:

    • Results should be ordered by the number of price changes in descending order.
    • In case of ties, results should be ordered by the product ID in ascending order.

This query helps in understanding the dynamics of product pricing over time and can be a valuable insight for inventory management, pricing strategies, and market analysis. Performing well on such questions requires a meticulous approach to grouping and ordering, which is fundamental in SQL querying.

24. Self-Join for Manager Names

Company: Backblaze Difficulty: easy

SQL Query to Retrieve Employee and Manager Names Ordered by Employee ID

In a typical SQL job interview, you might encounter questions designed to evaluate your ability to manipulate and retrieve data using SQL queries. One common question involves retrieving each employee's name alongside their respective manager's name from a given employees table.

To solve this, you'll need to create a SQL query that:

  1. Retrieves all employees.
  2. Includes each employee's manager name (if one exists).
  3. Ensures that employees without managers are still listed with a NULL value for the manager's name.
  4. Orders the results by the employee's unique identifier in ascending order.

Sample SQL Query

Here’s a detailed SQL query that meets the requirements:

SELECT 
    e.id AS employee_id,
    e.name AS employee_name,
    m.name AS manager_name
FROM 
    employees e
LEFT JOIN 
    employees m ON e.manager_id = m.id
ORDER BY 
    e.id ASC;

Explanation of the Query

  • SELECT: Specifies the columns you want to retrieve. Here we select the id and name from the employees table (aliased as e) as well as the name from the manager.

  • e.id AS employee_id: Retrieves the employee's ID and labels it as employee_id.

  • e.name AS employee_name: Retrieves the employee's name and labels it as employee_name.

  • m.name AS manager_name: Retrieves the manager's name (if it exists) and labels it as manager_name.

  • FROM employees e: Specifies the primary table we are querying from (employees), aliasing it as e.

  • LEFT JOIN employees m ON e.manager_id = m.id: Performs a left join on the employees table itself to get the manager's details. The join condition matches e.manager_id with m.id, ensuring managers are matched to their subordinates.

  • ORDER BY e.id ASC: Orders the result by the id of the employees in ascending order.

This query will yield a list of all employees with their respective manager names, if available. Employees without a manager will have NULL in the manager_name column, and the results will be neatly ordered by employee ID.

Conclusion

Answering SQL interview questions effectively often involves not just solving the problem but doing so in a clear, efficient, and accurate manner. This SQL query example demonstrates your ability to handle self-joins, NULL values, and sorting, making it a comprehensive answer to the posed question.

By fully understanding and implementing the core requirements—retrieving employees and their managers, handling employees without managers, and ordering the results by employee ID—you’ll showcase your SQL proficiency and logical problem-solving skills, critical assets for any data-centric role.

25. Join Employees and Departments

Company: Adobe Difficulty: easy

Understanding the Question

Objective

The goal is to write an SQL query to retrieve and display a list of employees who earn more than $50,000. For each employee that meets the specified salary condition, the output should include several details: employee ID, first name, last name, job title, hire date, and the name of their department. If an employee does not belong to any department, the department name should be presented as NULL. Furthermore, the employees should be listed in descending order based on their hire date, showcasing the most recently hired employees first.

Additional Information

Tables:

  • employees table:

    • employee_id (Integer): Unique identifier for each employee.
    • first_name (String): Employee's first name.
    • last_name (String): Employee's last name.
    • department_id (Integer): Identifier for the department to which the employee belongs (can be NULL).
    • job_title (String): The title of the employee's job.
    • salary (Integer): The employee's salary.
    • hire_date (Date): The date the employee was hired.
  • departments table:

    • department_id (Integer): Unique identifier for each department.
    • department_name (String): The name of the department.
    • location (String): The location of the department.

Requirements

  • Use a LEFT JOIN to merge the employees and departments tables on the department_id.
  • Filter to only include employees with a salary greater than 50000.
  • Display department_name as NULL for employees who are not assigned to any department.
  • Order the results by hire_date in descending order.

Example

Input:

  • employees table:

    employee_id first_name last_name department_id job_title salary hire_date
    1 John Smith 1 Senior Developer 85000 2020-01-15
    2 Mary Johnson 2 Project Manager 75000 2021-03-20
    3 Peter Brown NULL Consultant 65000 2022-06-10
    4 Sarah Davis 1 Developer 45000 2021-09-01
  • departments table:

    department_id department_name location
    1 Engineering New York
    2 Project Management Boston
    3 Marketing Chicago

Expected Output:

employee_id first_name last_name department_name job_title hire_date
3 Peter Brown NULL Consultant 2022-06-10
2 Mary Johnson Project Management Project Manager 2021-03-20
1 John Smith Engineering Senior Developer 2020-01-15

SQL Query:

SELECT
    e.employee_id,
    e.first_name,
    e.last_name,
    d.department_name,
    e.job_title,
    e.hire_date
FROM
    employees e
LEFT JOIN
    departments d ON e.department_id = d.department_id
WHERE
    e.salary > 50000
ORDER BY
    e.hire_date DESC;

26. Find Highest Salary

Company: JaneStreet Difficulty: easy

Objective

Write an SQL query to retrieve the names and salaries of the highest-paid employee(s) from the employees table.

Additional Information

  • The employees table consists of the following columns:
    • id (integer): The unique identifier for each employee.
    • name (string): The name of the employee.
    • salary (integer): The salary of the employee.
  • If multiple employees share the highest salary, include all of them in the result.
  • The output should contain two columns: name and salary.
  • The results can be returned in any order.

This SQL query requires identifying the highest salary first and then selecting the names and salaries of all employees who earn this highest salary.

To start with, the query can use a subquery to determine the maximum salary from the employees table. Subsequently, a main query can be employed to fetch the names and salaries of employees whose salaries match the maximum salary obtained from the subquery.

Here is the complete SQL query to achieve this objective:

SQL Query

SELECT name, salary
FROM employees
WHERE salary = (SELECT MAX(salary) FROM employees);

This query effectively retrieves the names and salaries of all employees who have the highest salary in the employees table, satisfying the given requirements.

27. Aggregate Employee Salaries

Company: RedHat Difficulty: medium

Objective

Given two tables, employees and departments, construct a SQL query to display each department's name, the total number of employees in that department, the aggregate salary of all its employees, and the average salary rounded to two decimal places. The output should be sorted alphabetically by the department name.

Additional Information

Tables:

employees:

  • id (INTEGER): Unique identifier for each employee.
  • name (VARCHAR): Name of the employee.
  • department_id (INTEGER): Identifier linking to the employee's department.
  • salary (INTEGER): Salary of the employee.

departments:

  • id (INTEGER): Unique identifier for each department.
  • name (VARCHAR): Name of the department.
  • location (VARCHAR): Location of the department.

Requirements:

  • Use appropriate JOIN operations to link employees with departments.
  • Calculate the total number of employees (employee_count) in each department.
  • Compute the sum of salaries (total_salary) for each department.
  • Determine the average salary (average_salary) for each department, rounded to two decimal places.
  • Alias the department name as department.
  • Order the final results by the department name in ascending order.

Output Columns:

  • department (VARCHAR): Name of the department.
  • employee_count (INTEGER): Number of employees in the department.
  • total_salary (INTEGER): Combined salaries of all employees in the department.
  • average_salary (DECIMAL(10,2)): Average salary within the department, rounded to two decimal places.

28. Filter Orders by Date Range

Company: Google Difficulty: easy

How to Construct an SQL Query to Fetch Order Details within a Specific Date Range

If you're preparing for an interview, crafting a precise SQL query to fetch order details is a skill you need to master. The task involves fetching order_id, customer_name, order_date, and total_amount from the orders table, only for orders placed between January 1, 2023, and June 30, 2023. The results must be sorted by order_date in ascending order. Here’s a step-by-step guide on how to nail this:

SQL Query Example

To achieve this, you need to follow these pointers:

  • Select the desired columns: order_id, customer_name, order_date, and total_amount.
  • Filter the results to only include orders where the order_date falls inclusively between '2023-01-01' and '2023-06-30'.
  • Order the results by order_date in ascending order to get the earliest orders at the top.

Here is the SQL query you need:

SELECT order_id, customer_name, order_date, total_amount
FROM orders
WHERE order_date BETWEEN '2023-01-01' AND '2023-06-30'
ORDER BY order_date ASC;

Explanation

  1. FROM orders: Indicates the data source, which is the orders table.
  2. SELECT order_id, customer_name, order_date, total_amount: Specifies the columns you want to retrieve.
  3. WHERE order_date BETWEEN '2023-01-01' AND '2023-06-30': Filters the results to include only those orders placed within the defined date range.
  4. ORDER BY order_date ASC: Ensures that the output is sorted by order_date in ascending order.

This query effectively pulls the required data while maintaining the specified conditions and ordering.

Conclusion

Being able to craft an SQL query that accurately fetches order details based on specific criteria is crucial for data analysis and reporting. This particular query showcases your ability to filter data by date and sort it effectively, ensuring that you can produce meaningful and organized insights from your data tables. Practice writing and running similar queries to enhance your SQL skills and be interview-ready.


29. Count Distinct Product Categories

Company: AMD Difficulty: medium

Analyzing Primary Categories and Subcategories in a Products Table: A Detailed Guide

Objective

In this analysis, we aim to examine the products table to identify and count all primary categories that do not have a parent category. Additionally, we will calculate the total number of unique subcategories directly associated with these primary categories.

Key Points

  • Primary Category: Defined as a category with no parent_category.
  • Subcategory: A distinct category whose parent_category matches a primary category.
  • Output: The analysis should yield two specific metrics:
    • total_root_categories: The count of distinct primary categories.
    • total_subcategories: The count of unique subcategories linked to the primary categories.

Table Description

The products table consist of these essential columns:

  • id (integer): Unique identifier for each product.
  • name (string): Name of the product.
  • category (string): The category to which the product belongs.
  • parent_category (string or null): The parent category of the product's category, if applicable.

Analysis Goals

  • Identify primary categories with parent_category as null.
  • Determine the distinct subcategories that have their parent_category set to a primary category.

This analysis provides valuable insights into the category hierarchy within the products table and supports better categorization strategies.

By performing this analysis, businesses can understand and organize their product categories more effectively, ensuring a well-structured product categorization system that enhances searchability and user experience.

Conclusion

The aim is to extract the total_root_categories and total_subcategories to comprehend the structure and hierarchy of the product categories. This process is pivotal in optimizing product categorization for better inventory management and improved customer navigation.

This detailed guide serves as a basis for extracting meaningful insights from the products table, ensuring accurate and efficient data analysis.

30. Join Three Tables

Company: AMD Difficulty: easy

Objective

Construct a SQL query to retrieve a comprehensive list of customer orders. For each order, display the customer's name, the date the order was placed, the name of the product ordered, the quantity of the product, and the unit price. Ensure that the results are organized first by the customer's name in ascending order and then by the order date in ascending order.

Additional Information

  • Tables and Schemas:
    • Customers

      • customer_id (INTEGER): Unique identifier for each customer.
      • customer_name (VARCHAR): Name of the customer.
      • email (VARCHAR): Email address of the customer.
    • Orders

      • order_id (INTEGER): Unique identifier for each order.
      • customer_id (INTEGER): Identifier linking the order to a customer.
      • order_date (DATE): Date when the order was placed.
    • OrderDetails

      • order_id (INTEGER): Identifier linking the detail to an order.
      • product_name (VARCHAR): Name of the product ordered.
      • quantity (INTEGER): Quantity of the product ordered.
      • unit_price (DECIMAL): Price per unit of the product.

Constraints

  • Each customer can have multiple orders.
  • Each order can include multiple products.
  • Ensure that all relevant records from the three tables are accurately joined based on their relationships.

Output Requirements

  • Columns to include in the result:
    • customer_name
    • order_date
    • product_name
    • quantity
    • unit_price
  • The result should be sorted by customer_name (ascending) and then by order_date (ascending).

Example SQL Query

SELECT 
    c.customer_name,
    o.order_date,
    od.product_name,
    od.quantity,
    od.unit_price
FROM 
    Customers c
JOIN 
    Orders o ON c.customer_id = o.customer_id
JOIN 
    OrderDetails od ON o.order_id = od.order_id
ORDER BY 
    c.customer_name ASC,
    o.order_date ASC;

Ready to Practice More?

Explore interview questions from specific companies or try our hands-on labs to build practical experience.