Scenario
A pharmaceutical company tracks its lab equipment and associated maintenance records in two separate tables.
Task
Write a Snowflake SQL query that:
- Joins
{{ ref("equipment") }} with {{ ref("maintenance_logs") }} to include only equipment that has maintenance records
- Aggregates each piece of equipment to find the most recent maintenance date and the sum of all maintenance costs
- Applies a dense rank based on total maintenance cost in descending order (rank 1 = highest cost)
- Returns only equipment ranked in the top 3 using
QUALIFY
- Returns the columns:
equipment_id, equipment_name, purchase_date, latest_maintenance_date, maintenance_cost_rank (cast as INTEGER)
Schema
equipment
| Column |
Type |
Description |
| equipment_id |
String |
Unique identifier for each piece of equipment |
| equipment_name |
String |
Name of the equipment |
| purchase_date |
Date |
Date when the equipment was purchased |
maintenance_logs
| Column |
Type |
Description |
| equipment_id |
String |
Foreign key referencing equipment |
| maintenance_date |
Date |
Date of the maintenance event |
| maintenance_cost |
Float |
Cost of the maintenance event |
Example
equipment:
| equipment_id |
equipment_name |
purchase_date |
| E101 |
Centrifuge |
2021-03-15 |
| E102 |
Spectrometer |
2021-06-20 |
| E103 |
Incubator |
2021-09-10 |
| E104 |
Autoclave |
2022-01-05 |
| E105 |
Chromatograph |
2022-04-18 |
maintenance_logs:
| equipment_id |
maintenance_date |
maintenance_cost |
| E101 |
2023-01-10 |
1200.00 |
| E101 |
2023-07-22 |
850.00 |
| E102 |
2023-03-05 |
2100.00 |
| E103 |
2023-05-14 |
950.00 |
| E103 |
2023-11-30 |
1050.00 |
| E104 |
2023-08-19 |
600.00 |
Expected Output:
| equipment_id |
equipment_name |
purchase_date |
latest_maintenance_date |
maintenance_cost_rank |
| E102 |
Spectrometer |
2021-06-20 |
2023-03-05 |
1 |
| E101 |
Centrifuge |
2021-03-15 |
2023-07-22 |
2 |
| E103 |
Incubator |
2021-09-10 |
2023-11-30 |
3 |
Note: E102 has the highest total maintenance cost (2100), followed by E101 (2050) and E103 (2000). E104 is excluded because its rank (4) falls outside the top 3. E105 has no maintenance records and is therefore excluded.
WITH aggregated AS (
SELECT
e.equipment_id,
e.equipment_name,
e.purchase_date,
MAX(m.maintenance_date) AS latest_maintenance_date,
SUM(m.maintenance_cost) AS total_cost
FROM {{ ref("equipment") }} AS e
INNER JOIN {{ ref("maintenance_logs") }} AS m
ON e.equipment_id = m.equipment_id
GROUP BY e.equipment_id, e.equipment_name, e.purchase_date
)
SELECT
equipment_id,
equipment_name,
purchase_date,
latest_maintenance_date,
CAST(DENSE_RANK() OVER (ORDER BY total_cost DESC) AS INTEGER) AS maintenance_cost_rank
FROM aggregated
QUALIFY DENSE_RANK() OVER (ORDER BY total_cost DESC) <= 3
Explanation
Step 1: Join and Aggregate in a CTE
WITH aggregated AS (
SELECT
e.equipment_id,
e.equipment_name,
e.purchase_date,
MAX(m.maintenance_date) AS latest_maintenance_date,
SUM(m.maintenance_cost) AS total_cost
FROM {{ ref("equipment") }} AS e
INNER JOIN {{ ref("maintenance_logs") }} AS m
ON e.equipment_id = m.equipment_id
GROUP BY e.equipment_id, e.equipment_name, e.purchase_date
)
The INNER JOIN ensures that only equipment with at least one maintenance record is included. GROUP BY collapses each piece of equipment into a single row, allowing MAX() to find the latest maintenance date and SUM() to compute the total maintenance cost. The total_cost column is computed here so it can be referenced by the window function in the outer query. Snowflake does not allow referencing an alias defined in the same SELECT inside a window function, so the CTE is necessary.
Step 2: DENSE_RANK for Ranking
CAST(DENSE_RANK() OVER (ORDER BY total_cost DESC) AS INTEGER) AS maintenance_cost_rank
DENSE_RANK() assigns consecutive ranks with no gaps for tied values. If two pieces of equipment share the same total cost, they receive the same rank, and the next distinct cost receives the next consecutive rank. This differs from RANK(), which would skip rank numbers after ties, and from ROW_NUMBER(), which would assign unique numbers even for ties. The result is cast to INTEGER because DENSE_RANK() returns a NUMBER type by default.
Step 3: QUALIFY for Filtering Window Results
QUALIFY DENSE_RANK() OVER (ORDER BY total_cost DESC) <= 3
QUALIFY is a Snowflake clause that filters rows based on the result of a window function, similar to how HAVING filters after GROUP BY. Without QUALIFY, you would need an additional subquery or CTE to filter on the rank. This keeps only equipment ranked in the top 3 by total maintenance cost. Because DENSE_RANK is used, ties at rank 3 are included (potentially returning more than 3 rows).
Alternative: Using a Subquery Instead of QUALIFY
If your SQL dialect does not support QUALIFY, you can achieve the same result with a wrapping subquery:
WITH aggregated AS (
SELECT
e.equipment_id,
e.equipment_name,
e.purchase_date,
MAX(m.maintenance_date) AS latest_maintenance_date,
SUM(m.maintenance_cost) AS total_cost
FROM {{ ref("equipment") }} AS e
INNER JOIN {{ ref("maintenance_logs") }} AS m
ON e.equipment_id = m.equipment_id
GROUP BY e.equipment_id, e.equipment_name, e.purchase_date
)
SELECT *
FROM (
SELECT
equipment_id,
equipment_name,
purchase_date,
latest_maintenance_date,
CAST(DENSE_RANK() OVER (ORDER BY total_cost DESC) AS INTEGER) AS maintenance_cost_rank
FROM aggregated
) ranked
WHERE maintenance_cost_rank <= 3
The QUALIFY approach is more concise and avoids the extra nesting.