Scenario
You work for a mining company that tracks rare mineral extraction across multiple global sites.
Task
Write a Snowflake SQL query that:
- Joins
{{ ref("mines") }} with {{ ref("extractions") }} on the mine identifier
- Calculates
total_quantity as the sum of quantity grouped by location and mineral
- Returns only the
location, mineral, and total_quantity columns
- Sorts results by
location ascending, then by mineral ascending
Schema
mines
| Column |
Type |
Description |
| id |
Integer |
Unique identifier for the mine |
| name |
String |
Name of the mine |
| location |
String |
Country where the mine operates |
extractions
| Column |
Type |
Description |
| mine_id |
Integer |
References the mine's id |
| date |
Date |
Date of the extraction |
| mineral |
String |
Name of the extracted mineral |
| quantity |
Integer |
Quantity extracted in kilograms |
Example
mines:
| id |
name |
location |
| 1 |
Ridgeback Mine |
Peru |
| 2 |
Sunstone Mine |
Norway |
| 3 |
Deepcore Mine |
Peru |
| 4 |
Ironveil Mine |
Ghana |
extractions:
| mine_id |
date |
mineral |
quantity |
| 1 |
2024-03-10 |
Copper |
450 |
| 2 |
2024-03-10 |
Titanium |
300 |
| 3 |
2024-03-10 |
Copper |
550 |
| 4 |
2024-03-11 |
Bauxite |
700 |
| 1 |
2024-03-11 |
Copper |
600 |
| 2 |
2024-03-11 |
Titanium |
400 |
Expected Output:
| location |
mineral |
total_quantity |
| Ghana |
Bauxite |
700 |
| Norway |
Titanium |
700 |
| Peru |
Copper |
1600 |
Note: Peru has two mines (Ridgeback and Deepcore) both extracting Copper, so their quantities are summed together. Norway's two Titanium extractions on different dates are also combined.
WITH aggregated AS (
SELECT
m.location,
e.mineral,
SUM(e.quantity) AS total_quantity
FROM {{ ref("mines") }} m
INNER JOIN {{ ref("extractions") }} e
ON m.id = e.mine_id
GROUP BY m.location, e.mineral
)
SELECT *
FROM aggregated
ORDER BY location ASC, mineral ASC
Explanation
Step 1: Joining the two tables
FROM {{ ref("mines") }} m
INNER JOIN {{ ref("extractions") }} e
ON m.id = e.mine_id
An INNER JOIN combines rows from mines and extractions where the mine identifier matches. Each extraction record gets paired with its mine's location and name. Since we only need rows that exist in both tables, INNER JOIN is the right choice over LEFT JOIN.
Step 2: Aggregating with GROUP BY
SELECT
m.location,
e.mineral,
SUM(e.quantity) AS total_quantity
...
GROUP BY m.location, e.mineral
GROUP BY collapses all rows that share the same location and mineral into a single row. SUM(e.quantity) adds up the extraction quantities within each group. Every non-aggregated column in the SELECT must appear in the GROUP BY clause.
Step 3: Sorting the results
ORDER BY location ASC, mineral ASC
ORDER BY with multiple columns sorts by location first, then by mineral within each location. ASC is the default direction but writing it explicitly makes the intent clear. You could also write ORDER BY 1, 2 using column positions, though named columns are more readable.
Step 4: Alternative without a CTE
You can write the same query as a flat statement without a CTE:
SELECT
m.location,
e.mineral,
SUM(e.quantity) AS total_quantity
FROM {{ ref("mines") }} m
INNER JOIN {{ ref("extractions") }} e
ON m.id = e.mine_id
GROUP BY m.location, e.mineral
ORDER BY location ASC, mineral ASC
Both versions produce identical results. The CTE approach separates the aggregation logic from the final ordering, which becomes more useful when you need to chain multiple transformation steps.