Scenario
You work as a data analyst for a government agency that tracks annual budget allocations and actual spending across departments.
Task
Write a Snowflake SQL query that:
- Joins
{{ ref("budgets") }} with {{ ref("spending") }} on both department and fiscal_year
- Calculates
budget_variance as CAST(VARIANCE(amount_budgeted) AS INTEGER) grouped by department
- Calculates
spending_variance as CAST(VARIANCE(amount_spent) AS INTEGER) grouped by department
- Returns
department, budget_variance, and spending_variance
Schema
budgets
| Column |
Type |
Description |
| department |
String |
Name of the government department |
| fiscal_year |
Integer |
Fiscal year of the allocation |
| amount_budgeted |
Float |
Total budget allocated (in millions) |
spending
| Column |
Type |
Description |
| department |
String |
Name of the government department |
| fiscal_year |
Integer |
Fiscal year of the expenditure |
| amount_spent |
Float |
Total amount spent (in millions) |
Example
budgets:
| department |
fiscal_year |
amount_budgeted |
| Engineering |
2022 |
400.0 |
| Engineering |
2023 |
460.0 |
| Marketing |
2022 |
300.0 |
| Marketing |
2023 |
350.0 |
spending:
| department |
fiscal_year |
amount_spent |
| Engineering |
2022 |
380.0 |
| Engineering |
2023 |
440.0 |
| Marketing |
2022 |
280.0 |
| Marketing |
2023 |
320.0 |
Expected Output:
| department |
budget_variance |
spending_variance |
| Engineering |
1800 |
1800 |
| Marketing |
1250 |
800 |
Note: Variance here uses sample variance (divides by N-1, not N). The result is a whole number with no decimals. Results are ordered alphabetically by department.
WITH combined AS (
SELECT
b.department,
b.amount_budgeted,
s.amount_spent
FROM {{ ref("budgets") }} b
INNER JOIN {{ ref("spending") }} s
ON b.department = s.department
AND b.fiscal_year = s.fiscal_year
),
aggregated AS (
SELECT
department,
CAST(VARIANCE(amount_budgeted) AS INTEGER) AS budget_variance,
CAST(VARIANCE(amount_spent) AS INTEGER) AS spending_variance
FROM combined
GROUP BY department
)
SELECT * FROM aggregated
ORDER BY department
Explanation
Step 1: Join the two tables on department and fiscal year
FROM {{ ref("budgets") }} b
INNER JOIN {{ ref("spending") }} s
ON b.department = s.department
AND b.fiscal_year = s.fiscal_year
The INNER JOIN matches each budget row with its corresponding spending row for the same department and year. Rows that exist in only one table (for example, a department that appears in spending but not in budgets) are excluded. Joining on both columns ensures you pair the correct budget and spending figures before aggregating.
Step 2: Compute sample variance with VARIANCE()
CAST(VARIANCE(amount_budgeted) AS INTEGER) AS budget_variance,
CAST(VARIANCE(amount_spent) AS INTEGER) AS spending_variance
Snowflake's VARIANCE() function computes sample variance, which divides the sum of squared deviations by N-1 (where N is the number of values in the group). This is the standard unbiased estimator for population variance from a sample. The alias VAR_SAMP() is equivalent.
Step 3: Cast to INTEGER
CAST(VARIANCE(...) AS INTEGER)
VARIANCE() returns a float. Wrapping it in CAST(... AS INTEGER) truncates the decimal portion to produce a whole number. For example, a variance of 1733.33 becomes 1733. This matches the expected output schema.
Step 4: Group by department
GROUP BY department
Grouping by department produces one row per department with the aggregated variance values across all matched fiscal years.
Alternative: Single CTE approach
You can combine the join and aggregation into a single CTE:
WITH result AS (
SELECT
b.department,
CAST(VARIANCE(b.amount_budgeted) AS INTEGER) AS budget_variance,
CAST(VARIANCE(s.amount_spent) AS INTEGER) AS spending_variance
FROM {{ ref("budgets") }} b
INNER JOIN {{ ref("spending") }} s
ON b.department = s.department
AND b.fiscal_year = s.fiscal_year
GROUP BY b.department
)
SELECT * FROM result
ORDER BY department
This is more concise and produces the same result. The two-CTE version separates the join logic from the aggregation, which can be easier to read and debug for more complex queries.