Government Budgeting Variance
Beginner Mode

Start your terminal to use beginner mode.

Objective

You are a Data Scientist working for the federal government. Your task involves analyzing budget and spending data across various departments to identify financial volatility. You have been given two DataFrames that represent these data sets.

Task

Write a PySpark function that calculates the sample variance in the budget and spending for each department over the years.

Combine the two DataFrames, group the data by department, and calculate the variance for both the budget and the spending. The final variance columns must be cast to an Integer type.

Save the resulting DataFrame as result_df. Ensure the output matches the exact schema order requested, and order the final output alphabetically by Department.

File Path

  • Budget Dataset: /home/interview/budget.csv
  • Spending Dataset: /home/interview/spending.csv
  • Starter script: /home/interview/gov_budget.py

Schema

budget.csv

Column Name Data Type
Department String
Year Integer
Budget Double

spending.csv

Column Name Data Type
Department String
Year Integer
Spending Double

Expected Output Schema

Column Name Data Type
Department String
Budget_Variance Integer
Spending_Variance Integer

Example

Given this sample input:

budget_df

Department Year Budget
Health 2019 750.0
Education 2019 500.0
Health 2020 800.0
Education 2020 550.0

spending_df

Department Year Spending
Health 2019 700.0
Education 2019 450.0
Health 2020 780.0
Education 2020 540.0

The expected output would be:

Department Budget_Variance Spending_Variance
Education 1250 4050
Health 1250 3200

Explanation: * For the Health Department's spending: The mean of (700 and 780) is 740. The sample variance is calculated as ((700 - 740)^2 + (780 - 740)^2) / (2 - 1) = (1600 + 1600) / 1 = 3200.

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Sign In

Track

Question Difficulty Company Access
Need more practice in this area? Explore more questions →