Tracking Model Usage
Beginner Mode

Start your terminal to use beginner mode.

Objective

As an AI engineer at an innovative technology company, you are tracking the performance and utilization of various AI models developed over the years. You are given two DataFrames: df_models, containing model metadata, and df_usage, containing daily usage logs.

Task

Write a PySpark function that merges the information from both DataFrames based on Model_ID. In addition, it should compute the total number of uses for each model over time, and the average accuracy of each Model_Type.

Save your resulting DataFrame as result_df. Ensure the output matches the exact schema order requested.

File Path

  • Models Dataset: /home/interview/models.csv
  • Usage Dataset: /home/interview/usage.csv
  • Starter script: /home/interview/ml_metrics.py

Schema

models.csv

Column Name Data Type
Model_ID String
Model_Name String
Model_Type String
Accuracy Float

usage.csv

Column Name Data Type
Model_ID String
Date Date
Uses Integer

Expected Output Schema

Column Name Data Type
Model_ID String
Model_Name String
Model_Type String
Accuracy Float
Total_Uses Integer
Average_Accuracy Float

Example

Given this sample input:

df_models

Model_ID Model_Name Model_Type Accuracy
M1 ModelA Type1 0.85
M2 ModelB Type2 0.78
M3 ModelC Type1 0.88
M4 ModelD Type3 0.92
M5 ModelE Type2 0.82

df_usage

Model_ID Date Uses
M1 2023-01-01 100
M1 2023-01-02 120
M2 2023-01-01 200
M3 2023-01-01 150
M4 2023-01-02 130

The output would be:

Model_ID Model_Name Model_Type Accuracy Total_Uses Average_Accuracy
M1 ModelA Type1 0.85 220 0.865
M2 ModelB Type2 0.78 200 0.8
M3 ModelC Type1 0.88 150 0.865
M4 ModelD Type3 0.92 130 0.92
M5 ModelE Type2 0.82 0 0.8

Notice how M5 has no usage logs in df_usage, but it still appears in the output with Total_Uses = 0 because of the left join. Its Average_Accuracy (0.8) is the average of all Type2 models (M2 at 0.78 and M5 at 0.82).

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Sign In

Track

Question Difficulty Company Access
Need more practice in this area? Explore more questions →