Scenario
You work for a video streaming company and have two tables: one recording daily watch activity and another tracking subscription periods for each user.
Task
Write a Snowflake SQL query that:
- Handles
NULL values in the end_date column of {{ ref("subscriptions") }} (NULL means the subscription is still active) by replacing them with a far-future date
- Joins
{{ ref("watch_history") }} with the normalized subscriptions on user_id, keeping only rows where watch_date falls between start_date and the normalized end_date (inclusive)
- Groups by
user_id and sums watch_duration as total_watch_time
- Returns
user_id and total_watch_time
Schema
watch_history
| Column |
Type |
Description |
| user_id |
String |
Unique identifier for the user |
| watch_duration |
Integer |
Minutes watched on that date |
| watch_date |
String |
Date content was watched (YYYY-MM-DD) |
subscriptions
| Column |
Type |
Description |
| user_id |
String |
Unique identifier for the user |
| start_date |
String |
Subscription start date (YYYY-MM-DD) |
| end_date |
Date |
Subscription end date, or NULL if still active |
Example
watch_history:
| user_id |
watch_duration |
watch_date |
| A1 |
30 |
2024-06-01 |
| A1 |
55 |
2024-06-05 |
| A2 |
40 |
2024-06-03 |
| A2 |
25 |
2024-06-10 |
| A3 |
60 |
2024-06-15 |
subscriptions:
| user_id |
start_date |
end_date |
| A1 |
2024-06-01 |
2024-06-07 |
| A2 |
2024-06-01 |
NULL |
| A3 |
2024-06-10 |
2024-06-20 |
Expected Output:
| user_id |
total_watch_time |
| A1 |
85 |
| A2 |
65 |
| A3 |
60 |
Note: A1 watched 30 + 55 = 85 minutes, both dates within Jun 1 to Jun 7. A2's subscription has no end date (still active), so all watch dates qualify (40 + 25 = 65). A3 watched on Jun 15, which falls within Jun 10 to Jun 20 (60 minutes).
WITH normalized_subs AS (
SELECT
user_id,
start_date,
COALESCE(end_date, '9999-12-31'::DATE) AS end_date
FROM {{ ref("subscriptions") }}
),
active_watches AS (
SELECT
w.user_id,
w.watch_duration
FROM {{ ref("watch_history") }} w
INNER JOIN normalized_subs s
ON w.user_id = s.user_id
WHERE w.watch_date >= s.start_date
AND w.watch_date <= s.end_date
)
SELECT
user_id,
SUM(watch_duration) AS total_watch_time
FROM active_watches
GROUP BY user_id
Explanation
Step 1: Handling NULL end dates for active subscriptions
COALESCE(end_date, '9999-12-31'::DATE) AS end_date
The end_date column is NULL for users whose subscription is still active. Since you cannot compare a watch date against NULL (any comparison with NULL returns NULL, not true/false), replace it with a far-future date like 9999-12-31. This ensures any realistic watch date will fall before the normalized end date. COALESCE returns the first non-NULL argument.
Step 2: Joining watch records with subscription periods
FROM {{ ref("watch_history") }} w
INNER JOIN normalized_subs s
ON w.user_id = s.user_id
WHERE w.watch_date >= s.start_date
AND w.watch_date <= s.end_date
The INNER JOIN pairs each watch record with all of that user's subscription periods. The WHERE clause then filters to only the rows where the watch date falls within the subscription window (inclusive on both boundaries). If a user has multiple subscriptions, a single watch record can match multiple periods, but since the periods should not overlap, each record will match at most one subscription.
Step 3: Aggregating total watch time per user
SELECT
user_id,
SUM(watch_duration) AS total_watch_time
FROM active_watches
GROUP BY user_id
After filtering to active-subscription watches only, SUM(watch_duration) grouped by user_id produces the total minutes. Users with no qualifying watch records (all their watch dates fall outside their subscription windows) do not appear in the result because the inner join and filter eliminate all their rows.
Step 4: Alternative using IFF
Instead of COALESCE, you can use IFF to check for NULL explicitly:
IFF(end_date IS NULL, '9999-12-31'::DATE, end_date) AS end_date
Both produce identical results. COALESCE is more concise for simple NULL replacement, while IFF (or CASE WHEN) supports more complex conditional logic.