Vercel: Customer Churn Across Three Sources — Data Analysis Interview Q&A (2026)

Customer Churn Across Three Sources

Vercel 🧠 Hard Snowflake Deduplication Multi-Table Joins Window Functions

Beginner Mode

Scenario

A company tracks customer churn across three separate data sources: user accounts, activity logs, and exit surveys. The activity logs contain duplicate records that must be cleaned before combining.

Task

Write a Snowflake SQL query that:

Deduplicates {{ ref("activities") }} by removing rows with identical user_id, activity_date, and activity_type (keep one row per unique combination)
Left joins the deduplicated activities and {{ ref("exit_surveys") }} onto {{ ref("accounts") }} using user_id
Returns all seven columns and sorts by user_id ascending, then activity_date descending

Schema

accounts

Column	Type	Description
user_id	String	Unique user identifier
account_created_date	Date	Date the account was created
location	String	User's city

activities

Column	Type	Description
user_id	String	User identifier
activity_date	Date	Date of the activity
activity_type	String	Type of activity performed

exit_surveys

Column	Type	Description
user_id	String	User identifier
exit_date	Date	Date the user churned
exit_reason	String	Reason given for leaving

Example

accounts:

user_id	account_created_date	location
A01	2024-01-10	Portland
A02	2024-01-18	Denver
A03	2024-02-03	Atlanta
A04	2024-02-20	Seattle
A05	2024-03-01	Miami

activities (with duplicates):

user_id	activity_date	activity_type
A01	2024-03-12	Login
A01	2024-03-12	Login
A02	2024-03-18	Export
A02	2024-03-18	Export
A03	2024-04-01	Deploy
A04	2024-04-10	Login
A01	2024-04-15	Deploy

exit_surveys:

user_id	exit_date	exit_reason
A01	2024-05-01	Switched provider
A03	2024-05-10	Too expensive
A04	2024-05-15	Missing features

Expected Output:

user_id	account_created_date	location	activity_date	activity_type	exit_date	exit_reason
A01	2024-01-10	Portland	2024-04-15	Deploy	2024-05-01	Switched provider
A01	2024-01-10	Portland	2024-03-12	Login	2024-05-01	Switched provider
A02	2024-01-18	Denver	2024-03-18	Export	NULL	NULL
A03	2024-02-03	Atlanta	2024-04-01	Deploy	2024-05-10	Too expensive
A04	2024-02-20	Seattle	2024-04-10	Login	2024-05-15	Missing features
A05	2024-03-01	Miami	NULL	NULL	NULL	NULL

Note: Duplicate activity rows are reduced to one per unique combination. A01 has two distinct activities so appears twice. A02 has no exit survey (NULLs). A05 has no activities and no exit survey (NULLs in both). Results are sorted by user_id ASC, then activity_date DESC.

WITH deduped_activities AS (
    SELECT
        user_id,
        activity_date,
        activity_type
    FROM {{ ref("activities") }}
    QUALIFY ROW_NUMBER() OVER (
        PARTITION BY user_id, activity_date, activity_type
        ORDER BY activity_date
    ) = 1
),
combined AS (
    SELECT
        a.user_id,
        a.account_created_date,
        a.location,
        act.activity_date,
        act.activity_type,
        e.exit_date,
        e.exit_reason
    FROM {{ ref("accounts") }} a
    LEFT JOIN deduped_activities act
        ON a.user_id = act.user_id
    LEFT JOIN {{ ref("exit_surveys") }} e
        ON a.user_id = e.user_id
)
SELECT *
FROM combined
ORDER BY user_id ASC, activity_date DESC

Explanation

Step 1: Deduplicate activities with QUALIFY and ROW_NUMBER

SELECT
    user_id,
    activity_date,
    activity_type
FROM {{ ref("activities") }}
QUALIFY ROW_NUMBER() OVER (
    PARTITION BY user_id, activity_date, activity_type
    ORDER BY activity_date
) = 1

Snowflake's QUALIFY clause filters on the result of a window function without needing a subquery or CTE wrapper. ROW_NUMBER() assigns a sequential number within each partition of identical rows. Keeping only = 1 removes all duplicates. This is equivalent to SELECT DISTINCT on those three columns, but QUALIFY with ROW_NUMBER is the standard Snowflake pattern for deduplication and extends naturally to cases where you need to pick a specific row (e.g., the most recent).

Step 2: LEFT JOIN all three sources

FROM {{ ref("accounts") }} a
LEFT JOIN deduped_activities act
    ON a.user_id = act.user_id
LEFT JOIN {{ ref("exit_surveys") }} e
    ON a.user_id = e.user_id

Using {{ ref("accounts") }} as the base table with LEFT JOIN ensures every account appears in the output. Users without any activity records will have NULLs in the activity columns, and users without an exit survey will have NULLs in the survey columns. If a user has multiple distinct activities after deduplication, the LEFT JOIN produces one output row per activity.

Step 3: Sort the result

ORDER BY user_id ASC, activity_date DESC

The primary sort groups all rows for the same user together. The secondary sort (descending on activity_date) places the most recent activity first within each user. NULL activity dates (for users with no activities) naturally sort last in descending order.

Alternative: Using SELECT DISTINCT instead of QUALIFY

WITH deduped_activities AS (
    SELECT DISTINCT
        user_id,
        activity_date,
        activity_type
    FROM {{ ref("activities") }}
)
SELECT
    a.user_id,
    a.account_created_date,
    a.location,
    da.activity_date,
    da.activity_type,
    es.exit_date,
    es.exit_reason
FROM {{ ref("accounts") }} a
LEFT JOIN deduped_activities da
    ON a.user_id = da.user_id
LEFT JOIN {{ ref("exit_surveys") }} es
    ON a.user_id = es.user_id
ORDER BY a.user_id ASC, da.activity_date DESC

For this particular case, SELECT DISTINCT produces the same result since we want exactly one row per unique combination and do not need to choose between non-duplicate columns. The QUALIFY ROW_NUMBER() approach is more flexible for real-world deduplication where rows differ in some columns.

Quick Solution

Code Environment

Track

	Question	Difficulty	Company	Access

Need more practice in this area? Explore more questions →