Clean Transactional Data
Beginner Mode
Scenario
You work at a financial services company that receives transactional data from various clients, but both tables contain invalid and duplicate primary keys that must be cleaned before joining.
Task
Write a Snowflake SQL query that:
- Filters
{{ ref("clients") }}to keep only rows whereclient_id > 0, then deduplicates byclient_id(keep the first occurrence usingROW_NUMBER) - Filters
{{ ref("transactions") }}to keep only rows wheretransaction_id > 0, then removes anytransaction_idthat appears more than once (drop all copies, not just extras) - Joins the cleaned transactions with the cleaned clients on
client_idusingINNER JOIN - Returns all six columns, casting
amounttoINTEGER
Schema
transactions
| Column | Type | Description |
|---|---|---|
| transaction_id | Integer | Unique identifier for each transaction |
| client_id | Integer | Foreign key referencing the clients table |
| date | String | Date of the transaction |
| amount | Float | Transaction amount in dollars |
clients
| Column | Type | Description |
|---|---|---|
| client_id | Integer | Unique identifier for each client |
| client_name | String | Name of the client |
| industry | String | Industry the client operates in |
Example
transactions:
| transaction_id | client_id | date | amount |
|---|---|---|---|
| 101 | 10 | 2024-03-01 | 500.0 |
| 102 | 10 | 2024-03-02 | 750.0 |
| 102 | 20 | 2024-03-03 | 300.0 |
| 103 | 30 | 2024-03-04 | 1200.0 |
| -104 | 40 | 2024-03-05 | 800.0 |
| 105 | 20 | 2024-03-06 | 950.0 |
clients:
| client_id | client_name | industry |
|---|---|---|
| 10 | Nexora | SaaS |
| 20 | Brightline | Fintech |
| 10 | Dawnfield | Healthcare |
| 30 | Crestview | Logistics |
| -50 | Vantage | Retail |
Expected Output:
| transaction_id | client_id | date | amount | client_name | industry |
|---|---|---|---|---|---|
| 101 | 10 | 2024-03-01 | 500 | Nexora | SaaS |
| 103 | 30 | 2024-03-04 | 1200 | Crestview | Logistics |
| 105 | 20 | 2024-03-06 | 950 | Brightline | Fintech |
Note: Transaction 102 is dropped because two rows share that ID (both copies removed). Transaction -104 is dropped for having a negative ID. Client 10 is duplicated, so only the first occurrence (Nexora) is kept. Client -50 is dropped for having a negative ID.
Code Environment
Sign in or try as guest to run your code.
Track
| Question | Difficulty | Company | Access |
|---|
Need more practice in this area? Explore more questions →
Vercel