Anonymize User PII Data for a Social Media Platform
Beginner Mode
Scenario
A social media company stores user information including email addresses and phone numbers, and needs to protect user privacy before sharing the data.
Task
Write a Snowflake SQL query that:
- Reads from
{{ ref("input_df") }} - Extracts the domain name from the
emailcolumn (everything after the@symbol) and aliases it asemail_domain - Anonymizes the
phonecolumn by replacing the first 6 digits with asterisks (******) and keeping the last 4 digits, aliased asanon_phone - Returns
user_id,email_domain, andanon_phone
Schema
input_df
| Column | Type | Description |
|---|---|---|
| user_id | Integer | Unique user identifier |
| String | User email address | |
| phone | Integer | 10-digit phone number |
Example
input_df:
| user_id | phone | |
|---|---|---|
| 1 | [email protected] | 5551234567 |
| 2 | [email protected] | 5559876543 |
| 3 | [email protected] | 5551239876 |
| 4 | [email protected] | 5554567890 |
| 5 | [email protected] | 5559871234 |
Expected Output:
| user_id | email_domain | anon_phone |
|---|---|---|
| 1 | example.com | ******4567 |
| 2 | domain.net | ******6543 |
| 3 | email.org | ******9876 |
| 4 | site.com | ******7890 |
| 5 | platform.io | ******1234 |
Note: The email domain is the portion after the
@symbol. The phone is masked so that only the last 4 digits are visible.
Code Environment
Sign in or try as guest to run your code.
Track
| Question | Difficulty | Company | Access |
|---|
Need more practice in this area? Explore more questions →
Atlassian