39. Anonymize User PII Data for a Social Media Platform
Beginner Mode
Scenario
A social media company stores user information including email addresses and phone numbers, and needs to protect user privacy before sharing the data.
Task
Write a Snowflake SQL query that:
- Reads from
{{ ref("input_df") }} - Extracts the domain name from the
emailcolumn (everything after the@symbol) and aliases it asemail_domain - Anonymizes the
phonecolumn by replacing the first 6 digits with asterisks (******) and keeping the last 4 digits, aliased asanon_phone - Returns
user_id,email_domain, andanon_phone
Schema
input_df
| Column | Type | Description |
|---|---|---|
| user_id | Integer | Unique user identifier |
| String | User email address | |
| phone | Integer | 10-digit phone number |
Example
input_df:
| user_id | phone | |
|---|---|---|
| 1 | [email protected] | 5551234567 |
| 2 | [email protected] | 5559876543 |
| 3 | [email protected] | 5551239876 |
| 4 | [email protected] | 5554567890 |
| 5 | [email protected] | 5559871234 |
Expected Output:
| user_id | email_domain | anon_phone |
|---|---|---|
| 1 | example.com | ******4567 |
| 2 | domain.net | ******6543 |
| 3 | email.org | ******9876 |
| 4 | site.com | ******7890 |
| 5 | platform.io | ******1234 |
Note: The email domain is the portion after the
@symbol. The phone is masked so that only the last 4 digits are visible.
Code Environment
Sign in or try as guest to run your code.
Essential
SQL 0/33
Spark 0/20
Snowflake 0/22
Python 0/24
Need more practice in this area? Explore more questions →
Atlassian
Revolut
Accenture
Adobe
Google
LinkedIn
Samsung
Datadog
Wix
Dropbox
Meta
OpenAI
Hulu
Uber
X
DoorDash
Anthropic
Amazon
ActivisionBlizzard
Vercel
Crypto.Com
Zscaler
DeutscheBank
Apple
GoDaddy
GitLab
BMW
PayPal
Snowflake
AMD
Twilio
JPMorgan
NVIDIA
IBM
Databricks
Coinbase
Cisco
Robinhood
Twitter
Microsoft
Palantir
Netflix
VMware
Cloudflare
Stripe
Lyft
Salesforce
GitHub
Bloomberg
Airbnb
Walmart
SAP
HashiCorp
Instacart
Mastercard
Intel
Visa
Tesla