Start your terminal to use beginner mode.
Objective
You are given a DataFrame containing information about webpages and their SEO (Search Engine Optimization) scores.
Task
Write a function that returns the pages with the highest SEO score for each domain, and also the pages with the highest SEO score among all domains. Save your result as result_df.
File Path
- Dataset:
/home/interview/pages.csv - Starter script:
/home/interview/seo_optimization.py
Schema
pages.csv
| Column Name | Data Type | Description |
|---|---|---|
| domain | string | The domain name of the webpage |
| url | string | The URL of the webpage |
| seo_score | integer | The SEO score of the webpage |
Expected Output Schema
| Column Name | Data Type | Description |
|---|---|---|
| domain | string | The domain name of the webpage |
| highest_seo_page | string | The URL of the webpage with the highest SEO score within the domain |
| highest_seo_score | integer | The SEO score of the webpage with the highest SEO score within the domain |
| overall_highest_page | string | The URL of the webpage with the highest SEO score among all domains |
| overall_highest_score | double | The SEO score of the webpage with the highest SEO score among all domains |
Constraints:
- The input DataFrame will have at least 1 row and at most 1000 rows.
- The
seo_scorecolumn will have values in the range of 0 to 100 (inclusive). - The
domainandurlcolumns will have at most length 255.
Example
Given this sample input:
pages
| domain | url | seo_score |
|---|---|---|
| example.com | https://www.example.com/page1 | 88 |
| example.com | https://www.example.com/page2 | 92 |
| example.com | https://www.example.com/page3 | 80 |
| example.net | https://www.example.net/page1 | 75 |
| example.net | https://www.example.net/page2 | 90 |
| example.org | https://www.example.org/page1 | 82 |
| example.org | https://www.example.org/page2 | 85 |
The output would be:
| domain | highest_seo_page | highest_seo_score | overall_highest_page | overall_highest_score |
|---|---|---|---|---|
| example.com | https://www.example.com/page2 | 92 | https://www.example.com/page2 | 92.0 |
| example.net | https://www.example.net/page2 | 90 | null | null |
| example.org | https://www.example.org/page2 | 85 | null | null |
Example.com's page2 is the highest in its domain (92) and also the highest across all domains, so it fills the overall columns. Example.net and example.org have local maximums but do not match the global maximum, so their overall columns are null.
Terminal requires a larger screen
Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.
Linux Terminal Environment
Write and execute your solution in the terminal below.
Amazon
Revolut
Accenture
Adobe
Google
LinkedIn
Samsung
Datadog
Wix
Dropbox
Meta
OpenAI
Hulu
Uber
X
DoorDash
Anthropic
ActivisionBlizzard
Vercel
Crypto.Com
Zscaler
DeutscheBank
Apple
GoDaddy
GitLab
BMW
PayPal
Snowflake
AMD
Twilio
Atlassian
JPMorgan
NVIDIA
IBM
Databricks
Coinbase
Cisco
Robinhood
Twitter
Microsoft
Palantir
Netflix
VMware
Cloudflare
Stripe
Lyft
Salesforce
GitHub
Bloomberg
Airbnb
Walmart
SAP
HashiCorp
Instacart
Mastercard
Intel
Visa
Tesla