Parse HTML and Extract External Domain Links
Beginner Mode

Start your terminal to use beginner mode.

Scenario

A web page contains numerous hyperlinks to both internal pages and external websites. You need to extract and catalog all external domains for analysis.

Task

Write a Python script at /home/interview/extract_domains.py that fetches the HTML page from http://content.local, extracts all hyperlinks, filters for external links (links to domains other than content.local), and saves the unique external domains to /home/interview/external_domains.txt (one domain per line, including protocol).

Note: BeautifulSoup is already installed for HTML parsing.

Example

Expected output format in /home/interview/external_domains.txt:

http://news.example.com
https://blog.sample.org
http://cdn.resources.net

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Sign In

Track

Question Difficulty Company Access
Need more practice in this area? Explore more questions →