Scrape Meta Tags from Multiple Web Pages into JSON Dataset
Beginner Mode
Start your terminal to use beginner mode.
Scenario
Multiple web pages need to be analyzed for their meta tag information. You need to scrape these pages and compile the meta data into a structured format.
Task
Write a Python script at /home/interview/scrape_meta.py that reads URLs from /home/interview/urls.txt, scrapes each page to extract all meta tags with a name attribute, and saves the compiled data to /home/interview/meta_data.json as a JSON array.
Note: BeautifulSoup is already installed for HTML parsing. Each page may have different meta tags.
Example
Expected output format in /home/interview/meta_data.json:
[
{
"url": "http://pages.local/page1",
"meta": {
"description": "A comprehensive guide to...",
"keywords": "python, web scraping, tutorial",
"author": "John Doe"
}
},
{
"url": "http://pages.local/page2",
"meta": {
"category": "Technology",
"rating": "5 stars"
}
}
]
Terminal requires a larger screen
Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.
Linux Terminal Environment
Write and execute your solution in the terminal below.
Track
| Question | Difficulty | Company | Access |
|---|
Need more practice in this area? Explore more questions →
Twilio