Twilio: Scrape Meta Tags from Multiple Web Pages into JSON Dataset — Data Engineering Interview Q&A (2026)

Scrape Meta Tags from Multiple Web Pages into JSON Dataset

Beginner Mode

Start your terminal to use beginner mode.

Scenario

Multiple web pages need to be analyzed for their meta tag information. You need to scrape these pages and compile the meta data into a structured format.

Task

Write a Python script at /home/interview/scrape_meta.py that reads URLs from /home/interview/urls.txt, scrapes each page to extract all meta tags with a name attribute, and saves the compiled data to /home/interview/meta_data.json as a JSON array.

Note: BeautifulSoup is already installed for HTML parsing. Each page may have different meta tags.

Example

Expected output format in /home/interview/meta_data.json:

[
  {
    "url": "http://pages.local/page1",
    "meta": {
      "description": "A comprehensive guide to...",
      "keywords": "python, web scraping, tutorial",
      "author": "John Doe"
    }
  },
  {
    "url": "http://pages.local/page2",
    "meta": {
      "category": "Technology",
      "rating": "5 stars"
    }
  }
]

Step 1: Examine the URLs file

cat /home/interview/urls.txt

Contains the list of URLs to scrape.

Step 2: Create the Python script

nano /home/interview/scrape_meta.py

Write a script to scrape all meta tags from each URL:

import requests
from bs4 import BeautifulSoup
import json

# Read URLs from file
with open('/home/interview/urls.txt', 'r') as f:
    urls = [line.strip() for line in f if line.strip()]

# Collect meta data from each page
meta_data = []

for url in urls:
    # Fetch the page
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Extract all meta tags with name attribute
    meta_tags = soup.find_all('meta', attrs={'name': True})
    
    # Build meta dictionary
    meta_dict = {}
    for tag in meta_tags:
        name = tag.get('name')
        content = tag.get('content', '')
        if name:
            meta_dict[name] = content
    
    # Build data object
    page_data = {
        'url': url,
        'meta': meta_dict
    }
    
    meta_data.append(page_data)

# Save to JSON file
with open('/home/interview/meta_data.json', 'w') as f:
    json.dump(meta_data, f, indent=2)

print(f"Scraped meta data from {len(meta_data)} pages")

Step 3: Run the script

python3 /home/interview/scrape_meta.py

Step 4: Verify the output

cat /home/interview/meta_data.json

Should show a JSON array with all meta tags from each page, with different fields per page.

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Track

	Question	Difficulty	Company	Access

Need more practice in this area? Explore more questions →