PayPal: Scrape Multi-Page E-commerce Data with BeautifulSoup — Data Engineering Interview Q&A (2026)

Scrape Multi-Page E-commerce Data with BeautifulSoup

Beginner Mode

Start your terminal to use beginner mode.

Scenario

A website has a product listing page with links to individual product detail pages. You need to scrape data from both the list page and detail pages to create a complete dataset.

Task

Write a Python script at /home/interview/scrape_products.py that scrapes the product listing at http://shop.local/products/, follows links to individual product pages, extracts information from both pages, and saves the combined data to /home/interview/products.csv.

Note: The beautifulsoup4 and requests packages are already installed.

Example

Expected output in /home/interview/products.csv:

id,name,price,brand,description,stock_status,rating
1,Wireless Mouse,$29.99,TechBrand,High-precision wireless mouse...,In Stock,4.5
2,Mechanical Keyboard,$89.99,KeyMaster,RGB mechanical keyboard...,In Stock,4.8
...

Step 1: Explore the website structure

curl http://shop.local/products/

The listing page shows product cards in a grid. Each detail page contains additional information like brand, description, stock status, and rating.

Step 2: Create the scraping script

nano /home/interview/scrape_products.py

Write a script that scrapes both the list page and detail pages:

import requests
from bs4 import BeautifulSoup
import csv

# Scrape the main product listing page
list_url = 'http://shop.local/products/'
response = requests.get(list_url)
soup = BeautifulSoup(response.content, 'html.parser')

products = []

# Find all product cards
for card in soup.find_all('div', class_='product-card'):
    name = card.find('div', class_='product-name').text.strip()
    price = card.find('div', class_='price').text.strip()
    detail_link = card.find('a', class_='btn')['href']
    
    # Build full URL for detail page
    detail_url = f'http://shop.local{detail_link}'
    
    # Scrape the detail page
    detail_response = requests.get(detail_url)
    detail_soup = BeautifulSoup(detail_response.content, 'html.parser')
    
    # Extract additional details from product-id div
    product_id_text = detail_soup.find('div', class_='product-id').text
    product_id = product_id_text.split('|')[0].replace('Product ID:', '').strip()
    
    brand = detail_soup.find('div', class_='brand').text.replace('Brand:', '').strip()
    description = detail_soup.find('div', class_='description').find('br').next_sibling.strip()
    
    stock_div = detail_soup.find('div', class_='stock')
    stock_status = stock_div.text.replace('Status:', '').strip()
    
    rating_text = detail_soup.find('div', class_='rating').text
    rating = rating_text.split()[1].split('/')[0]
    
    # Combine data from both pages
    product = {
        'id': product_id,
        'name': name,
        'price': price,
        'brand': brand,
        'description': description,
        'stock_status': stock_status,
        'rating': rating
    }
    products.append(product)

# Save to CSV
with open('/home/interview/products.csv', 'w', newline='') as csvfile:
    fieldnames = ['id', 'name', 'price', 'brand', 'description', 'stock_status', 'rating']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    
    writer.writeheader()
    writer.writerows(products)

print(f"Scraped {len(products)} products")

Step 3: Run the script

python3 /home/interview/scrape_products.py

Step 4: Verify the output

head /home/interview/products.csv
wc -l /home/interview/products.csv

Should show 51 lines (header + 50 products) with all columns populated.

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Track

	Question	Difficulty	Company	Access

Need more practice in this area? Explore more questions →