Amazon: Extract and Normalize Timestamps from Multi-Format Log File — Data Engineering Interview Q&A (2026)

All Questions

Extract and Normalize Timestamps from Multi-Format Log File

Amazon ☯️ Medium Programming Python

Beginner Mode

Start your terminal to use beginner mode.

Scenario

A log file contains entries with timestamps in multiple formats that need to be extracted and normalized for analysis.

Task

Write a Python script at /home/interview/extract_timestamps.py that reads /home/interview/application.log, uses regular expressions to extract all timestamps regardless of format, converts them to ISO 8601 format, and saves them to /home/interview/timestamps.txt (one timestamp per line).

Example

Input (application.log):

2026-02-10T14:30:45Z [ERROR] api.controller - Connection timeout
10/Feb/2026:15:45:30 +0000 [INFO] nginx.access - GET /api/users 200
Mon, 10 Feb 2026 16:20:15 +0000 [WARNING] system.auth - Failed login attempt

Expected output (timestamps.txt):

2026-02-10T14:30:45Z
2026-02-10T15:45:30Z
2026-02-10T16:20:15Z

Step 1: Examine the log file

head -20 /home/interview/application.log

Observe the different timestamp formats at the beginning of each line.

Step 2: Create the Python script

nano /home/interview/extract_timestamps.py

Write a script using regex to extract timestamps and convert them to ISO 8601:

import re
from datetime import datetime

# Define regex patterns for different timestamp formats
patterns = {
    'iso': r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z',
    'apache': r'\d{2}/[A-Z][a-z]{2}/\d{4}:\d{2}:\d{2}:\d{2} \+\d{4}',
    'rfc2822': r'[A-Z][a-z]{2}, \d{2} [A-Z][a-z]{2} \d{4} \d{2}:\d{2}:\d{2} \+\d{4}'
}

# Combine patterns
combined_pattern = '|'.join(f'({p})' for p in patterns.values())

normalized_timestamps = []

# Read log file and extract timestamps
with open('/home/interview/application.log', 'r') as f:
    for line in f:
        match = re.search(combined_pattern, line)
        if match:
            timestamp_str = match.group(0)
            
            # Parse and convert to ISO 8601
            if re.match(patterns['iso'], timestamp_str):
                # Already in ISO format
                normalized = timestamp_str
            elif re.match(patterns['apache'], timestamp_str):
                # Parse: 10/Feb/2026:14:30:45 +0000
                dt = datetime.strptime(timestamp_str, '%d/%b/%Y:%H:%M:%S %z')
                normalized = dt.strftime('%Y-%m-%dT%H:%M:%SZ')
            elif re.match(patterns['rfc2822'], timestamp_str):
                # Parse: Mon, 10 Feb 2026 14:30:45 +0000
                dt = datetime.strptime(timestamp_str, '%a, %d %b %Y %H:%M:%S %z')
                normalized = dt.strftime('%Y-%m-%dT%H:%M:%SZ')
            
            normalized_timestamps.append(normalized)

# Save normalized timestamps to file
with open('/home/interview/timestamps.txt', 'w') as f:
    for ts in normalized_timestamps:
        f.write(ts + '\n')

print(f"Extracted and normalized {len(normalized_timestamps)} timestamps")

Step 3: Run the script

python3 /home/interview/extract_timestamps.py

Step 4: Verify the output

head -20 /home/interview/timestamps.txt
wc -l /home/interview/timestamps.txt

Should show 1000 timestamps all in ISO 8601 format.

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Track

	Question	Difficulty	Company	Access

Need more practice in this area? Explore more questions →