Scenario
A log file contains entries with timestamps in multiple formats that need to be extracted and normalized for analysis.
Task
Write a Python script at /home/interview/extract_timestamps.py that reads /home/interview/application.log, uses regular expressions to extract all timestamps regardless of format, converts them to ISO 8601 format, and saves them to /home/interview/timestamps.txt (one timestamp per line).
Example
Input (application.log):
2026-02-10T14:30:45Z [ERROR] api.controller - Connection timeout
10/Feb/2026:15:45:30 +0000 [INFO] nginx.access - GET /api/users 200
Mon, 10 Feb 2026 16:20:15 +0000 [WARNING] system.auth - Failed login attempt
Expected output (timestamps.txt):
2026-02-10T14:30:45Z
2026-02-10T15:45:30Z
2026-02-10T16:20:15Z
Step 1: Examine the log file
head -20 /home/interview/application.log
Observe the different timestamp formats at the beginning of each line.
Step 2: Create the Python script
nano /home/interview/extract_timestamps.py
Write a script using regex to extract timestamps and convert them to ISO 8601:
import re
from datetime import datetime
# Define regex patterns for different timestamp formats
patterns = {
'iso': r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z',
'apache': r'\d{2}/[A-Z][a-z]{2}/\d{4}:\d{2}:\d{2}:\d{2} \+\d{4}',
'rfc2822': r'[A-Z][a-z]{2}, \d{2} [A-Z][a-z]{2} \d{4} \d{2}:\d{2}:\d{2} \+\d{4}'
}
# Combine patterns
combined_pattern = '|'.join(f'({p})' for p in patterns.values())
normalized_timestamps = []
# Read log file and extract timestamps
with open('/home/interview/application.log', 'r') as f:
for line in f:
match = re.search(combined_pattern, line)
if match:
timestamp_str = match.group(0)
# Parse and convert to ISO 8601
if re.match(patterns['iso'], timestamp_str):
# Already in ISO format
normalized = timestamp_str
elif re.match(patterns['apache'], timestamp_str):
# Parse: 10/Feb/2026:14:30:45 +0000
dt = datetime.strptime(timestamp_str, '%d/%b/%Y:%H:%M:%S %z')
normalized = dt.strftime('%Y-%m-%dT%H:%M:%SZ')
elif re.match(patterns['rfc2822'], timestamp_str):
# Parse: Mon, 10 Feb 2026 14:30:45 +0000
dt = datetime.strptime(timestamp_str, '%a, %d %b %Y %H:%M:%S %z')
normalized = dt.strftime('%Y-%m-%dT%H:%M:%SZ')
normalized_timestamps.append(normalized)
# Save normalized timestamps to file
with open('/home/interview/timestamps.txt', 'w') as f:
for ts in normalized_timestamps:
f.write(ts + '\n')
print(f"Extracted and normalized {len(normalized_timestamps)} timestamps")
Step 3: Run the script
python3 /home/interview/extract_timestamps.py
Step 4: Verify the output
head -20 /home/interview/timestamps.txt
wc -l /home/interview/timestamps.txt
Should show 1000 timestamps all in ISO 8601 format.