Scenario
A data file contains mixed record types with different delimiters. Each row starts with a type indicator that determines its format and delimiter.
Task
Create a Python script at /home/interview/parse_mixed.py that reads /home/interview/mixed_data.txt, parses rows based on their type indicator and delimiter, and saves each type to separate CSV files: /home/interview/customers.csv, /home/interview/products.csv, and /home/interview/orders.csv.
Output Format
Each output CSV should exclude the type field and include proper headers:
| File |
Columns |
| customers.csv |
customer_id, name, email, country |
| products.csv |
product_id, name, category, price |
| orders.csv |
order_id, customer_id, product_id, quantity, date |
Example
Input (mixed_data.txt):
CUSTOMER,C001,John Doe,[email protected],USA
PRODUCT|P001|Laptop|Electronics|999.99
ORDER;O001;C001;P001;2;2026-02-15
Output (customers.csv):
customer_id,name,email,country
C001,John Doe,[email protected],USA
Step 1: Examine the input file
head -20 /home/interview/mixed_data.txt
Review the different row types and their delimiters to understand the parsing requirements.
Step 2: Create the Python script
nano /home/interview/parse_mixed.py
Write a script that parses each row type with its specific delimiter:
import csv
# Read and separate lines by type
customer_lines = []
product_lines = []
order_lines = []
with open('/home/interview/mixed_data.txt', 'r') as f:
for line in f:
line = line.strip()
if line.startswith('CUSTOMER'):
customer_lines.append(line.split(','))
elif line.startswith('PRODUCT'):
product_lines.append(line.split('|'))
elif line.startswith('ORDER'):
order_lines.append(line.split(';'))
# Write CUSTOMER records
with open('/home/interview/customers.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['customer_id', 'name', 'email', 'country'])
for row in customer_lines:
writer.writerow(row[1:]) # Skip type field
# Write PRODUCT records
with open('/home/interview/products.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['product_id', 'name', 'category', 'price'])
for row in product_lines:
writer.writerow(row[1:]) # Skip type field
# Write ORDER records
with open('/home/interview/orders.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(['order_id', 'customer_id', 'product_id', 'quantity', 'date'])
for row in order_lines:
writer.writerow(row[1:]) # Skip type field
print(f"Parsed {len(customer_lines)} customers, {len(product_lines)} products, {len(order_lines)} orders")
The script reads each line, identifies its type, splits by the appropriate delimiter, and writes to separate CSV files.
Step 3: Run the script
python3 /home/interview/parse_mixed.py
Step 4: Verify the output files
head /home/interview/customers.csv
head /home/interview/products.csv
head /home/interview/orders.csv
Each file should contain properly parsed data with appropriate headers.