DoorDash: Partition CSV Data into Monthly Parquet Files — Data Engineering Interview Q&A (2026)

Partition CSV Data into Monthly Parquet Files

DoorDash ☯️ Medium Programming Python Pandas Parquet

Beginner Mode

Start your terminal to use beginner mode.

Scenario

A large CSV file containing transaction data needs to be partitioned into separate Parquet files for efficient querying and storage.

Task

Write a Python script at /home/interview/partition_data.py using pandas that reads /home/interview/transactions.csv, partitions the data by month based on the transaction_date column, and saves each month as a separate Parquet file in /home/interview/output/ with the naming format YYYY-MM.parquet.

Note: pandas and pyarrow are already installed.

Example

Directory structure after partitioning:

/home/interview/output/
├── 2024-01.parquet
├── 2024-02.parquet
├── 2024-03.parquet
...
└── 2024-12.parquet

Step 1: Examine the input data

head /home/interview/transactions.csv

Review the structure and date column format.

Step 2: Create the Python script

nano /home/interview/partition_data.py

Write a script to partition by month and save as Parquet:

import pandas as pd

# Read the CSV file
df = pd.read_csv('/home/interview/transactions.csv')

# Convert transaction_date to datetime
df['transaction_date'] = pd.to_datetime(df['transaction_date'])

# Extract year-month for partitioning
df['year_month'] = df['transaction_date'].dt.strftime('%Y-%m')

# Group by year-month and save each partition
for year_month, group in df.groupby('year_month'):
    # Drop the temporary year_month column
    group = group.drop('year_month', axis=1)
    
    # Save to parquet file
    output_file = f'/home/interview/output/{year_month}.parquet'
    group.to_parquet(output_file, index=False)
    print(f"Created {year_month}.parquet with {len(group)} records")

print(f"\nPartitioned {len(df)} total records into {df['year_month'].nunique()} files")

Step 3: Run the script

python3 /home/interview/partition_data.py

Step 4: Verify the output

ls -lh /home/interview/output/

Should show 12 Parquet files (2024-01.parquet through 2024-12.parquet).

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Track

	Question	Difficulty	Company	Access

Need more practice in this area? Explore more questions →