Partition CSV Data into Monthly Parquet Files
Beginner Mode
Start your terminal to use beginner mode.
Scenario
A large CSV file containing transaction data needs to be partitioned into separate Parquet files for efficient querying and storage.
Task
Write a Python script at /home/interview/partition_data.py using pandas that reads /home/interview/transactions.csv, partitions the data by month based on the transaction_date column, and saves each month as a separate Parquet file in /home/interview/output/ with the naming format YYYY-MM.parquet.
Note: pandas and pyarrow are already installed.
Example
Directory structure after partitioning:
/home/interview/output/
├── 2024-01.parquet
├── 2024-02.parquet
├── 2024-03.parquet
...
└── 2024-12.parquet
Terminal requires a larger screen
Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.
Linux Terminal Environment
Write and execute your solution in the terminal below.
Track
| Question | Difficulty | Company | Access |
|---|
Need more practice in this area? Explore more questions →
DoorDash