Select Specific Columns from Parquet File
Beginner Mode

Start your terminal to use beginner mode.

Scenario

A Parquet file contains customer data with many columns, but only a subset of columns is needed for analysis.

Task

Write a Python script at /home/interview/select_columns.py using pandas that reads /home/interview/customers.parquet, selects only the columns id, first_name, last_name, email, and total_purchases, and writes the result to /home/interview/selected_data.parquet.

Note: pandas and pyarrow are already installed.

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Sign In

Track

Question Difficulty Company Access
Need more practice in this area? Explore more questions →