Extract Schema Information from Parquet File Using PyArrow
Beginner Mode

Start your terminal to use beginner mode.

Scenario

A Parquet file contains structured data and you need to extract and document its schema information for analysis purposes.

Task

Write a Python script at /home/interview/extract_schema.py using pyarrow that reads /home/interview/data.parquet, extracts the schema information (column names, data types, compression codec, row count, and file size), and saves the output as JSON to /home/interview/schema_info.json.

Note: The pyarrow module is already installed.

Example

Expected output format in /home/interview/schema_info.json:

{
  "file": "/home/interview/data.parquet",
  "row_count": 390,
  "file_size_bytes": 45632,
  "file_size_kb": 44.56,
  "compression_codec": "SNAPPY",
  "columns": [
    {
      "name": "id",
      "type": "int64"
    },
    {
      "name": "name",
      "type": "string"
    },
    ...
  ]
}

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Sign In

Track

Question Difficulty Company Access
Need more practice in this area? Explore more questions →