Stratified Sampling from Dataset by Region
Beginner Mode

Start your terminal to use beginner mode.

Scenario

A customer dataset contains records from multiple regions with varying sizes. You need to create a representative sample that maintains the proportional distribution of each region.

Task

Write a Python script at /home/interview/stratified_sample.py using pandas that reads /home/interview/customers.csv, extracts a 10% stratified sample (proportionally representative from each region), and writes the result to /home/interview/sample_data.csv.

Note: pandas is already installed.

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Sign In

Track

Question Difficulty Company Access
Need more practice in this area? Explore more questions →