60. Mountain Climber Logs
Beginner Mode

Start your terminal to use beginner mode.

Sign in to watch the walkthrough video

Sign In

Objective

You are analyzing expedition logs. You have two DataFrames: mountain_info, containing details about various peaks, and mountain_climbers, which logs individual ascents.

Task

Write a PySpark script to find the most recent climber for each mountain. Your output should only contain mountains that have been climbed by at least one person.

The final DataFrame must be saved as result_df and include the mountain's name, the last climber's name, and the date and time of that most recent climb. Rename the output columns to match the expected schema below.

File Path

  • Mountain Info Dataset: /home/interview/mountain_info.csv
  • Mountain Climbers Dataset: /home/interview/mountain_climbers.csv
  • Starter script: /home/interview/latest_climbers.py

Schema

mountain_info.csv

Column Name Data Type
name string
height integer
country string
range string

mountain_climbers.csv

Column Name Data Type
climber_name string
mountain_name string
climb_date date
climb_time double

Expected Output Schema

Column Name Data Type
mountain_name string
last_climber_name string
last_climb_date date
last_climb_time double

Example

Given this sample input:

mountain_info

name height country range
Mount Everest 8848 Nepal Himalayas
Mount Kilimanjaro 5895 Tanzania Kilimanjaro
Mount Denali 6190 USA Alaska
Mount Fuji 3776 Japan Fuji
Mont Blanc 4808 France Alps

mountain_climbers

climber_name mountain_name climb_date climb_time
John Mount Everest 2020-01-01 8.5
Jane Mount Everest 2022-02-02 9.0
Jim Mount Kilimanjaro 2021-03-03 6.0
Jess Mount Kilimanjaro 2022-04-04 7.0
Joe Mount Denali 2022-05-05 10.0
Jill Mount Denali 2021-06-06 11.0

The output would be:

mountain_name last_climber_name last_climb_date last_climb_time
Mount Everest Jane 2022-02-02 9.0
Mount Kilimanjaro Jess 2022-04-04 7.0
Mount Denali Joe 2022-05-05 10.0

Notice how Mount Fuji and Mont Blanc are excluded because they do not appear in the climbers log in this specific excerpt. For Everest, Jane's 2022 climb is kept over John's 2020 climb.

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Sign In

Essential

SQL 0/33
Spark 0/20
Snowflake 0/22
Python 0/24
Question Difficulty Company Access
Managing High I/O Processes Easy Revolut Free
Docker Multi-Architecture Image Easy Accenture Free
Average Order Value Easy Accenture Free
Join Employees and Departments Easy Adobe Free
Filter Orders by Date Range Easy Google Free
Find Customers Without Orders Easy LinkedIn Free
Use COALESCE for Null Handling Easy Samsung Free
Merge Multiple Address Fields Easy Datadog Free
String Concatenation in SELECT Easy Wix Free
Find Nth Highest Revenue Easy Dropbox Free
Self-Join to Identify Missing Supervisors Easy Meta Free
Year-over-Year Revenue Growth Easy OpenAI Free
Above Average Price Products Medium Hulu Free
Calculate Cumulative Sales Medium Uber Free
Find Overlapping Date Ranges Medium X Free
Set Operation: INTERSECT Medium DoorDash Free
Subquery for Best Order per Customer Medium Anthropic Free
Ranking with Dense_Rank Medium Amazon Free
Median Salary by Job Title Medium ActivisionBlizzard Free
String Splitting and Aggregation Medium Vercel Free
Salary Comparison with CTE Aggregation Medium Crypto.Com Free
String Pattern Extraction in Descriptions Medium Zscaler Free
Nested Subquery for Latest Record Medium DoorDash Free
Window Function for Moving Average Medium DeutscheBank Free
Re-enrollment Rate Calculator Medium Google Free
String Pattern Matching Using LIKE Medium Apple Free
Merge Employee and Department Records Hard Anthropic Free
Sequence Products by Price Hard GoDaddy Free
Combine Data from Multiple Sources into Unified Report Hard Vercel Free
Export SQLite Database to Parquet Format with Metadata Hard GitLab Free
Top Categories by Average Price Hard Samsung Free
Customer Order Aggregation Medium BMW Free
Filter Popular Videos on a Streaming Platform Easy Apple Free
Replace Keywords in Social Media Post Text Easy PayPal Free
Filter Movies with Missing Box Office Data Easy DoorDash Free
Daily Category Sales Easy Snowflake Free
Filter and Uppercase Artifacts Easy AMD Free
Combine Customer Orders and Products Medium Twilio Free
Anonymize User PII Data for a Social Media Platform Medium Atlassian Free
Product Sales and Inventory Data Medium PayPal Free
Products and Duplicates Medium JPMorgan Free
Mortgage Rate Calculator Medium NVIDIA Free
Weekend Order Detection Medium IBM Free
Flooring Company Data Medium Databricks Free
Rank Top Products by Revenue per Category Hard Coinbase Free
Highest SEO Score Pages per Domain Hard Cisco Free
Math Expressions Hard IBM Free
CSV and Partitions Easy Atlassian Free
Repartition Easy Robinhood Free
Broadcast Join Easy Databricks Free
Correcting Social Media Posts Easy Twitter Free
Daily Category Sales Aggregation Easy Microsoft Free
Cache and Performance Medium Palantir Free
Filter Popular Videos Medium Netflix Free
Anonymize User PII Medium Meta Free
Call Center Daily Stats Medium VMware Free
Venture Capital Sector Analysis Medium Cloudflare Free
Window Functions without Partitions Medium Google Free
Calculating PE Portfolio Values Medium IBM Free
Mountain Climber Logs Hard Stripe Free
Global & Domain SEO Leaders Hard Amazon Free
Tracking Customer Purchase History Hard Coinbase Free
Merge Customer Records from Two Sources Easy Lyft Free
Filter Funded Startups Easy Salesforce Free
Assign Row Numbers to Authors per Paper Medium Cloudflare Free
Amusement Park Rating Anomalies Medium GitHub Free
Usage and Accuracy per Model Type Medium VMware Free
Find the Last Climber per Mountain Medium Bloomberg Free
Track Product Purchases Hard Microsoft Free
Most Common Order Status Easy Airbnb Free
Calculating Overtime Pay Easy Cisco Free
Top Products by Revenue Medium Walmart Free
Product Summary Medium Amazon Free
Parsing Comma-Separated Values Medium Revolut Free
CSV Row Filter and Count Easy DoorDash Free
Analyze Sales Dataset Dimensions and Calculate Total Revenue Easy Databricks Free
Sort Avro Employee Records by Salary Easy GitHub Free
Count User Events from JSON Activity Logs Easy Uber Free
Split Delimited Column into Separate Columns with Pandas Easy Snowflake Free
Compare SQLite Database and CSV File Records Easy Robinhood Free
Analyze DataFrame Memory Usage Easy SAP Free
Time-Series Rolling Window Analysis for Multi-Stock Price Data Medium HashiCorp Free
Flatten Nested JSON to CSV with Dot-Notation Columns Medium Amazon Free
Calculate Descriptive Statistics for Numeric Columns in Pandas Easy Google Free
Decompose Time-Series Data into Trend, Seasonal, and Residual Components Medium Instacart Free
Extract Schema Information from Parquet File Using PyArrow Easy Palantir Free
Select Specific Columns from Parquet File Easy OpenAI Free
Flatten Nested Struct Columns in Parquet and Export to CSV Medium Coinbase Free
Merge Customer and Purchase Data Using Pandas Easy Mastercard Free
SQL JOIN with Pandas Data Processing and CSV Export Medium Intel Free
Insert New Records into SQLite Database from CSV Medium Visa Free
Aggregate SQL Query Results with Pandas and Export to Excel Medium Meta Free
Aggregate Time-Series Data into Fixed Time Windows Hard Tesla Free
Interpolate Missing Values in Irregular Time-Series Sensor Data Hard VMware Free
Remove Seasonal Effects from Time-Series Sales Data Hard Cloudflare Free
Convert Excel Files with Multiple Sheets to Individual CSV Files Easy Airbnb Free
Need more practice in this area? Explore more questions →