59. Calculating PE Portfolio Values
Beginner Mode

Start your terminal to use beginner mode.

Objective

You work for a Private Equity (PE) Firm and are given two DataFrames: portfolio, which contains the companies that a private equity firm holds, and prices, which contains the daily price movements for the equities.

Task

Write a PySpark function to merge these two datasets and compute the total daily portfolio value for each private equity firm.

To find the daily portfolio value, you must multiply the number of shares a firm holds in a company by the closing_price of that company on a specific date, and then sum those values up for the entire firm.

Save your resulting DataFrame as result_df. Ensure the output strictly matches the requested Output Schema, casting the final portfolio_value to an Integer. Order the output alphabetically by PE_firm, and then chronologically by date.

File Path

  • Portfolio Dataset: /home/interview/portfolio.csv
  • Prices Dataset: /home/interview/prices.csv
  • Starter script: /home/interview/portfolio_values.py

Schema

portfolio.csv

Column Name Data Type Description
PE_firm String The name of the private equity firm
company String The name of the company
shares Integer The number of shares the firm holds in the company

prices.csv

Column Name Data Type Description
date Date The date
company String The name of the company
closing_price Double The closing price of the company's equity on the date

Expected Output Schema

Column Name Data Type Description
PE_firm String The name of the private equity firm
date Date The date
portfolio_value Integer The daily portfolio value of the private equity firm

Example

Given this sample input:

portfolio

PE_firm company shares
Alpha A 1000
Alpha B 2000
Beta A 1500
Beta C 2500
Gamma B 1200
Gamma C 1300

prices

date company closing_price
2023-01-01 A 50.0
2023-01-01 B 20.0
2023-01-01 C 30.0
2023-01-02 A 52.0
2023-01-02 B 21.0
2023-01-02 C 31.0

The expected output would be:

PE_firm date portfolio_value
Alpha 2023-01-01 90000
Alpha 2023-01-02 94000
Beta 2023-01-01 150000
Beta 2023-01-02 155500
Gamma 2023-01-01 63000
Gamma 2023-01-02 65500

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Sign In

Essential

SQL 0/33
Spark 0/20
Snowflake 0/22
Python 0/24
Question Difficulty Company Access
Managing High I/O Processes Easy Revolut Free
Docker Multi-Architecture Image Easy Accenture Free
Average Order Value Easy Accenture Free
Join Employees and Departments Easy Adobe Free
Filter Orders by Date Range Easy Google Free
Find Customers Without Orders Easy LinkedIn Free
Use COALESCE for Null Handling Easy Samsung Free
Merge Multiple Address Fields Easy Datadog Free
String Concatenation in SELECT Easy Wix Free
Find Nth Highest Revenue Easy Dropbox Free
Self-Join to Identify Missing Supervisors Easy Meta Free
Year-over-Year Revenue Growth Easy OpenAI Free
Above Average Price Products Medium Hulu Free
Calculate Cumulative Sales Medium Uber Free
Find Overlapping Date Ranges Medium X Free
Set Operation: INTERSECT Medium DoorDash Free
Subquery for Best Order per Customer Medium Anthropic Free
Ranking with Dense_Rank Medium Amazon Free
Median Salary by Job Title Medium ActivisionBlizzard Free
String Splitting and Aggregation Medium Vercel Free
Salary Comparison with CTE Aggregation Medium Crypto.Com Free
String Pattern Extraction in Descriptions Medium Zscaler Free
Nested Subquery for Latest Record Medium DoorDash Free
Window Function for Moving Average Medium DeutscheBank Free
Re-enrollment Rate Calculator Medium Google Free
String Pattern Matching Using LIKE Medium Apple Free
Merge Employee and Department Records Hard Anthropic Free
Sequence Products by Price Hard GoDaddy Free
Combine Data from Multiple Sources into Unified Report Hard Vercel Free
Export SQLite Database to Parquet Format with Metadata Hard GitLab Free
Top Categories by Average Price Hard Samsung Free
Customer Order Aggregation Medium BMW Free
Filter Popular Videos on a Streaming Platform Easy Apple Free
Replace Keywords in Social Media Post Text Easy PayPal Free
Filter Movies with Missing Box Office Data Easy DoorDash Free
Daily Category Sales Easy Snowflake Free
Filter and Uppercase Artifacts Easy AMD Free
Combine Customer Orders and Products Medium Twilio Free
Anonymize User PII Data for a Social Media Platform Medium Atlassian Free
Product Sales and Inventory Data Medium PayPal Free
Products and Duplicates Medium JPMorgan Free
Mortgage Rate Calculator Medium NVIDIA Free
Weekend Order Detection Medium IBM Free
Flooring Company Data Medium Databricks Free
Rank Top Products by Revenue per Category Hard Coinbase Free
Highest SEO Score Pages per Domain Hard Cisco Free
Math Expressions Hard IBM Free
CSV and Partitions Easy Atlassian Free
Repartition Easy Robinhood Free
Broadcast Join Easy Databricks Free
Correcting Social Media Posts Easy Twitter Free
Daily Category Sales Aggregation Easy Microsoft Free
Cache and Performance Medium Palantir Free
Filter Popular Videos Medium Netflix Free
Anonymize User PII Medium Meta Free
Call Center Daily Stats Medium VMware Free
Venture Capital Sector Analysis Medium Cloudflare Free
Window Functions without Partitions Medium Google Free
Calculating PE Portfolio Values Medium IBM Free
Mountain Climber Logs Hard Stripe Free
Global & Domain SEO Leaders Hard Amazon Free
Tracking Customer Purchase History Hard Coinbase Free
Merge Customer Records from Two Sources Easy Lyft Free
Filter Funded Startups Easy Salesforce Free
Assign Row Numbers to Authors per Paper Medium Cloudflare Free
Amusement Park Rating Anomalies Medium GitHub Free
Usage and Accuracy per Model Type Medium VMware Free
Find the Last Climber per Mountain Medium Bloomberg Free
Track Product Purchases Hard Microsoft Free
Most Common Order Status Easy Airbnb Free
Calculating Overtime Pay Easy Cisco Free
Top Products by Revenue Medium Walmart Free
Product Summary Medium Amazon Free
Parsing Comma-Separated Values Medium Revolut Free
CSV Row Filter and Count Easy DoorDash Free
Analyze Sales Dataset Dimensions and Calculate Total Revenue Easy Databricks Free
Sort Avro Employee Records by Salary Easy GitHub Free
Count User Events from JSON Activity Logs Easy Uber Free
Split Delimited Column into Separate Columns with Pandas Easy Snowflake Free
Compare SQLite Database and CSV File Records Easy Robinhood Free
Analyze DataFrame Memory Usage Easy SAP Free
Time-Series Rolling Window Analysis for Multi-Stock Price Data Medium HashiCorp Free
Flatten Nested JSON to CSV with Dot-Notation Columns Medium Amazon Free
Calculate Descriptive Statistics for Numeric Columns in Pandas Easy Google Free
Decompose Time-Series Data into Trend, Seasonal, and Residual Components Medium Instacart Free
Extract Schema Information from Parquet File Using PyArrow Easy Palantir Free
Select Specific Columns from Parquet File Easy OpenAI Free
Flatten Nested Struct Columns in Parquet and Export to CSV Medium Coinbase Free
Merge Customer and Purchase Data Using Pandas Easy Mastercard Free
SQL JOIN with Pandas Data Processing and CSV Export Medium Intel Free
Insert New Records into SQLite Database from CSV Medium Visa Free
Aggregate SQL Query Results with Pandas and Export to Excel Medium Meta Free
Aggregate Time-Series Data into Fixed Time Windows Hard Tesla Free
Interpolate Missing Values in Irregular Time-Series Sensor Data Hard VMware Free
Remove Seasonal Effects from Time-Series Sales Data Hard Cloudflare Free
Convert Excel Files with Multiple Sheets to Individual CSV Files Easy Airbnb Free
Need more practice in this area? Explore more questions →