52. Daily Category Sales Aggregation
Beginner Mode

Start your terminal to use beginner mode.

Objective

You are working for an outdoor supplies company that sells various items such as camping equipment, hiking gear, fishing equipment, etc. You are given two DataFrames: df_sales containing daily transaction logs, and df_products containing the product catalog.

Task

Write a PySpark function that joins the two DataFrames and calculates the total quantity sold for each product_category on a daily basis.

Save your resulting DataFrame as result_df. Order the final output chronologically by date (ascending), and then alphabetically by product_category (ascending).

File Path

  • Sales Dataset: /home/interview/sales.csv
  • Products Dataset: /home/interview/products.csv
  • Starter script: /home/interview/camping_sales.py

Schema

sales.csv

Column Name Type
sales_id String
product_id String
date Date
quantity_sold Integer

products.csv

Column Name Type
product_id String
product_name String
product_category String

Expected Output Schema

Column Name Type
date Date
product_category String
total_quantity Integer

Example

Given this sample input:

df_sales

sales_id product_id date quantity_sold
S1 P1 2023-06-01 10
S2 P2 2023-06-02 15
S3 P3 2023-06-02 20
S4 P4 2023-06-01 12
S5 P5 2023-06-03 25

df_products

product_id product_name product_category
P1 Camping Tent Camping
P2 Hiking Shoes Hiking
P3 Fishing Rod Fishing
P4 Insulated Bottle Hiking
P5 Outdoor Grill Camping

The expected output would be:

date product_category total_quantity
2023-06-01 Camping 10
2023-06-01 Hiking 12
2023-06-02 Fishing 20
2023-06-02 Hiking 15
2023-06-03 Camping 25

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Sign In

Essential

SQL 0/33
Spark 0/20
Snowflake 0/22
Python 0/24
Question Difficulty Company Access
Managing High I/O Processes Easy Revolut Free
Docker Multi-Architecture Image Easy Accenture Free
Average Order Value Easy Accenture Free
Join Employees and Departments Easy Adobe Free
Filter Orders by Date Range Easy Google Free
Find Customers Without Orders Easy LinkedIn Free
Use COALESCE for Null Handling Easy Samsung Free
Merge Multiple Address Fields Easy Datadog Free
String Concatenation in SELECT Easy Wix Free
Find Nth Highest Revenue Easy Dropbox Free
Self-Join to Identify Missing Supervisors Easy Meta Free
Year-over-Year Revenue Growth Easy OpenAI Free
Above Average Price Products Medium Hulu Free
Calculate Cumulative Sales Medium Uber Free
Find Overlapping Date Ranges Medium X Free
Set Operation: INTERSECT Medium DoorDash Free
Subquery for Best Order per Customer Medium Anthropic Free
Ranking with Dense_Rank Medium Amazon Free
Median Salary by Job Title Medium ActivisionBlizzard Free
String Splitting and Aggregation Medium Vercel Free
Salary Comparison with CTE Aggregation Medium Crypto.Com Free
String Pattern Extraction in Descriptions Medium Zscaler Free
Nested Subquery for Latest Record Medium DoorDash Free
Window Function for Moving Average Medium DeutscheBank Free
Re-enrollment Rate Calculator Medium Google Free
String Pattern Matching Using LIKE Medium Apple Free
Merge Employee and Department Records Hard Anthropic Free
Sequence Products by Price Hard GoDaddy Free
Combine Data from Multiple Sources into Unified Report Hard Vercel Free
Export SQLite Database to Parquet Format with Metadata Hard GitLab Free
Top Categories by Average Price Hard Samsung Free
Customer Order Aggregation Medium BMW Free
Filter Popular Videos on a Streaming Platform Easy Apple Free
Replace Keywords in Social Media Post Text Easy PayPal Free
Filter Movies with Missing Box Office Data Easy DoorDash Free
Daily Category Sales Easy Snowflake Free
Filter and Uppercase Artifacts Easy AMD Free
Combine Customer Orders and Products Medium Twilio Free
Anonymize User PII Data for a Social Media Platform Medium Atlassian Free
Product Sales and Inventory Data Medium PayPal Free
Products and Duplicates Medium JPMorgan Free
Mortgage Rate Calculator Medium NVIDIA Free
Weekend Order Detection Medium IBM Free
Flooring Company Data Medium Databricks Free
Rank Top Products by Revenue per Category Hard Coinbase Free
Highest SEO Score Pages per Domain Hard Cisco Free
Math Expressions Hard IBM Free
CSV and Partitions Easy Atlassian Free
Repartition Easy Robinhood Free
Broadcast Join Easy Databricks Free
Correcting Social Media Posts Easy Twitter Free
Daily Category Sales Aggregation Easy Microsoft Free
Cache and Performance Medium Palantir Free
Filter Popular Videos Medium Netflix Free
Anonymize User PII Medium Meta Free
Call Center Daily Stats Medium VMware Free
Venture Capital Sector Analysis Medium Cloudflare Free
Window Functions without Partitions Medium Google Free
Calculating PE Portfolio Values Medium IBM Free
Mountain Climber Logs Hard Stripe Free
Global & Domain SEO Leaders Hard Amazon Free
Tracking Customer Purchase History Hard Coinbase Free
Merge Customer Records from Two Sources Easy Lyft Free
Filter Funded Startups Easy Salesforce Free
Assign Row Numbers to Authors per Paper Medium Cloudflare Free
Amusement Park Rating Anomalies Medium GitHub Free
Usage and Accuracy per Model Type Medium VMware Free
Find the Last Climber per Mountain Medium Bloomberg Free
Track Product Purchases Hard Microsoft Free
Most Common Order Status Easy Airbnb Free
Calculating Overtime Pay Easy Cisco Free
Top Products by Revenue Medium Walmart Free
Product Summary Medium Amazon Free
Parsing Comma-Separated Values Medium Revolut Free
CSV Row Filter and Count Easy DoorDash Free
Analyze Sales Dataset Dimensions and Calculate Total Revenue Easy Databricks Free
Sort Avro Employee Records by Salary Easy GitHub Free
Count User Events from JSON Activity Logs Easy Uber Free
Split Delimited Column into Separate Columns with Pandas Easy Snowflake Free
Compare SQLite Database and CSV File Records Easy Robinhood Free
Analyze DataFrame Memory Usage Easy SAP Free
Time-Series Rolling Window Analysis for Multi-Stock Price Data Medium HashiCorp Free
Flatten Nested JSON to CSV with Dot-Notation Columns Medium Amazon Free
Calculate Descriptive Statistics for Numeric Columns in Pandas Easy Google Free
Decompose Time-Series Data into Trend, Seasonal, and Residual Components Medium Instacart Free
Extract Schema Information from Parquet File Using PyArrow Easy Palantir Free
Select Specific Columns from Parquet File Easy OpenAI Free
Flatten Nested Struct Columns in Parquet and Export to CSV Medium Coinbase Free
Merge Customer and Purchase Data Using Pandas Easy Mastercard Free
SQL JOIN with Pandas Data Processing and CSV Export Medium Intel Free
Insert New Records into SQLite Database from CSV Medium Visa Free
Aggregate SQL Query Results with Pandas and Export to Excel Medium Meta Free
Aggregate Time-Series Data into Fixed Time Windows Hard Tesla Free
Interpolate Missing Values in Irregular Time-Series Sensor Data Hard VMware Free
Remove Seasonal Effects from Time-Series Sales Data Hard Cloudflare Free
Convert Excel Files with Multiple Sheets to Individual CSV Files Easy Airbnb Free
Need more practice in this area? Explore more questions →