Parsing Retail Discounts
Beginner Mode

Start your terminal to use beginner mode.

Objective

As an analyst working in a Consumer Goods company, you've been provided a DataFrame with sales data for various stores. This DataFrame includes multiple fields tracking the store, product, category, units sold, and a text description.

Task

The Description column contains text information about the product. Some products have a special discount tagged inside square brackets within this column (e.g., [10% off]).

Write a PySpark function that extracts this discount information and creates a new column called Discount. The discount must be expressed as a decimal (e.g., 0.10 for a 10% discount). If no discount is present in the text, the value should be 0.0.

Keep all other columns as they are and save the resulting DataFrame as result_df. Ensure the columns match the exact order specified in the schema.

File Path

  • Dataset: /home/interview/sales.csv
  • Starter script: /home/interview/discount_parser.py

Schema

sales.csv

Column Name Data Type
StoreID String
ProductName String
Category String
SoldUnits Integer
Description String

Expected Output Schema

Column Name Data Type
StoreID String
ProductName String
Category String
SoldUnits Integer
Description String
Discount Float

Example

Given this sample input:

df

StoreID ProductName Category SoldUnits Description
S101 Biscuits Food 120 Tasty Biscuits [10% off]
S102 Shampoo Hygiene 85 Smoothens Hair [5% off]
S103 Banana Food 150 Fresh Bananas
S101 Toothpaste Hygiene 300 Protects Teeth
S102 Shirt Clothes 65 Cotton Shirts [20% off]

The expected output would be:

StoreID ProductName Category SoldUnits Description Discount
S101 Biscuits Food 120 Tasty Biscuits [10% off] 0.1
S102 Shampoo Hygiene 85 Smoothens Hair [5% off] 0.05
S103 Banana Food 150 Fresh Bananas 0.0
S101 Toothpaste Hygiene 300 Protects Teeth 0.0
S102 Shirt Clothes 65 Cotton Shirts [20% off] 0.2

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Sign In

Track

Question Difficulty Company Access
Need more practice in this area? Explore more questions →