Regex Extract
Beginner Mode

Start your terminal to use beginner mode.

Objective

A geologist is working with a dataset containing information about different rock samples. The dataset contains a description field with a mixture of letters and numbers representing the rock type and its approximate age.

Task

Extract the numeric parts from the description column to create a new column called age.

In the resulting DataFrame, the age column should contain only the numeric part extracted using a regular expression. If there is no numeric part in the description, the age column should contain an empty string ("").

Save your result as result_df, ensuring the final columns are ordered exactly as sample_id, description, and age.

File Path

  • Dataset: /home/interview/samples.csv
  • Starter script: /home/interview/extract_age.py

Schema

samples.csv

Column Name Data Type
sample_id string
description string

Expected Output Schema

Column Name Data Type
sample_id string
description string
age string

Constraints:

  • The input DataFrame will have at least 1 row and at most $10^4$ rows.
  • The sample_id column will only contain unique alphanumeric strings with 1 to 50 characters.
  • The description column will contain alphanumeric strings with 1 to 100 characters.
  • The numeric part, if present, will be a positive integer.

Example

Given this sample input:

input_df

sample_id description
S1 Basalt_450Ma
S2 Sandstone_300Ma
S3 Limestone
S4 Granite_200Ma
S5 Marble_1800Ma

The output would be:

sample_id description age
S1 Basalt_450Ma 450
S2 Sandstone_300Ma 300
S3 Limestone
S4 Granite_200Ma 200
S5 Marble_1800Ma 1800

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Sign In

Track

Question Difficulty Company Access
Need more practice in this area? Explore more questions →