Materials Engineering Outer Join
Beginner Mode

Start your terminal to use beginner mode.

Objective

You are working as a Materials Engineer and you are required to process two sets of data related to various experiments conducted in your lab. You need to write a function that performs a join operation between your experiments log and your materials catalog.

Task

Join df_experiments and df_materials to bring more context to the experiment results. Specifically, you want to join on the material_id field that is common between the two DataFrames.

The join operation should be such that all records from both the df_experiments and df_materials DataFrames are included, regardless if they have a match in the other DataFrame (a full outer join).

Save your resulting DataFrame as result_df. Ensure the output matches the exact schema order requested, and order the final output by material_id in ascending order, then by experiment_id in ascending order.

File Path

  • Experiments Dataset: /home/interview/experiments.csv
  • Materials Dataset: /home/interview/materials.csv
  • Starter script: /home/interview/materials_join.py

Schema

experiments.csv

Column Name Data Type
experiment_id integer
material_id integer
experiment_date string
experiment_results float

materials.csv

Column Name Data Type
material_id integer
material_name string
material_type string

Expected Output Schema

Column Name Data Type
experiment_id integer
material_id integer
material_name string
material_type string
experiment_date string
experiment_results float

Example

Given this sample input:

df_experiments

| experiment_id | material_id | experiment_date | experiment_results |
|---------------+-------------+-----------------+--------------------|
| 1 | 101 | 2023-07-01 | 7.6 |
| 2 | 102 | 2023-07-02 | 8.3 |
| 3 | 103 | 2023-07-03 | 6.9 |
| 4 | 101 | 2023-07-04 | 7.2 |

df_materials

| material_id | material_name | material_type |
|-------------|---------------+---------------|
| 101 | Material A | Type X |
| 102 | Material B | Type Y |
| 104 | Material C | Type Z |

The expected output would be:

| experiment_id | material_id | material_name | material_type | experiment_date | experiment_results |
|---------------+-------------+---------------+---------------+-----------------+--------------------|
| 1 | 101 | Material A | Type X | 2023-07-01 | 7.6 |
| 4 | 101 | Material A | Type X | 2023-07-04 | 7.2 |
| 2 | 102 | Material B | Type Y | 2023-07-02 | 8.3 |
| 3 | 103 | null | null | 2023-07-03 | 6.9 |
| null | 104 | Material C | Type Z | null | null |

(Note: Material 103 has no name metadata, and Material 104 has no experiment results. Both are retained in the final output).

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Sign In

Track

Question Difficulty Company Access
Need more practice in this area? Explore more questions →