Cloudflare: Calculating Ideal Gas Law Parameters — Data Analysis Interview Q&A (2026)

All Questions

Calculating Ideal Gas Law Parameters

Cloudflare 👶 Easy Spark Complex Joins

Beginner Mode

Start your terminal to use beginner mode.

Objective

You are given two DataFrames, df_temperature and df_pressure, related to some thermodynamics experiments.

Task

Write a PySpark function that combines these DataFrames and creates the Result column using the formula for the Ideal Gas Law: Pressure * Temperature = Result. For each ExperimentID, multiply the Pressure and Temperature to calculate Result.

If there's an ExperimentID present in one DataFrame but not in the other, ignore that ExperimentID. Your solution should only contain ExperimentIDs that exist in both DataFrames.

Save your resulting DataFrame as result_df. Order the final output in ascending order by ExperimentID.

File Path

Temperature Dataset: /home/interview/temperature.csv
Pressure Dataset: /home/interview/pressure.csv
Starter script: /home/interview/thermodynamics.py

Schema

temperature.csv

Column Name	Data Type
ExperimentID	integer
Temperature	double

pressure.csv

Column Name	Data Type
ExperimentID	integer
Pressure	double

Expected Output Schema

Column Name	Data Type
ExperimentID	integer
Result	double

Example

Given this sample input:

df_temperature

ExperimentID	Temperature
1	273.15
2	293.15
3	313.15

df_pressure

ExperimentID	Pressure
1	1.0
3	2.0
4	1.5

The expected output would be:

ExperimentID	Result
1	273.15
3	626.3

from pyspark.sql import SparkSession
from pyspark.sql import functions as F

spark = SparkSession.builder.appName("PrepareshSpark").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")

df_temperature = spark.read.csv("/home/interview/temperature.csv", header=True, inferSchema=True)
df_pressure = spark.read.csv("/home/interview/pressure.csv", header=True, inferSchema=True)

# Step 1: Perform an inner join to keep only matching ExperimentIDs
joined_df = df_temperature.join(df_pressure, on="ExperimentID", how="inner")

# Step 2: Calculate the Result column by multiplying Pressure and Temperature
result_df = joined_df.withColumn("Result", F.col("Pressure") * F.col("Temperature"))

# Step 3: Select the required columns and order by ExperimentID ascending
result_df = result_df.select("ExperimentID", "Result").orderBy(F.col("ExperimentID").asc())

# --- Do not edit below this line ---
result_df.coalesce(1).write.csv("/home/interview/output", header=True, mode="overwrite")
spark.stop()

Explanation

Step 1: Merging with an Inner Join

joined_df = df_temperature.join(df_pressure, on="ExperimentID", how="inner")

The prompt strictly requires that we only calculate the result for experiments that exist in both datasets, discarding the rest. An inner join on the ExperimentID column perfectly handles this. Any ExperimentID missing from either the temperature or pressure table will naturally be dropped during the join.

Step 2: Column Multiplication

result_df = joined_df.withColumn("Result", F.col("Pressure") * F.col("Temperature"))

To compute the final thermodynamics value, we can perform basic arithmetic directly between PySpark columns. By multiplying F.col("Pressure") by F.col("Temperature"), we generate the required output and save it to a new column named "Result".

Step 3: Selecting and Sorting

result_df = result_df.select("ExperimentID", "Result").orderBy(F.col("ExperimentID").asc())

Finally, we drop the intermediate Pressure and Temperature columns by chaining a .select() statement that strictly pulls the two columns required by the Output Schema. We then append .orderBy() to satisfy the ascending sort requirement.

Terminal requires a larger screen

Open this page on a desktop or tablet (≥ 768px) to launch the terminal and practice hands-on.

Linux Terminal Environment

Write and execute your solution in the terminal below.

Track

	Question	Difficulty	Company	Access

Need more practice in this area? Explore more questions →