Okta: Top Reptile Observations — Data Engineering Interview Q&A (2026)

Top Reptile Observations

Okta 🧠 Hard Snowflake Window Functions Joins Ranking

Beginner Mode

Scenario

You are a herpetologist studying reptiles and amphibians, with separate tables tracking field observation records and species information.

Task

Write a Snowflake SQL query that:

Joins {{ ref("observations") }} with {{ ref("species") }} on species_id using an inner join
Ranks all rows by count in descending order using RANK()
Returns only the rows where the rank is 3 or better (ties at rank 3 are included)
Returns the columns obs_id, species_id, species_name, location_id, and count

Schema

observations

Column	Type	Description
obs_id	Integer	Unique identifier of the observation
species_id	Integer	Identifier of the species observed
location_id	Integer	Identifier of the observation location
count	Integer	Number of individuals observed

species

Column	Type	Description
species_id	Integer	Unique identifier of the species
species_name	String	Common name of the species

Example

observations:

obs_id	species_id	location_id	count
1	200	10	72
2	201	11	48
3	200	10	61
4	202	12	33
5	201	11	55

species:

species_id	species_name
200	Chameleon
201	Tortoise
202	Skink
203	Tuatara

Expected Output:

obs_id	species_id	species_name	location_id	count
1	200	Chameleon	10	72
3	200	Chameleon	10	61
5	201	Tortoise	11	55

Note: Observation 4 (count 33) and observation 2 (count 48) fall below rank 3 and are excluded. Species 203 (Tuatara) has no observations and does not appear in the output.

WITH joined AS (
    SELECT
        o.obs_id,
        o.species_id,
        s.species_name,
        o.location_id,
        o.count,
        RANK() OVER (ORDER BY o.count DESC) AS rnk
    FROM {{ ref("observations") }} AS o
    INNER JOIN {{ ref("species") }} AS s
        ON o.species_id = s.species_id
)
SELECT obs_id, species_id, species_name, location_id, count
FROM joined
WHERE rnk <= 3

Explanation

Step 1: Joining the tables

FROM {{ ref("observations") }} AS o
INNER JOIN {{ ref("species") }} AS s
    ON o.species_id = s.species_id

An INNER JOIN pairs each observation with its matching species record. Species that have no observations (and observations referencing a missing species) are excluded from the result. This is the correct join type here because you only want observations that have valid species data.

Step 2: Ranking with a window function

RANK() OVER (ORDER BY o.count DESC) AS rnk

RANK() assigns a position to each row based on the count column in descending order. When two or more rows share the same count, they receive the same rank, and the next rank value skips ahead accordingly. For example, if two rows tie at rank 1, the next row gets rank 3 (not rank 2). This ensures that all rows with a count equal to the third-highest value are included.

Step 3: Filtering to the top 3

WHERE rnk <= 3

After computing ranks in the CTE, the outer query filters to keep only rows ranked 3 or better. Because RANK() handles ties, the result may contain more than 3 rows when multiple observations share the same count at the boundary.

Step 4: Why RANK() instead of LIMIT

Using LIMIT 3 would arbitrarily cut off tied rows. If two observations both have the third-highest count, LIMIT 3 would include one and exclude the other depending on row order. RANK() treats ties fairly by including all rows that share the qualifying count. DENSE_RANK() is another option, but it would include rows ranked 1, 2, and 3 in a denser numbering scheme, potentially returning more rows than RANK() when there are ties above rank 3.

Quick Solution

Code Environment

Track

	Question	Difficulty	Company	Access

Need more practice in this area? Explore more questions →