Scenario
You are a herpetologist studying reptiles and amphibians, with separate tables tracking field observation records and species information.
Task
Write a Snowflake SQL query that:
- Joins
{{ ref("observations") }} with {{ ref("species") }} on species_id using an inner join
- Ranks all rows by
count in descending order using RANK()
- Returns only the rows where the rank is 3 or better (ties at rank 3 are included)
- Returns the columns
obs_id, species_id, species_name, location_id, and count
Schema
observations
| Column |
Type |
Description |
| obs_id |
Integer |
Unique identifier of the observation |
| species_id |
Integer |
Identifier of the species observed |
| location_id |
Integer |
Identifier of the observation location |
| count |
Integer |
Number of individuals observed |
species
| Column |
Type |
Description |
| species_id |
Integer |
Unique identifier of the species |
| species_name |
String |
Common name of the species |
Example
observations:
| obs_id |
species_id |
location_id |
count |
| 1 |
200 |
10 |
72 |
| 2 |
201 |
11 |
48 |
| 3 |
200 |
10 |
61 |
| 4 |
202 |
12 |
33 |
| 5 |
201 |
11 |
55 |
species:
| species_id |
species_name |
| 200 |
Chameleon |
| 201 |
Tortoise |
| 202 |
Skink |
| 203 |
Tuatara |
Expected Output:
| obs_id |
species_id |
species_name |
location_id |
count |
| 1 |
200 |
Chameleon |
10 |
72 |
| 3 |
200 |
Chameleon |
10 |
61 |
| 5 |
201 |
Tortoise |
11 |
55 |
Note: Observation 4 (count 33) and observation 2 (count 48) fall below rank 3 and are excluded. Species 203 (Tuatara) has no observations and does not appear in the output.
WITH joined AS (
SELECT
o.obs_id,
o.species_id,
s.species_name,
o.location_id,
o.count,
RANK() OVER (ORDER BY o.count DESC) AS rnk
FROM {{ ref("observations") }} AS o
INNER JOIN {{ ref("species") }} AS s
ON o.species_id = s.species_id
)
SELECT obs_id, species_id, species_name, location_id, count
FROM joined
WHERE rnk <= 3
Explanation
Step 1: Joining the tables
FROM {{ ref("observations") }} AS o
INNER JOIN {{ ref("species") }} AS s
ON o.species_id = s.species_id
An INNER JOIN pairs each observation with its matching species record. Species that have no observations (and observations referencing a missing species) are excluded from the result. This is the correct join type here because you only want observations that have valid species data.
Step 2: Ranking with a window function
RANK() OVER (ORDER BY o.count DESC) AS rnk
RANK() assigns a position to each row based on the count column in descending order. When two or more rows share the same count, they receive the same rank, and the next rank value skips ahead accordingly. For example, if two rows tie at rank 1, the next row gets rank 3 (not rank 2). This ensures that all rows with a count equal to the third-highest value are included.
Step 3: Filtering to the top 3
WHERE rnk <= 3
After computing ranks in the CTE, the outer query filters to keep only rows ranked 3 or better. Because RANK() handles ties, the result may contain more than 3 rows when multiple observations share the same count at the boundary.
Step 4: Why RANK() instead of LIMIT
Using LIMIT 3 would arbitrarily cut off tied rows. If two observations both have the third-highest count, LIMIT 3 would include one and exclude the other depending on row order. RANK() treats ties fairly by including all rows that share the qualifying count. DENSE_RANK() is another option, but it would include rows ranked 1, 2, and 3 in a denser numbering scheme, potentially returning more rows than RANK() when there are ties above rank 3.