66. Amusement Park Rating Anomalies
Beginner Mode

Sign in to watch the walkthrough video

Sign In

Scenario

You are a Data Analyst at an amusement park and need to identify rides whose average visitor rating deviates significantly from the norm.

Task

Write a Snowflake SQL query that:

  1. Joins {{ ref("rides") }} with {{ ref("visitors") }} on ride_id using an inner join (rides with no visitor ratings should be excluded)
  2. Computes the average rating per ride (rounded to 2 decimal places)
  3. Computes the global mean and standard deviation of all rides' average ratings using window functions
  4. Flags each ride as anomalous when its average rating deviates from the global mean by more than one standard deviation
  5. Returns ride_id, ride_name, average_rating, and is_anomalous

Schema

rides

Column Type Description
ride_id String Unique identifier for the ride
ride_name String Name of the ride
type String Category of the ride
capacity Integer Maximum number of riders per cycle

visitors

Column Type Description
visitor_id String Unique identifier for the visitor
ride_id String Ride the visitor went on
visit_date Date Date of the visit
rating Integer Rating given by the visitor (1 to 5)

Example

rides:

ride_id ride_name type capacity
R001 Thunder Canyon Thrill 28
R002 Sky Glider Observation 50
R003 Splash Falls Water 18
R004 Turbo Karts Family 22
R005 Gentle River Classic 36

visitors:

visitor_id ride_id visit_date rating
V01 R001 2024-06-10 5
V02 R001 2024-06-10 4
V01 R002 2024-06-10 3
V03 R003 2024-06-11 1
V04 R004 2024-06-11 5
V02 R004 2024-06-11 5

Expected Output:

ride_id ride_name average_rating is_anomalous
R001 Thunder Canyon 4.5 false
R002 Sky Glider 3.0 false
R003 Splash Falls 1.0 true
R004 Turbo Karts 5.0 false

Note: R005 (Gentle River) has no visitor ratings and is excluded. R003 (Splash Falls) has an average rating of 1.0, which deviates from the global mean (3.375) by more than one standard deviation (1.797), so it is flagged as anomalous.

Quick Solution

Code Environment

Sign in or try as guest to run your code.

Sign In

Essential

SQL 0/33
Spark 0/20
Snowflake 0/22
Python 0/24
Question Difficulty Company Access
Managing High I/O Processes Easy Revolut Free
Docker Multi-Architecture Image Easy Accenture Free
Average Order Value Easy Accenture Free
Join Employees and Departments Easy Adobe Free
Filter Orders by Date Range Easy Google Free
Find Customers Without Orders Easy LinkedIn Free
Use COALESCE for Null Handling Easy Samsung Free
Merge Multiple Address Fields Easy Datadog Free
String Concatenation in SELECT Easy Wix Free
Find Nth Highest Revenue Easy Dropbox Free
Self-Join to Identify Missing Supervisors Easy Meta Free
Year-over-Year Revenue Growth Easy OpenAI Free
Above Average Price Products Medium Hulu Free
Calculate Cumulative Sales Medium Uber Free
Find Overlapping Date Ranges Medium X Free
Set Operation: INTERSECT Medium DoorDash Free
Subquery for Best Order per Customer Medium Anthropic Free
Ranking with Dense_Rank Medium Amazon Free
Median Salary by Job Title Medium ActivisionBlizzard Free
String Splitting and Aggregation Medium Vercel Free
Salary Comparison with CTE Aggregation Medium Crypto.Com Free
String Pattern Extraction in Descriptions Medium Zscaler Free
Nested Subquery for Latest Record Medium DoorDash Free
Window Function for Moving Average Medium DeutscheBank Free
Re-enrollment Rate Calculator Medium Google Free
String Pattern Matching Using LIKE Medium Apple Free
Merge Employee and Department Records Hard Anthropic Free
Sequence Products by Price Hard GoDaddy Free
Combine Data from Multiple Sources into Unified Report Hard Vercel Free
Export SQLite Database to Parquet Format with Metadata Hard GitLab Free
Top Categories by Average Price Hard Samsung Free
Customer Order Aggregation Medium BMW Free
Filter Popular Videos on a Streaming Platform Easy Apple Free
Replace Keywords in Social Media Post Text Easy PayPal Free
Filter Movies with Missing Box Office Data Easy DoorDash Free
Daily Category Sales Easy Snowflake Free
Filter and Uppercase Artifacts Easy AMD Free
Combine Customer Orders and Products Medium Twilio Free
Anonymize User PII Data for a Social Media Platform Medium Atlassian Free
Product Sales and Inventory Data Medium PayPal Free
Products and Duplicates Medium JPMorgan Free
Mortgage Rate Calculator Medium NVIDIA Free
Weekend Order Detection Medium IBM Free
Flooring Company Data Medium Databricks Free
Rank Top Products by Revenue per Category Hard Coinbase Free
Highest SEO Score Pages per Domain Hard Cisco Free
Math Expressions Hard IBM Free
CSV and Partitions Easy Atlassian Free
Repartition Easy Robinhood Free
Broadcast Join Easy Databricks Free
Correcting Social Media Posts Easy Twitter Free
Daily Category Sales Aggregation Easy Microsoft Free
Cache and Performance Medium Palantir Free
Filter Popular Videos Medium Netflix Free
Anonymize User PII Medium Meta Free
Call Center Daily Stats Medium VMware Free
Venture Capital Sector Analysis Medium Cloudflare Free
Window Functions without Partitions Medium Google Free
Calculating PE Portfolio Values Medium IBM Free
Mountain Climber Logs Hard Stripe Free
Global & Domain SEO Leaders Hard Amazon Free
Tracking Customer Purchase History Hard Coinbase Free
Merge Customer Records from Two Sources Easy Lyft Free
Filter Funded Startups Easy Salesforce Free
Assign Row Numbers to Authors per Paper Medium Cloudflare Free
Amusement Park Rating Anomalies Medium GitHub Free
Usage and Accuracy per Model Type Medium VMware Free
Find the Last Climber per Mountain Medium Bloomberg Free
Track Product Purchases Hard Microsoft Free
Most Common Order Status Easy Airbnb Free
Calculating Overtime Pay Easy Cisco Free
Top Products by Revenue Medium Walmart Free
Product Summary Medium Amazon Free
Parsing Comma-Separated Values Medium Revolut Free
CSV Row Filter and Count Easy DoorDash Free
Analyze Sales Dataset Dimensions and Calculate Total Revenue Easy Databricks Free
Sort Avro Employee Records by Salary Easy GitHub Free
Count User Events from JSON Activity Logs Easy Uber Free
Split Delimited Column into Separate Columns with Pandas Easy Snowflake Free
Compare SQLite Database and CSV File Records Easy Robinhood Free
Analyze DataFrame Memory Usage Easy SAP Free
Time-Series Rolling Window Analysis for Multi-Stock Price Data Medium HashiCorp Free
Flatten Nested JSON to CSV with Dot-Notation Columns Medium Amazon Free
Calculate Descriptive Statistics for Numeric Columns in Pandas Easy Google Free
Decompose Time-Series Data into Trend, Seasonal, and Residual Components Medium Instacart Free
Extract Schema Information from Parquet File Using PyArrow Easy Palantir Free
Select Specific Columns from Parquet File Easy OpenAI Free
Flatten Nested Struct Columns in Parquet and Export to CSV Medium Coinbase Free
Merge Customer and Purchase Data Using Pandas Easy Mastercard Free
SQL JOIN with Pandas Data Processing and CSV Export Medium Intel Free
Insert New Records into SQLite Database from CSV Medium Visa Free
Aggregate SQL Query Results with Pandas and Export to Excel Medium Meta Free
Aggregate Time-Series Data into Fixed Time Windows Hard Tesla Free
Interpolate Missing Values in Irregular Time-Series Sensor Data Hard VMware Free
Remove Seasonal Effects from Time-Series Sales Data Hard Cloudflare Free
Convert Excel Files with Multiple Sheets to Individual CSV Files Easy Airbnb Free
Need more practice in this area? Explore more questions →