120 years of Olympic history | SQL Project | MySQL Workbench
In this SQL project, I worked on a 120-year Olympic history dataset, where I tried to find some insight by writing SQL queries. I used MySQL Workbench to analyze the dataset for this project. I have analyzed the dataset in the form of questions and answers by writing SQL queries where i used joins, common table expression, subqueries, window function, etc.
This dataset contains 2 tables, in CSV format:
- The Athlete Events table contains over 270,000 Olympic performances across history
- Each record represents an individual athlete competing in an individual event
- Records contain details about the athlete (ID, sex, name, age, height, weight, country) and the event (games, year, city, sport, event, medal)
- The NOC Region table serves as a lookup table with “NOC” as the primary key
- Each record represents one country according to the National Olympic Committee.
athlete_events table:
noc_regions table:
SQL Queries:
I am writing SQL Queries using this data. For each of these queries, you would find the problem statement, SQL Queries and then the screen shot of the output(some screen shot contains few lines of output).
1. How many olympics games have been held?
Problem Statement: Write a SQL query to find the total no of Olympic Games held as per the dataset.
select count(distinct(Games)) as olympics_games
from athlete_events;
Problem Statement: Write a SQL query to list down all the Olympic Games held so far.
select distinct year, season, city
from athlete_events
order by year;
Problem Statement: SQL query to fetch total no of countries participated in each olympic games.
with all_countries as
(select games, nr.region
from athlete_events oh
join noc_regions nr ON nr.noc = oh.noc
group by games, nr.region)
select games, count(1) as total_countries
from all_countries
group by games
order by games;
Problem Statement: Write a SQL query to return the Olympic Games which had the highest participating countries and the lowest participating countries.
with all_countries as
(select games, nr.region
from athlete_events oh
join noc_regions nr ON nr.noc=oh.noc
group by games, nr.region),
tot_countries as
(select games, count(1) as total_countries
from all_countries
group by games)
select distinct
concat(first_value(games) over(order by total_countries)
, ' - '
, first_value(total_countries) over(order by total_countries)) as Lowest_Countries,
concat(first_value(games) over(order by total_countries desc)
, ' - '
, first_value(total_countries) over(order by total_countries desc)) as Highest_Countries
from tot_countries
order by 1;
Problem Statement: SQL query to return the list of countries who have been part of every Olympics games.
with tot_games as
(select count(distinct games) as total_games
from athlete_events),
countries as
(select games, nr.region as country
from athlete_events oh
join noc_regions nr ON nr.noc=oh.noc
group by games, nr.region),
countries_participated as
(select country, count(1) as total_participated_games
from countries
group by country)
select cp.*
from countries_participated cp
join tot_games tg on tg.total_games = cp.total_participated_games
order by 1;
Problem Statement: SQL query to fetch the list of all sports which have been part of every olympics.
with tot_games as
(select count(distinct(games)) as total_games
from athlete_events where season = 'Summer'),
sports_played as
(select games, sport
from athlete_events
group by games, sport),
game_participated as
(select sport, count(1) as total_participated_games
from sports_played
group by sport)
select cp.*
from game_participated cp
join tot_games tg on tg.total_games = cp.total_participated_games
order by 1;
Problem Statement: Using SQL query, Identify the sport which were just played once in all of olympics.
with t1 as
(select distinct games, sport
from athlete_events),
t2 as
(select sport, count(1) as no_of_games
from t1
group by sport)
select t2.*, t1.games
from t2
join t1 on t1.sport = t2.sport
where t2.no_of_games = 1
order by t1.sport;
Problem Statement: Write SQL query to fetch the total no of sports played in each olympics.
select Games, count(distinct(Sport)) as total_sports
from athlete_events
group by Games
order by total_sports desc, Games;
Problem Statement: SQL Query to fetch the details of the oldest athletes to win a gold medal at the olympics.
select * from athlete_events
where Medal = 'Gold' and
Age = (select max(Age) from athlete_events where Medal = 'Gold');
10. Fetch the top 5 athletes who have won the most gold medals.
Problem Statement: SQL query to fetch the top 5 athletes who have won the most gold medals.
with m1 as(
select Name, Team, count(Medal) as tot
from athlete_events
where Medal ='Gold'
group by Name, Team
order by tot desc),
m2 as
(select Name, Team, tot, dense_rank() over(order by tot desc) as rnk
from m1)
select name, team, tot
from m2
where rnk <= 5;
Problem Statement: SQL Query to fetch the top 5 athletes who have won the most medals (Medals include gold, silver and bronze).
with m1 as(
select Name, Team, count(Medal) as tot
from athlete_events
where Medal <> 'NA'
group by Name, Team
order by tot desc, Team, Name),
m2 as
(select Name, Team, tot, dense_rank() over(order by tot desc) as rnk
from m1)
select name, team, tot
from m2
where rnk <= 5;
Output:
12. Fetch the top 5 most successful countries in olympics. Success is defined by no of medals won.
12. Fetch the top 5 most successful countries in olympics. Success is defined by no of medals won.
Problem Statement: Write a SQL query to fetch the top 5 most successful countries in olympics. (Success is defined by no of medals won).
with cte as(
select r.Region, count(a.Medal) as Total_Medal
from athlete_events a join noc_regions r
on a.noc = r.noc
where a.Medal <> 'NA'
group by r.Region
order by Total_Medal desc)
select Region, Total_Medal, row_number() over(order by Total_Medal desc) as rnk
from cte limit 5;
reference :
https://techtfq.com/blog/practice-writing-sql-queries-using-real-dataset
https://www.mavenanalytics.io/blog/maven-olympics-challenge
No comments: