120 years of Olympic history | SQL Project

120 years of Olympic history | SQL Project | MySQL Workbench
SQL Workbench

In this SQL project, I worked on a 120-year Olympic history dataset, where I tried to find some insight by writing SQL queries. I used MySQL Workbench to analyze the dataset for this project. I have analyzed the dataset in the form of questions and answers by writing SQL queries where i used joins, common table expression, subqueries, window function, etc.

This dataset contains 2 tables, in CSV format:
  • The Athlete Events table contains over 270,000 Olympic performances across history
  • Each record represents an individual athlete competing in an individual event
  • Records contain details about the athlete (ID, sex, name, age, height, weight, country) and the event (games, year, city, sport, event, medal)
  • The NOC Region table serves as a lookup table with “NOC” as the primary key
  • Each record represents one country according to the National Olympic Committee.

Loading dataset to MySQL Workbench Database:

athlete_events table:


noc_regions table:



SQL Queries:
I am writing SQL Queries using this data. For each of these queries, you would find the problem statement, SQL Queries and then the screen shot of the output(some screen shot contains few lines of output).


1. How many olympics games have been held?
Problem Statement: Write a SQL query to find the total no of Olympic Games held as per the dataset.

select count(distinct(Games)) as olympics_games
from athlete_events;

Output:



2. List down all Olympics games held so far.
Problem Statement: Write a SQL query to list down all the Olympic Games held so far.

select distinct year, season, city
from athlete_events
order by year;

Output:



3. Mention the total no of nations who participated in each olympics game?
Problem Statement: SQL query to fetch total no of countries participated in each olympic games.

with all_countries as
    (select games, nr.region
    from athlete_events oh
    join noc_regions nr ON nr.noc = oh.noc
    group by games, nr.region)
select games, count(1) as total_countries
from all_countries
group by games
order by games;

Output:



4. Which year saw the highest and lowest no of countries participating in Olympics
Problem Statement: Write a SQL query to return the Olympic Games which had the highest participating countries and the lowest participating countries.

with all_countries as
    (select games, nr.region
    from athlete_events oh
    join noc_regions nr ON nr.noc=oh.noc
    group by games, nr.region),
tot_countries as
    (select games, count(1) as total_countries
    from all_countries
    group by games)
select distinct
concat(first_value(games) over(order by total_countries)
, ' - '
, first_value(total_countries) over(order by total_countries)) as Lowest_Countries,
concat(first_value(games) over(order by total_countries desc)
, ' - '
, first_value(total_countries) over(order by total_countries desc)) as Highest_Countries
from tot_countries
order by 1;

Output:



5. Which nation has participated in all of the olympic games?
Problem Statement: SQL query to return the list of countries who have been part of every Olympics games.

with tot_games as
    (select count(distinct games) as total_games
    from athlete_events),
countries as
    (select games, nr.region as country
    from athlete_events oh
    join noc_regions nr ON nr.noc=oh.noc
    group by games, nr.region),
countries_participated as
    (select country, count(1) as total_participated_games
    from countries
    group by country)
select cp.*
from countries_participated cp
join tot_games tg on tg.total_games = cp.total_participated_games
order by 1;

Output:



6. Identify the sport which was played in all summer olympics.
Problem Statement: SQL query to fetch the list of all sports which have been part of every olympics.

with tot_games as
    (select count(distinct(games)) as total_games
    from athlete_events where season = 'Summer'),
sports_played as
    (select games, sport
    from athlete_events
    group by games, sport),
game_participated as
    (select sport, count(1) as total_participated_games
    from sports_played
    group by sport)
select cp.*
from game_participated cp
join tot_games tg on tg.total_games = cp.total_participated_games
order by 1;

Output:



7. Which Sports were just played only once in the olympics.
Problem Statement: Using SQL query, Identify the sport which were just played once in all of olympics.

with t1 as
    (select distinct games, sport
    from athlete_events),
t2 as
    (select sport, count(1) as no_of_games
    from t1
    group by sport)
select t2.*, t1.games
from t2
join t1 on t1.sport = t2.sport
where t2.no_of_games = 1
order by t1.sport;

Output:



8. Fetch the total no of sports played in each olympic games.
Problem Statement: Write SQL query to fetch the total no of sports played in each olympics.

select Games, count(distinct(Sport)) as total_sports
from athlete_events
group by Games
order by total_sports desc, Games;

Output:



9. Fetch oldest athletes to win a gold medal
Problem Statement: SQL Query to fetch the details of the oldest athletes to win a gold medal at the olympics.

select * from athlete_events
where Medal = 'Gold' and
Age = (select max(Age) from athlete_events where Medal = 'Gold');

Output:



10. Fetch the top 5 athletes who have won the most gold medals.
Problem Statement: SQL query to fetch the top 5 athletes who have won the most gold medals.

with m1 as(
    select Name, Team, count(Medal) as tot
    from athlete_events
    where Medal ='Gold'
    group by Name, Team
    order by tot desc),
m2 as
    (select Name, Team, tot, dense_rank() over(order by tot desc) as rnk
    from m1)
select name, team, tot
from m2
where rnk <= 5;

Output:



11. Fetch the top 5 athletes who have won the most medals (gold/silver/bronze).
Problem Statement: SQL Query to fetch the top 5 athletes who have won the most medals (Medals include gold, silver and bronze).

with m1 as(
    select Name, Team, count(Medal) as tot
    from athlete_events
    where Medal <> 'NA'
    group by Name, Team
    order by tot desc, Team, Name),
m2 as
    (select Name, Team, tot, dense_rank() over(order by tot desc) as rnk
    from m1)
select name, team, tot
from m2
where rnk <= 5;

Output:



12. Fetch the top 5 most successful countries in olympics. Success is defined by no of medals won.
Problem Statement: Write a SQL query to fetch the top 5 most successful countries in olympics. (Success is defined by no of medals won).

with cte as(
    select r.Region, count(a.Medal) as Total_Medal
    from athlete_events a join noc_regions r
    on a.noc = r.noc
    where a.Medal <> 'NA'
    group by r.Region
    order by Total_Medal desc)
select Region, Total_Medal, row_number() over(order by Total_Medal desc) as rnk
from cte limit 5;

Output:



reference :
https://techtfq.com/blog/practice-writing-sql-queries-using-real-dataset
https://www.mavenanalytics.io/blog/maven-olympics-challenge





No comments: