Web scrapping data for English Premier League football matches

Category > Data Collection

Jun 11, 2022

python web-scrapping beautiful-soup EPL

In this project, we're going to complete a machine learning project on the English Premier League (EPL) football matches. The final goal of the project is to predict the winner of each football match. At first, we're going to use web scraping to get the necessary data on the EPL match results from this page. Let's download the HTML for that page and then explore it in the web browser's inspector. We want to extract the first table — League Table — that lists every team in the league and its stats. In particular, we need to fetch the URL for each team to be able to grab the match log for the season from each of them.

In [ ]:

import requests

In [ ]:

URL = "https://fbref.com/en/comps/9/2021-2022/2021-2022-Premier-League-Stats"
response = requests.get(URL)
if response.status_code == 200:
    content = response.content
else:
    print("Couldn't download the web page")

Let's explore the page first in the web browser's inspector to identify which HTML tag is associated with the URLs of the teams. After some exploration, we identify the id of the 'League Table'

In [ ]:

from bs4 import BeautifulSoup
parser = BeautifulSoup(content, 'html.parser')
league_table = parser.select('#results2021-202291_overall')[0]

We notice that the table rows that contain the the URL for each team statistics have a special attribute 'data-stat' = 'team'. We use this information to select the desired rows and finally scrap the URL from those rows

In [ ]:

team_data = league_table.find_all("td", attrs={'data-stat' : 'team'})
team_stat_URL = {}
for team in team_data:
    # The table only contains partial URL. We add the domain name to get the full URL
    URL = "https://fbref.com" + team.select('a')[0]['href']
    team_name = team.select('a')[0].text
    team_stat_URL[team_name] = URL
print(team_stat_URL)

{'Manchester City': 'https://fbref.com/en/squads/b8fd03ef/2021-2022/Manchester-City-Stats', 'Liverpool': 'https://fbref.com/en/squads/822bd0ba/2021-2022/Liverpool-Stats', 'Chelsea': 'https://fbref.com/en/squads/cff3d9bb/2021-2022/Chelsea-Stats', 'Tottenham': 'https://fbref.com/en/squads/361ca564/2021-2022/Tottenham-Hotspur-Stats', 'Arsenal': 'https://fbref.com/en/squads/18bb7c10/2021-2022/Arsenal-Stats', 'Manchester Utd': 'https://fbref.com/en/squads/19538871/2021-2022/Manchester-United-Stats', 'West Ham': 'https://fbref.com/en/squads/7c21e445/2021-2022/West-Ham-United-Stats', 'Leicester City': 'https://fbref.com/en/squads/a2d435b3/2021-2022/Leicester-City-Stats', 'Brighton': 'https://fbref.com/en/squads/d07537b9/2021-2022/Brighton-and-Hove-Albion-Stats', 'Wolves': 'https://fbref.com/en/squads/8cec06e1/2021-2022/Wolverhampton-Wanderers-Stats', 'Newcastle Utd': 'https://fbref.com/en/squads/b2b47a98/2021-2022/Newcastle-United-Stats', 'Crystal Palace': 'https://fbref.com/en/squads/47c64c55/2021-2022/Crystal-Palace-Stats', 'Brentford': 'https://fbref.com/en/squads/cd051869/2021-2022/Brentford-Stats', 'Aston Villa': 'https://fbref.com/en/squads/8602292d/2021-2022/Aston-Villa-Stats', 'Southampton': 'https://fbref.com/en/squads/33c895d4/2021-2022/Southampton-Stats', 'Everton': 'https://fbref.com/en/squads/d3fd31cc/2021-2022/Everton-Stats', 'Leeds United': 'https://fbref.com/en/squads/5bfb9659/2021-2022/Leeds-United-Stats', 'Burnley': 'https://fbref.com/en/squads/943e8050/2021-2022/Burnley-Stats', 'Watford': 'https://fbref.com/en/squads/2abfe087/2021-2022/Watford-Stats', 'Norwich City': 'https://fbref.com/en/squads/1c781004/2021-2022/Norwich-City-Stats'}

Now that we have a list of the URLs, one for each team, we can get the stats we want. Let's start with the first team: Manchester City. After exploring the web page for the team, we decide to parse the table named "Scores & Fixture" for our analysis. The parsed table is read into a pandas dataframe for our convenience.

In [ ]:

import pandas as pd
link = team_stat_URL['Manchester City']
response_MC = requests.get(link)
if response.status_code == 200:
    content_MC = response_MC.content
    score_tables = pd.read_html(content_MC, match="Scores & Fixtures")
    score_df = score_tables[0]
else:
    print("The page couldn't be downloaded for team {}".format("Manchester City"))

In [ ]:

score_df.head()

Out[ ]:

	Date	Time	Comp	Round	Day	Venue	Result	GF	GA	Opponent	xG	xGA	Poss	Attendance	Captain	Formation	Referee	Match Report	Notes
0	2021-08-07	17:15	Community Shield	FA Community Shield	Sat	Neutral	L	0	1	Leicester City	NaN	NaN	57	NaN	Fernandinho	4-3-3	Paul Tierney	Match Report	NaN
1	2021-08-15	16:30	Premier League	Matchweek 1	Sun	Away	L	0	1	Tottenham	2.0	1.0	65	58262.0	Fernandinho	4-3-3	Anthony Taylor	Match Report	NaN
2	2021-08-21	15:00	Premier League	Matchweek 2	Sat	Home	W	5	0	Norwich City	2.7	0.1	67	51437.0	İlkay Gündoğan	4-3-3	Graham Scott	Match Report	NaN
3	2021-08-28	12:30	Premier League	Matchweek 3	Sat	Home	W	5	0	Arsenal	4.0	0.2	80	52276.0	İlkay Gündoğan	4-3-3	Martin Atkinson	Match Report	NaN
4	2021-09-11	15:00	Premier League	Matchweek 4	Sat	Away	W	1	0	Leicester City	3.3	0.6	61	32087.0	İlkay Gündoğan	4-3-3	Paul Tierney	Match Report	NaN

In [ ]:

score_df.tail()

Out[ ]:

	Date	Time	Comp	Round	Day	Venue	Result	GF	GA	Opponent	xG	xGA	Poss	Attendance	Captain	Formation	Referee	Match Report	Notes
53	2022-05-04	21:00	Champions Lg	Semi-finals	Wed	Away	L	1	3	es Real Madrid	1.4	2.3	55	61416.0	Rúben Dias	4-3-3	Daniele Orsato	Match Report	Leg 2 of 2; Real Madrid won; Required Extra Time
54	2022-05-08	16:30	Premier League	Matchweek 36	Sun	Home	W	5	0	Newcastle Utd	3.3	0.8	71	53336.0	İlkay Gündoğan	4-2-3-1	Stuart Attwell	Match Report	NaN
55	2022-05-11	20:15	Premier League	Matchweek 33	Wed	Away	W	5	1	Wolves	2.8	0.5	66	32000.0	Fernandinho	4-2-3-1	Martin Atkinson	Match Report	NaN
56	2022-05-15	14:00	Premier League	Matchweek 37	Sun	Away	D	2	2	West Ham	2.9	1.8	78	59972.0	Fernandinho	4-3-3	Anthony Taylor	Match Report	NaN
57	2022-05-22	16:00	Premier League	Matchweek 38	Sun	Home	W	3	2	Aston Villa	3.7	0.3	71	53395.0	Fernandinho	4-3-3	Michael Oliver	Match Report	NaN

Here is a brief description of some columns in the table that can't be interpreted easily from their name

Comp : Competition
Round: Phase of competition
GF : Goal for the team
GA : Goal against the team
xG : Expected goals
xGA : Expected goals allowed
Poss : Possession as a percentage of passes attempted

As we can observe, there is something we don't have in the table with scores and fixtures: the details about each match, such as the number of shots, the number of shots on target, the number of free kicks, and the number of penalty kicks. We can find some of these stats in the table under the Shooting tab. Let's find and download the table containing the shooting stats for the Manchester City team and read it in a pandas DataFrame.

In [ ]:

MC_parser = BeautifulSoup(content_MC, 'html.parser')
parsed_links = MC_parser.select(".filter a") # After exploring the web page, we find that the desired URL can be found inside the body of <div> tag with class="filter"
shooting_tab_link = ["https://fbref.com" + link['href'] for link in parsed_links if link.text == "Shooting"]
print(shooting_tab_link)

['https://fbref.com/en/squads/b8fd03ef/2021-2022/matchlogs/all_comps/shooting/Manchester-City-Match-Logs-All-Competitions']

In [ ]:

response_shooting = requests.get(*shooting_tab_link)
if response_shooting.status_code == 200:
    shooting_html = response_shooting.content
    shooting_tables = pd.read_html(shooting_html, match="Shooting ")
    shooting_df = shooting_tables[0]
else:
    print("Couldn't download the shooting page")

In [ ]:

shooting_df.head()

Out[ ]:

	For Manchester City										...	Standard				Expected					Unnamed: 25_level_0
	Date	Time	Comp	Round	Day	Venue	Result	GF	GA	Opponent	...	Dist	FK	PK	PKatt	xG	npxG	npxG/Sh	G-xG	np:G-xG	Match Report
0	2021-08-07	17:15	Community Shield	FA Community Shield	Sat	Neutral	L	0	1	Leicester City	...	NaN	NaN	0	0	NaN	NaN	NaN	NaN	NaN	Match Report
1	2021-08-15	16:30	Premier League	Matchweek 1	Sun	Away	L	0	1	Tottenham	...	17.3	1.0	0	0	2.0	2.0	0.11	-2.0	-2.0	Match Report
2	2021-08-21	15:00	Premier League	Matchweek 2	Sat	Home	W	5	0	Norwich City	...	18.5	1.0	0	0	2.7	2.7	0.17	1.3	1.3	Match Report
3	2021-08-28	12:30	Premier League	Matchweek 3	Sat	Home	W	5	0	Arsenal	...	14.8	0.0	0	0	4.0	4.0	0.16	1.0	1.0	Match Report
4	2021-09-11	15:00	Premier League	Matchweek 4	Sat	Away	W	1	0	Leicester City	...	14.3	0.0	0	0	3.3	3.3	0.14	-2.3	-2.3	Match Report

5 rows × 26 columns

The dataframe has multi-level index, which is not important for our purpose. So, we can drop the multi-level index. After that we have two DataFrames: the matches and shootings. Since both refer to the same matches, we can combine these DataFrames.

In [ ]:

shooting_df.columns = shooting_df.columns.droplevel()
shooting_df.head()

Out[ ]:

	Date	Time	Comp	Round	Day	Venue	Result	GF	GA	Opponent	...	Dist	FK	xG	npxG	npxG/Sh	G-xG	np:G-xG	Match Report
0	2021-08-07	17:15	Community Shield	FA Community Shield	Sat	Neutral	L	0	1	Leicester City	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	Match Report
1	2021-08-15	16:30	Premier League	Matchweek 1	Sun	Away	L	0	1	Tottenham	...	17.3	1.0	2.0	2.0	0.11	-2.0	-2.0	Match Report
2	2021-08-21	15:00	Premier League	Matchweek 2	Sat	Home	W	5	0	Norwich City	...	18.5	1.0	2.7	2.7	0.17	1.3	1.3	Match Report
3	2021-08-28	12:30	Premier League	Matchweek 3	Sat	Home	W	5	0	Arsenal	...	14.8	0.0	4.0	4.0	0.16	1.0	1.0	Match Report
4	2021-09-11	15:00	Premier League	Matchweek 4	Sat	Away	W	1	0	Leicester City	...	14.3	0.0	3.3	3.3	0.14	-2.3	-2.3	Match Report

5 rows × 26 columns

Both score and shooting dataframe have multiple common columns. The unique columns in the shooting dataframe are listed below:

Sh : Shots Total (Does not include penalty kicks)
SoT : Shots on Target (Without penalty kicks)
Dist : Average distance travelled by a shot
FK : Number of free kicks
PK : Pealty kicks mades
PKatt: Penalty kicks attempted

These unique columns are merged with the score dataframe

In [ ]:

team_data = score_df.merge(shooting_df[["Date", "Sh", "SoT", "Dist", "FK", "PK", "PKatt"]], on="Date")

In [ ]:

team_data.head()

Out[ ]:

	Date	Time	Comp	Round	Day	Venue	Result	GF	GA	Opponent	...	Formation	Referee	Match Report	Notes	Sh	SoT	Dist	FK
0	2021-08-07	17:15	Community Shield	FA Community Shield	Sat	Neutral	L	0	1	Leicester City	...	4-3-3	Paul Tierney	Match Report	NaN	12	3	NaN	NaN
1	2021-08-15	16:30	Premier League	Matchweek 1	Sun	Away	L	0	1	Tottenham	...	4-3-3	Anthony Taylor	Match Report	NaN	18	4	17.3	1.0
2	2021-08-21	15:00	Premier League	Matchweek 2	Sat	Home	W	5	0	Norwich City	...	4-3-3	Graham Scott	Match Report	NaN	16	4	18.5	1.0
3	2021-08-28	12:30	Premier League	Matchweek 3	Sat	Home	W	5	0	Arsenal	...	4-3-3	Martin Atkinson	Match Report	NaN	25	10	14.8	0.0
4	2021-09-11	15:00	Premier League	Matchweek 4	Sat	Away	W	1	0	Leicester City	...	4-3-3	Paul Tierney	Match Report	NaN	25	8	14.3	0.0

5 rows × 25 columns

Now let's repeat these steps for each team who played last 5 seasons of EPL

In [ ]:

import time
import re
import pandas as pd

years = list(range(2022, 2017, -1))
all_matches = []

for year in years:
    for team in team_stat_URL:
        link = team_stat_URL[team]
        print(link)
        
        response_status_code = 401

        while response_status_code != 200:
            response = requests.get(link)
            if response.status_code == 200:
                content = response.content
                score_tables = pd.read_html(content, match="Scores & Fixtures")
                score_df = score_tables[0]
                response_status_code = 200
            else:
                print("The page couldn't be downloaded for team {}. Trying again".format(team))
                time.sleep(1)
        
        parser = BeautifulSoup(content, 'html.parser')
        parsed_links = parser.select(".filter a") # After exploring the web page, we find that the desired URL can be found inside the body of <div> tag with class="filter"
        shooting_tab_link = ["https://fbref.com" + link['href'] for link in parsed_links if link.text == "Shooting"]
        
        response_shooting_status_code = 401
        
        while response_shooting_status_code != 200:
            response_shooting = requests.get(*shooting_tab_link)
            
            if response_shooting.status_code == 200:
                shooting_html = response_shooting.content
                shooting_tables = pd.read_html(shooting_html, match="Shooting ")
                shooting_df = shooting_tables[0]
                response_shooting_status_code = 200
            else:
                print("Couldn't download the shooting page for team {}. Trying again".format(team))
                time.sleep(1)
            
        shooting_df.columns = shooting_df.columns.droplevel()
        
        try:
            team_data = score_df.merge(shooting_df[["Date", "Sh", "SoT", "Dist", "FK", "PK", "PKatt"]], on="Date")
            print(f"{team} : {year} : {team_data.shape}")
        except ValueError:
            continue
        except KeyError as e:
            print(f"Column {e.args[0]} missing from the dataframe. So adding an extra column for consistency")
            team_data['FK'] = None
            
        # Our goal is to predict winners for EPL match. So, ignore any data not within our scope
        team_data = team_data[team_data["Comp"] == "Premier League"]
        
        # Adding extra columns to keep track of the team name and season 
        team_data["Season"] = year
        team_data["Team"] = team
        
        all_matches.append(team_data)
        time.sleep(1)
        
        # Update the links in team_stat_URL for scrapping data for previous season
        season_links = parser.select(".prevnext a")
        prev_season_link = "https://fbref.com" + season_links[0]['href']
        team_stat_URL[team] = prev_season_link
        
    time.sleep(300)

https://fbref.com/en/squads/b8fd03ef/2021-2022/Manchester-City-Stats
Manchester City : 2022 : (58, 25)
https://fbref.com/en/squads/822bd0ba/2021-2022/Liverpool-Stats
Liverpool : 2022 : (63, 25)
https://fbref.com/en/squads/cff3d9bb/2021-2022/Chelsea-Stats
Chelsea : 2022 : (61, 25)
https://fbref.com/en/squads/361ca564/2021-2022/Tottenham-Hotspur-Stats
Tottenham : 2022 : (54, 25)
https://fbref.com/en/squads/18bb7c10/2021-2022/Arsenal-Stats
Arsenal : 2022 : (45, 25)
https://fbref.com/en/squads/19538871/2021-2022/Manchester-United-Stats
Manchester Utd : 2022 : (49, 25)
https://fbref.com/en/squads/7c21e445/2021-2022/West-Ham-United-Stats
West Ham : 2022 : (56, 25)
https://fbref.com/en/squads/a2d435b3/2021-2022/Leicester-City-Stats
Leicester City : 2022 : (58, 25)
https://fbref.com/en/squads/d07537b9/2021-2022/Brighton-and-Hove-Albion-Stats
Brighton : 2022 : (43, 25)
https://fbref.com/en/squads/8cec06e1/2021-2022/Wolverhampton-Wanderers-Stats
Wolves : 2022 : (42, 25)
https://fbref.com/en/squads/b2b47a98/2021-2022/Newcastle-United-Stats
Newcastle Utd : 2022 : (40, 25)
https://fbref.com/en/squads/47c64c55/2021-2022/Crystal-Palace-Stats
Crystal Palace : 2022 : (44, 25)
https://fbref.com/en/squads/cd051869/2021-2022/Brentford-Stats
Brentford : 2022 : (44, 25)
https://fbref.com/en/squads/8602292d/2021-2022/Aston-Villa-Stats
Aston Villa : 2022 : (41, 25)
https://fbref.com/en/squads/33c895d4/2021-2022/Southampton-Stats
Southampton : 2022 : (45, 25)
https://fbref.com/en/squads/d3fd31cc/2021-2022/Everton-Stats
Everton : 2022 : (44, 25)
https://fbref.com/en/squads/5bfb9659/2021-2022/Leeds-United-Stats
Leeds United : 2022 : (42, 25)
https://fbref.com/en/squads/943e8050/2021-2022/Burnley-Stats
Burnley : 2022 : (42, 25)
https://fbref.com/en/squads/2abfe087/2021-2022/Watford-Stats
Watford : 2022 : (41, 25)
https://fbref.com/en/squads/1c781004/2021-2022/Norwich-City-Stats
Norwich City : 2022 : (43, 25)
https://fbref.com/en/squads/b8fd03ef/2020-2021/Manchester-City-Stats
Manchester City : 2021 : (61, 25)
https://fbref.com/en/squads/822bd0ba/2020-2021/Liverpool-Stats
Liverpool : 2021 : (53, 25)
https://fbref.com/en/squads/cff3d9bb/2020-2021/Chelsea-Stats
Chelsea : 2021 : (59, 25)
https://fbref.com/en/squads/361ca564/2020-2021/Tottenham-Hotspur-Stats
Tottenham : 2021 : (59, 25)
https://fbref.com/en/squads/18bb7c10/2020-2021/Arsenal-Stats
Arsenal : 2021 : (58, 25)
https://fbref.com/en/squads/19538871/2020-2021/Manchester-United-Stats
Manchester Utd : 2021 : (61, 25)
https://fbref.com/en/squads/7c21e445/2020-2021/West-Ham-United-Stats
West Ham : 2021 : (44, 25)
https://fbref.com/en/squads/a2d435b3/2020-2021/Leicester-City-Stats
Leicester City : 2021 : (53, 25)
https://fbref.com/en/squads/d07537b9/2020-2021/Brighton-and-Hove-Albion-Stats
Brighton : 2021 : (44, 25)
https://fbref.com/en/squads/8cec06e1/2020-2021/Wolverhampton-Wanderers-Stats
Wolves : 2021 : (42, 25)
https://fbref.com/en/squads/b2b47a98/2020-2021/Newcastle-United-Stats
Newcastle Utd : 2021 : (43, 25)
https://fbref.com/en/squads/47c64c55/2020-2021/Crystal-Palace-Stats
Crystal Palace : 2021 : (40, 25)
https://fbref.com/en/squads/cd051869/2020-2021/Brentford-Stats
Brentford : 2021 : (57, 25)
https://fbref.com/en/squads/8602292d/2020-2021/Aston-Villa-Stats
Aston Villa : 2021 : (42, 25)
https://fbref.com/en/squads/33c895d4/2020-2021/Southampton-Stats
Southampton : 2021 : (44, 25)
https://fbref.com/en/squads/d3fd31cc/2020-2021/Everton-Stats
Everton : 2021 : (46, 25)
https://fbref.com/en/squads/5bfb9659/2020-2021/Leeds-United-Stats
Leeds United : 2021 : (40, 25)
https://fbref.com/en/squads/943e8050/2020-2021/Burnley-Stats
Burnley : 2021 : (44, 25)
https://fbref.com/en/squads/2abfe087/2020-2021/Watford-Stats
Watford : 2021 : (49, 25)
https://fbref.com/en/squads/1c781004/2020-2021/Norwich-City-Stats
Norwich City : 2021 : (49, 25)
https://fbref.com/en/squads/b8fd03ef/2019-2020/Manchester-City-Stats
Manchester City : 2020 : (59, 25)
https://fbref.com/en/squads/822bd0ba/2019-2020/Liverpool-Stats
Liverpool : 2020 : (55, 25)
https://fbref.com/en/squads/cff3d9bb/2019-2020/Chelsea-Stats
Chelsea : 2020 : (55, 25)
https://fbref.com/en/squads/361ca564/2019-2020/Tottenham-Hotspur-Stats
Tottenham : 2020 : (52, 25)
https://fbref.com/en/squads/18bb7c10/2019-2020/Arsenal-Stats
Arsenal : 2020 : (54, 25)
https://fbref.com/en/squads/19538871/2019-2020/Manchester-United-Stats
Manchester Utd : 2020 : (61, 25)
https://fbref.com/en/squads/7c21e445/2019-2020/West-Ham-United-Stats
West Ham : 2020 : (42, 25)
https://fbref.com/en/squads/a2d435b3/2019-2020/Leicester-City-Stats
Leicester City : 2020 : (48, 25)
https://fbref.com/en/squads/d07537b9/2019-2020/Brighton-and-Hove-Albion-Stats
Brighton : 2020 : (41, 25)
https://fbref.com/en/squads/8cec06e1/2019-2020/Wolverhampton-Wanderers-Stats
Wolves : 2020 : (59, 25)
https://fbref.com/en/squads/b2b47a98/2019-2020/Newcastle-United-Stats
Newcastle Utd : 2020 : (45, 25)
https://fbref.com/en/squads/47c64c55/2019-2020/Crystal-Palace-Stats
Crystal Palace : 2020 : (40, 25)
https://fbref.com/en/squads/cd051869/2019-2020/Brentford-Stats
Brentford : 2020 : (52, 25)
https://fbref.com/en/squads/8602292d/2019-2020/Aston-Villa-Stats
Aston Villa : 2020 : (46, 25)
https://fbref.com/en/squads/33c895d4/2019-2020/Southampton-Stats
Southampton : 2020 : (44, 25)
https://fbref.com/en/squads/d3fd31cc/2019-2020/Everton-Stats
Everton : 2020 : (43, 25)
https://fbref.com/en/squads/5bfb9659/2019-2020/Leeds-United-Stats
Leeds United : 2020 : (49, 25)
https://fbref.com/en/squads/943e8050/2019-2020/Burnley-Stats
Burnley : 2020 : (41, 25)
https://fbref.com/en/squads/2abfe087/2019-2020/Watford-Stats
Watford : 2020 : (43, 25)
https://fbref.com/en/squads/1c781004/2019-2020/Norwich-City-Stats
Norwich City : 2020 : (43, 25)
https://fbref.com/en/squads/b8fd03ef/2018-2019/Manchester-City-Stats
Manchester City : 2019 : (61, 25)
https://fbref.com/en/squads/822bd0ba/2018-2019/Liverpool-Stats
Liverpool : 2019 : (53, 25)
https://fbref.com/en/squads/cff3d9bb/2018-2019/Chelsea-Stats
Chelsea : 2019 : (63, 25)
https://fbref.com/en/squads/361ca564/2018-2019/Tottenham-Hotspur-Stats
Tottenham : 2019 : (58, 25)
https://fbref.com/en/squads/18bb7c10/2018-2019/Arsenal-Stats
Arsenal : 2019 : (58, 25)
https://fbref.com/en/squads/19538871/2018-2019/Manchester-United-Stats
Manchester Utd : 2019 : (53, 25)
https://fbref.com/en/squads/7c21e445/2018-2019/West-Ham-United-Stats
West Ham : 2019 : (43, 25)
https://fbref.com/en/squads/a2d435b3/2018-2019/Leicester-City-Stats
Leicester City : 2019 : (43, 25)
https://fbref.com/en/squads/d07537b9/2018-2019/Brighton-and-Hove-Albion-Stats
Brighton : 2019 : (45, 25)
https://fbref.com/en/squads/8cec06e1/2018-2019/Wolverhampton-Wanderers-Stats
Wolves : 2019 : (46, 25)
https://fbref.com/en/squads/b2b47a98/2018-2019/Newcastle-United-Stats
Newcastle Utd : 2019 : (42, 25)
https://fbref.com/en/squads/47c64c55/2018-2019/Crystal-Palace-Stats
Crystal Palace : 2019 : (45, 25)
https://fbref.com/en/squads/cd051869/2018-2019/Brentford-Stats
Brentford : 2019 : (53, 25)
https://fbref.com/en/squads/8602292d/2018-2019/Aston-Villa-Stats
Aston Villa : 2019 : (52, 25)
https://fbref.com/en/squads/33c895d4/2018-2019/Southampton-Stats
Southampton : 2019 : (43, 25)
https://fbref.com/en/squads/d3fd31cc/2018-2019/Everton-Stats
Everton : 2019 : (42, 25)
https://fbref.com/en/squads/5bfb9659/2018-2019/Leeds-United-Stats
Leeds United : 2019 : (51, 25)
https://fbref.com/en/squads/943e8050/2018-2019/Burnley-Stats
Burnley : 2019 : (47, 25)
https://fbref.com/en/squads/2abfe087/2018-2019/Watford-Stats
Watford : 2019 : (46, 25)
https://fbref.com/en/squads/1c781004/2018-2019/Norwich-City-Stats
Norwich City : 2019 : (51, 25)
https://fbref.com/en/squads/b8fd03ef/2017-2018/Manchester-City-Stats
Manchester City : 2018 : (57, 25)
https://fbref.com/en/squads/822bd0ba/2017-2018/Liverpool-Stats
Liverpool : 2018 : (56, 25)
https://fbref.com/en/squads/cff3d9bb/2017-2018/Chelsea-Stats
Chelsea : 2018 : (59, 25)
https://fbref.com/en/squads/361ca564/2017-2018/Tottenham-Hotspur-Stats
Tottenham : 2018 : (55, 25)
https://fbref.com/en/squads/18bb7c10/2017-2018/Arsenal-Stats
Arsenal : 2018 : (60, 25)
https://fbref.com/en/squads/19538871/2017-2018/Manchester-United-Stats
Manchester Utd : 2018 : (56, 25)
https://fbref.com/en/squads/7c21e445/2017-2018/West-Ham-United-Stats
West Ham : 2018 : (45, 25)
https://fbref.com/en/squads/a2d435b3/2017-2018/Leicester-City-Stats
Leicester City : 2018 : (47, 25)
https://fbref.com/en/squads/d07537b9/2017-2018/Brighton-and-Hove-Albion-Stats
Brighton : 2018 : (44, 25)
https://fbref.com/en/squads/8cec06e1/2017-2018/Wolverhampton-Wanderers-Stats
Column ['FK'] not in index missing from the dataframe. So adding an extra column for consistency
https://fbref.com/en/squads/b2b47a98/2017-2018/Newcastle-United-Stats
Newcastle Utd : 2018 : (41, 25)
https://fbref.com/en/squads/47c64c55/2017-2018/Crystal-Palace-Stats
Crystal Palace : 2018 : (42, 25)
https://fbref.com/en/squads/cd051869/2017-2018/Brentford-Stats
Column ['FK'] not in index missing from the dataframe. So adding an extra column for consistency
https://fbref.com/en/squads/8602292d/2017-2018/Aston-Villa-Stats
Column ['FK'] not in index missing from the dataframe. So adding an extra column for consistency
https://fbref.com/en/squads/33c895d4/2017-2018/Southampton-Stats
Southampton : 2018 : (44, 25)
https://fbref.com/en/squads/d3fd31cc/2017-2018/Everton-Stats
Everton : 2018 : (51, 25)
https://fbref.com/en/squads/5bfb9659/2017-2018/Leeds-United-Stats
Column ['FK'] not in index missing from the dataframe. So adding an extra column for consistency
https://fbref.com/en/squads/943e8050/2017-2018/Burnley-Stats
Burnley : 2018 : (41, 25)
https://fbref.com/en/squads/2abfe087/2017-2018/Watford-Stats
Watford : 2018 : (41, 25)
https://fbref.com/en/squads/1c781004/2017-2018/Norwich-City-Stats
Column ['FK'] not in index missing from the dataframe. So adding an extra column for consistency

In [ ]:

match_df = pd.concat(all_matches)
match_df.columns = [c.lower() for c in match_df.columns]
match_df.to_csv("matches.csv")

Data Analysis

Profitable App Profiles for the App Store and Google Play Markets

In this post, I analyze apps in Google Play and Apple store to understand the type of apps that attract more users. Based on the analysis, a new app may be developed for english speaking users, which will be available for free on the popular app stores. The developers will earn revenue through in-app ads. The more users that see and engage with adds, the better.

40 min reading

Data Analysis

Exploring Hacker News Post

In this post, I analyze the posts in Hacker News to identify the optimal timing for creating a post and the type of posts that recieve more comments and. Based on the analysis, the 'ask posts' recieve more comments than 'show posts' on average. Moreover, the 'ask posts' receieve 38.59 comments on average if a post is createed at 3.00pm in Eastern timezone.

20 min reading

Data Visualization

Finding Heavy Traffic Indicators

In this post, I analyze a dataset about the westbound traffic on the I-94 Interstate highway to determine a few indicators of heavy traffic on I-94. These indicators can be weather type, time of the day, time of the week, month of the year, etc.

25 min reading

Machine Learning

Predicting the winner for English Premier League football matches

In this post, I build a pipeline for preprocessing web-scrapped data of EPL matches and predicting the winner of a match using machine learning.

60 min reading

Related Posts

Profitable App Profiles for the App Store and Google Play Markets

Exploring Hacker News Post

Finding Heavy Traffic Indicators

Predicting the winner for English Premier League football matches