Web scrapping data for English Premier League football matches

In this project, we're going to complete a machine learning project on the English Premier League (EPL) football matches. The final goal of the project is to predict the winner of each football match. At first, we're going to use web scraping to get the necessary data on the EPL match results from this page. Let's download the HTML for that page and then explore it in the web browser's inspector. We want to extract the first table — League Table — that lists every team in the league and its stats. In particular, we need to fetch the URL for each team to be able to grab the match log for the season from each of them.

In [ ]:
import requests
In [ ]:
URL = "https://fbref.com/en/comps/9/2021-2022/2021-2022-Premier-League-Stats"
response = requests.get(URL)
if response.status_code == 200:
    content = response.content
else:
    print("Couldn't download the web page")

Let's explore the page first in the web browser's inspector to identify which HTML tag is associated with the URLs of the teams. After some exploration, we identify the id of the 'League Table'

In [ ]:
from bs4 import BeautifulSoup
parser = BeautifulSoup(content, 'html.parser')
league_table = parser.select('#results2021-202291_overall')[0]

We notice that the table rows that contain the the URL for each team statistics have a special attribute 'data-stat' = 'team'. We use this information to select the desired rows and finally scrap the URL from those rows

In [ ]:
team_data = league_table.find_all("td", attrs={'data-stat' : 'team'})
team_stat_URL = {}
for team in team_data:
    # The table only contains partial URL. We add the domain name to get the full URL
    URL = "https://fbref.com" + team.select('a')[0]['href']
    team_name = team.select('a')[0].text
    team_stat_URL[team_name] = URL
print(team_stat_URL)
{'Manchester City': 'https://fbref.com/en/squads/b8fd03ef/2021-2022/Manchester-City-Stats', 'Liverpool': 'https://fbref.com/en/squads/822bd0ba/2021-2022/Liverpool-Stats', 'Chelsea': 'https://fbref.com/en/squads/cff3d9bb/2021-2022/Chelsea-Stats', 'Tottenham': 'https://fbref.com/en/squads/361ca564/2021-2022/Tottenham-Hotspur-Stats', 'Arsenal': 'https://fbref.com/en/squads/18bb7c10/2021-2022/Arsenal-Stats', 'Manchester Utd': 'https://fbref.com/en/squads/19538871/2021-2022/Manchester-United-Stats', 'West Ham': 'https://fbref.com/en/squads/7c21e445/2021-2022/West-Ham-United-Stats', 'Leicester City': 'https://fbref.com/en/squads/a2d435b3/2021-2022/Leicester-City-Stats', 'Brighton': 'https://fbref.com/en/squads/d07537b9/2021-2022/Brighton-and-Hove-Albion-Stats', 'Wolves': 'https://fbref.com/en/squads/8cec06e1/2021-2022/Wolverhampton-Wanderers-Stats', 'Newcastle Utd': 'https://fbref.com/en/squads/b2b47a98/2021-2022/Newcastle-United-Stats', 'Crystal Palace': 'https://fbref.com/en/squads/47c64c55/2021-2022/Crystal-Palace-Stats', 'Brentford': 'https://fbref.com/en/squads/cd051869/2021-2022/Brentford-Stats', 'Aston Villa': 'https://fbref.com/en/squads/8602292d/2021-2022/Aston-Villa-Stats', 'Southampton': 'https://fbref.com/en/squads/33c895d4/2021-2022/Southampton-Stats', 'Everton': 'https://fbref.com/en/squads/d3fd31cc/2021-2022/Everton-Stats', 'Leeds United': 'https://fbref.com/en/squads/5bfb9659/2021-2022/Leeds-United-Stats', 'Burnley': 'https://fbref.com/en/squads/943e8050/2021-2022/Burnley-Stats', 'Watford': 'https://fbref.com/en/squads/2abfe087/2021-2022/Watford-Stats', 'Norwich City': 'https://fbref.com/en/squads/1c781004/2021-2022/Norwich-City-Stats'}

Now that we have a list of the URLs, one for each team, we can get the stats we want. Let's start with the first team: Manchester City. After exploring the web page for the team, we decide to parse the table named "Scores & Fixture" for our analysis. The parsed table is read into a pandas dataframe for our convenience.

In [ ]:
import pandas as pd
link = team_stat_URL['Manchester City']
response_MC = requests.get(link)
if response.status_code == 200:
    content_MC = response_MC.content
    score_tables = pd.read_html(content_MC, match="Scores & Fixtures")
    score_df = score_tables[0]
else:
    print("The page couldn't be downloaded for team {}".format("Manchester City"))
In [ ]:
score_df.head()
Out[ ]:
Date Time Comp Round Day Venue Result GF GA Opponent xG xGA Poss Attendance Captain Formation Referee Match Report Notes
0 2021-08-07 17:15 Community Shield FA Community Shield Sat Neutral L 0 1 Leicester City NaN NaN 57 NaN Fernandinho 4-3-3 Paul Tierney Match Report NaN
1 2021-08-15 16:30 Premier League Matchweek 1 Sun Away L 0 1 Tottenham 2.0 1.0 65 58262.0 Fernandinho 4-3-3 Anthony Taylor Match Report NaN
2 2021-08-21 15:00 Premier League Matchweek 2 Sat Home W 5 0 Norwich City 2.7 0.1 67 51437.0 İlkay Gündoğan 4-3-3 Graham Scott Match Report NaN
3 2021-08-28 12:30 Premier League Matchweek 3 Sat Home W 5 0 Arsenal 4.0 0.2 80 52276.0 İlkay Gündoğan 4-3-3 Martin Atkinson Match Report NaN
4 2021-09-11 15:00 Premier League Matchweek 4 Sat Away W 1 0 Leicester City 3.3 0.6 61 32087.0 İlkay Gündoğan 4-3-3 Paul Tierney Match Report NaN
In [ ]:
score_df.tail()
Out[ ]:
Date Time Comp Round Day Venue Result GF GA Opponent xG xGA Poss Attendance Captain Formation Referee Match Report Notes
53 2022-05-04 21:00 Champions Lg Semi-finals Wed Away L 1 3 es Real Madrid 1.4 2.3 55 61416.0 Rúben Dias 4-3-3 Daniele Orsato Match Report Leg 2 of 2; Real Madrid won; Required Extra Time
54 2022-05-08 16:30 Premier League Matchweek 36 Sun Home W 5 0 Newcastle Utd 3.3 0.8 71 53336.0 İlkay Gündoğan 4-2-3-1 Stuart Attwell Match Report NaN
55 2022-05-11 20:15 Premier League Matchweek 33 Wed Away W 5 1 Wolves 2.8 0.5 66 32000.0 Fernandinho 4-2-3-1 Martin Atkinson Match Report NaN
56 2022-05-15 14:00 Premier League Matchweek 37 Sun Away D 2 2 West Ham 2.9 1.8 78 59972.0 Fernandinho 4-3-3 Anthony Taylor Match Report NaN
57 2022-05-22 16:00 Premier League Matchweek 38 Sun Home W 3 2 Aston Villa 3.7 0.3 71 53395.0 Fernandinho 4-3-3 Michael Oliver Match Report NaN

Here is a brief description of some columns in the table that can't be interpreted easily from their name

  • Comp : Competition
  • Round: Phase of competition
  • GF : Goal for the team
  • GA : Goal against the team
  • xG : Expected goals
  • xGA : Expected goals allowed
  • Poss : Possession as a percentage of passes attempted

As we can observe, there is something we don't have in the table with scores and fixtures: the details about each match, such as the number of shots, the number of shots on target, the number of free kicks, and the number of penalty kicks. We can find some of these stats in the table under the Shooting tab. Let's find and download the table containing the shooting stats for the Manchester City team and read it in a pandas DataFrame.

In [ ]:
MC_parser = BeautifulSoup(content_MC, 'html.parser')
parsed_links = MC_parser.select(".filter a") # After exploring the web page, we find that the desired URL can be found inside the body of <div> tag with class="filter"
shooting_tab_link = ["https://fbref.com" + link['href'] for link in parsed_links if link.text == "Shooting"]
print(shooting_tab_link)
['https://fbref.com/en/squads/b8fd03ef/2021-2022/matchlogs/all_comps/shooting/Manchester-City-Match-Logs-All-Competitions']
In [ ]:
response_shooting = requests.get(*shooting_tab_link)
if response_shooting.status_code == 200:
    shooting_html = response_shooting.content
    shooting_tables = pd.read_html(shooting_html, match="Shooting ")
    shooting_df = shooting_tables[0]
else:
    print("Couldn't download the shooting page")
In [ ]:
shooting_df.head()
Out[ ]:
For Manchester City ... Standard Expected Unnamed: 25_level_0
Date Time Comp Round Day Venue Result GF GA Opponent ... Dist FK PK PKatt xG npxG npxG/Sh G-xG np:G-xG Match Report
0 2021-08-07 17:15 Community Shield FA Community Shield Sat Neutral L 0 1 Leicester City ... NaN NaN 0 0 NaN NaN NaN NaN NaN Match Report
1 2021-08-15 16:30 Premier League Matchweek 1 Sun Away L 0 1 Tottenham ... 17.3 1.0 0 0 2.0 2.0 0.11 -2.0 -2.0 Match Report
2 2021-08-21 15:00 Premier League Matchweek 2 Sat Home W 5 0 Norwich City ... 18.5 1.0 0 0 2.7 2.7 0.17 1.3 1.3 Match Report
3 2021-08-28 12:30 Premier League Matchweek 3 Sat Home W 5 0 Arsenal ... 14.8 0.0 0 0 4.0 4.0 0.16 1.0 1.0 Match Report
4 2021-09-11 15:00 Premier League Matchweek 4 Sat Away W 1 0 Leicester City ... 14.3 0.0 0 0 3.3 3.3 0.14 -2.3 -2.3 Match Report

5 rows × 26 columns

The dataframe has multi-level index, which is not important for our purpose. So, we can drop the multi-level index. After that we have two DataFrames: the matches and shootings. Since both refer to the same matches, we can combine these DataFrames.

In [ ]:
shooting_df.columns = shooting_df.columns.droplevel()
shooting_df.head()
Out[ ]:
Date Time Comp Round Day Venue Result GF GA Opponent ... Dist FK PK PKatt xG npxG npxG/Sh G-xG np:G-xG Match Report
0 2021-08-07 17:15 Community Shield FA Community Shield Sat Neutral L 0 1 Leicester City ... NaN NaN 0 0 NaN NaN NaN NaN NaN Match Report
1 2021-08-15 16:30 Premier League Matchweek 1 Sun Away L 0 1 Tottenham ... 17.3 1.0 0 0 2.0 2.0 0.11 -2.0 -2.0 Match Report
2 2021-08-21 15:00 Premier League Matchweek 2 Sat Home W 5 0 Norwich City ... 18.5 1.0 0 0 2.7 2.7 0.17 1.3 1.3 Match Report
3 2021-08-28 12:30 Premier League Matchweek 3 Sat Home W 5 0 Arsenal ... 14.8 0.0 0 0 4.0 4.0 0.16 1.0 1.0 Match Report
4 2021-09-11 15:00 Premier League Matchweek 4 Sat Away W 1 0 Leicester City ... 14.3 0.0 0 0 3.3 3.3 0.14 -2.3 -2.3 Match Report

5 rows × 26 columns

Both score and shooting dataframe have multiple common columns. The unique columns in the shooting dataframe are listed below:

  • Sh : Shots Total (Does not include penalty kicks)
  • SoT : Shots on Target (Without penalty kicks)
  • Dist : Average distance travelled by a shot
  • FK : Number of free kicks
  • PK : Pealty kicks mades
  • PKatt: Penalty kicks attempted

These unique columns are merged with the score dataframe

In [ ]:
team_data = score_df.merge(shooting_df[["Date", "Sh", "SoT", "Dist", "FK", "PK", "PKatt"]], on="Date")
In [ ]:
team_data.head()
Out[ ]:
Date Time Comp Round Day Venue Result GF GA Opponent ... Formation Referee Match Report Notes Sh SoT Dist FK PK PKatt
0 2021-08-07 17:15 Community Shield FA Community Shield Sat Neutral L 0 1 Leicester City ... 4-3-3 Paul Tierney Match Report NaN 12 3 NaN NaN 0 0
1 2021-08-15 16:30 Premier League Matchweek 1 Sun Away L 0 1 Tottenham ... 4-3-3 Anthony Taylor Match Report NaN 18 4 17.3 1.0 0 0
2 2021-08-21 15:00 Premier League Matchweek 2 Sat Home W 5 0 Norwich City ... 4-3-3 Graham Scott Match Report NaN 16 4 18.5 1.0 0 0
3 2021-08-28 12:30 Premier League Matchweek 3 Sat Home W 5 0 Arsenal ... 4-3-3 Martin Atkinson Match Report NaN 25 10 14.8 0.0 0 0
4 2021-09-11 15:00 Premier League Matchweek 4 Sat Away W 1 0 Leicester City ... 4-3-3 Paul Tierney Match Report NaN 25 8 14.3 0.0 0 0

5 rows × 25 columns

Now let's repeat these steps for each team who played last 5 seasons of EPL

In [ ]:
import time
import re
import pandas as pd

years = list(range(2022, 2017, -1))
all_matches = []

for year in years:
    for team in team_stat_URL:
        link = team_stat_URL[team]
        print(link)
        
        response_status_code = 401

        while response_status_code != 200:
            response = requests.get(link)
            if response.status_code == 200:
                content = response.content
                score_tables = pd.read_html(content, match="Scores & Fixtures")
                score_df = score_tables[0]
                response_status_code = 200
            else:
                print("The page couldn't be downloaded for team {}. Trying again".format(team))
                time.sleep(1)
        
        parser = BeautifulSoup(content, 'html.parser')
        parsed_links = parser.select(".filter a") # After exploring the web page, we find that the desired URL can be found inside the body of <div> tag with class="filter"
        shooting_tab_link = ["https://fbref.com" + link['href'] for link in parsed_links if link.text == "Shooting"]
        
        response_shooting_status_code = 401
        
        while response_shooting_status_code != 200:
            response_shooting = requests.get(*shooting_tab_link)
            
            if response_shooting.status_code == 200:
                shooting_html = response_shooting.content
                shooting_tables = pd.read_html(shooting_html, match="Shooting ")
                shooting_df = shooting_tables[0]
                response_shooting_status_code = 200
            else:
                print("Couldn't download the shooting page for team {}. Trying again".format(team))
                time.sleep(1)
            
        shooting_df.columns = shooting_df.columns.droplevel()
        
        try:
            team_data = score_df.merge(shooting_df[["Date", "Sh", "SoT", "Dist", "FK", "PK", "PKatt"]], on="Date")
            print(f"{team} : {year} : {team_data.shape}")
        except ValueError:
            continue
        except KeyError as e:
            print(f"Column {e.args[0]} missing from the dataframe. So adding an extra column for consistency")
            team_data['FK'] = None
            
        # Our goal is to predict winners for EPL match. So, ignore any data not within our scope
        team_data = team_data[team_data["Comp"] == "Premier League"]
        
        # Adding extra columns to keep track of the team name and season 
        team_data["Season"] = year
        team_data["Team"] = team
        
        all_matches.append(team_data)
        time.sleep(1)
        
        # Update the links in team_stat_URL for scrapping data for previous season
        season_links = parser.select(".prevnext a")
        prev_season_link = "https://fbref.com" + season_links[0]['href']
        team_stat_URL[team] = prev_season_link
        
    time.sleep(300)
https://fbref.com/en/squads/b8fd03ef/2021-2022/Manchester-City-Stats
Manchester City : 2022 : (58, 25)
https://fbref.com/en/squads/822bd0ba/2021-2022/Liverpool-Stats
Liverpool : 2022 : (63, 25)
https://fbref.com/en/squads/cff3d9bb/2021-2022/Chelsea-Stats
Chelsea : 2022 : (61, 25)
https://fbref.com/en/squads/361ca564/2021-2022/Tottenham-Hotspur-Stats
Tottenham : 2022 : (54, 25)
https://fbref.com/en/squads/18bb7c10/2021-2022/Arsenal-Stats
Arsenal : 2022 : (45, 25)
https://fbref.com/en/squads/19538871/2021-2022/Manchester-United-Stats
Manchester Utd : 2022 : (49, 25)
https://fbref.com/en/squads/7c21e445/2021-2022/West-Ham-United-Stats
West Ham : 2022 : (56, 25)
https://fbref.com/en/squads/a2d435b3/2021-2022/Leicester-City-Stats
Leicester City : 2022 : (58, 25)
https://fbref.com/en/squads/d07537b9/2021-2022/Brighton-and-Hove-Albion-Stats
Brighton : 2022 : (43, 25)
https://fbref.com/en/squads/8cec06e1/2021-2022/Wolverhampton-Wanderers-Stats
Wolves : 2022 : (42, 25)
https://fbref.com/en/squads/b2b47a98/2021-2022/Newcastle-United-Stats
Newcastle Utd : 2022 : (40, 25)
https://fbref.com/en/squads/47c64c55/2021-2022/Crystal-Palace-Stats
Crystal Palace : 2022 : (44, 25)
https://fbref.com/en/squads/cd051869/2021-2022/Brentford-Stats
Brentford : 2022 : (44, 25)
https://fbref.com/en/squads/8602292d/2021-2022/Aston-Villa-Stats
Aston Villa : 2022 : (41, 25)
https://fbref.com/en/squads/33c895d4/2021-2022/Southampton-Stats
Southampton : 2022 : (45, 25)
https://fbref.com/en/squads/d3fd31cc/2021-2022/Everton-Stats
Everton : 2022 : (44, 25)
https://fbref.com/en/squads/5bfb9659/2021-2022/Leeds-United-Stats
Leeds United : 2022 : (42, 25)
https://fbref.com/en/squads/943e8050/2021-2022/Burnley-Stats
Burnley : 2022 : (42, 25)
https://fbref.com/en/squads/2abfe087/2021-2022/Watford-Stats
Watford : 2022 : (41, 25)
https://fbref.com/en/squads/1c781004/2021-2022/Norwich-City-Stats
Norwich City : 2022 : (43, 25)
https://fbref.com/en/squads/b8fd03ef/2020-2021/Manchester-City-Stats
Manchester City : 2021 : (61, 25)
https://fbref.com/en/squads/822bd0ba/2020-2021/Liverpool-Stats
Liverpool : 2021 : (53, 25)
https://fbref.com/en/squads/cff3d9bb/2020-2021/Chelsea-Stats
Chelsea : 2021 : (59, 25)
https://fbref.com/en/squads/361ca564/2020-2021/Tottenham-Hotspur-Stats
Tottenham : 2021 : (59, 25)
https://fbref.com/en/squads/18bb7c10/2020-2021/Arsenal-Stats
Arsenal : 2021 : (58, 25)
https://fbref.com/en/squads/19538871/2020-2021/Manchester-United-Stats
Manchester Utd : 2021 : (61, 25)
https://fbref.com/en/squads/7c21e445/2020-2021/West-Ham-United-Stats
West Ham : 2021 : (44, 25)
https://fbref.com/en/squads/a2d435b3/2020-2021/Leicester-City-Stats
Leicester City : 2021 : (53, 25)
https://fbref.com/en/squads/d07537b9/2020-2021/Brighton-and-Hove-Albion-Stats
Brighton : 2021 : (44, 25)
https://fbref.com/en/squads/8cec06e1/2020-2021/Wolverhampton-Wanderers-Stats
Wolves : 2021 : (42, 25)
https://fbref.com/en/squads/b2b47a98/2020-2021/Newcastle-United-Stats
Newcastle Utd : 2021 : (43, 25)
https://fbref.com/en/squads/47c64c55/2020-2021/Crystal-Palace-Stats
Crystal Palace : 2021 : (40, 25)
https://fbref.com/en/squads/cd051869/2020-2021/Brentford-Stats
Brentford : 2021 : (57, 25)
https://fbref.com/en/squads/8602292d/2020-2021/Aston-Villa-Stats
Aston Villa : 2021 : (42, 25)
https://fbref.com/en/squads/33c895d4/2020-2021/Southampton-Stats
Southampton : 2021 : (44, 25)
https://fbref.com/en/squads/d3fd31cc/2020-2021/Everton-Stats
Everton : 2021 : (46, 25)
https://fbref.com/en/squads/5bfb9659/2020-2021/Leeds-United-Stats
Leeds United : 2021 : (40, 25)
https://fbref.com/en/squads/943e8050/2020-2021/Burnley-Stats
Burnley : 2021 : (44, 25)
https://fbref.com/en/squads/2abfe087/2020-2021/Watford-Stats
Watford : 2021 : (49, 25)
https://fbref.com/en/squads/1c781004/2020-2021/Norwich-City-Stats
Norwich City : 2021 : (49, 25)
https://fbref.com/en/squads/b8fd03ef/2019-2020/Manchester-City-Stats
Manchester City : 2020 : (59, 25)
https://fbref.com/en/squads/822bd0ba/2019-2020/Liverpool-Stats
Liverpool : 2020 : (55, 25)
https://fbref.com/en/squads/cff3d9bb/2019-2020/Chelsea-Stats
Chelsea : 2020 : (55, 25)
https://fbref.com/en/squads/361ca564/2019-2020/Tottenham-Hotspur-Stats
Tottenham : 2020 : (52, 25)
https://fbref.com/en/squads/18bb7c10/2019-2020/Arsenal-Stats
Arsenal : 2020 : (54, 25)
https://fbref.com/en/squads/19538871/2019-2020/Manchester-United-Stats
Manchester Utd : 2020 : (61, 25)
https://fbref.com/en/squads/7c21e445/2019-2020/West-Ham-United-Stats
West Ham : 2020 : (42, 25)
https://fbref.com/en/squads/a2d435b3/2019-2020/Leicester-City-Stats
Leicester City : 2020 : (48, 25)
https://fbref.com/en/squads/d07537b9/2019-2020/Brighton-and-Hove-Albion-Stats
Brighton : 2020 : (41, 25)
https://fbref.com/en/squads/8cec06e1/2019-2020/Wolverhampton-Wanderers-Stats
Wolves : 2020 : (59, 25)
https://fbref.com/en/squads/b2b47a98/2019-2020/Newcastle-United-Stats
Newcastle Utd : 2020 : (45, 25)
https://fbref.com/en/squads/47c64c55/2019-2020/Crystal-Palace-Stats
Crystal Palace : 2020 : (40, 25)
https://fbref.com/en/squads/cd051869/2019-2020/Brentford-Stats
Brentford : 2020 : (52, 25)
https://fbref.com/en/squads/8602292d/2019-2020/Aston-Villa-Stats
Aston Villa : 2020 : (46, 25)
https://fbref.com/en/squads/33c895d4/2019-2020/Southampton-Stats
Southampton : 2020 : (44, 25)
https://fbref.com/en/squads/d3fd31cc/2019-2020/Everton-Stats
Everton : 2020 : (43, 25)
https://fbref.com/en/squads/5bfb9659/2019-2020/Leeds-United-Stats
Leeds United : 2020 : (49, 25)
https://fbref.com/en/squads/943e8050/2019-2020/Burnley-Stats
Burnley : 2020 : (41, 25)
https://fbref.com/en/squads/2abfe087/2019-2020/Watford-Stats
Watford : 2020 : (43, 25)
https://fbref.com/en/squads/1c781004/2019-2020/Norwich-City-Stats
Norwich City : 2020 : (43, 25)
https://fbref.com/en/squads/b8fd03ef/2018-2019/Manchester-City-Stats
Manchester City : 2019 : (61, 25)
https://fbref.com/en/squads/822bd0ba/2018-2019/Liverpool-Stats
Liverpool : 2019 : (53, 25)
https://fbref.com/en/squads/cff3d9bb/2018-2019/Chelsea-Stats
Chelsea : 2019 : (63, 25)
https://fbref.com/en/squads/361ca564/2018-2019/Tottenham-Hotspur-Stats
Tottenham : 2019 : (58, 25)
https://fbref.com/en/squads/18bb7c10/2018-2019/Arsenal-Stats
Arsenal : 2019 : (58, 25)
https://fbref.com/en/squads/19538871/2018-2019/Manchester-United-Stats
Manchester Utd : 2019 : (53, 25)
https://fbref.com/en/squads/7c21e445/2018-2019/West-Ham-United-Stats
West Ham : 2019 : (43, 25)
https://fbref.com/en/squads/a2d435b3/2018-2019/Leicester-City-Stats
Leicester City : 2019 : (43, 25)
https://fbref.com/en/squads/d07537b9/2018-2019/Brighton-and-Hove-Albion-Stats
Brighton : 2019 : (45, 25)
https://fbref.com/en/squads/8cec06e1/2018-2019/Wolverhampton-Wanderers-Stats
Wolves : 2019 : (46, 25)
https://fbref.com/en/squads/b2b47a98/2018-2019/Newcastle-United-Stats
Newcastle Utd : 2019 : (42, 25)
https://fbref.com/en/squads/47c64c55/2018-2019/Crystal-Palace-Stats
Crystal Palace : 2019 : (45, 25)
https://fbref.com/en/squads/cd051869/2018-2019/Brentford-Stats
Brentford : 2019 : (53, 25)
https://fbref.com/en/squads/8602292d/2018-2019/Aston-Villa-Stats
Aston Villa : 2019 : (52, 25)
https://fbref.com/en/squads/33c895d4/2018-2019/Southampton-Stats
Southampton : 2019 : (43, 25)
https://fbref.com/en/squads/d3fd31cc/2018-2019/Everton-Stats
Everton : 2019 : (42, 25)
https://fbref.com/en/squads/5bfb9659/2018-2019/Leeds-United-Stats
Leeds United : 2019 : (51, 25)
https://fbref.com/en/squads/943e8050/2018-2019/Burnley-Stats
Burnley : 2019 : (47, 25)
https://fbref.com/en/squads/2abfe087/2018-2019/Watford-Stats
Watford : 2019 : (46, 25)
https://fbref.com/en/squads/1c781004/2018-2019/Norwich-City-Stats
Norwich City : 2019 : (51, 25)
https://fbref.com/en/squads/b8fd03ef/2017-2018/Manchester-City-Stats
Manchester City : 2018 : (57, 25)
https://fbref.com/en/squads/822bd0ba/2017-2018/Liverpool-Stats
Liverpool : 2018 : (56, 25)
https://fbref.com/en/squads/cff3d9bb/2017-2018/Chelsea-Stats
Chelsea : 2018 : (59, 25)
https://fbref.com/en/squads/361ca564/2017-2018/Tottenham-Hotspur-Stats
Tottenham : 2018 : (55, 25)
https://fbref.com/en/squads/18bb7c10/2017-2018/Arsenal-Stats
Arsenal : 2018 : (60, 25)
https://fbref.com/en/squads/19538871/2017-2018/Manchester-United-Stats
Manchester Utd : 2018 : (56, 25)
https://fbref.com/en/squads/7c21e445/2017-2018/West-Ham-United-Stats
West Ham : 2018 : (45, 25)
https://fbref.com/en/squads/a2d435b3/2017-2018/Leicester-City-Stats
Leicester City : 2018 : (47, 25)
https://fbref.com/en/squads/d07537b9/2017-2018/Brighton-and-Hove-Albion-Stats
Brighton : 2018 : (44, 25)
https://fbref.com/en/squads/8cec06e1/2017-2018/Wolverhampton-Wanderers-Stats
Column ['FK'] not in index missing from the dataframe. So adding an extra column for consistency
https://fbref.com/en/squads/b2b47a98/2017-2018/Newcastle-United-Stats
Newcastle Utd : 2018 : (41, 25)
https://fbref.com/en/squads/47c64c55/2017-2018/Crystal-Palace-Stats
Crystal Palace : 2018 : (42, 25)
https://fbref.com/en/squads/cd051869/2017-2018/Brentford-Stats
Column ['FK'] not in index missing from the dataframe. So adding an extra column for consistency
https://fbref.com/en/squads/8602292d/2017-2018/Aston-Villa-Stats
Column ['FK'] not in index missing from the dataframe. So adding an extra column for consistency
https://fbref.com/en/squads/33c895d4/2017-2018/Southampton-Stats
Southampton : 2018 : (44, 25)
https://fbref.com/en/squads/d3fd31cc/2017-2018/Everton-Stats
Everton : 2018 : (51, 25)
https://fbref.com/en/squads/5bfb9659/2017-2018/Leeds-United-Stats
Column ['FK'] not in index missing from the dataframe. So adding an extra column for consistency
https://fbref.com/en/squads/943e8050/2017-2018/Burnley-Stats
Burnley : 2018 : (41, 25)
https://fbref.com/en/squads/2abfe087/2017-2018/Watford-Stats
Watford : 2018 : (41, 25)
https://fbref.com/en/squads/1c781004/2017-2018/Norwich-City-Stats
Column ['FK'] not in index missing from the dataframe. So adding an extra column for consistency
In [ ]:
match_df = pd.concat(all_matches)
match_df.columns = [c.lower() for c in match_df.columns]
match_df.to_csv("matches.csv")


Related Posts

    Data Analysis

    Profitable App Profiles for the App Store and Google Play Markets

    In this post, I analyze apps in Google Play and Apple store to understand the type of apps that attract more users. Based on the analysis, a new app may be developed for english speaking users, which will be available for free on the popular app stores. The developers will earn revenue through in-app ads. The more users that see and engage with adds, the better.

    40 min reading
    Data Analysis

    Exploring Hacker News Post

    In this post, I analyze the posts in Hacker News to identify the optimal timing for creating a post and the type of posts that recieve more comments and. Based on the analysis, the 'ask posts' recieve more comments than 'show posts' on average. Moreover, the 'ask posts' receieve 38.59 comments on average if a post is createed at 3.00pm in Eastern timezone.

    20 min reading
    Data Visualization

    Finding Heavy Traffic Indicators

    In this post, I analyze a dataset about the westbound traffic on the I-94 Interstate highway to determine a few indicators of heavy traffic on I-94. These indicators can be weather type, time of the day, time of the week, month of the year, etc.

    25 min reading
    Machine Learning

    Predicting the winner for English Premier League football matches

    In this post, I build a pipeline for preprocessing web-scrapped data of EPL matches and predicting the winner of a match using machine learning.

    60 min reading