A Data-Driven Approach to Ranking Teams in Uneven Paired Competition

This article was first published on The Pleasure of Finding Things Out: A blog by James Triveri , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

I recently came across The Perron-Frobenius and the Ranking of Football Teams, an interesting paper in which the author describes four different methods to rank teams in uneven paired competition. He goes on to show how each of these methods depends in some way on the Perron-Frobenius theorem. The Perron-Frobenius theorem provides key insights into the structure of non-negative matrices, especially in terms of their largest eigenvalue and associated eigenvector. For irreducible non-negative matrices, the theorem guarantees the existence of a dominant eigenvalue that is real, simple, and larger than all others in magnitude, with a corresponding non-negative eigenvector.

An uneven paired competition is one in which the outcome of competition between pairs of teams is known, but the pairings are not evenly matched, meaning the competition is not a round robin in which each team is paired with every other team an equal number of times. A good example is regular season football in-conference play for any of the major NCAA Division I conferences: For the 2023 season, the Big 12 had 14 teams, but each team had only 9 conference games.

Here we focus on the first ranking method, which the author refers to as the “direct method”. The direct method formulates the ranking approach as a linear eigenvalue problem which makes direct use of the Perron-Frobenius theorem. For each team under consideration, the goal is to assign a score to each team based on its interactions with other teams, with the goal that the assigned score reflect both the interactions as well as the strength of opponents. We will then compare our data-driven ranking approach with the final regular season standings and assess how they line up. A similar exercise will be performed focusing on the 2021 MLB regular season.

Creating the Adjacency Matrix

It is first necessary to construct the adjacency matrix in order to encode interactions between teams. Big 12 2023 regular season football results were obtained here. Within the matrix, the value in cell is set to 1 if team defeated team , and 0 otherwise. For games that resulted in tie, , but there were no such cases in 2023 Big 12 regular season conference play.

The regular season rankings and adjacency matrix can be downloaded from GitHub (links available in the next cell):

%load_ext watermark

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import networkx as nx
from numpy.linalg import eig

np.set_printoptions(suppress=True, precision=5)
pd.options.mode.chained_assignment = None
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)


big12_matrix_url = "https://gist.githubusercontent.com/jtrive84/b9b4ff8620f90045a0377f27ec4eb50f/raw/e6923583530edadbe9da1a1f4821e415d8a7e6f2/2023-big-12-adjacency.csv"
big12_rankings_url = "https://gist.githubusercontent.com/jtrive84/0207b8fd18a05e096a89498290b08d4a/raw/462d2b1bef52d96ae20e077f55501bfa23951ae4/2023-big-12-rankings.csv"

# ------------------------------------------------------------------------------

%watermark --python --conda --hostname --machine --iversions
Python implementation: CPython
Python version       : 3.11.10
IPython version      : 8.28.0

conda environment: py311

Compiler    : MSC v.1941 64 bit (AMD64)
OS          : Windows
Release     : 10
Machine     : AMD64
Processor   : Intel64 Family 6 Model 170 Stepping 4, GenuineIntel
CPU cores   : 22
Architecture: 64bit

Hostname: JTRIZPC11

numpy     : 2.1.0
matplotlib: 3.9.2
pandas    : 2.2.2
networkx  : 3.3

Big 12 regular season rankings for 2023:

ranks_big12 = pd.read_csv(big12_rankings_url)

ranks_big12.head(15)
teamconf_winsconf_lossesoverall_winsoverall_lossesconf_win_pctoverall_win_pct
0Texas811220.8890.857
1Oklahoma State721040.7780.714
2Oklahoma721030.7780.769
3Iowa State63760.6670.538
4Kansas State63940.6670.692
5West Virginia63940.6670.692
6Texas Tech54760.5560.538
7Kansas54940.5560.692
8UCF36670.3330.462
9TCU36570.3330.417
10Houston27480.2220.333
11BYU27570.2220.417
12Baylor27390.2220.250
13Cincinnati18390.1110.250

The adjacency matrix considers only conference play (non-conference games excluded):

adj_big12 = pd.read_csv(big12_matrix_url)

adj_big12.head(15)
Unnamed: 0BaylorBYUCincinnatiHoustonIowa StateKansasKansas StateOklahomaOklahoma StateTCUTexasTexas TechUCFWest Virginia
0Baylor00100000000010
1BYU00100000000100
2Cincinnati00010000000000
3Houston10000000000001
4Iowa State11100010110000
5Kansas01101001000010
6Kansas State10010100010110
7Oklahoma01101000011011
8Oklahoma State01110111000001
9TCU11010000000000
10Texas11011110010100
11Texas Tech10010100010010
12UCF00110000100000
13West Virginia11100000010110

For each row in adj_big12, 1 indicates that team at row defeated the team in column . For example, Oklahoma defeated BYU 31-24 in 2023, so the value at the intersection of row Oklahoma and column BYU is 1. The value at the intersection of row BYU and column Oklahoma is 0, since BYU did not defeat Oklahoma in 2023, and they only faced each other in one contest.

The sum of each row in the adjacency matrix represents the number of regular season wins in conference play for a given team. Texas was 8-1 in 2023 regular season conference play, therefore the sum of the Texas row is 8. The columnar sum represents the number of losses for a given team (for Texas, this is 1).

We can use NetworkX to visualize the relationships encoded in the adjacency matrix (each node label corresponds to the alphabetical enumeration of teams: 0=Baylor, 1=BYU, … 13=West Virginia). Edges indicate whether team and team faced each other in a regular season contest:

import networkx as nx
import matplotlib.pyplot as plt

# Create adjacency matrix as Numpy array. 
# team_names = dfadj["Unnamed: 0"].values
A = adj_big12.drop("Unnamed: 0", axis=1).values
G = nx.from_numpy_array(A)

fig, ax = plt.subplots(1, 1, figsize=(7.5, 5), tight_layout=True)
ax.set_title(
    "2023 Big-12 Regular Season Football Matchups", 
    color="#000000", loc="center", weight="normal", fontsize=9
)
nx.draw_networkx(
    G, node_color="#E02C70", node_size=350, ax=ax, with_labels=True, 
    edge_color="grey", width=.25, pos=nx.spring_layout(G, seed=516)
)

The adjacency matrix, :

A
array([[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
       [0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
       [1, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0],
       [0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0],
       [1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0],
       [0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1],
       [0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1],
       [1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0],
       [1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0],
       [0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0],
       [1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0]])

If we create an initial ranking vector with all values set to 1, the the component of is the winning percentage for team (after dividing by the number of games played):

r0 = np.ones(14)
win_pcts = A @ r0 / 9

pairs = zip(adj_big12.columns[1:], win_pcts.tolist())

for tt in pairs:
    print(tt)
('Baylor', 0.2222222222222222)
('BYU', 0.2222222222222222)
('Cincinnati', 0.1111111111111111)
('Houston', 0.2222222222222222)
('Iowa State', 0.6666666666666666)
('Kansas', 0.5555555555555556)
('Kansas State', 0.6666666666666666)
('Oklahoma', 0.7777777777777778)
('Oklahoma State', 0.7777777777777778)
('TCU', 0.3333333333333333)
('Texas', 0.8888888888888888)
('Texas Tech', 0.5555555555555556)
('UCF', 0.3333333333333333)
('West Virginia', 0.6666666666666666)

This aligns with values in the conf_win_pct column from ranks_big12.

The component of represents the average winning percentage of the teams that team defeated. As the author highlights, can be considered a proxy for strength of schedule. In the limit as goes to infinity, converges to the unique positive eigenvector of , and the magnitude of the entries of this eigenvector gives a ranking of teams.

If has nonnegative entries (which will always be the case given out definition of ), then it has an eigenvector with non-negative entries associated with a positive eigenvalue . If is irreducible, then has strictly positive entries and the corresponding eigenvalue is the one of largest absolute value. Note however that if is an eigenvector of , so is . In practice, we simply take the absolute value of the eigenvector associated with the largest eigenvalue. Note that for to be irreducible, there can be no winless teams.

The steps for deriving our rankings are outlined below:

  1. Construct the adjacency matrix , in which entry represents the number of times team defeated team .

  2. Perform the eigendecomposition of , factoring the matrix into its eigenvalues and eigenvectors.

  3. Identify the index of the largest eigenvalue.

  4. Extract the eigenvector at the index identified in step 3. If using Numpy and the maximum eigenvalue is found at index , the corresponding eigenvector will be located in column .

  5. Take the absolute value of . The value at index in represents the score for the team at the same index (for the Big 12 example, index 0 = Baylor, index 1 = BYU, …).

  6. Sort the eigenvector scores in decreasing order; higher performing teams will have a larger value, poorer performing teams will have a smaller value.

Keep in mind that using a binary encoding scheme in a football setting, where each team may only compete once per season, overlooks information that could enhance the encoding. As it stands, a victory by 80 points for team A over team B is treated the same as a victory in triple overtime. In sports where teams face each other multiple times in a season, serves as a better indicator of the relative strength between the two teams. We’ll explore regular season Major League Baseball results later.

The next cell demonstrates how to implement the ranking procedure using Numpy.

from numpy.linalg import eig

# Adjacency matrix as Numpy array.
A = adj_big12.drop("Unnamed: 0", axis=1).values.astype(float)

# Perform eigendecomposition of A. 
e_vals, e_vecs = eig(A)

# Identify index of largest eigenvalue. 
e_val1_indx = np.argmax(e_vals)

# Extract real part of eigenvector at index e_val1_indx. 
e_vec1 = np.abs(e_vecs[:, e_val1_indx])

# Get indices associated with each team.
indices = np.argsort(e_vec1)[::-1]

# Associate ranks with teams. 
teams = adj_big12.columns[1:]
ranked_teams = teams[indices]

for team in ranked_teams:
    print(team)
Texas
Oklahoma State
Oklahoma
Kansas
Iowa State
Kansas State
Texas Tech
West Virginia
UCF
Houston
BYU
TCU
Baylor
Cincinnati

We can compare actual vs. predicted rankings to see how well the direct method performed:

for jj, team in enumerate(ranked_teams):
    actual_rank = ranks_big12[ranks_big12.team==team].index.item()
    print(f"{team}: actual/predicted : {actual_rank}/{jj}")
Texas: actual/predicted : 0/0
Oklahoma State: actual/predicted : 1/1
Oklahoma: actual/predicted : 2/2
Kansas: actual/predicted : 7/3
Iowa State: actual/predicted : 3/4
Kansas State: actual/predicted : 4/5
Texas Tech: actual/predicted : 6/6
West Virginia: actual/predicted : 5/7
UCF: actual/predicted : 8/8
Houston: actual/predicted : 10/9
BYU: actual/predicted : 11/10
TCU: actual/predicted : 9/11
Baylor: actual/predicted : 12/12
Cincinnati: actual/predicted : 13/13

There are a few discrepancies, but the ranks are largely consistent. An interesting discrepancy is Kansas, having an actual rank of 7 vs. a predicted rank of 3. It’s difficult to say why Kansas is given such a high rank, but it may have to do with strength of schedule.

For the bottom five teams, the direct method does a good job. Three of the five teams have the same in-conference winning percentage, therefore slight out-of-orderings aren’t of concern.

Next let’s look at a more substantial example: All games for the 2021 MLB regular season.

MLB Example

A Major League Baseball dataset with game results from 2016-2021 is available on Kaggle. The games.csv dataset has information about each contest that can be used to build an adjacency matrix. We load the dataset and inspect the first few records:

import numpy as np
import pandas as pd

np.set_printoptions(suppress=True, precision=5)
pd.options.mode.chained_assignment = None
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

df = pd.read_csv("C:/Users/jtriv/datasets/MLB/games.csv")

print(f"df.shape: {df.shape}")

df.head(3)
df.shape: (13439, 43)
Gameawayaway-recordawayaway-recordhomehome-recordhomehome-recordaway-scorehome-scorepostseason infoWalks Issued – AwayWalks Issued – HomeStolen Bases – AwayStolen Bases – HomeStrikeouts Thrown – AwayStrikeouts Thrown – HomeTotal Bases – AwayTotal Bases – HomeStadiumDateLocationOddsO/UAttendanceCapacityDurationUmpiresWIN – Pitcher – StatsWIN – Pitcher – IdWIN – Pitcher – NameWIN – Pitcher – AbbrNameWIN – Pitcher – RecordLOSS – Pitcher – StatsLOSS – Pitcher – IdLOSS – Pitcher – NameLOSS – Pitcher – AbbrNameLOSS – Pitcher – RecordSAVE – Pitcher – StatsSAVE – Pitcher – IdSAVE – Pitcher – NameSAVE – Pitcher – AbbrNameSAVE – Pitcher – RecordExtra Innings
0360403123STL0-10-1 AwayPITJan-001-0 Home1.04.0NaN5.05.00.00.05.014.05.013.0\n\n\n\n\n\t\t\t\t\t\t\tPNC Park\n\t\t\t\t\t\t…2016-04-03T17:00ZPittsburgh, Pennsylvania\n\t\t\t\t\t\t\t\n\t\t…NaNNaN39,50038,3623:02Home Plate Umpire – Jerry Layne, First Base Um…6.0 IP, 0 ER, 10 K, 5 BB6211.0Francisco LirianoF. Liriano(1-0)6.0 IP, 3 ER, 3 K, 3 BB5403.0Adam WainwrightA. Wainwright(0-1)NaNNaNNaNNaNNaNNaN
1360403130TORJan-001-0 AwayTB0-10-1 Home5.03.0NaN1.03.00.00.07.016.011.011.0\n\n\n\n\n\t\t\t\t\t\t\tTropicana Field\n\t\t\…2016-04-03T20:00ZSt. Petersburg, Florida\n\t\t\t\t\t\t\t\n\t\t\…NaNNaN31,04231,0422:51Home Plate Umpire – Mike Everitt, First Base U…8.0 IP, 3 ER, 5 K, 1 BB32815.0Marcus StromanM. Stroman(1-0)5.0 IP, 2 ER, 12 K, 3 BB31003.0Chris ArcherC. Archer(0-1)1.0 IP, 0 ER, 2 K, 0 BB32693.0Roberto OsunaR. Osuna-1.0NaN
2360403107NYM0-10-1 AwayKCJan-001-0 Home3.04.0NaN2.06.00.01.03.09.08.09.0\n\n\n\n\n\t\t\t\t\t\t\tKauffman Stadium\n\t\t…2016-04-04T00:30ZKansas City, Missouri\n\t\t\t\t\t\t\t\n\t\t\t\…NaNNaN40,03037,9033:13Home Plate Umpire – Gerry Davis, First Base Um…6.0 IP, 0 ER, 5 K, 3 BB6401.0Edinson VolquezE. Volquez(1-0)5.2 IP, 3 ER, 2 K, 2 BB31214.0Matt HarveyM. Harvey(0-1)1.0 IP, 0 ER, 2 K, 1 BB28957.0Wade DavisW. Davis-1.0NaN

It is first necessary to filter down to 2021 regular season games. If “postseason info” column is null, we assume the game is a regular season matchup. The “Date” column is used to extract the year.

In order to create the adjacency matrix only “away”, “home”, “away-score” and “home-score” need be retained. All other columns are removed:

df["yyyy"] = pd.to_datetime(df["Date"]).dt.year

df21 = (
    df[(pd.isnull(df["postseason info"])) & (df["yyyy"]==2021)]
    .dropna(subset=["away", "home", "away-score", "home-score"])
    .rename({"away-score": "away_score", "home-score": "home_score"}, axis=1)
    [["away", "home", "away_score", "home_score"]]
    .reset_index(drop=True)
)

print(f"df21.shape: {df21.shape}")

df21.head(15)
df21.shape: (2310, 4)
awayhomeaway_scorehome_score
0TORNYY3.02.0
1CLEDET2.03.0
2MINMIL5.06.0
3PITCHC5.03.0
4ATLPHI2.03.0
5ARISD7.08.0
6LADCOL5.08.0
7STLCIN11.06.0
8TBMIA1.00.0
9TEXKC10.014.0
10CHWLAA3.04.0
11HOUOAK8.01.0
12SFSEA7.08.0
13BALBOS3.00.0
14TBMIA6.04.0

All 30 MLB teams are represented in the “home” and away” columns. An empty DataFrame is created with columns and rows indexed using the 30 teams ordered alphabetically.

# Create empty DataFrame with rows and columns indexed by the 30 MLB teams.
mlb_teams = sorted(df21["home"].unique().tolist())
dfadj = pd.DataFrame(index=mlb_teams, columns=mlb_teams)
dfadj.loc[:,:] = 0

dfadj
ARIATLBALBOSCHCCHWCINCLECOLDETHOUKCLAALADMIAMILMINNYMNYYOAKPHIPITSDSEASFSTLTBTEXTORWSH
ARI000000000000000000000000000000
ATL000000000000000000000000000000
BAL000000000000000000000000000000
BOS000000000000000000000000000000
CHC000000000000000000000000000000
CHW000000000000000000000000000000
CIN000000000000000000000000000000
CLE000000000000000000000000000000
COL000000000000000000000000000000
DET000000000000000000000000000000
HOU000000000000000000000000000000
KC000000000000000000000000000000
LAA000000000000000000000000000000
LAD000000000000000000000000000000
MIA000000000000000000000000000000
MIL000000000000000000000000000000
MIN000000000000000000000000000000
NYM000000000000000000000000000000
NYY000000000000000000000000000000
OAK000000000000000000000000000000
PHI000000000000000000000000000000
PIT000000000000000000000000000000
SD000000000000000000000000000000
SEA000000000000000000000000000000
SF000000000000000000000000000000
STL000000000000000000000000000000
TB000000000000000000000000000000
TEX000000000000000000000000000000
TOR000000000000000000000000000000
WSH000000000000000000000000000000

We iterate over df21, updating values in dfadj according to the following rules:

  • If team in row defeated team in column , is incremented by 1.
  • If team in row lost to team in column , is incremented by 1.
  • If the contest resulted in a tie, and are incremented by , but there are no ties in df21.

A dictionary dresults tracking wins and losses for each team is also created, in order to use regular season winning percentage as a proxy to compare against our direct method rankings.

dresults = {kk: {"wins": 0, "losses": 0} for kk in dfadj.columns}

for rr in df21.itertuples(index=False):

    ii, jj, ii_score, jj_score = rr.away, rr.home, rr.away_score, rr.home_score

    if ii_score == jj_score:
        dfadj.at[ii, jj]+=.5
        dfadj.at[jj, ii]+=.5

    elif ii_score > jj_score:
        dfadj.at[ii, jj]+=1
        dresults[ii]["wins"]+=1
        dresults[jj]["losses"]+=1

    else:
        dfadj.at[jj, ii]+=1
        dresults[jj]["wins"]+=1
        dresults[ii]["losses"]+=1

dfadj
ARIATLBALBOSCHCCHWCINCLECOLDETHOUKCLAALADMIAMILMINNYMNYYOAKPHIPITSDSEASFSTLTBTEXTORWSH
ARI010020609010032101004482210103
ATL403150402000021130910104403510012
BAL000600020234202021831002001453
BOS03130030403253020341033004008283
CHC4200018122000214240021450190004
CHW0052503909282001120040303023540
CIN13001110351020359230031210190005
CLE00523720012113500070320103021410
COL10400202000201642010154102330404
DET005317250048100360310105034630
HOU2035050621031320030211002111041440
KC00323913011402004100220304012220
LAA3043050126640300404400281011130
LAD164003030130203034060246123940207
MIA582050202000040309009230301008
MIL630013210351000330230021350480003
MIN00422521001049100400110102013430
NYM573030302000019100306240110028
NYY0311705040344203051042005008672
OAK40330304268515100303000242041020
PHI3923502020000295092004402400112
PIT230051620301005423003030470002
SD1100010607040274203022403830305
SEA401304042183111004021500002061340
SF1730060601402031043050443111020304
STL60008182410503611240031230400002
TB0218110304032460503311340010003111
TEX303401022154810030190006104020
TOR06127030403232040411052002008301
WSH4530303020000011106105430143030

Convert dresults into a DataFrame and add win_pct column:

dfresults = (
    pd.DataFrame().from_dict(dresults, orient="index")
    .reset_index(drop=False, names="team")
)

# Compute winning percentage. 
dfresults["win_pct"] = dfresults["wins"] / (dfresults["wins"] + dfresults["losses"])

# Sort values by win_pct. 
dfresults = (
    dfresults.sort_values("win_pct", ascending=False)
    .reset_index(drop=True)
)

# Add win_pct rank column.
dfresults["rank0"] = dfresults.index + 1

dfresults.head(30)
teamwinslosseswin_pctrank0
0SF105530.6645571
1LAD106540.6625002
2TB98620.6125003
3HOU94660.5875004
4MIL89650.5779225
5BOS86640.5733336
6NYY86670.5620927
7ATL84660.5600008
8TOR82650.5578239
9CHW82660.55405410
10SEA88710.55345911
11STL85690.55194812
12OAK84760.52500013
13PHI80740.51948114
14CLE74730.50340115
15CIN80790.50314516
16SD77820.48427717
17LAA75820.47770718
18DET71790.47333319
19COL70790.46979920
20MIN69790.46621621
21KC70850.45161322
22NYM61760.44525523
23CHC68860.44155824
24MIA66930.41509425
25WSH62890.41059626
26TEX601000.37500027
27PIT56970.36601328
28BAL511030.33116929
29ARI511100.31677030

rank0 will be used to compare our results against. Let’s visualize the regular season matchup network:

import networkx as nx
import matplotlib.pyplot as plt

# Create adjacency matrix as Numpy array. 
A = dfadj.values.astype(float)
G = nx.from_numpy_array(A)

fig, ax = plt.subplots(1, 1, figsize=(8.5, 6), tight_layout=True)

ax.set_title(
    "2021 MLB Regular Season Matchups", 
    color="#000000", loc="center", weight="normal", fontsize=9
)
nx.draw_networkx(
    G, node_color="#32cd32", node_size=200, ax=ax, with_labels=True, 
    edge_color="blue", width=.15, pos=nx.spring_layout(G, seed=516)
)

Next we perform the same steps carried out for the Big 12 analysis:

from numpy.linalg import eig


# Adjacency matrix as Numpy array.
A = dfadj.values.astype(float)

# Perform eigendecomposition of A. 
e_vals, e_vecs = eig(A)

# Identify index of largest eigenvalue. 
e_val1_indx = np.argmax(e_vals)

# Extract real part of eigenvector at index e_val1_indx. 
e_vec1 = np.abs(e_vecs[:, e_val1_indx])

# Get indices associated with each team.
indices = np.argsort(e_vec1)[::-1]

# Associate ranks with teams. 
teams = dfadj.columns
ranked_teams = teams[indices]

for team in ranked_teams:
    print(team)
HOU
TB
LAD
SF
SEA
NYY
OAK
BOS
TOR
CHW
MIL
LAA
STL
SD
DET
CLE
ATL
KC
CIN
MIN
PHI
COL
TEX
CHC
MIA
WSH
BAL
NYM
PIT
ARI

Again comparing actual vs. predicted ranks, using regular season winning percentage as a proxy for actual rank:

for jj, team in enumerate(ranked_teams, start=1):
    actual_rank = dfresults[dfresults.team==team]["rank0"].item()
    print(f"{team}: actual/predicted : {actual_rank}/{jj}")
HOU: actual/predicted : 4/1
TB: actual/predicted : 3/2
LAD: actual/predicted : 2/3
SF: actual/predicted : 1/4
SEA: actual/predicted : 11/5
NYY: actual/predicted : 7/6
OAK: actual/predicted : 13/7
BOS: actual/predicted : 6/8
TOR: actual/predicted : 9/9
CHW: actual/predicted : 10/10
MIL: actual/predicted : 5/11
LAA: actual/predicted : 18/12
STL: actual/predicted : 12/13
SD: actual/predicted : 17/14
DET: actual/predicted : 19/15
CLE: actual/predicted : 15/16
ATL: actual/predicted : 8/17
KC: actual/predicted : 22/18
CIN: actual/predicted : 16/19
MIN: actual/predicted : 21/20
PHI: actual/predicted : 14/21
COL: actual/predicted : 20/22
TEX: actual/predicted : 27/23
CHC: actual/predicted : 24/24
MIA: actual/predicted : 25/25
WSH: actual/predicted : 26/26
BAL: actual/predicted : 29/27
NYM: actual/predicted : 23/28
PIT: actual/predicted : 28/29
ARI: actual/predicted : 30/30

The Houston Astros are considered the best team based on the direct method, which is encouraging as they ultimately reached the 2021 World Series. One of the biggest discrepancies is with the Atlanta Braves, who were 8th in terms of regular season winning percentage but 17th in terms of the direct method. They went on to win the 2021 World Series. Nonetheless, the modeled ranking are reasonable, and it is clear that the direct might be able to provide further insight into how teams rank looking beyond winning percentage.

To leave a comment for the author, please follow the link and comment on their blog: The Pleasure of Finding Things Out: A blog by James Triveri .

Want to share your content on python-bloggers? click here.