I finally read Nate Silver’s The Signal and the Noise. At the time of its release in 2012, it was a rather unique book. It discussed statistical modeling, Bayes theorem, and the art and science of predictions in a way that the general public could follow and understand. A book ahead of its time and it has held up nicely when I read it in 2019.

One of the things the author talks about in the book are

When evaluating models we run into mentions of accuracy, f1 score, or confusion matrixes. Calibration is not something I see too often and it turns out it’s a pretty good view into how your model is performing.

In general terms, calibration is a comparison of the confidence of your model with the actual results. If a model is 90% confident in the prediction, what’s the percentage it is actually correct? Does it have “blind spots” where it is a model is overconfident consistently? Calibration plot can help you spot such trends.

The method to calculate it is pretty straightforward. Here is a snippet of code that illustrates the approach:

```
predictions = model.predict(X)
probabilities = model.predict_proba(X)
calibration_map:Dict = {}
for idx,val in enumerate(predictions):
true_outcome = y[idx]
predicted_outcome = predictions[idx]
confidence = float(max(probabilities[idx]))
calibration_key = int(confidence * 100)
# use 5% increments for calibration values (50, 55, 60, etc)
calibration_key = calibration_key - (calibration_key%5)
if calibration_key not in calibration_map:
calibration_map[calibration_key] = (0, 0)
wins_losses = calibration_map[calibration_key]
if predicted_outcome == true_outcome:
wins_losses = (wins_losses[0] + 1, wins_losses[1])
else:
wins_losses = (wins_losses[0], wins_losses[1] + 1)
calibration_map[calibration_key] = wins_losses
with open("calibration.csv", "w", newline='') as o:
writer = csv.writer(o)
writer.writerow(["index","real","model","number_of_games"])
for pct in calibration_map:
wins_losses = calibration_map[pct]
number_of_games = wins_losses[0] + wins_losses[1]
true_pct = wins_losses[0] / number_of_games
true_pct = int(true_pct * 100)
# don't bother with small sample size
if number_of_games > 20:
writer.writerow([pct,pct,true_pct,number_of_games])
```

What we are doing above is running through model predictions. For each prediction, round down to the nearest 5% interval and note the outcome of his prediction. Tally # of correct vs incorrect and you have the accuracy % for each interval. I output this into a CSV to later render with pandas:

```
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
df = pd.read_csv("calibration.csv", index_col="index")
df.sort_index(inplace=True)
df.loc[:,["predicted","actual"]].plot(figsize=(15,10))
```

Once you run this, you should see something like this:

This is what I see when I do a calibration plot for my NBA model for 2018-2019 games. You can see how the actual values are pretty close to what it thought it should be with