March Madness |
![]() |
Probability Weighted ScoringBack to Tournament HomepageBack to the standings |
|
There are three metrics provided for measuring an entry's performance throughout the tournament.
Let us assume for now that we can easily calculate all of the PA(B) for all combinations of A and B. PAi is then calculated as
PAi = PA(i-1) × S (PA(B) × PB(i-1))
summed for all possible opponents B that team A can face in round i. The following bracket segment illustrates the application of the calculations.

This procedure is continued iteratively over all unplayed games in the tournament. As results come in, PWS will assign probabilities of one to correct picks and zero to incorrect picks and adjust the calculations accordingly. For instance, if Team A beats Team B in the above example, the new calculations would look as follows.

The methodology above assumed that we could easily calculate the probability of team A winning a game against opponent team B for all combinations of A and B. To do this, I make the assumption that this probability is a function of the ratio of the two teams' seeds. Specifically, if a 1 seed is playing a 4 seed, I assume the 1 seed is four times more likely to win than the 4 seed. The following equation shows the mathematics used to calculate the probabilities.
PA(B) = SB / (SA + SB)
Are there issues with this? Yes. Are there other alternatives? Yes. Have I done the research to investigate these altenatives? No. This one seems to work fairly well (as shown below) and gives us a reasonable approximation, which is all we ask for. PWS is not meant to be a predictive measure. It is meant as a comparative tool of relative strength. For that, this works fine.
Due to the lack of empirical data, backtesting the model is difficult. There are 17 years of tournament history available since the format expanded to the 64 team field. The best cases (first round match-ups) provide 68 data points ... scarcely enough. Later round matchups have fewer and fewer data points since they occur less frequently. Nonetheless, we proceed.
The model is tested by plotting, for each seed pairing, the model prediction against the empirical results. We then perform a linear regression, forcing the intercept to zero, and examine two key figures.
As mentioned, many possible seed pairings occur infrequently. A restriction must be as to how many sample we require to provide a valid data point. This "cutoff" is initially set at 10 samples and then progressively increased. The trade off is a reduction of the total number of available points for the sake of increasingly more meaningful data. The following table summarizes the results.
| Cutoff | Number of data points |
b | R2 |
| 10 samples | 24 | 1.0322 | 0.0745 |
| 20 samples | 19 | 1.0244 | 0.4351 |
| 30 samples | 15 | 1.0177 | 0.5916 |
| 35 samples | 12 | 0.9869 | 0.7148 |
The values for b are excellent and move progressively closer to one. For the 10 sample cutoff case, the R2 is horrible. It then dramatically improves as the cutoff is increased without an alarming loss in the number of data points. This would seem to indicate that the 10 sample case either included unusally extreme outliers or is simply not a stringent enough cutoff. We must also weigh the fact that as we increase the cutoff, we lose more of the closer seed pairings which are likely more difficult to model and more significant in the ultimate PWS calculation. However, even in the 30 sample cutoff case, which boasts reasonable results, the 1-2, 1-4, 3-6, 4-5, 8-9 and 7-10 pairings are still included.