Probability (ish) of Placement by Seed for the NCAA Tournament

Wrestleknownothing · February 28

I put together a table of the probability of placement by seed using placement data from 2010 - 2023.

After getting the raw numbers, I fitted in two directions to come up with these approximations. It could probably benefit from a couple more iterations of fits, but I grew tired, and this is good enough for government work.

How to read:

The left column is the seed
The second through twelfth columns are the exact placement. 9 represents the blood round losers, 9-12. 13 represents the prior round losers, 13-16, etc.
The percentages represent the probability of the exact placement. For example, a #2 seed has a 23% of winning and a 32.3% of finishing second.
The probabilities are additive left to right, but not top to bottom. For example, if you want to know the probability that a #1 seed makes the final, simply sum the second and third columns (52.3% + 22.5% + 74.8%).
None of this is all that precise even though I fake precision by giving you tenths.
But, again, good enough for government work.

Wrestleknownothing · February 28

@jajensen09 on the right

lu_alum · February 28

Just because I'm too lazy to type all of the data... can you build on your work and show probability of making AA down the left? Example using your #1 seed data:

Edited February 28 by lu_alum

Wrestleknownothing · February 28

5 minutes ago, lu_alum said:

Just because I'm too lazy to type all of data... can you build on your work and show likelihood of making AA across the top? Example using your #1 seed data:

Dark Energy · February 28

Very very cool. Thanks for posting. Can you describe more what you did to ‘fit’ the data? Not sure what that means. Did it somehow modify the data to give smoother probability distributions?

Edited February 28 by Dark Energy

Wrestleknownothing · February 28

3 hours ago, Dark Energy said:

Very very cool. Thanks for posting. Can you describe more what you did to ‘fit’ the data? Not sure what that means. Did it somehow modify the data to give smoother probability distributions?

I created the raw data matrix first. By definition every row/seed sums to 100%. And there are patterns for each seed, but they can be a bit noisy. We are also not dealing with a huge amount of data here. Seeding to 33 is only 4 years old, for example, so seeds below 16 are sparse.
So I used a polynomial regression with five degrees to fit a line through the data. I chose polynomial because it is a classification problem and there are essentially five classes to fit wrestlers into (AA, 9-12, 13-16, 17-24, 25-33).
Using the resulting equation leads to a horizontal sum that is never 100%, so I had to refactor to force a 100% sum constraint.
Then I did the polynomial fit vertically as there are distinct patterns there too, with some noise. There is no sum constraint vertically, but it messes with the horizontal sums.
So, then I re-ran the horizontal fit with the 100% sum constraint.

You can rinse and repeat the process as many times as you like to make small tweaks and smooth out the resulting lines, but there is not a lot of incremental improvement. At the end of the day, you are taking a data set that is limited (but not too limited) and using it to estimate what will happen as more years pass and more data is collected to fill in the gaps.

That is all built on an assumption that the seeds are generally accurate (i.e. have a stable average result) with a mathematical distribution around the average. I think this is true, but homers tend to think their team is special and different and this is surely the year.

lightweight · February 28

Nicely done.

So, if the one seed is the eventual champion 52.3% of the time (close to a coin flip), getting any kind of favorable odds to choose the field would seem to be a good bet.

Wrestleknownothing · February 28

12 minutes ago, lightweight said:

Nicely done.

So, if the one seed is the eventual champion 52.3% of the time (close to a coin flip), getting any kind of favorable odds to choose the field would seem to be a good bet.

Kinda depends on if the field contains a PSU wrestler.

As a #1 seed, PSU wins 82% of the time. As a #2 seed it is 47%. And as a #3 seed it is 38%.

So don't take the field if PSU is #1.

Definitely take the field if they are #2 or #3.

Everything else is a close to a push, so it depends on the odds you can get.

lightweight · February 28

12 minutes ago, Wrestleknownothing said:

Kinda depends on if the field contains a PSU wrestler.

As a #1 seed, PSU wins 82% of the time. As a #2 seed it is 47%. And as a #3 seed it is 38%.

So don't take the field if PSU is #1.

Definitely take the field if they are #2 or #3.

Everything else is a close to a push, so it depends on the odds you can get.

Oh, now you're going all multi-variate on me...

Thanks for the additional insight.

Nittany · February 28

It is interesting that the probability distribution for the #1 and #8 seeds have the least uncertainty (most divergent from a uniform distribution) while the probability distributions for the #3 and #4 seeds have the most uncertainty (least divergent from a uniform distribution). So, on average, I'd say the seeding committee gets things pretty much right over the long-haul: #1 seeds are most likely to place #1 while #8 seeds are most likely to not AA, and its more or less anything goes with the other seeds, especially #3 and #4.

Pin Head · February 29

In 14 years with 10 # 1 seeds each , which 2 or 3 #1's out of 140 did not AA ?

Wrestleknownothing · February 29

30 minutes ago, Pin Head said:

In 14 years with 10 # 1 seeds each , which 2 or 3 #1's out of 140 did not AA ?

I will give you some hints. Both guys sustained injuries and medical forfeited out after losing on the top side. One was recent, one was not.

ionel · February 29

16 minutes ago, Wrestleknownothing said:

I will give you some hints. Both guys sustained injuries and medical forfeited out after losing on the top side. One was recent, one was not.

Now you are trying to confuse me. Alex Tirapelli was one one but no injury, no mf.

Edited February 29 by ionel

Wrestleknownothing · February 29

4 minutes ago, ionel said:

Now you are trying to confuse me. Alex Tirapelli was one one but no injury, no mf.

Tirapelle was 2005. Remember, this data only goes back to 2010.

So now who are your guesses?

ionel · February 29

Just now, Wrestleknownothing said:

Tirapelle was 2005. Remember, this data only goes back to 2010.

So now who are your guesses?

What data? We can't use long term memory on these contests?

Gantry · February 29

Alex Marinelli was one

Wrestleknownothing · February 29

5 minutes ago, Gantry said:

Alex Marinelli was one

Very good

flyingcement · February 29

2011 Darrion Caldwell

Wrestleknownothing · February 29

8 minutes ago, flyingcement said:

2011 Darrion Caldwell

And that is number two. Well done.

Interviewed_at_Weehawken · February 29

As always, thanks for this work.

What year did they start seeding every wrestler in the bracket? Was it 2010?

If not, it would be super interesting to see those numbers.

Wrestleknownothing · February 29

10 minutes ago, Interviewed_at_Weehawken said:

As always, thanks for this work.

What year did they start seeding every wrestler in the bracket? Was it 2010?

If not, it would be super interesting to see those numbers.

2019. It has only been four tournaments.

Interviewed_at_Weehawken · February 29

5 minutes ago, Wrestleknownothing said:

2019. It has only been four tournaments.

OK, smaller sample size, but I would imagine the data might be quite a bit different.

Wrestleknownothing · February 29

1 hour ago, Interviewed_at_Weehawken said:

OK, smaller sample size, but I would imagine the data might be quite a bit different.

This is what before and after looks like:

Take the lower left green box in the left picture (raw data). It represents 1 wrestler at #25 who AA'd in 1 of the 4 years that ranks went to 33 (2.5%). Does that mean that every four years we should expect to see that? Or is it more likely that we will see something similar every ten years or so (1%), or more (<1%)? If the later then the expectation would be that as time passes, we are more likely to see one #24, or one #26 achieve AA than another #25. The fitting process assumes the latter and accounts for this by effectively treating AA as a category rather than eight separate categories.

The other thing it achieves is to fill in more obvious discontinuities. For example, no #7 seed won between 2010 and 2023. Does that mean a #7 seed has no chance to win? Not likely given that winners have come from #8, #9, #11, and #13 seeds in that time.

And no #1 seed busted out in the blood round in my sample, but two busted out before in the round prior to the the blood round. So a 13-16 exit for a #1 seed is certainly possible, and on the right it is filled in.

Edited February 29 by Wrestleknownothing

ionel · February 29

25 minutes ago, Wrestleknownothing said:

This is what before and after looks like:

Probability (ish) of Placement by Seed for the NCAA Tournament

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Latest Rankings

College Commitments