Jump to content

Probability (ish) of Placement by Seed for the NCAA Tournament


Recommended Posts

I put together a table of the probability of placement by seed using placement data from 2010 - 2023.

After getting the raw numbers, I fitted in two directions to come up with these approximations. It could probably benefit from a couple more iterations of fits, but I grew tired, and this is good enough for government work.

image.png.25ed3de753e54fb964124a70c96ce160.png

How to read:

  • The left column is the seed
  • The second through twelfth columns are the exact placement. 9 represents the blood round losers, 9-12. 13 represents the prior round losers, 13-16, etc.
  • The percentages represent the probability of the exact placement. For example, a #2 seed has a 23% of winning and a 32.3% of finishing second.
  • The probabilities are additive left to right, but not top to bottom. For example, if you want to know the probability that a #1 seed makes the final, simply sum the second and third columns (52.3% + 22.5% + 74.8%).
  • None of this is all that precise even though I fake precision by giving you tenths.
  • But, again, good enough for government work.
  • Fire 4
Link to comment
Share on other sites

Very very cool.  Thanks for posting.  Can you describe more what you did to ‘fit’ the data?  Not sure what that means.  Did it somehow modify the data to give smoother probability distributions?  

Edited by Dark Energy
Link to comment
Share on other sites

3 hours ago, Dark Energy said:

Very very cool.  Thanks for posting.  Can you describe more what you did to ‘fit’ the data?  Not sure what that means.  Did it somehow modify the data to give smoother probability distributions?  

  1. I created the raw data matrix first. By definition every row/seed sums to 100%. And there are patterns for each seed, but they can be a bit noisy. We are also not dealing with a huge amount of data here. Seeding to 33 is only 4 years old, for example, so seeds below 16 are sparse.
  2. So I used a polynomial regression with five degrees to fit a line through the data. I chose polynomial because it is a classification problem and there are essentially five classes to fit wrestlers into (AA, 9-12, 13-16, 17-24, 25-33). 
  3. Using the resulting equation leads to a horizontal sum that is never 100%, so I had to refactor to force a 100% sum constraint.
  4. Then I did the polynomial fit vertically as there are distinct patterns there too, with some noise. There is no sum constraint vertically, but it messes with the horizontal sums.
  5. So, then I re-ran the horizontal fit with the 100% sum constraint.

You can rinse and repeat the process as many times as you like to make small tweaks and smooth out the resulting lines, but there is not a lot of incremental improvement. At the end of the day, you are taking a data set that is limited (but not too limited) and using it to estimate what will happen as more years pass and more data is collected to fill in the gaps.

That is all built on an assumption that the seeds are generally accurate (i.e. have a stable average result) with a mathematical distribution around the average. I think this is true, but homers tend to think their team is special and different and this is surely the year.

  • Fire 2
  • Haha 1
Link to comment
Share on other sites

12 minutes ago, lightweight said:

Nicely done.

So, if the one seed is the eventual champion 52.3% of the time (close to a coin flip), getting any kind of favorable odds to choose the field would seem to be a good bet.  

Kinda depends on if the field contains a PSU wrestler.

As a #1 seed, PSU wins 82% of the time. As a #2 seed it is 47%. And as a #3 seed it is 38%. 

So don't take the field if PSU is #1.

Definitely take the field if they are #2 or #3.

Everything else is a close to a push, so it depends on the odds you can get.

Link to comment
Share on other sites

12 minutes ago, Wrestleknownothing said:

Kinda depends on if the field contains a PSU wrestler.

As a #1 seed, PSU wins 82% of the time. As a #2 seed it is 47%. And as a #3 seed it is 38%. 

So don't take the field if PSU is #1.

Definitely take the field if they are #2 or #3.

Everything else is a close to a push, so it depends on the odds you can get.

Oh, now you're going all multi-variate on me...   🙂

Thanks for the additional insight.  

Link to comment
Share on other sites

It is interesting that the probability distribution for the #1 and #8 seeds have the least uncertainty (most divergent from a uniform distribution) while the probability distributions for the #3 and #4 seeds have the most uncertainty (least divergent from a uniform distribution).  So, on average, I'd say the seeding committee gets things pretty much right over the long-haul: #1 seeds are most likely to place #1 while #8 seeds are most likely to not AA, and its more or less anything goes with the other seeds, especially #3 and #4.

  • Fire 1
Link to comment
Share on other sites

16 minutes ago, Wrestleknownothing said:

I will give you some hints. Both guys sustained injuries and medical forfeited out after losing on the top side. One was recent, one was not.

Now you are trying to confuse me.  Alex Tirapelli was one one but no injury, no mf.

Edited by ionel
Link to comment
Share on other sites

Just now, Wrestleknownothing said:

Tirapelle was 2005. Remember, this data only goes back to 2010.

So now who are your guesses?

What data?  We can't use long term memory on these contests?  😉

Link to comment
Share on other sites

1 hour ago, Interviewed_at_Weehawken said:

OK, smaller sample size, but I would imagine the data might be quite a bit different.

This is what before and after looks like:

image.png.68692a09803a5a3a866c07a0117e9e28.png

Take the lower left green box in the left picture (raw data). It represents 1 wrestler at #25 who AA'd in 1 of the 4 years that ranks went to 33 (2.5%). Does that mean that every four years we should expect to see that? Or is it more likely that we will see something similar every ten years or so (1%), or more (<1%)? If the later then the expectation would be that as time passes, we are more likely to see one #24, or one #26 achieve AA than another #25. The fitting process assumes the latter and accounts for this by effectively treating AA as a category rather than eight separate categories.

The other thing it achieves is to fill in more obvious discontinuities. For example, no #7 seed won between 2010 and 2023. Does that mean a #7 seed has no chance to win? Not likely given that winners have come from #8, #9, #11, and #13 seeds in that time.

And no #1 seed busted out in the blood round in my sample, but two busted out before in the round prior to the the blood round. So a 13-16 exit for a #1 seed is certainly possible, and on the right it is filled in.

Edited by Wrestleknownothing
  • Fire 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Latest Rankings

  • College Commitments

    Lucas Galdine

    Wauconda, Illinois
    Class of 2024
    Committed to Davidson
    Projected Weight: 125

    Luca Stefanelli

    Delbarton, New Jersey
    Class of 2024
    Committed to Cornell
    Projected Weight: 141, 149

    Ryder Yoshitake

    San Marino, California
    Class of 2024
    Committed to Cornell
    Projected Weight: 149

    Caleb Scott

    Civic Memorial, Illinois
    Class of 2024
    Committed to Cleveland State
    Projected Weight: 133

    Peyton Costa

    Granville, Ohio
    Class of 2024
    Committed to Cleveland State
    Projected Weight: 125
×
×
  • Create New...