Jump to content

Recommended Posts

Posted

I put together a table of the probability of placement by seed using placement data from 2010 - 2023.

After getting the raw numbers, I fitted in two directions to come up with these approximations. It could probably benefit from a couple more iterations of fits, but I grew tired, and this is good enough for government work.

image.png.25ed3de753e54fb964124a70c96ce160.png

How to read:

  • The left column is the seed
  • The second through twelfth columns are the exact placement. 9 represents the blood round losers, 9-12. 13 represents the prior round losers, 13-16, etc.
  • The percentages represent the probability of the exact placement. For example, a #2 seed has a 23% of winning and a 32.3% of finishing second.
  • The probabilities are additive left to right, but not top to bottom. For example, if you want to know the probability that a #1 seed makes the final, simply sum the second and third columns (52.3% + 22.5% + 74.8%).
  • None of this is all that precise even though I fake precision by giving you tenths.
  • But, again, good enough for government work.
  • Fire 4

Drowning in data, but thirsting for knowledge

Posted (edited)

Just because I'm too lazy to type all of the data... can you build on your work and show probability of making AA down the left?  Example using your #1 seed data:

image.png.bb32c9b5acbd83e0da560b78ef1edbe7.png

Edited by lu_alum
Posted
5 minutes ago, lu_alum said:

Just because I'm too lazy to type all of data... can you build on your work and show likelihood of making AA across the top?  Example using your #1 seed data:

image.png.e08c493bf0da08b09d2314c556affa45.png

image.png.26024bdfe3760a8dfd4b6449e9be0f92.png

  • Fire 6

Drowning in data, but thirsting for knowledge

Posted (edited)

Very very cool.  Thanks for posting.  Can you describe more what you did to ‘fit’ the data?  Not sure what that means.  Did it somehow modify the data to give smoother probability distributions?  

Edited by Dark Energy
Posted
3 hours ago, Dark Energy said:

Very very cool.  Thanks for posting.  Can you describe more what you did to ‘fit’ the data?  Not sure what that means.  Did it somehow modify the data to give smoother probability distributions?  

  1. I created the raw data matrix first. By definition every row/seed sums to 100%. And there are patterns for each seed, but they can be a bit noisy. We are also not dealing with a huge amount of data here. Seeding to 33 is only 4 years old, for example, so seeds below 16 are sparse.
  2. So I used a polynomial regression with five degrees to fit a line through the data. I chose polynomial because it is a classification problem and there are essentially five classes to fit wrestlers into (AA, 9-12, 13-16, 17-24, 25-33). 
  3. Using the resulting equation leads to a horizontal sum that is never 100%, so I had to refactor to force a 100% sum constraint.
  4. Then I did the polynomial fit vertically as there are distinct patterns there too, with some noise. There is no sum constraint vertically, but it messes with the horizontal sums.
  5. So, then I re-ran the horizontal fit with the 100% sum constraint.

You can rinse and repeat the process as many times as you like to make small tweaks and smooth out the resulting lines, but there is not a lot of incremental improvement. At the end of the day, you are taking a data set that is limited (but not too limited) and using it to estimate what will happen as more years pass and more data is collected to fill in the gaps.

That is all built on an assumption that the seeds are generally accurate (i.e. have a stable average result) with a mathematical distribution around the average. I think this is true, but homers tend to think their team is special and different and this is surely the year.

  • Fire 2
  • Haha 1

Drowning in data, but thirsting for knowledge

Posted

Nicely done.

So, if the one seed is the eventual champion 52.3% of the time (close to a coin flip), getting any kind of favorable odds to choose the field would seem to be a good bet.  

Posted
12 minutes ago, lightweight said:

Nicely done.

So, if the one seed is the eventual champion 52.3% of the time (close to a coin flip), getting any kind of favorable odds to choose the field would seem to be a good bet.  

Kinda depends on if the field contains a PSU wrestler.

As a #1 seed, PSU wins 82% of the time. As a #2 seed it is 47%. And as a #3 seed it is 38%. 

So don't take the field if PSU is #1.

Definitely take the field if they are #2 or #3.

Everything else is a close to a push, so it depends on the odds you can get.

Drowning in data, but thirsting for knowledge

Posted
12 minutes ago, Wrestleknownothing said:

Kinda depends on if the field contains a PSU wrestler.

As a #1 seed, PSU wins 82% of the time. As a #2 seed it is 47%. And as a #3 seed it is 38%. 

So don't take the field if PSU is #1.

Definitely take the field if they are #2 or #3.

Everything else is a close to a push, so it depends on the odds you can get.

Oh, now you're going all multi-variate on me...   🙂

Thanks for the additional insight.  

Posted

It is interesting that the probability distribution for the #1 and #8 seeds have the least uncertainty (most divergent from a uniform distribution) while the probability distributions for the #3 and #4 seeds have the most uncertainty (least divergent from a uniform distribution).  So, on average, I'd say the seeding committee gets things pretty much right over the long-haul: #1 seeds are most likely to place #1 while #8 seeds are most likely to not AA, and its more or less anything goes with the other seeds, especially #3 and #4.

  • Fire 1
Posted
30 minutes ago, Pin Head said:

In 14 years with 10 # 1 seeds each , which 2 or 3 #1's out of 140 did not AA ?

I will give you some hints. Both guys sustained injuries and medical forfeited out after losing on the top side. One was recent, one was not.

Drowning in data, but thirsting for knowledge

Posted (edited)
16 minutes ago, Wrestleknownothing said:

I will give you some hints. Both guys sustained injuries and medical forfeited out after losing on the top side. One was recent, one was not.

Now you are trying to confuse me.  Alex Tirapelli was one one but no injury, no mf.

Edited by ionel

.

Posted
4 minutes ago, ionel said:

Now you are trying to confuse me.  Alex Tirapelli was one one but no injury, no mf.

Tirapelle was 2005. Remember, this data only goes back to 2010.

So now who are your guesses?

Drowning in data, but thirsting for knowledge

Posted
Just now, Wrestleknownothing said:

Tirapelle was 2005. Remember, this data only goes back to 2010.

So now who are your guesses?

What data?  We can't use long term memory on these contests?  😉

.

Posted
10 minutes ago, Interviewed_at_Weehawken said:

As always, thanks for this work.

What year did they start seeding every wrestler in the bracket? Was it 2010?

If not, it would be super interesting to see those numbers.

2019. It has only been four tournaments.

Drowning in data, but thirsting for knowledge

Posted (edited)
1 hour ago, Interviewed_at_Weehawken said:

OK, smaller sample size, but I would imagine the data might be quite a bit different.

This is what before and after looks like:

image.png.68692a09803a5a3a866c07a0117e9e28.png

Take the lower left green box in the left picture (raw data). It represents 1 wrestler at #25 who AA'd in 1 of the 4 years that ranks went to 33 (2.5%). Does that mean that every four years we should expect to see that? Or is it more likely that we will see something similar every ten years or so (1%), or more (<1%)? If the later then the expectation would be that as time passes, we are more likely to see one #24, or one #26 achieve AA than another #25. The fitting process assumes the latter and accounts for this by effectively treating AA as a category rather than eight separate categories.

The other thing it achieves is to fill in more obvious discontinuities. For example, no #7 seed won between 2010 and 2023. Does that mean a #7 seed has no chance to win? Not likely given that winners have come from #8, #9, #11, and #13 seeds in that time.

And no #1 seed busted out in the blood round in my sample, but two busted out before in the round prior to the the blood round. So a 13-16 exit for a #1 seed is certainly possible, and on the right it is filled in.

Edited by Wrestleknownothing
  • Fire 1

Drowning in data, but thirsting for knowledge

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...