Jump to content

Recommended Posts

Posted

Sorry, I've been out of town, but I've now taken a quick look at the results for Pablo for the NCAA tournament.  Looking at all the wrestlers in all the matches, excluding the medical forfeits and I also culled out the DQ at 285 (not necessarily justified, but it won't matter because it doesn't affect the comparison), the higher ranked wrestler in Pablo won 74.4% of the time.  No one who was rated more than 2000 higher than their opponent lost.

For context, let's compare it to seeds.  By my math, I have the higher seed winning 468 and losing 168, for an overall success rate of .... 73.6%.  A little behind, but not huge, for sure.  It's a difference of 5 matches.

I won't claim that it is statistically significant or anything, but for sure, there is nothing here to say that seeding is any better.  I'm calling that a success for Pablo.

One thing I do want to comment on:  Steveson vs Hendrickson.

I haven't gone back to the thread where we discussed probabilities, but recall my post where I talked about my interpretation of them.  In particular, where I talked about how I use 15% as the line of "it wouldn't a surprise."  Now, I don't remember for sure, but I think I actually talked about 285 in that context, and said something like "Certainly, we would expect Steveson to win, but it wouldn't be a surprise if Hendrickson pulled it off"  (recall Pablo gave Hendrickson a 15.8% chance of winning the title, and, if you look it up, it was a 28% chance of beating Steveson).

So for all the people insisting that this was the "biggest upset" in championship history and that this was a total shocker, the answer is no.  Pablo told you all that this was possible, and wouldn't actually be a surprise.

I know it's always a surprise to see a #1 seed lose, and yeah, Steveson is an olympian, but when you look at what they've done this season, Hendrickson has been legit.  Pablo was telling us ahead of time that they were closer than people were crediting.

It was an exciting match, for sure, but the outcome didn't surprise me.

  • Fire 1
Posted
  On 3/31/2025 at 8:52 PM, Wrestleknownothing said:

What was the biggest upset of this tournament based on Pablo? Or maybe top 5?

Expand  

The biggest upsets (remember Hendrickson over Steveson is 880)

5 149:North Dakota St_Gavin Drexler    over    149:Illinois_Kannon Webster difference=1526 (15.4% chance; 22 seed over 7)

4 165:Central Michigan_Chandler Amaker    over    165:Bucknell_Noah Mulvaney difference 1595 (14.4% chance; 33 seed over 17)

3 174:Army_Dalton Harkins    over    174:Nebraska_Lenny Pinto difference=1774 (11.8% chance; 25 seed over 😎

2 165:Hofstra_Kyle Mosher      pinned 165:Arizona St_Nicco Ruiz difference=1825 (11.2% chance; 16 seed over 15; Pablo did NOT like Mosher)

1 197:Rider_Brock Zurawski    over    197:Northern Iowa_Wyatt Voelker difference = 1957 (9.6% chance; 26 seed over 7)
 

Posted

Assuming Pablo goes on to predict future seasons, will this years data be included for larger sample sizes of individual wrestlers? Or is it purely a system based on same season results?

Posted
  On 3/31/2025 at 9:23 PM, BruceyB said:

Assuming Pablo goes on to predict future seasons, will this years data be included for larger sample sizes of individual wrestlers? Or is it purely a system based on same season results?

Expand  

You ask a good question.

What I've done in the past (in volleyball) is to use previous year data at the beginning of the season until there is enough data in the current season to make reliable rankings.  That usually takes about 4 - 6 weeks of competition in volleyball, although that's a more compact season (so it goes through about 12 matches before I move on).

With wrestling however, I don't know how it is going to work.  In order to do the weighting correctly in volleyball, i needed several seasons worth of results in order to see how much last season informs this season.  I've got one season's worth of data now.  Even aside from that, the challenges in wrestling are multifold

1) Since this is all individual, the turnover of individuals makes it harder.  Right now I've got about 4000 wrestlers in the database.  How many are graduating?  But I'd have to keep them around in order for this to be useful.  So next season comes and we are talking....6000 maybe?  That's huge.  And then 24000 matches?  Already on my office computer (the fast one) this takes more than 12 hours.  On my laptop, it's more than 24 hrs.  I need to figure out how to port this onto the supercomputer (I don't know if it has a microsoft license)

2) Even if they don't graduate, how many of the change weight classes.  Another challenge.

But this question is outside of the bigger issue - can there be an easier way to get data?  I just don't think importing from track like I'm doing now will be sustainable.

We'll see.  Anyone here from InterMat or Flo or WrestleStat who wants to collaborate?  I'll give you rights to publish if you can provide a more convenient access to scores.

 

  • Brain 2
Posted

Do we know how seeds are formed for ncaa brackets, such that the top ranked wrestlers outperform Pablo rankings?  Is there a schedule strength factor in addition to win/loss and season ranking inputs?  What would it take for Pablo-like systems to outperform the current seeding process?

  • Bob 1
Posted
  On 4/3/2025 at 7:50 PM, jross said:

Do we know how seeds are formed for ncaa brackets, such that the top ranked wrestlers outperform Pablo rankings?  Is there a schedule strength factor in addition to win/loss and season ranking inputs?  What would it take for Pablo-like systems to outperform the current seeding process?

Expand  

Um, Pablo did just outperform the current seeding process.  Not by a lot (5 matches), but it did.

 

Posted (edited)
  On 4/3/2025 at 8:58 PM, Pablo said:

Um, Pablo did just outperform the current seeding process.  Not by a lot (5 matches), but it did.

 

Expand  

My reading comprehension was opposite/wrong.  Nice job.

What would it take to outperform by a larger margin?

Edited by jross
Posted
  On 4/4/2025 at 12:00 PM, jross said:

My reading comprehension was opposite/wrong.  Nice job.

What would it take to outperform by a larger margin?

Expand  

I don't l know how much farther it can go, but in order to improve upon what I've done, I could go in and do a lot more testing of actual predictive parameters.  However, the challenge in that is that every time I tweak something to rerun the calculation, it takes like 24 hours to try it again.

I've used match-pair data (when two wrestlers face each other twice) to get a pretty good idea of the model, but there are other modeling aspects that can be tweaked.

But I'm happy that it's able to run comparable to seeding.  

  • Bob 1
Posted
  On 3/31/2025 at 9:46 PM, Pablo said:

You ask a good question.

What I've done in the past (in volleyball) is to use previous year data at the beginning of the season until there is enough data in the current season to make reliable rankings.  That usually takes about 4 - 6 weeks of competition in volleyball, although that's a more compact season (so it goes through about 12 matches before I move on).

With wrestling however, I don't know how it is going to work.  In order to do the weighting correctly in volleyball, i needed several seasons worth of results in order to see how much last season informs this season.  I've got one season's worth of data now.  Even aside from that, the challenges in wrestling are multifold

1) Since this is all individual, the turnover of individuals makes it harder.  Right now I've got about 4000 wrestlers in the database.  How many are graduating?  But I'd have to keep them around in order for this to be useful.  So next season comes and we are talking....6000 maybe?  That's huge.  And then 24000 matches?  Already on my office computer (the fast one) this takes more than 12 hours.  On my laptop, it's more than 24 hrs.  I need to figure out how to port this onto the supercomputer (I don't know if it has a microsoft license)

2) Even if they don't graduate, how many of the change weight classes.  Another challenge.

But this question is outside of the bigger issue - can there be an easier way to get data?  I just don't think importing from track like I'm doing now will be sustainable.

We'll see.  Anyone here from InterMat or Flo or WrestleStat who wants to collaborate?  I'll give you rights to publish if you can provide a more convenient access to scores.

 

Expand  

Hmmm.  What’s the back end?   Traditional third normal form? structured?   What’s your predicative model written in?    Python?   Sql?   ML models? Can you use cloud?   I’m defo not from any of those fwiw.  

Posted
  On 4/6/2025 at 7:54 PM, Caveira said:

Hmmm.  What’s the back end?   Traditional third normal form? structured?   What’s your predicative model written in?    Python?   Sql?   ML models? Can you use cloud?   I’m defo not from any of those fwiw.  

Expand  

I have been using excel with the built in solver.  Structurally, it works well and is straightforward with the built-in functions.  Yeah, I'm sure I could program the arrays in whatever app, but the multi-variable, non-linear regression is well-beyond my skills.

Posted
  On 4/6/2025 at 9:59 PM, Pablo said:

I have been using excel with the built in solver.  Structurally, it works well and is straightforward with the built-in functions.  Yeah, I'm sure I could program the arrays in whatever app, but the multi-variable, non-linear regression is well-beyond my skills.

Expand  

How much data are you analyzing?  Know python at all?

Posted
  On 4/6/2025 at 10:07 PM, Caveira said:

How much data are you analyzing?  Know python at all?

Expand  

4000 variables (wrestlers) with 17000 data points (although they could be separated into probably groups of about 400 and maybe 1700 outcomes and run in serial).

I don't know python.  

Posted
  On 4/6/2025 at 10:22 PM, Pablo said:

4000 variables (wrestlers) with 17000 data points (although they could be separated into probably groups of about 400 and maybe 1700 outcomes and run in serial).

I don't know python.  

Expand  

That’s all gonna be single threaded.   Hmmm.     I would try python if I could boss.   It’s free.  There are other things out there but a lot of them have fees.   

Posted

yea would be interesting to plug that into ai to have it made as performant as possible in python or C code.  10-100x faster than VBA.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...