Pablo

Pablo · March 31

Sorry, I've been out of town, but I've now taken a quick look at the results for Pablo for the NCAA tournament. Looking at all the wrestlers in all the matches, excluding the medical forfeits and I also culled out the DQ at 285 (not necessarily justified, but it won't matter because it doesn't affect the comparison), the higher ranked wrestler in Pablo won 74.4% of the time. No one who was rated more than 2000 higher than their opponent lost.

For context, let's compare it to seeds. By my math, I have the higher seed winning 468 and losing 168, for an overall success rate of .... 73.6%. A little behind, but not huge, for sure. It's a difference of 5 matches.

I won't claim that it is statistically significant or anything, but for sure, there is nothing here to say that seeding is any better. I'm calling that a success for Pablo.

One thing I do want to comment on: Steveson vs Hendrickson.

I haven't gone back to the thread where we discussed probabilities, but recall my post where I talked about my interpretation of them. In particular, where I talked about how I use 15% as the line of "it wouldn't a surprise." Now, I don't remember for sure, but I think I actually talked about 285 in that context, and said something like "Certainly, we would expect Steveson to win, but it wouldn't be a surprise if Hendrickson pulled it off" (recall Pablo gave Hendrickson a 15.8% chance of winning the title, and, if you look it up, it was a 28% chance of beating Steveson).

So for all the people insisting that this was the "biggest upset" in championship history and that this was a total shocker, the answer is no. Pablo told you all that this was possible, and wouldn't actually be a surprise.

I know it's always a surprise to see a #1 seed lose, and yeah, Steveson is an olympian, but when you look at what they've done this season, Hendrickson has been legit. Pablo was telling us ahead of time that they were closer than people were crediting.

It was an exciting match, for sure, but the outcome didn't surprise me.

Wrestleknownothing · March 31

What was the biggest upset of this tournament based on Pablo? Or maybe top 5?

Pablo · March 31

Just now, Wrestleknownothing said:

What was the biggest upset of this tournament based on Pablo? Or maybe top 5?

The biggest upsets (remember Hendrickson over Steveson is 880)

5 149:North Dakota St_Gavin Drexler over 149:Illinois_Kannon Webster difference=1526 (15.4% chance; 22 seed over 7)

4 165:Central Michigan_Chandler Amaker over 165:Bucknell_Noah Mulvaney difference 1595 (14.4% chance; 33 seed over 17)

3 174:Army_Dalton Harkins over 174:Nebraska_Lenny Pinto difference=1774 (11.8% chance; 25 seed over

2 165:Hofstra_Kyle Mosher pinned 165:Arizona St_Nicco Ruiz difference=1825 (11.2% chance; 16 seed over 15; Pablo did NOT like Mosher)

1 197:Rider_Brock Zurawski over 197:Northern Iowa_Wyatt Voelker difference = 1957 (9.6% chance; 26 seed over 7)

BruceyB · March 31

Assuming Pablo goes on to predict future seasons, will this years data be included for larger sample sizes of individual wrestlers? Or is it purely a system based on same season results?

Pablo · March 31

7 minutes ago, BruceyB said:

Assuming Pablo goes on to predict future seasons, will this years data be included for larger sample sizes of individual wrestlers? Or is it purely a system based on same season results?

You ask a good question.

What I've done in the past (in volleyball) is to use previous year data at the beginning of the season until there is enough data in the current season to make reliable rankings. That usually takes about 4 - 6 weeks of competition in volleyball, although that's a more compact season (so it goes through about 12 matches before I move on).

With wrestling however, I don't know how it is going to work. In order to do the weighting correctly in volleyball, i needed several seasons worth of results in order to see how much last season informs this season. I've got one season's worth of data now. Even aside from that, the challenges in wrestling are multifold

1) Since this is all individual, the turnover of individuals makes it harder. Right now I've got about 4000 wrestlers in the database. How many are graduating? But I'd have to keep them around in order for this to be useful. So next season comes and we are talking....6000 maybe? That's huge. And then 24000 matches? Already on my office computer (the fast one) this takes more than 12 hours. On my laptop, it's more than 24 hrs. I need to figure out how to port this onto the supercomputer (I don't know if it has a microsoft license)

2) Even if they don't graduate, how many of the change weight classes. Another challenge.

But this question is outside of the bigger issue - can there be an easier way to get data? I just don't think importing from track like I'm doing now will be sustainable.

We'll see. Anyone here from InterMat or Flo or WrestleStat who wants to collaborate? I'll give you rights to publish if you can provide a more convenient access to scores.

jross · April 3

Do we know how seeds are formed for ncaa brackets, such that the top ranked wrestlers outperform Pablo rankings? Is there a schedule strength factor in addition to win/loss and season ranking inputs? What would it take for Pablo-like systems to outperform the current seeding process?

Pablo · April 3

1 hour ago, jross said:

Do we know how seeds are formed for ncaa brackets, such that the top ranked wrestlers outperform Pablo rankings? Is there a schedule strength factor in addition to win/loss and season ranking inputs? What would it take for Pablo-like systems to outperform the current seeding process?

Um, Pablo did just outperform the current seeding process. Not by a lot (5 matches), but it did.

jross · April 4

15 hours ago, Pablo said:

Um, Pablo did just outperform the current seeding process. Not by a lot (5 matches), but it did.

My reading comprehension was opposite/wrong. Nice job.

What would it take to outperform by a larger margin?

Edited April 4 by jross

Pablo · April 6

On 4/4/2025 at 8:00 AM, jross said:

My reading comprehension was opposite/wrong. Nice job.

What would it take to outperform by a larger margin?

I don't l know how much farther it can go, but in order to improve upon what I've done, I could go in and do a lot more testing of actual predictive parameters. However, the challenge in that is that every time I tweak something to rerun the calculation, it takes like 24 hours to try it again.

I've used match-pair data (when two wrestlers face each other twice) to get a pretty good idea of the model, but there are other modeling aspects that can be tweaked.

But I'm happy that it's able to run comparable to seeding.

Caveira · April 6

On 3/31/2025 at 4:46 PM, Pablo said:

You ask a good question.

What I've done in the past (in volleyball) is to use previous year data at the beginning of the season until there is enough data in the current season to make reliable rankings. That usually takes about 4 - 6 weeks of competition in volleyball, although that's a more compact season (so it goes through about 12 matches before I move on).

With wrestling however, I don't know how it is going to work. In order to do the weighting correctly in volleyball, i needed several seasons worth of results in order to see how much last season informs this season. I've got one season's worth of data now. Even aside from that, the challenges in wrestling are multifold

1) Since this is all individual, the turnover of individuals makes it harder. Right now I've got about 4000 wrestlers in the database. How many are graduating? But I'd have to keep them around in order for this to be useful. So next season comes and we are talking....6000 maybe? That's huge. And then 24000 matches? Already on my office computer (the fast one) this takes more than 12 hours. On my laptop, it's more than 24 hrs. I need to figure out how to port this onto the supercomputer (I don't know if it has a microsoft license)

2) Even if they don't graduate, how many of the change weight classes. Another challenge.

But this question is outside of the bigger issue - can there be an easier way to get data? I just don't think importing from track like I'm doing now will be sustainable.

We'll see. Anyone here from InterMat or Flo or WrestleStat who wants to collaborate? I'll give you rights to publish if you can provide a more convenient access to scores.

Hmmm. What’s the back end? Traditional third normal form? structured? What’s your predicative model written in? Python? Sql? ML models? Can you use cloud? I’m defo not from any of those fwiw.

Pablo · April 6

1 hour ago, Caveira said:

Hmmm. What’s the back end? Traditional third normal form? structured? What’s your predicative model written in? Python? Sql? ML models? Can you use cloud? I’m defo not from any of those fwiw.

I have been using excel with the built in solver. Structurally, it works well and is straightforward with the built-in functions. Yeah, I'm sure I could program the arrays in whatever app, but the multi-variable, non-linear regression is well-beyond my skills.

Caveira · April 6

7 minutes ago, Pablo said:

I have been using excel with the built in solver. Structurally, it works well and is straightforward with the built-in functions. Yeah, I'm sure I could program the arrays in whatever app, but the multi-variable, non-linear regression is well-beyond my skills.

How much data are you analyzing? Know python at all?

Pablo · April 6

12 minutes ago, Caveira said:

How much data are you analyzing? Know python at all?

4000 variables (wrestlers) with 17000 data points (although they could be separated into probably groups of about 400 and maybe 1700 outcomes and run in serial).

I don't know python.

Caveira · April 6

37 minutes ago, Pablo said:

4000 variables (wrestlers) with 17000 data points (although they could be separated into probably groups of about 400 and maybe 1700 outcomes and run in serial).

I don't know python.

That’s all gonna be single threaded. Hmmm. I would try python if I could boss. It’s free. There are other things out there but a lot of them have fees.

jross · April 6

yea would be interesting to plug that into ai to have it made as performant as possible in python or C code. 10-100x faster than VBA.

GreatDane67 · April 6

Can we buy Vote for Pablo t-shirts?

bnwtwg · April 7

No one has used basic since 2012. I call shenanigans.

Pablo · April 7

9 hours ago, bnwtwg said:

No one has used basic since 2012. I call shenanigans.

My original VBA macros were written back in about 2002, so they get grandfathered in.

Sign In

Pablo - NCAA Championships

Recommended Posts

Wrestleknownothing

Pablo

BruceyB

Pablo

jross

Pablo

jross

Pablo

Caveira

Pablo

Caveira

Pablo

Caveira

jross

GreatDane67

bnwtwg

Pablo

Create an account or sign in to comment

Create an account

Sign in

Latest Rankings

College Commitments

Trey Craig

Julian Burgett

John Murphy

Tyler Neiva

Zion Borge

Rankings

Commits

Browse

Activity

Support