Could you beat 9,999 random people at chess?

Using the Monte Carlo approach to tackle a fun little hypothetical challenge.

Jul 23, 2025

Last year, one of my roommates asked me: “If you were forced to beat 99 random strangers in a game of skill, where even one loss means instant death, what game would you choose to give you the best chance of survival?”

My answer was chess. I’m highly rated in Rapid (2087) and Classical (1827 USCF), and that’s amongst chess players. Against random strangers, I’d likely be in the 99.9th percentile. Additionally, I believe that the skill-based nature of chess makes it unlikely that a worse opponent will just ‘get lucky’ and beat me.

The other day I came across a similar discussion in r/theydidthemath. OP claimed they could beat not just 99, but 9,999 random people at chess. His friend thought he was full of shit. And the commenters were mixed:

I was torn between these two lines of reasoning. Were his odds of success reasonable, or were they astronomical? And I then realized…

I could simulate this whole scenario using the Elo system’s “Expected Score” metric to find out the answer.

You see, Chess’s Elo rating system doubles as a predictor of game outcomes. Given two players’ respective ratings, you can predict the chance that player A will score a point against player B via the following formula:

The formula for the expected score of player A (E_A) given his rating (R_A) and his opponent’s rating (R_B)

This is known as a logistic distribution function. The higher A is rated relative to B, the greater his chance of scoring a point is. However, his chances are never 100%, and a small chance of losing a match against an inferior opponent might compound over time.1

How much would his chance of a loss compound?

I found a cute little Elo win probability calculator which pegged the win % of a 1650-rated player vs a 400-rated player (chess.com’s recommended initial chess rating for complete beginners is 400) to be roughly 99.9%. Pretty good, right?

It’s important that we use a Logistic rather than Gaussian distribution, as they give very different win percentages as the players’ ratings diverge. **chess.com** uses a Logistic distribution for its own Elo system (technically it’s a Glicko system but I don’t have room for more footnotes of footnotes, if you’re interestedm, Google “Glicko 2”).

But what are his chances of beating 9,999 such “noobs” in a row?

\(0.999250668^{9999}=0.0005556\)

Roughly 1/1800.

Already OP’s chances aren’t looking good. But I wasn’t satisfied. I wanted to do more than just a simple calculation: I wanted to use the Monte Carlo approach to find out exactly what his chances (and mine) of beating this gauntlet really are.

But what is the Monte Carlo Approach?

Sometimes the easiest way to estimate complex probabilities is to just simulate the event many, many times and see how it pans out. Computer scientists adopted the term Monte Carlo to describe this tactic, after the famous casino in Monaco. So long as you can define a problem sufficiently, and you have a powerful enough computer to simulate it millions of times, you can gain real insights into it with this approach.

Here was the plan:

Create a “population” of Elo ratings which corresponded to the real-world Elo distribution.
Sample 9,999 opponents from that population at random.
Use the expected score formula to determine the % chance of the player winning a match against each opponent. By multiplying all of these probabilities together, I’d get our intrepid hero’s expected chance of winning against all 9,999 opponents in a row.

Part 1: Simulating the Global Population

I decided to use this chart detailing the chess.com Rapid ratings distribution in 2023 to create my simulated chess population.

r/chess - Chess.com Rapid Elo Distribution (September 2023)

The plan was simple. Generate 1 hypothetical “player” for every thousand players present in each ratings bracket, with an Elo chosen randomly from within said ratings bracket. These players would collectively represent our global chess-playing population, which according to this PDF is around 605 million people..

Notably, there are less than 1,000 players with Elo ratings in the 2500+ ratings brackets. To represent these players, I created a single “super-gm” with an Elo in between 2600 and 2700, as creating less than one player was impossible.

Next, we needed to include the rest of humanity.

I used this fancy website called World-o-meters to estimate earth’s current population:

Our total simulated population should be of size P, where the ratio of our simulated chess_pop (31,481) to the real global chess-playing population (605,000,000) should be the same as P/Earth’s Population. So…

So we’ll create 397,019 noobs to add to our chess-playing population:

I chose to vary the ratings of the noobs just because I felt that the real noob population would have some variance in skill around a mean of about 350.

Now we just have to create the function for determining win % from relative Elo rating:

and we have all of the pieces in place to run this simulation2.

Simulating the Gauntlet

I first created a few functions to simulate playing a series of randomly generated opponents

howManyPeopleCanIBeat(myElo, opponentPop) runs the gauntlet for player A of rating myElo against opponents sampled randomly from population opponentPop. Each game, we calculate A’s win probability p, then sample a random number n between 0 and 1. If n > p, A lost, and the gauntlet ends. Otherwise, A racks up 1 win and plays again. We keep going until A loses, and return A’s win count.
runVsNPeople(myElo,opponentPop,n) calculates the expected chance of player A with rating myElo to win against n players sampled from opponentPop in a row. We do this by multiplying the win probability of each match together to create a cumulative win probability.
plotScoreDistribution and winPercVsNPeople are aggregator functions for howManyPeopleCanIBeat and runVsNPeople respectively. plotScoreDistribution additionally returns a histogram of runs by win-streak size.

Can OP Beat 9,999 Random People?

Without further ado, we can use winPercVsNPeople to calculate OP’s probability of beating 9,999 randomly selected people to be…

Oof. For those of you who haven’t used a TI-85, that’s a probability of about 0.00000000000436%. So it looks like the posts indicating OP’s chances were “near-zero” had the right of it.

Here’s a histogram of 10,000 simulated runs for OP using plotScoreDistributions3.

OP performed admirably, making it through 100 opponents in over half of his runs. But he only made it past 1,000 in a few. In his best run out of 10,000, he made it past slightly more than 2,500 opponents.

Ok, I have to know. How would I do?

I’m rated 2087 in Rapid, which should give me better odds of beating this gauntlet. But just how much better?

~0.66%! I’m in the percentage ball-park, wild!

Here’s the breakdown of 10,000 simulated runs:

The variance in the length of these runs is pretty wild to see.

But we’re not done yet. I want to see a player beat this challenge. Magnus Carlsen is currently ranked #1 on chess.com’s Rapid Chess leaderboard, at a rating of 2819. What if he faced off against 9,999 random people?

Alright, we finally have a player who’s favored against 9,999 random opponents. However, Magnus’s score still isn’t perfect. According to the Elo system, he has a ~11.5% chance of losing to at least one player before beating the gauntlet.

When the model conflicts with intuition, do we trust the model?

This is a really interesting result for me, because it fails my personal sanity check.

According to the Elo system, the miniscule yet ever-present chance of an upset compounds match after match, so that after 9,999 matches, even the world #1 stands a real chance of losing at least one opponent. But intuitively, I believe that Magnus would have no trouble crushing 9,999 random people in a row in chess. My personal experience, both playing and watching high-level play, insists that the Elo model must be over-representing the chance of an upset. Which viewpoint is correct, the model or my intuition? I really don’t know.

A histogram of 1,000 Magnus runs. When I tried to plot the 10,000-run histogram, my PC crashed. Is this distribution accurate? Do you think this is what it would look like if Magnus Carlsen ran our gauntlet 1,000 times? Can we even claim to know what that would look like?

That’s the trouble with statistics. It’s an attempt to make sense of an imperfect world, with imperfect information. We can never truly say that a model is correct or accurate to reality, only that it is useful, or accurate to an extent. But it’s precisely this embrace of imperfection that draws me to statistics. Much like the Monte Carlo approach in computer science, statistics can be used to attack problems which defy analytical solution, or logical explanation. That makes it both incredibly practical and incredibly beautiful. And invoking it to solve a problem can be incredibly fun.

If you have thoughts about how Magnus would fare against 9,999 random opponents (or how you would do), or if you have suggestions about things I should change in my methodology, or other simulations you’d like to see, let me know in the comments!

Well, that’s all for now, folks. Stay curious!

Evan’s Substack

Discussion about this post