### Intro

So, some day in April 2023 Prof. James Grime asked for help on Twitter with some problem he was tinkering with. This is his tweet:

**"OK, I have a problem someone could probably simulate faster than I can work it out. I have a set of dice with values: 0, 0, 0, 0, 2, 3. You roll the dice, add up the total, then roll that number of dice. And repeat. If I start with five dice, how long will the game last?"**

I quickly wrote a little program to perform the simulation, and let it run overnight not only with 5 dice as a starting point, but also 1, 2, 3 and 5. These are the results:

Initial Rolls | Average Length | Simulated games | Longest game |

1 | 2.22307... | 2.7 x 10^{12} | 150 |

2 | 3.23895... | 1.5 x 10^{12} | 151 |

3 | 4.09949... | 1.1 x 10^{12} | 154 |

4 | 4.84160... | 0.8 x 10^{12} | 151 |

5 | 5.49181... | 1.2·10^{12} | 143 |

Here the decimals are all correct, in that I truncated the results a couple of digits sooner than the most stable digits of the simulation. So, the answer he was looking for is

**5.491816...**

This is the histogram of the lengths of the 1.2 x 10

^{12}games, for initial number of dice # = 5:

Which in logarithmic Y scale looks like this:

Which show a nice linear decay. The histograms for #=1,2,3 and 4 follow linear decays as well.

Regarding the code, I run the simulation with a direct and straightforward encoding of the problem statement. It's written in C, and runs at about 8 million games per second as is:

#include <stdint.h>
#include <stdio.h>
#include <random>
int main( void )
{
int dice[6] = { 0, 0, 0, 0, 2, 3 };
std::mt19937 gen;
uint32_t longestGame = 0;
uint64_t runningTotal = 0;
for( uint64_t numGames=1; ; numGames++ )
{
uint32_t numThrows = 0;
uint32_t numDiceToRoll = 5;
for( numThrows=1; ; numThrows++ )
{
uint32_t total = 0;
for( uint32_t i=0; i<numDiceToRoll; i++ )
total += dice[ gen() % 6 ];
numDiceToRoll = total;
if( numDiceToRoll==0 ) break;
}
runningTotal += uint64_t(numThrows);
longestGame = std::max( longestGame, numThrows );
if( (numGames & 0x00000000001fffff )==0 )
{
printf( "%I64u %d %.16g\n", numGames, longestGame,
double(runningTotal)/double(numGames) );
}
}
return 0;
}

This is rather unoptimized in terms of low level programing like SIMD instructions or multithreading, not to speak of a GPU implementation. But a simple optimization can be done by creating a look up table (LUT) that encodes the results of 1, 2, 3, 4 or 5 rolls of the dice, so that we can have a lot less random number generation invocations, memory lookups and additions the innest of the loops. Something like this:

#include <stdint.h>
#include <stdio.h>
#include <random>
int main( void )
{
uint8_t dice1[6] = { 0, 0, 0, 0, 2, 3 };
uint8_t dice5[6*6*6*6*6]; // 7776 bytes
uint32_t len[] = { 1, 6, 6*6, 6*6*6, 6*6*6*6, 6*6*6*6*6 };
// precompute results for 5 rolls
for( int m=0; m<6; m++ )
for( int l=0; l<6; l++ )
for( int k=0; k<6; k++ )
for( int j=0; j<6; j++ )
for( int i=0; i<6; i++ )
dice5[6*6*6*6*m+6*6*6*l+6*6*k+6*j+i] = dice1[i] + dice1[j] + dice1[k] + dice1[l] + dice1[m];
std::mt19937 gen;
uint32_t longestGame = 0;
uint64_t runningTotal = 0;
for( uint64_t numGames=1; ; numGames++ )
{
uint32_t numThrows = 0;
uint32_t numDiceToRoll = 5;
for( numThrows=1; ; numThrows++ )
{
uint32_t total = 0;
while( numDiceToRoll>0 )
{
uint32_t num = std::min(numDiceToRoll,5u);
total += dice5[ gen() % len[ num ] ];
numDiceToRoll -= num;
}
numDiceToRoll = total;
if( numDiceToRoll==0 ) break;
}
runningTotal += uint64_t(numThrows);
longestGame = std::max( longestGame, numThrows );
if( (numGames & 0x00000000001fffff )==0 )
{
printf( "%I64u %d %.16g\n", numGames, longestGame,
double(runningTotal)/double(numGames) );
}
}
return 0;
}