# trouble generating random numbers with a certain distribution

**807589**Aug 23, 2008 11:43 AM

I'd like to generate a random integer (int or long) from 1 to N where, over time, I get the number 'n' 1/n times as often as the number 1. In other words, I get a 2 half as many times as a 1, a 3 one-third as many times as a 1, etc...does anyone know of a fast and accurate way to do this?

So far, I have two ways to do this, one that's efficient and accurate and one that's fast but too inaccurate for my application. They are as follows:

1. ...the inefficient but accurate way that uses [rejection sampling|http://en.wikipedia.org/wiki/Rejection_sampling] with the probability distribution function N/x:

2. ...the fast but inaccurate way that I made after recognizing that the inverse of the cumulative probability distribution (forgive me if my math terms are wrong, which is likely) looks like the [harmonic series|http://en.wikipedia.org/wiki/Harmonic_series_(mathematics)], or 1/1 + 1/2 + 1/3 + ... for which there is an estimation using the [Euler-Mascheroni constant|http://en.wikipedia.org/wiki/Euler-Mascheroni_constant]:

The best thing I can come up with for now is to amend the second version with a lookup table for smaller values of N to improve the accuracy there. Fortunately, the estimation function

A lookup table with interpolation is possible, but I'd like to avoid it due to accuracy concerns.

Thanks for any advice...

So far, I have two ways to do this, one that's efficient and accurate and one that's fast but too inaccurate for my application. They are as follows:

1. ...the inefficient but accurate way that uses [rejection sampling|http://en.wikipedia.org/wiki/Rejection_sampling] with the probability distribution function N/x:

```
java.util.Random rand = new java.util.Random();
int N = 100; // max int to generate
boolean bFound = false;
double dRand;
while (!bFound)
{
int iRandX = (int) ((rand.nextDouble() * (double) N) + 1.0); // random X
double dRandY = (rand.nextDouble() * (double) N); // random Y
// only take it (the x value) if y falls under the curve N / x
if (dRandY <= ((double) N / (double) iRandX))
{
dRand = iRandX;
bFound = true;
}
}
```

However, it's dog slow as many samples get rejected.2. ...the fast but inaccurate way that I made after recognizing that the inverse of the cumulative probability distribution (forgive me if my math terms are wrong, which is likely) looks like the [harmonic series|http://en.wikipedia.org/wiki/Harmonic_series_(mathematics)], or 1/1 + 1/2 + 1/3 + ... for which there is an estimation using the [Euler-Mascheroni constant|http://en.wikipedia.org/wiki/Euler-Mascheroni_constant]:

```
java.util.Random rand = new java.util.Random();
double dEulerMascheroni = 0.57721566490153286060651209008240243104215933593992;
int N = 100;
double dMax = Math.log(N) + dEulerMascheroni;
double dNormalRand = rand.nextDouble() * (dMax - 1.0) + 1.0; // 1 to N
double dRand = Math.exp(dNormalRand - dEulerMascheroni); // integral
```

However, this is only an estimation, so after viewing a histogram, the frequencies of the smaller values in particular (1-10 or so) are wildly inaccurate, and for my application, accuracy in the lower values is critical. The best thing I can come up with for now is to amend the second version with a lookup table for smaller values of N to improve the accuracy there. Fortunately, the estimation function

`Math.log(N) + dEulerMascheroni`

approaches an error value of 0 as N gets higher, so a lookup table only for lower values should suffice (haven't tried this yet).*Am I missing a really simple and fast way to generate this easy distribution accurately?*A lookup table with interpolation is possible, but I'd like to avoid it due to accuracy concerns.

Thanks for any advice...

- 96 Views
- Tags: none (add)