Basara
UO Forum Moderator
Moderator
Professional
Governor
Stratics Veteran
Stratics Legend
Wiki Moderator
UNLEASHED
Campaign Supporter
Pavel, we have tested on longer numbers. It was years ago, and the results were the same.
The issue isn't balance over time - it's the fact that there is "result clumping" greater than what should be expected for a near-50% chance, let alone what one sees when the success or failure chances are much greater.
One has to first look at the long-term balance. That proved that the expected mean for the smelt numbers was accurate, in those older tests.
However, looking at the same data in smaller samples of consecutive attempts, rather than totals, shows abnormally long stretches of SUCCESSES AND FAILURES, that if the RNG was working as (or approximating) a fair die, should not be appearing SO MANY TIMES over the size of the population. If X successes or X failures in a row are so unlikely that the probability of such a stretch is 1 in 10,000, then seeing 20 of them in a 1,000 - or even a 10,000, attempt population should be throwing up warning flags about the RNG concerning its seed. 20 in 100k attempts, would be acceptable, almost expected (well within the expected value range at that size); but for test populations that are still large, but powers of 10 smaller, those kind of results are extreme outliers.
That's the point we've been repeatedly been trying to make, and it keeps getting repeatedly ignored by those misrepresenting the issue.
The RNG is fair over long periods of time, because its inefficiencies occur evenly spread over the entire range of results, given enough time - but is somehow getting its seed from a value that is not conducive to random number generation over short sampling times, as it does not produce significant variation short-term.
It would be like trying to figure out what the average level of ambient light is for an area. If you take the readings all day long - and do it for different weather conditions on every day, you'll get a good result.
But if all your samples are an hour before sunset, or all at 1 PM, they'll have a lot less variation between samples than if you combined the two sets of data together and compared all the results.
And, for the person wondering about the Hubble comment? I was illustrating a point, that went right over your head. Testing methods are only as good as the tool using them. If you assume the testing method is relevant (or correct), when it isn't, the data that results from the test is meaningless, and will probably cost you (like being caught speeding in a non-US country by going 75 MPH in a 75 kmph zone, because your kid played with the dash buttons and switched the digital speedometer display to English units - Had something similar happen to an internet friend once).
For looking at problems with short-term streaks, you don't look at the totals for 100k tests, and ignore the raw data. You look at the totals, see that the mean is coming out correctly, and THEN start investigating what could be causing abnormally common data clumps in the test data (since the clumping is occurring with both positive and negative results, and balancing out the mean that way). It's like looking at the mean number of Christmas Cards sent out over the year, then expecting that number to be averaged out per month, and not be clumped in November & December. If something is causing large clumps in the results, there is probably something there (in the card case, the date of Christmas).
The issue isn't balance over time - it's the fact that there is "result clumping" greater than what should be expected for a near-50% chance, let alone what one sees when the success or failure chances are much greater.
One has to first look at the long-term balance. That proved that the expected mean for the smelt numbers was accurate, in those older tests.
However, looking at the same data in smaller samples of consecutive attempts, rather than totals, shows abnormally long stretches of SUCCESSES AND FAILURES, that if the RNG was working as (or approximating) a fair die, should not be appearing SO MANY TIMES over the size of the population. If X successes or X failures in a row are so unlikely that the probability of such a stretch is 1 in 10,000, then seeing 20 of them in a 1,000 - or even a 10,000, attempt population should be throwing up warning flags about the RNG concerning its seed. 20 in 100k attempts, would be acceptable, almost expected (well within the expected value range at that size); but for test populations that are still large, but powers of 10 smaller, those kind of results are extreme outliers.
That's the point we've been repeatedly been trying to make, and it keeps getting repeatedly ignored by those misrepresenting the issue.
The RNG is fair over long periods of time, because its inefficiencies occur evenly spread over the entire range of results, given enough time - but is somehow getting its seed from a value that is not conducive to random number generation over short sampling times, as it does not produce significant variation short-term.
It would be like trying to figure out what the average level of ambient light is for an area. If you take the readings all day long - and do it for different weather conditions on every day, you'll get a good result.
But if all your samples are an hour before sunset, or all at 1 PM, they'll have a lot less variation between samples than if you combined the two sets of data together and compared all the results.
And, for the person wondering about the Hubble comment? I was illustrating a point, that went right over your head. Testing methods are only as good as the tool using them. If you assume the testing method is relevant (or correct), when it isn't, the data that results from the test is meaningless, and will probably cost you (like being caught speeding in a non-US country by going 75 MPH in a 75 kmph zone, because your kid played with the dash buttons and switched the digital speedometer display to English units - Had something similar happen to an internet friend once).
For looking at problems with short-term streaks, you don't look at the totals for 100k tests, and ignore the raw data. You look at the totals, see that the mean is coming out correctly, and THEN start investigating what could be causing abnormally common data clumps in the test data (since the clumping is occurring with both positive and negative results, and balancing out the mean that way). It's like looking at the mean number of Christmas Cards sent out over the year, then expecting that number to be averaged out per month, and not be clumped in November & December. If something is causing large clumps in the results, there is probably something there (in the card case, the date of Christmas).