A combined z-score of 2.8 standard deviations from chance expectation! You guys are veritable time traveling rock stars!
Accumulative z-score 2776 trials
INTENDED Z-SCORE = 1.5
DISPLACEMENT Z-SCORE = 1.36
COMBINED Z-SCORE = 2.8
Accumulative z-score trials intended scores > .1, displacement scores>1.5
INTENDED Z-SCORE = 1.7
DISPLACEMENT Z-SCORE = 2.4
COMBINED Z-SCORE = 2.7
The first plot "Accumulative z-score 2776 trials" is the accumulative z-score for all intended, displacement and combined trials and includes all 2776 trials by every single time-machine user since the beginning of October, 2023 when we started this experiment (no users trials have been removed).
A z-score is the number of standard deviations from chance expectation of the result. A z-score over 1.7 standard deviations from chance expectation is considered statistically significant, and this cannot be considered a result of chance expectation. A z-score less than 1.7 standard deviations might be considered to have happened by chance. Since one of our goals with the Time-machine project is to demonstrate non-chance transfer of information from the future to the past, we need to measure significant performance statistics.
Intended z-score is the binary z-score for all trials, and it is calculated in this way: if the trial was correct, then it scored a "1", if the trial was incorrect then it scored a "-1". The final z-score was calculated by summing the binary scores and dividing by the square root of the number of trials. Example: sum(1,1,1,-1,-1,-1,1,1,-1,1,1)/sqrt(11). The total z-score to date is z=1.5 for all intended trials which is not statistically significant (z > 1.7 is significant). However, as the second plot shows, when we filter-out confidence scores < .2, we reach statistical significance of z=1.7.
Displacement z-score is the binary z-score for all 9 trials in a prediction with the exception of the intended trial. For example, the score for trial 1 would be the combined scores comparing the RV transcript for trial 1 to the photos for trials 2,3,4,5,6,7,8,9 and 10 (all photos for all trials except for trial #1). Sometimes when you remote view the photo for "trial #1", you unintentionally perceive elements of photos in the other 9 trials. One of the benefits to using the A.I. to analyze scores is we can compare the RV transcript to ALL of the trial photos, not just the intended photo. The total displacement z-score is a non-significant z=1.36, but if we filter out all confidence scores < 1.5, the displacement z-score increases to a highly significant z=2.4. The filtering-out of low scores seems arbitrary, and will result in z-scores that are over-fitted, and possibly invalid. However, if we were to define the score filters before the analysis WITHOUT optimizing these filter values, then the resulting z-score could be considered believable.
Combined z-score uses BOTH intended and displacement scores to determine the prediction for a trial. An example of how that works is like so:
intended confidence score for "YES" for trial 1 = 2.3
intended confidence score for "NO" for trial 1 = -.5
intended prediction = "YES", total confidence score = 2.8
displacement confidence score for "YES" for trial 1 = 1.3
displacement confidence score for "NO" for trial 1 = 3.4
displacement prediction = "NO", total confidence score = 2.1
combined prediction = greater of intended & displacement scores which was "intended" in this example, so the combined prediction and score would be "YES" for a total confidence score of 2.8
The combined z-score is then calculated like the others, if the trial was correct it would score a 1, incorrect, a -1 then we sum and divide by the square root of the count.
The combined z-score all all 2776 trials is a very statistically significant 2.8 standard deviations from chance expectation!
I would not consider the combination of using intended z score with displacement z scores as optimized and over-fitted because I had been measuring displacement that was obviously not at chance level earlier in the experiment. There are other methods of combining intended scores with displacement scores that could be highly optimized and possibly over-fitted. For example, we are now looking at optimizing unique user "profiles" consisting of a custom mix of displacement vs intended for each user specifically. These profiles would likely be over-fitted at this point, but there could be some value to this approach once we collect enough data to avoid over-fitting.
コメント