 This is for:

Developer

Increasing the number of goals gives you more information about an experience and might help you build the numbers into a narrative about the success of a campaign or a hypothesis around visitor behavior. However, the increased complexity raises the probability that at least one part of this narrative is incorrect.

There are never any certainties in a statistical analysis. When the attribution model indicates that the probability of an uplift is larger than 95%, a goal is declared a winner. Even in this situation, there’s also a 5% probability that it’s losing.

This isn’t only true at the experiment level, it’s also true at the level of individual goals. If you have, say, four goals for an experience, and two are winning, then the probability that at least one of them is losing is roughly doubled. More precisely, instead of 5% it is:

``p = 1 - (1 - 5%)^2 = 9.75%``

With five winning goals, the probability that at least one of them isn’t a winner is:

``p = 1 - (1 - 5%)^5 = 22.6%`` Increasing the number of goals in an experience therefore increases the odds of committing what is known in statistics as a Type I error—incorrectly identifying any goal as a winner; a false positive—when interpreting an experience’s results.

Similarly, once a goal has reached a large enough sample size that the model indicates the probability of success is low enough to end the test, there’s still a non-zero probability that an uplift exists but was not detected. Coveo Experimentation Hub forecasts this probability of success, sometimes called statistical power. For more information see also Using power forecasts to gauge experiment success.

For example, if you ran a test to a large sample size and then decided to end the experiment when it measures a 1% uplift with a 10% probability of success (which is a statistically insignificant, non-winning result), then you’d miss out on a true 1% uplift 10% of the time, and you would miss smaller uplifts even more often!

Once again, if you have then four goals in an experience, all of which have a large sample size, but two of which aren’t winning due to statistical fluctuations (with non-discovery probabilities of 10%), then the probability you’re committing a Type II error-incorrectly identifying any goals as a loser; a false negative—when interpreting an experience’s results is roughly doubled:

``p = 1 - (1 - 10%)^2 = 19%``

With five such non-winning goals, the probability that at least one of them is actually a winning experience that needs more time to be discovered is:

``p = 1 - (1 - 10%)^5 = 41%`` Increasing the number of goals in an experience therefore increases the odds of committing a Type II error-incorrectly failing to identify a winner; a false negative-when interpreting an experience’s results.

The odds of committing either a Type I or a Type II error in this situation (two goals winning, two goals not winning but with a 10% probability of success) is then roughly the sum of both error probabilities, ~27%.

Although the above is true for groups of multiple goals, the rates per goal of a false discovery or a false non-discovery do not increase with the number of goals or experiences. Those stay fixed at 5%, and at the value indicated by the power meter attached to the goal, respectively.