Estimated uplift

This is for:

Developer

In this article

Bayesian prior
Qubit’s approach
Changes in traffic allocation
Summary

Qubit’s data model is designed to handle a variety of complications that may occur in A/B testing, for example, changes in Conversion Rate, changes in variation, or changes in audience split.

Qubit uses statistical methods, such as Bayesian prior to make use of all the information from each iteration of the test. Below is an explanation of how complications can occur, and how the Qubit model deals with them.

Bayesian prior

Situation

A/B testing is dependent on accruing a required sample size of visitors to establish ‘Statistical Power’. This determines if enough data has been collected to draw conclusions from the data. Whilst this is not the case, we will not show results against an experience’s goals. However we recognize that clients often prefer to review results before significance has been obtained just, to see if the test is having an effect.

Complication

Early in the testing process, there is insufficient data to make conclusions. The low volume of data at this stage, tends to lead to more volatile results, where a test can go through large fluctuations.

In A/B testing, it is important to only draw conclusions when sufficient data has been collected, as it is more indicative of the likely performance of the variation in the longer term.

If conclusions are drawn too early, then false results can be obtained. At best, this can result in a waste of development time, and at worst, have a negative impact on conversions. In other words, we should hold the assumption that the effect of an A/B test will be minimal until we have data to prove otherwise.

Example:

If at the start of your A/B test (50/50 split), you observe 20 conversions in your control, and 40 in your variant, that would result in a 100% Conversion Rate uplift. As it is so early in the test process and insufficient data has been collected, this was probably a statistical fluctuation and even if there’s an uplift, it’s probably much less than 100%.

Experiment arm	Iteration 1	Uplift
Control	20 / 100 = 20%
Variation	40 / 100 = 40%	1

Experiment arm

Iteration 1

Uplift

Control

20 / 100 = 20%

Variation

40 / 100 = 40%

Qubit’s approach

Qubit’s testing platform applies something called a Bayesian prior to your test results to calculate a final uplift figure. A Bayesian prior sounds complicated but in this case is basically equivalent to saying, it looks like this happened but I know that’s not very likely so I’d like to check a bit more.

Qubit’s testing platform handles these mental gymnastics for you in a rigorous mathematical fashion. We take our extensive experience conducting thousands of A/B tests and apply a Bayesian prior, to inform you exactly how likely any given uplift is. The uplift displayed to you is therefore a combination of the measured conversions and this prior belief.

The main effect of this is early on in your tests, when there isn’t much data to go on, where we tend to be skeptical of any large uplift. So you may see situations where there are a lot more conversions in your variant but we report only a small effect. This prevents misinterpretation of a statistical fluctuation, which could lead to you implementing something on your website which may have no effect or even worse, a negative effect.

Of course as your test accumulates more data, you can be more certain of the results and gradually the impact of these prior beliefs should disappear. Our model handles this for you too. If a test keeps reporting a 100% uplift, our model will tell you exactly when you should start believing it.

Changes in traffic allocation

Situation

One common use case, when running a personalization as an A/B test, is to prove that the personalization is working prior to making the content available for all visitors. Tests are therefore run as a 50/50 split test to determine success. When a positive result is received, we are able to change the traffic allocated to the successful test to 95%, thus always ensuring a control group in case the results turn south.

In our reporting interface, we have markers to reflect the changes made to the experiment. Changing the traffic allocation of an experiment is one example of what might result in us showing a marker. At Qubit, we call these different phases of an experiment, iterations.

Complication

Iterations are very important and enable us to see differences over the course of the experiment. Especially when traffic allocation is changed as part of the experiment, iterations can have a dramatic and unexpected effect on the result of a basic A/B test.

To take a simple example:

Experiment arm	Iteration 1	Iteration 2
Control	10 / 1000 =1%	10 / 100 =10%
Variation	10 / 1000 =1%	190 / 1900 =10%

Experiment arm

Iteration 1

Iteration 2

Control

10 / 1000 =1%

10 / 100 =10%

Variation

10 / 1000 =1%

190 / 1900 =10%

Each variation has exactly the same performance in each iteration, but the performance changes from iteration 1 to 2, perhaps a result of seasonality. The only difference we see, is that in Iteration 2, we have changed the traffic allocation from 50 / 50 to 95 / 5 in favor of the variation.

If we were to simply report this data in the reporting interface the results would be as follows:

Experiment arm	Results	Estimated uplift
Control	20 / 1,100 =1.8%
Variation	200 / 2,900 =6.9%	2.79

Experiment arm

Results

Estimated uplift

Control

20 / 1,100 =1.8%

Variation

200 / 2,900 =6.9%

2.79

That is, we could be reporting a 279% increase in conversion uplift by doing nothing more than changing traffic allocation, this would be a gross error.

Qubit’s approach

At Qubit, we treat each iteration independently. Our stats engine looks at the performance of the test, at each iteration, to generate an output that fits the overall experiment, whilst avoiding bias as a result of changes in the performance of the test over time.

We calculate an estimated uplift using the information for each iteration. The confidence score that our stats engine reports, is the level of confidence that the test is having an impact on the success metric, with this complication already factored in.

We look to assess the uplift in each iteration, weighing its impact on the overall uplift by the volume of traffic, and always in the wider context of the overall experiment.

Experiment arm	Iteration 1 Conversion Rate	Iteration 2 conversion Rate	Estimated Uplift
Control	1%	10%
Variation	1%	10%	0%

Summary

The Qubit estimated uplift handles a variety of complications that can occur in A/B testing. It uses the information from all iterations of a test effectively to avoid bias and combines them with a Bayesian prior to report the uplift that you see in the Qubit platform.