Coveo Experimentation Hub glossary
Coveo Experimentation Hub glossary
This is for:
DeveloperIn this article, we’ll introduce some terms and concepts that you will encounter when working with experiences and experience test results.
Quick Links
A

A/B testing  On a simple level, A/B testing is a method of experimenting with two versions of a website, a control and a variation. By observing and analyzing the behavior of visitors that are randomly bucketed into either the control or a variation, we avoid most effects that might bias the data. We can then answer questions about which version performed better against defined goals such as Conversion Rate or clickthrough rate on a banner

Average Order Value (AOV). Not the same as RPC, since RPC is cumulative for a visitor, whereas Average order value (AOV) is measured across single orders. AOV is important because it provides a valuable insight how much your customers are spending on your products
B

Bad allocation / trafficsplit  Our stats model continuously monitors how many visitors are going into each variation. If the split goes outside a statistically realistic range, the test throws up an error message  this almost always signifies an issue with the data
C

Chance of an uplift  The probability of uplift

Confidence  The amount of uncertainty associated with an uplift estimate . It is the chance that the confidence interval (margin of error around the estimate) will contain the true value that you are trying to estimate. A higher confidence level requires a larger sample size

Control  One of an experiment’s variations, where no treatment is applied, i.e. we don’t show the banner we are testing. By exposing visitors to a control, we have an effective means of testing your experiment. Visitors bucketed into your experience control will see your website, mobile platform, or mobile app without any changes. The control is used as a basis of comparison

Conversions  The number of purchases where your property is transaction based in an iteration

Conversion Rate (CR)  The number of conversions divided by the total number of visitors. When referring specifically to the metric reported for an experience, CR refers to conversions amongst visitors from the moment they enter the experience until the moment they leave or the experience ends. When referring to segment metrics, CR refers to conversions amongst members of a segment that visited a site on a given day. CR is important because it tells you about how customers are engaging with your brand and interacting with your website or mobile app
For a more indepth discussion of how Qubit calculates Conversion Rate, see What is Conversion Rate?

Converters  A visitor who went on to convert. At Qubit when we talk about
Conversion Rate
, we are talking aboutconverter rate
i.e. the rate of visitors that convert, rather than the rate of sessions that end in a conversion 
Customer Lifetime Value (CLV)  Calculated by multiplying a customer’s Average Order Value (AOV) by the average purchase frequency rate, CLV predicts the value that can be attributed to the entire future relationship with a customer. CLV is important not only because it helps identify and segment the most loyal customers it also tells you how well you’re resonating with your customer base, how much your customers like your products or services, and what you’re doing right — as well as how you can improve. It can also help make decisions around how much to invest in your customers
E

Experience  One of Qubit’s custom or programmatic experiences that are used to deliver changes to a website, mobile platform, or mobile app

Experiment  A change delivered to a website, mobile platform, app, etc to make a discovery or test a hypothesis

Experiment completion  An experiment is considered complete when the primary goal has reached statistical significance
F

False positive  If the default winning threshold is 95%, it is a given that 5% of the time we will be wrong. This is known as a false positive and is defined as an experiment that we thought was beneficial, but wasn’t, and may well be harmful even
G

Goal  A means of determining how an experiment will be evaluated as a success. Each experiment will consist of a primary goal and secondary goals
I

Iteration  A period of time during an A/B test where the test was stopped, changed, and restarted. If an experiment is stopped and restarted without change, the iteration stays the same.
Note
Changes that will cause a new iteration to be started include: changing the experience triggers, changing the targeted segments, changing traffic allocation, adding/deleting a variant 
L

Live visitors  A live count of the number of visitors that are seeing the experience in real time on your site and the number of visitors that have been served the experience in the last 1 hour
O

Outliers  In revenue testing, there are occasional customers who spend far more than the average. This is a problem for the revenue model because it infers the distribution of revenue from previous data, and can lead to skewed results
Note
For each of your experiences you can mitigate the potential for outliers to skew results by ignoring outlier data. This will remove the top 0.1% of spenders from the sample size to prevent the data from outliers interfering with the statistical analysis of an experience 
P

Pilot test  A pilot test is a trial run of an A/B test, where the power is set to 20% rather than 80%. It runs a lot faster than a normal test, and is generally used to check that a change does not have a massive negative effect

Power  The power of a test is essentially how good it is at detecting true uplift–given that a variation provides a real uplift, what is the chance that it will win the test?
Note
We would like this to be as high as possible, but the tradeoff is that high powered tests require more data. At Qubit, for a standard test, we have set our parameters so that a variant that has an (actual) 5% uplift has an 80% chance of winning the test. 

Primary goal  All experiences will have a single primary goal. The default primary goal is Conversions, which you can change, if necessary. The default goal is important because it determines when an experiment is complete. An experiment is complete when the primary goal has reached statistical significance

Prior  Our prior belief on how we believe experiments are distributed for all tests. We use it in the stats model to temper the effect of random fluctuations, especially early on in tests. Without it, the first few thousand visitors would have wildly varying confidence intervals
Note
Our prior belief also brings a reality check to extremely large uplifts—you may notice that the expected uplift is not the same as the raw uplift. We are essentially saying that while we believe there is an uplift, we think it likely that some of the uplift came from random fluctuation 

Probability  Probability is a measure of credibility and confidence. In datadriven decision making, probability reflects the decisionmaking strength in the data. When talking about experiences, and specifically when comparing results between a control and variation in an A/B test, we use probability to indicate how much evidence there is that an observed change is due to the experience itself rather than something else
R

Revenue Per Converter (RPC)  Revenue divided by converters. When referring specifically to the metric reported for an experience, RPC refers to revenue from the moment the visitor enters the experience until the moment the visitor leaves or the experience ends. When referring to segment metrics, RPC refers to revenue whilst a member of that segment who visited the site on a given day

Revenue Per Visitor (RPV)  Revenue Per Converter multiplied by Conversion Rate. When referring specifically to the metric reported for an experience, RPV refers to revenue from the moment the visitor enters the experience until the moment the visitor leaves or the experience ends. When referring to segment metrics, RPV refers to revenue whilst a member of that segment who visited the site on a given day
S

Sample size  An amount of data that we require to get a statistically significant result. Can be reduced by changing the winning threshold
Other things being equal, reducing the sample size decreases the amount of time it takes to get a result in an experience. The tradeoff is that it also reduces the confidence we have in the result. As our sample size decreases, the confidence in our estimates of uplift also decreases 

Secondary goal  In addition to a primary goal, each experience can have up to 4 additional goals. These ancillary or secondary goals are used in A/B testing to compare experiment variations, but are not used to define whether the experiment is complete or not

Significance  A test result is statistically significant if it is deemed unlikely to have occurred by statistical error alone
Note
Because we use a Bayesian model, the uplift probability is not a significance in the wikipedia sense, but we call it that anyway. Uplift probability is strictly the probability that the impact of a test is positive under the prior. 
+ At Qubit we take the significance to be 95% (the industry standard). This means that for a test to be a winner, we have to determine that the uplift is more than 0 with probability at least 95% * Statistical significance  The point in the lifetime of an experiment when we have collected enough data to be certain that the observed change in uplift is due to the experience being shown to visitors and not some unknown factor
T

Total converters  Total number of visitors that saw the experience, either the control or one of the other variations, and converted
Note
The metric is derived from the Qubit statistical model and is updated periodically throughout the day. The live count is updated in real time. 

Total visitors  Total number of visitors to your site that saw one of the experience variations, either the control or one of the other variations
Note
The metric is derived from Qubit’s statistical model and is updated periodically throughout the day. The live count is updated in real time 

Traffic allocation  The proportion of visitors that are put into each variation
Note
Typically, we run 50/50 experiments, meaning that half of visitors should see the control, and half of visitors should see the treatment. Other common splits are 80/20 and 95/5 but you can also define a custom allocation 
Note
When testing multiple variants, we give each variant and even split, so for 3 variants the split is 33/33/33 

Treatment/treatment variant  The treatment (terminology stolen from medicine) is simply the thing that we change on a website, mobile platform, app, etc. So if we want to test adding a welcome message to a website, the treatment is simply the act of displaying a welcome message to a visitor. The variation is the variant of our experiment in which we apply the treatment
U

Uplift  An observed change for a given metric between a variation and the control. A 0% uplift means there is no difference between the variation and the control, and a negative uplift (or downlift) means that the control actually did better than a variation
Note
At Qubit we express this change as a percentage. So if the control has a Conversion Rate of 4%, and the variation has a Conversion Rate of 5%, we would say this is a 25% percent uplift, since 5 is 125% of 4 
V

Variation  One of an experiment’s variations, with an applied treatment. Visitors bucketed into your experiences variation will see your website, mobile platform, or mobile app, with the changes delivered in your experience
W

Winning threshold  The default winning threshold for all Qubit Experiences is 95%. This is the standard in webanalytics and denotes our confidence that the observed change in uplift for a given metric is not due to some unknown or random factor
Leadingpractice
By lowering the threshold, you will reduce the required sample size and therefore the time it takes for the experience to complete and get a result. Doing so is therefore often seen as an acceptable method of getting results quicker 