Coveo Experimentation Hub glossary
Coveo Experimentation Hub glossary
This is for:
DeveloperIn this article, we’ll introduce some terms and concepts that you’ll encounter when working with experiences and experience test results.
Quick links
A

A/B testing: Essentially, A/B testing is a method of experimenting with two versions of a website: a control and a variation. By randomly assigning visitors to either the control or a variation and then observing and analyzing their behavior, we can minimize most effects that might bias the data. We can then answer questions about which version performed better against defined goals such as Conversion Rate or clickthrough rate on a banner.

Average Order Value (AOV). Not the same as RPC, since RPC is cumulative for a visitor, whereas Average order value (AOV) is measured across single orders. AOV is important because it provides a valuable insight how much your customers are spending on your products.
B

Bad allocation / trafficsplit: Our statistical model continuously monitors the distribution of visitors across each variation. If the split goes outside a statistically realistic range, the test throws up an error message  this almost always signifies an issue with the data.
C

Chance of an uplift: This refers to the likelihood of an uplift occurring.

Confidence: The amount of uncertainty associated with an uplift estimate . It’s the chance that the confidence interval (margin of error around the estimate) will contain the true value that you’re trying to estimate. A higher confidence level requires a larger sample size.

Control: This is one of an experiment’s variations where no treatment is applied. In other words, we don’t display the banner we’re testing. By exposing visitors to a control, we have an effective means of testing your experiment. Visitors bucketed into your experience control will see your website, mobile platform, or mobile app without any changes. The control is used as a basis of comparison.

Conversions: The number of purchases where your property is transaction based in an iteration.

Conversion Rate (CR): The number of conversions divided by the total number of visitors. When referring specifically to the metric reported for an experience, CR refers to conversions amongst visitors from the moment they enter the experience until the moment they leave or the experience ends. When referring to segment metrics, CR refers to conversions amongst members of a segment that visited a site on a given day. CR is important because it tells you about how customers are engaging with your brand and interacting with your website or mobile app.
For a more indepth discussion of how Qubit calculates Conversion Rate, see What is Conversion Rate?.

Converters: A visitor who went on to convert. At Qubit when we talk about
Conversion Rate
, we’re talking aboutconverter rate
, that is, the rate of visitors that convert, rather than the rate of sessions that end in a conversion 
Customer Lifetime Value (CLV): Calculated by multiplying a customer’s Average Order Value (AOV) by the average purchase frequency rate, CLV predicts the value that can be attributed to the entire future relationship with a customer. CLV is important not only because it helps identify and segment the most loyal customers it also tells you how well you’re resonating with your customer base, how much your customers like your products or services, and what you’re doing right — as well as how you can improve. It can also help make decisions around how much to invest in your customers.
E

Experience: One of Qubit’s custom or programmatic experiences that are used to deliver changes to a site, mobile platform, or mobile app.

Experiment: A change delivered to a website, mobile platform, app, etc to make a discovery or test a hypothesis.

Experiment completion: An experiment is considered complete when the primary goal has reached statistical significance.
F

False positive: If the default winning threshold is 95%, it’s a given that 5% of the time we will be wrong. This is known as a false positive and is defined as an experiment that we thought was beneficial, but wasn’t, and may well be harmful even.
G

Goal: A means of determining how an experiment will be evaluated as a success. Each experiment will consist of a primary goal and secondary goals.
I

Iteration: A period of time during an A/B test where the test was stopped, changed, and restarted. If an experiment is stopped and restarted without change, the iteration stays the same.
Note
Changes that will cause a new iteration to be started include: changing the experience triggers, changing the targeted segments, changing traffic allocation, adding/deleting a variant. 
L

Live visitors: A live count of the number of visitors that are seeing the experience in real time on your site and the number of visitors that have been served the experience in the last 1 hour.
O

Outliers: In revenue testing, there are occasional customers who spend far more than the average. This is a problem for the revenue model because it infers the distribution of revenue from previous data, and can lead to skewed results.
Note
For each of your experiences you can mitigate the potential for outliers to skew results by ignoring outlier data. This will remove the top 0.1% of spenders from the sample size to prevent the data from outliers interfering with the statistical analysis of an experience. 
P

Pilot test: A pilot test is a trial run of an A/B test, where the power is set to 20% rather than 80%. It runs a lot faster than a normal test, and is generally used to check that a change doesn’t have a massive negative effect.

Power: The power of a test is essentially how good it is at detecting true uplift–given that a variation provides a real uplift, what’s the chance that it will win the test?
Note
We would like this to be as high as possible, but the tradeoff is that high powered tests require more data. At Qubit, for a standard test, we’ve set our parameters so that a variant that has an (actual) 5% uplift has an 80% chance of winning the test. 

Primary goal: All experiences will have a single primary goal. The default primary goal is Conversions, which you can change, if necessary. The default goal is important because it determines when an experiment is complete. An experiment is complete when the primary goal has reached statistical significance

Prior: Our prior belief on how we believe experiments are distributed for all tests. We use it in the stats model to temper the effect of random fluctuations, especially early on in tests. Without it, the first few thousand visitors would have wildly varying confidence intervals.
Note
Our prior belief also brings a reality check to extremely large uplifts—you may notice that the expected uplift isn’t the same as the raw uplift. We’re essentially saying that while we believe there’s an uplift, we think it likely that some of the uplift came from random fluctuation 

Probability: Probability is a measure of credibility and confidence. In datadriven decision making, probability reflects the decisionmaking strength in the data. When talking about experiences, and specifically when comparing results between a control and variation in an A/B test, we use probability to indicate how much evidence there is that an observed change is due to the experience itself rather than something else.
R

Revenue Per Converter (RPC): Revenue divided by converters. When referring specifically to the metric reported for an experience, RPC refers to revenue from the moment the visitor enters the experience until the moment the visitor leaves or the experience ends. When referring to segment metrics, RPC refers to revenue whilst a member of that segment who visited the site on a given day.

Revenue Per Visitor (RPV): Revenue Per Converter multiplied by Conversion Rate. When referring specifically to the metric reported for an experience, RPV refers to revenue from the moment the visitor enters the experience until the moment the visitor leaves or the experience ends. When referring to segment metrics, RPV refers to revenue whilst a member of that segment who visited the site on a given day.
S

Sample size: An amount of data that we require to get a statistically significant result. Can be reduced by changing the winning threshold.
Other things being equal, reducing the sample size decreases the amount of time it takes to get a result in an experience. The tradeoff is that it also reduces the confidence we have in the result. As our sample size decreases, the confidence in our estimates of uplift also decreases. 

Secondary goal: In addition to a primary goal, each experience can have up to 4 additional goals. These ancillary or secondary goals are used in A/B testing to compare experiment variations, but aren’t used to define whether the experiment is complete or not

Significance: A test result is statistically significant if it’s deemed unlikely to have occurred by statistical error alone.
Note
Because we use a Bayesian model, the uplift probability isn’t a significance in the Wikipedia sense, but we call it that anyway. Uplift probability is strictly the probability that the impact of a test is positive under the prior. 
+ At Qubit we take the significance to be 95% (the industry standard). This means that for a test to be a winner, we have to determine that the uplift is more than 0 with probability at least 95% * Statistical significance: The point in the lifetime of an experiment when we have collected enough data to be certain that the observed change in uplift is due to the experience being shown to visitors and not some unknown factor
T

Total converters: This is the cumulative number of visitors who viewed the experience and subsequently converted, across all iterations, either in the last 24 hours or since the experience was launched.
Note
The metric is derived from the Qubit statistical model and is updated periodically throughout the day. The live count is updated in real time. 

Total visitors: This represents the cumulative number of visitors who have viewed the experience, across all iterations, either in the last 24 hours or since the experience was launched.
Note
The metric is derived from Qubit’s statistical model and is updated periodically throughout the day. The live count is updated in real time. 

Traffic allocation: The proportion of visitors that are put into each variation.
Note
Typically, we run 50/50 experiments, meaning that half of visitors should see the control, and half of visitors should see the treatment. Other common splits are 80/20 and 95/5 but you can also define a custom allocation. 
Note
When testing multiple variants, we give each variant and even split, so for 3 variants the split is 33/33/33. 

Treatment/treatment variant: The treatment (terminology stolen from medicine) is simply the thing that we change on a site, mobile platform, app, etc. So if we want to test adding a welcome message to a website, the treatment is simply the act of displaying a welcome message to a visitor. The variation is the variant of our experiment in which we apply the treatment.
U

Uplift: An observed change for a given metric between a variation and the control. A 0% uplift means there’s no difference between the variation and the control, and a negative uplift (or downlift) means that the control actually did better than a variation.
Note
At Qubit we express this change as a percentage. So if the control has a Conversion Rate of 4%, and the variation has a Conversion Rate of 5%, we would say this is a 25% percent uplift, since 5 is 125% of 4. 
V

Variation: One of an experiment’s variations, with an applied treatment. Visitors bucketed into your experiences variation will see your website, mobile platform, or mobile app, with the changes delivered in your experience.
W

Winning threshold: The default winning threshold for all Qubit Experiences is 95%. This is the standard in webanalytics and denotes our confidence that the observed change in uplift for a given metric isn’t due to some unknown or random factor.
Leadingpractice
By lowering the threshold, you’ll reduce the required sample size and therefore the time it takes for the experience to complete and get a result. Therefore, this is often seen as an acceptable method of getting results more quickly. 