Add or edit a Web source
Add or edit a Web source
Members with the required privileges can use a Web source to make the content of a website searchable.
The Web source crawler behaves similarly to bots of web search engines such as Google. The source only needs a starting URL and then automatically discovers all the pages of the site following the site navigation and hyperlinks appearing on the pages. Consequently, only pages that are discovered are indexed, and in the order they’re discovered.
Source key characteristics
Features | Supported | Additional information | |
---|---|---|---|
Indexable content |
Web pages (complete) |
||
Basic authentication |
|||
Form authentication |
|||
A variety of basic and advanced rules may be used to ignore the web pages you don’t want to index. |
|||
The Web source automatically collects some metadata from your content. |
|||
Exclude irrelevant sections in pages, extract custom metadata, and generate sub-items. |
|||
Available at an extra charge. Contact Coveo Sales to add this feature to your Coveo organization license. |
|||
The crawler can run JavaScript on a web page to dynamically render content before indexing the page. |
|||
Some lesser-known |
Default metadata
The Web source automatically extracts metadata from your content, both at the crawling stage and at the processing stage (also known as the converter stage).
Some default metadata is automatically mapped to a default field. In this case, the metadata values are actually indexed in the items and you can immediately use the metadata in a search interface by referencing the corresponding field.
Some default metadata is unmapped (i.e., not indexed). In this case, you need to create a field and mapping before the metadata can be used in a search interface.
The table below lists metadata collected by the Web source at the crawling stage.
Metadata | Description |
---|---|
|
The Example
The HTML of a page contains the following: |
HTTP call Request Headers |
All request headers are extracted in separate metadata with a |
All request headers are also extracted in a single JSON metadata (i.e., |
|
HTTP call Response Headers |
All response headers are extracted in separate metadata with a |
All response headers are also extracted in a single JSON metadata (i.e., |
|
To view all mapped (i.e., indexed) and unmapped (i.e., not indexed) metadata collected automatically by the Web source, use the View metadata feature after a rebuild. ![]() You can inspect field values of your indexed items in the Content Browser. |
Limitations
-
Only pages reachable through website page hyperlinks are indexed. For example, the Web source crawler doesn’t follow options in a
<select>
tag. -
Refresh isn’t available. A daily rescan is defined, but not enabled by default. You can enable this daily rescan on a per-source basis.
-
Pausing and resuming source updates isn’t supported. Therefore, Web source operations can’t be paused on error.
-
Multi-factor authentication (MFA) and CAPTCHA aren’t supported.
-
Indexing page permissions isn’t supported.
-
Content in pop-up windows and page elements requiring interaction aren’t indexed.
-
Although, in the source JSON configuration, the
MaxPageSizeInBytes
is set to0
(unlimited size) by default, the Coveo indexing pipeline can handle web pages up to 512 MB only. Larger pages are indexed by reference (i.e., their content is ignored by the Coveo crawler, and only their metadata and path are searchable). Therefore, no search result Quick View is available for these larger items. -
When the Execute JavaScript on pages option is enabled:
-
The Web source doesn’t support sending
AdditionalHeaders
-
The Web source doesn’t support the
UseProxy
parameter. -
Basic authentication isn’t supported.
-
-
The Coveo crawler doesn’t find links in the shadow DOM.
-
When indexing content with the Crawling Module, ensure not to change space character encoding in your items' URIs, as Coveo uses these URIs to distinguish items.
For example, an item whose URI would change from
example.com/my first item
toexample.com/my%20first%20item
wouldn’t be recognized as the same by Coveo. As a result, it would be indexed twice, and the older version wouldn’t be deleted.Item URIs are displayed in the Content Browser (platform-ca | platform-eu | platform-au). We recommend you check where these URIs come from before making changes that affect space character encoding. Depending on your source type, the URI may be an item’s URL, or it may be built out of pieces of metadata by your source mapping rules. For example, your item URIs may consist in the main site URL, plus the item filename, due to a mapping rule such as
example.com/%[filename]
. In such a case, changing space encoding in the item filename could impact the URI.
Leading practices
-
Favor using a Sitemap source when the website features a sitemap file.
-
When a connector exists for the technology powering the website, rather create a source based on that connector, as it will typically index content, metadata, and permissions more effectively.
ExampleYou want to make an Atlassian Confluence-powered site content searchable. Create a Confluence source, not a Web source.
-
It’s best to create or edit your source in your sandbox organization first. Once you have confirmed that it indexes the desired content, you can copy your source configuration to your production organization, either with a snapshot or manually. See About non-production organizations for more information and best practices regarding sandbox organizations.
-
Always try authenticating without a custom login sequence first. You should only start working on a custom login sequence when you’re sure your form authentication details (i.e., login address, user credentials, validation method) are accurate and that the standard form authentication process doesn’t work.
-
Always review the Activity Browser (platform-ca | platform-eu | platform-au) page for the full context around an abnormal indexing activity. See the Troubleshooting article for help resolving indexing issues.
-
Ensure that you have the right to crawl the public content in the event where you aren’t the owner of the website. Crawling websites that you don’t own nor have the right to crawl could create reachability issues.
Furthermore, certain websites may use security mechanisms that can impact Coveo’s ability to retrieve the content. If you’re unfamiliar with these mechanisms, we recommend investigating and learning about them beforehand. For example, one impact this type of software (e.g., Akamai, Cloudflare) can have is detecting the Coveo crawler as an attack and blocking it from any further crawling.
-
The number of items that a source processes per hour (crawling speed) depends on various factors, such as network bandwidth and source configuration. See About crawling speed for information on what can impact crawling speed, as well as possible solutions.
-
Leverage the Time the crawler waits between requests to your server parameter to increase the crawling speed for the sites you own. Contact the Coveo Support team for help if needed.
-
Schedule rescan operations following the rate at which your source content changes.
-
If you want to index only one or a few specific pages of a site such as for a test, enter the pages to index as Starting URLs. Then, set the Number of page levels to crawl from a starting URL parameter value to
0
, instructing the crawler to only index the specified pages, and none of their linked pages. -
Though it’s possible to add external URLs to a Web source outside the main Add source / Edit source user interfaces (such as through the source API or the JSON configuration), doing so is a bad practice. Always create one source per website. This helps:
-
Reduce the number and complexity of crawling and scraping rules.
-
Optimize source configurations for each website.
-
Avoid having a rebuild/rescan issue on one website cause the deletion of indexed items associated with the other websites.
-
-
You can index pages that are only referenced in excluded pages by setting the
ExpandBeforeFiltering
parameter totrue
in theparameters
section of the source JSON configuration. This way, even if your Starting URLs are excluded by your filters, pages referenced in the Starting URLs pages are retrieved before the filtering is applied.NoteSetting the
ExpandBeforeFiltering
parameter totrue
can significantly reduce the crawling speed since the crawler retrieves many pages that can be rejected in the end. -
Group your source and the other implementation resources together in a project. See Manage projects.
Add or edit a Web source
To add a Web source
-
On the Sources (platform-ca | platform-eu | platform-au) page, click Add source.
-
In the Add a source of content panel, click the Cloud (
) or Crawling Module (
) tile, depending on your content retrieval context.
Crawling Module Web source creation requirements-
Make sure you meet all Crawling Module requirements.
-
Make sure you deploy the Crawling Module on your server before creating the Web source.
-
-
In the Add a new Web source / Add a new Crawling Module Web source panel, enter the source Name and Starting URL.
Name: Use a short and descriptive name, using only letters, numbers, hyphens (-), and underscores (_). The source name can’t be modified once it’s saved.
Starting URL: The URL of a website page from which the crawler starts discovering and following links found in pages, including:
-
The protocol (e.g.,
http
,https
) -
The subdomain, if applicable (e.g., the
www
subdomain)
Examples of valid starting URLs-
https://www.coveo.com
-
https://docs.coveo.com/en
With the cloud Web source, as soon as you have typed the website domain, the source looks for sitemap files in standard website locations. If sitemap files are found, they’re displayed and you’re prompted to switch to the Coveo Sitemap source. Switching to the Sitemap source is recommended.
-
-
If you’re creating a Crawling Module Web source, in the Crawling Module dropdown menu, select the installed Crawling Module instance.
-
Continue configurations as a Sitemap source or Web source.
-
If available, click Switch to a Sitemap source and continue configuring your Sitemap source with the autodetected sitemap URLs.
OR
-
To continue configuring your Web source
-
Click Next.
-
Select who has permission to access the content through the search interface and click Add source.
NoteThis information is editable later in the Content security tab.
-
-
To edit a Web source
-
On the Sources (platform-ca | platform-eu | platform-au) page, click the desired source.
-
Click Edit in the Action bar.
The "Crawling rules" tab
The Crawling rules tab lets you define the pages that the crawler should consider when indexing.
Starting URLs
The Starting URL you entered when creating the Web source is automatically added to the Starting URLs list. Add other starting URLs in the same domain to ensure that orphan pages and isolated sections of your website are crawled and indexed.
Exclusions and inclusions
Add exclusion and inclusion rules to crawl and index only specific pages.

The following diagram illustrates how the exclusion and inclusion rules are applied.
|
|

|
About the Include all pages that were not excluded option
The Include all pages that were not excluded option automatically adds an all-inclusive inclusion rule in the background.
This ensures that all starting URLs meet the |
You can use any of the six types of rules:
-
is and a URL that includes the protocol (for example,
https://myfood.com/
). -
contains and a string found in the URL (for example,
recipes
). -
begins with and a string found at the beginning of the URL and which includes the protocol (for example,
https://myfood
). -
ends with and a string found at the end of the URL (for example,
.pdf
). -
matches wilcard rule and a wildcard expression that matches the whole URL (for example,
https://myfood.com/recipes*
). -
matches regex rule and a regex rule that matches the whole URL
(for example,^.*(company-(dev|staging)).*html.?$
).When using regex rules, make sure they match the desired URLs with a testing tool such as Regex101.
The "Web scraping" tab
The Web scraping tab lists and lets you manage web scraping configurations for your source.
When the crawler is about to index a page, it checks whether it must apply a web scraping configuration that you have defined. The crawler considers the Pages to target rules of each of your web scraping configurations, starting with the configuration at the top of your list. Only the first matching web scraping configuration is applied to the page.
|
Note
When no web scraping configuration is defined:
|
The Web source features two modes to manage web scraping configurations: UI assisted mode and Edit with JSON mode.
UI assisted mode

The Web source lets you add (1), edit (2), and delete (3) one web scraping configuration at a time through a user interface that makes many technical aspects transparent. UI assisted mode is easier to use and more mistake-proof than Edit with JSON mode.
Use this mode except for sub-item related configurations (which are only supported in Edit with JSON mode).
When you add or edit a web scraping configuration using UI assisted mode, the Add/Edit a web scraping configuration panel is displayed. See Configurations in UI assisted mode for more details.
Edit with JSON mode
The Edit with JSON button gives access to the aggregated web scraping JSON configuration of the source. Adding, editing, and deleting configurations directly in the JSON requires more technical skills than using UI assisted mode.


Use this mode to perform sub-item related configurations and when you want to test your aggregated web scraping configuration with the Coveo Labs Web Scraper Helper.
|
Note
The Web scraping tab displays a message when the aggregated web scraping configuration contains a sub-item related configuration. ![]() |
When you add or edit a web scraping configuration in Edit with JSON mode, the Edit a web scraping JSON configuration panel is displayed. See Configurations in Edit with JSON mode for more details.
The "Advanced settings" tab
The Advanced settings tab lets you customize the Coveo crawler behavior. All advanced settings have default values which are adequate in most use cases.
Content and images
If you want Coveo to extract text from image files or PDF files containing images, enable the appropriate option.
The extracted text is processed as item data, meaning that it’s fully searchable and will appear in the item Quick View. See Enable optical character recognition for details on this feature.
Execute JavaScript on pages
Only enable this option when some website content you want to consider for indexing is dynamically rendered by JavaScript. Enabling this option may significantly increase the time needed to crawl pages.
When Execute JavaScript on pages is enabled, specify the Time the crawler waits before considering a page as rendered.
When you set this value to 0
(default), the crawler doesn’t wait after the page is loaded.
If the JavaScript takes longer to execute than normal or makes asynchronous calls, consider increasing this value to ensure that the pages with the longest rendering time are indexed with all the dynamically rendered content.
Query parameters to ignore
Add query string parameters that the source should ignore when determining whether a URL corresponds to a distinct item.
By default, the source considers the whole URL to determine whether the page is a distinct item. The URLs of the website you index can contain one or more query string parameters after the host name and the path. Some query string parameters may change the content of the page significantly, and therefore legitimately contribute to a distinct page. Other query string parameters may not affect the content of the page, or very little. In the latter case, you want to ignore the query string parameter to avoid creating search result duplicates.
The URL of a website page for which you get search result duplicates looks as follows:
http://www.mysite.com/v1/getitdone.html?lang=en¶m1=abc¶m2=123
The values of param1
and param2
can change without affecting the page content while the lang
value changes the language in which the page appears.
You want to ignore the param1
and param2
query string parameters to eliminate search result duplicates, but not lang
.
In this example, you would therefore add the param1
and param2
parameters.
|
Note
Wildcards or ReGex aren’t supported in query string parameter names.
For instance, in the example above, you can’t cover both the |
Directives overrides
robots.txt
Check this box if you want the Coveo crawler to ignore directives specified in the website’s robots.txt
file.
noindex
Check this box if you want the Coveo crawler to index pages that have a noindex
directive in their meta
tag or in their X-Robots-Tag
HTTP response header.
nofollow links
Check this box if you want the Coveo crawler to follow links in pages that have a nofollow
directive in their meta
tag or in their X-Robots-Tag
HTTP response header.
nofollow anchors
Check this box if you want the Coveo crawler to follow links that have a rel="nofollow"
attribute.
Crawl limits
Number of page levels to crawl from a starting URL
Indicate the number of page link levels (or clicks) the crawler can travel from any starting URL. A starting URL is level 0. All pages accessible from a starting URL are considered level 1.
Time the crawler waits between requests to your server
Indicate the number of milliseconds between consecutive HTTP requests to the website server. The default value is 1000 milliseconds, which represents a crawling rate of one page per second.
One page per second is the highest rate at which Coveo can crawl a public website for a cloud Web source without proof of ownership of the website. You can enter a number below 1000. However, the Coveo crawler will only apply a crawling delay below 1000 milliseconds if it can verify that you’re the owner of the site.
If you’re retrieving content of an internal website using the Crawling Module Web source, the crawling delay you specify applies automatically. You don’t need to prove site ownership as Coveo detects that the crawled site has a private IP address.
The "Authentication" tab
The Authentication tab contains settings used by the source crawler to emulate the behavior of a user authenticating to access restricted website content. If authentication is required, select the authentication type your website uses, whether Basic authentication or Form authentication. Then, provide the corresponding login details.
Basic authentication
When selecting Basic authentication, enter the credentials of an account on the website you’re making searchable. See Source credentials leading practices.
|
When the Coveo crawler follows links requiring basic authentication while indexing your website, it only uses the basic authentication credentials you entered if the link URL matches the scheme and domain of the Starting URLs. If this condition isn’t met, the Coveo crawler doesn’t try to authenticate. For example, if your starting URL is
|
Form authentication
The Web source lets you choose between two form authentication workflows:
Without Force authentication enabled (recommended):
This workflow typically goes as follows:
-
The crawler requests a protected page.
-
The web server redirects the crawler to the Login address.
-
Using the configured Validation method, the crawler determines it’s not authenticated. This automatically triggers the next step.
-
The crawler performs a standard login sequence using the provided Login details, or the Custom login sequence if one is configured.
-
After successful authentication, the web server responds by redirecting back to the requested protected page and returning cookies.
-
The crawler follows the server redirect to get the protected page and indexes that page.
-
The crawler requests other pages it discovers using the cookies.
This is the default and recommended workflow as it emulates human behavior the best and ensures crawler re-authentication, when needed.
With Force authentication enabled:
This workflow typically goes as follows:
-
The crawler performs a standard login sequence using the provided Login details, or the Custom login sequence if one is configured.
-
After successful authentication, the web server responds with cookies that the crawler will use to request other pages.
-
The crawler requests the first Starting URL from the web server using the cookies and indexes that page.
-
The crawler requests other pages it discovers using the cookies.
If the crawler loses authentication at some point (e.g., if a cookie expires), it has no way of knowing it must re-authenticate unless you have a proper authentication status validation method. As a result, you may notice at some point that your source has indexed some, but not all, protected pages.
Only use Force authentication when no reliable authentication status validation method can be configured.
Username and password
Enter the credentials required to access the secured content. See Source credentials leading practices.
Login address
Enter the URL of the website login page where the username and password are to be used.
Loading delay
Enter the maximum time the crawler should allow for JavaScript to execute and go through the login sequence before timing out.
Validation method
The crawler uses the validation method after requesting a page from the web server to know if it’s authenticated or not. When the validation method reveals that the crawler isn’t authenticated, the crawler immediately tries to re-authenticate.
To configure the validation method
-
In the dropdown menu, select your preferred authentication status validation method.
-
In the Value(s) field, specify the corresponding URL, regex or text.
-
For Redirection to URL:
Enter the URL where users trying to access protected content on the website are redirected to when they’re not authenticated. If the crawler is redirected to this URL, it will immediately authenticate (or re-authenticate).
Examplehttps://mycompany.com/login/failed.html
-
For Text not found in page:
Enter the text that appears on the page after successful authentication. If this text isn’t found on the page, the crawler will immediately authenticate (or re-authenticate).
ExampleWhen a user successfully logs in, the page shows a "Hello, <USERNAME>!" greeting text. If the login username you specified was
jsmith@mycompany.com
, the text to enter would be:Hello, jsmith@mycompany.com!
ExampleLog out
-
For Text found in page:
Enter the text that appears on the page when a user isn’t authenticated. If this text is found on the page, the crawler will immediately authenticate (or re-authenticate).
Examples-
An error has occurred.
-
Your username or password is invalid.
-
-
For Cookie not found:
Enter the name of the cookie returned by the server after successful authentication. If this cookie isn’t found, the crawler will immediately authenticate (or re-authenticate).
ExampleASP.NET_SessionId
-
For URL matches regex:
Enter a regex rule that matches the URL where users trying to access protected content are redirected to when they’re not authenticated. If the crawler is redirected to a URL that matches this regex, it will immediately authenticate (or re-authenticate).
Example.+Account\/Login.*
-
For URL doesn’t match regex:
Enter a regex rule that matches the URL where users trying to access protected content are redirected to after successful authentication. If the crawler isn’t redirected to a URL that matches this regex, it will immediately authenticate (or re-authenticate).
-
Force authentication
Select this option if you want Coveo’s first request to be for authentication, regardless of whether it is actually required.
|
You should only force authentication if you have no reliable authentication status validation method. |
Custom login sequence
If the web page requires specific actions during the login process, you might have to configure a custom login sequence.
|
The standard Web source login sequence can handle various login pages (e.g., OneLogin, Google, Salesforce). Make sure the standard Web source login sequence doesn’t work before configuring a custom login sequence. |
Custom login sequences have the following constraints:
-
They can have no more than five steps.
-
All steps must contain no more than ten actions.
-
There can only be one step to enter the password.
Contact the Coveo Support team if you need help.
The "Content security" tab
Select who will be able to access the source items through a Coveo-powered search interface. For details on this parameter, see Content security.
The "Access" tab
In the Access tab, set whether each group (and API key, if applicable) in your Coveo organization can view or edit the current source.
For example, when creating a new source, you could decide that members of Group A can edit its configuration while Group B can only view it.
See Custom access level for more information.
Completion
-
Finish adding or editing your source:
-
When you want to save your source configuration changes without starting a build/rebuild, such as when you know you want to do other changes soon, click Add source/Save.
-
When you’re done editing the source and want to make changes effective, click Add and build source/Save and rebuild source.
NoteOn the Sources (platform-ca | platform-eu | platform-au) page, you must click Launch build or Start required rebuild in the source Status column to add the source content or to make your changes effective, respectively.
Back on the Sources (platform-ca | platform-eu | platform-au) page, you can follow the progress of your source addition or modification.
Once the source is built or rebuilt, you can review its content in the Content Browser.
-
-
Once your source is done building or rebuilding, review the metadata Coveo is retrieving from your content.
-
On the Sources (platform-ca | platform-eu | platform-au) page, click your source, and then click More > View metadata in the Action bar.
-
If you want to use a currently not indexed metadata in a facet or result template, map it to a field.
-
Click the metadata and then, at the top right, click Add to Index.
-
In the Apply a mapping on all item types of a source panel, select the field you want to map the metadata to, or add a new field if none of the existing fields are appropriate.
Notes-
For details on configuring a new field, see Add or edit a field.
-
For advanced mapping configurations, like applying a mapping to a specific item type, see Manage mappings.
-
-
Click Apply mapping.
-
-
Depending on the connector you use, you may be able to extract additional metadata from your content. You can then map that metadata to a field, just like you did for the default metadata.
Click for more information about custom metadata extraction and indexing
NoteSome Coveo connectors let you define rules to extract metadata beyond the default metadata Coveo discovers during the initial source build.
For example:
-
The Push API connector lets you define metadata key-value pairs in the
addOrUpdate
section of thePUT
request payload used to upload push operations to an Amazon S3 file container. -
The REST API connector lets you build a JSON configuration that Coveo uses to retrieve content through the REST API of your remote content repository. That JSON configuration also allows you to define metadata names and specify where to locate the metadata values in the JSON API response Coveo receives.
-
The Database connector lets you add
<CustomField>
elements in the XML configuration. Each element defines a metadata name and the database field to use to populate the metadata with. -
The Web connector lets you create web scraping configurations that contain metadata extraction rules using CSS or XPath selectors. You can also extract metadata from JSON-LD
<script>
tags. -
The Sitemap connector shares the same metadata extraction capabilities as the Web connector. You can use web scraping configurations and extract metadata from JSON-LD
<script>
tags. The connector also supports extracting metadata included in the XML sitemap file.
Some connectors automatically map metadata to default or user created fields, making the mapping process unnecessary. Some connectors automatically create mappings and fields for you when you configure metadata extraction.
See your connector documentation for more details.
-
-
When you’re done reviewing and mapping metadata, return to the Sources (platform-ca | platform-eu | platform-au) page.
-
To reindex your source with your new mappings, click Launch rebuild in the source Status column.
-
Once the source is rebuilt, you can review its content in the Content Browser.
-
-
Add your source to a project to group all your implementation resources together. If you’re using the Crawling Module, you can also add it to your project.
Required privileges
You can assign privileges to allow access to specific tools in the Coveo Administration Console. The following table indicates the privileges required to view or edit elements of the Sources (platform-ca | platform-eu | platform-au) page and associated panels. See Manage privileges and Privilege reference for more information.
|
Note
The Edit all privilege isn’t required to create sources. When granting privileges for the Sources domain, you can grant a group or API key the View all or Custom access level, instead of Edit all, and then select the Can Create checkbox to allow users to create sources. See Can Create ability dependence for more information. |
Actions | Service | Domain | Required access level |
---|---|---|---|
View sources, view source update schedules, and subscribe to source notifications |
Content |
Fields |
View |
Sources |
|||
Organization |
Organization |
||
Edit sources, edit source update schedules, and view the View Metadata page |
Content |
Fields |
Edit |
Sources |
|||
Content |
Source metadata |
View |
|
Organization |
Organization |
Proof of website ownership
Coveo applies a Time the crawler waits between requests to your server value below 1000 milliseconds only when you prove ownership of the website you want to index.
To prove ownership of the website you want to index
-
Create an empty text file named
coveo-ownership-orgid.txt
, replacingorgid
with your Coveo organization ID. -
Upload this file at the root of the website you want to index.
NoteIf your site has
robots.txt
directives that include acrawl-delay
parameter with a different value, the slowest crawling speed applies. See also the robots.txt option.
Migrate from manual form authentication
If you’re using manual form authentication, you’ll see a "Manual form authentication deprecation" warning when viewing the Authentication tab. You’ll want to migrate to form authentication. To do so, we recommend you create a duplicate of your source and configure form authentication on the duplicate. When the duplicate is configured and fully tested, you can copy its configuration to the original source.
If you’re using a sandbox organization and a snapshot-based phased rollout, the alternative is to copy your original source and related resources configurations to your sandbox using the resource snapshots feature. Once your sandbox source authentication configurations updated and fully tested, you can use a snapshot to apply your changes to your production organization source.
Though the following procedure uses the source duplicate method, steps 3 to 8 inclusively are common to both methods.
To migrate from manual form authentication to form authentication
-
On the Sources (platform-ca | platform-eu | platform-au) page, click your source, and then click More > Duplicate in the Action bar.
-
Name your duplicate.
-
Click your duplicate source, and then click Edit in the Action bar.
-
Select the Authentication tab.
-
Select the Form authentication radio button.
The following fields will be populated automatically using your existing manual form authentication settings: Username, Password, Login address, Validation method and Value(s), Force authentication.
-
Rebuild your duplicate source.
-
Make sure that your duplicate source contains properly indexed content. Things you should check for:
-
Your duplicate source contains the same number of items as the original source.
-
For pages that are authentication protected in your website, make sure the quick view of the corresponding items in your duplicate source shows the content of the actual website page. If form authentication fails, the item quick view may display the content of your form authentication login page instead of the actual website page.
-
-
If form authentication is failing, consider making the following adjustments to your duplicate source form authentication configuration:
-
Changing the Validation method and associated Value(s) to a more reliable combination.
-
Increasing the Loading delay.
-
Setting up a custom login sequence.
Contact Coveo Support if you need help.
-
-
When you’re sure the authentication configuration on your duplicate source works, apply the changes to the original source.
-
On the Sources (platform-ca | platform-eu | platform-au) page, click your duplicate source, and then click More > Edit JSON in the Action bar.
-
Copy the
FormAuthenticationConfiguration
JSON object. The object looks like the following:"FormAuthenticationConfiguration": { "sensitive": false, "value": "{\"authenticationFailed\":{\"method\":\"RedirectedToUrl\",\"values\":[\"https://something.com/Account/Login\"]},\"inputs\":[], \"formUrl\":\"https://something.com/Account/Login\",\"enableJavaScript\":true,\"forceLogin\":false,\"javaScriptLoadingDelayInMilliseconds\":2000,\"customLoginSequence\":{}}" }
-
On the Sources (platform-ca | platform-eu | platform-au) page, click your original source, and then click More > Edit JSON in the Action bar.
-
Replace the
FormAuthenticationConfiguration
object with the one from your duplicate source. -
Click Save.
-
What’s next?
If you’re using the Crawling Module to retrieve your content, consider subscribing to deactivation notifications to receive an alert when a Crawling Module component becomes obsolete and stops the content crawling process.