Index with a generic connector
Index with a generic connector
This is for:
DeveloperThe Coveo Platform provides two generic connectors that may be used to index website content, namely the Sitemap and Web connectors.
The goals of this article are:
-
To present the Sitemap and Web connector characteristics and features. This information will help you determine the optimal indexing strategy for your use case.
-
To guide you in creating your Coveo organization source(s), once your indexing strategy has been determined.
Note
Though they represent different concepts, the terms source and connector are often used interchangeably in Coveo terminology. |
Sitemap and Web connector comparison table
Consider the characteristics of the Sitemap and Web connectors in the table below when deciding how to index your Adobe Experience Manager content. Green check marks in the table below indicate advantages of the given connector over the other connector.
Criteria | Sitemap connector | Web connector |
---|---|---|
Prerequisites |
An existing sitemap or a new, Coveo-specific sitemap. Learn about AEM out-of-the-box sitemaps starting with AEM 6.5.9. |
None. |
Content coverage
(Ability to index all content from AEM. Ease of scoping the AEM content to be indexed.) |
Covers all content.
DAM is supported if sitemap includes links to assets.
Content can be filtered using inclusion/exclusion rules. |
Covers all content.
DAM is supported if assets are linked to parent web pages.
Content can be filtered using inclusion/exclusion rules. |
Indexing speed |
Faster than with Web connector since the Sitemap connector simply fetches the web pages listed in the sitemap file. |
Slower than with the Sitemap connector because the Web connector has to discover content, reading a web page to find links to other pages. |
Metadata
(What kind of metadata can you index?) |
Using a Web scraping configuration
Enabling the indexing of the web page <meta> tag
Enabling the indexing of JSON-LD <script> tags in your web pages
Indexing standard sitemap metadata (for example,
|
Using a Web scraping configuration
Indexing of the web page <meta> tag
Enabling the indexing of JSON-LD <script> tags in your web pages |
Partial update support
(Can the connector index only what has changed in the source since the last indexing operation? Is this indexing triggered manually or can it be scheduled?) |
Manual and scheduled refresh is supported, provided the target sitemap file defines the optional
The maximum refresh schedule frequency is every 5 minutes.
|
Refresh isn’t supported. Only rescans (either manual or scheduled) and rebuilds are available. |
The Coveo Sitemap connector is the ideal choice for Adobe Experience Manager content, not only from an indexing performance perspective but also because of the many metadata indexing options it provides. The Coveo Web connector should only be considered as a fallback solution.
Adobe Experience Manager metadata indexing options
Depending on the way your Adobe Experience Manager website metadata is organized, one of the options below (or a combination thereof) will fulfill your needs. The options are presented in order of performance.
Adding Coveo metadata tags directly in your sitemap file
By default, when using the Sitemap connector, Coveo doesn’t index the content of the <meta>
tags in the <head>
of the web pages.
This operation is costly resource-wise and may therefore impact the indexing performance.
Instead, by default, the Coveo Sitemap connector is coded to look for item metadata added directly inside the website sitemap file <url>
elements.
The connector expects this metadata to be included in a <coveo:metadata>
tag. A developer, therefore, needs to extend the Sitemap protocol, and to modify or generate the sitemap file with the necessary Coveo metadata tag structure and content (see Coveo-Specific Custom Metadata).
Using JSON-LD script tags in your web pages
If you’re already using JSON-LD <script>
tags in your web pages as your metadata implementation format, Coveo has an IndexJsonLdMetadata Sitemap connector parameter you can enable to extract that metadata.
Note
Coveo also provides the |
For general information on how to enable or configure connector parameters, see Edit a source JSON configuration.
Indexing web page head section metadata in a Sitemap source
As mentioned in Adding Coveo metadata tags directly in your sitemap file, the Sitemap connector doesn’t fetch the content of the <meta>
tags in the <head>
of web pages by default.
However, the Sitemap connector has the IndexHtmlMetadata parameter you can enable to do just that.
You can then create a field and mapping to store that metadata.
Using a web scraping configuration
Unlike the previous options that automatically capture a site’s metadata because it’s presented in a standard format, setting up a web scraping configuration requires more work on your part. Moreover, the web scraping configuration may vary from one page to another. However, a web scraping configuration is more flexible than the previous metadata extraction options.
The Sitemap and Web connectors both support web scraping configurations. Once again, you should favor the Sitemap connector for performance considerations.
To more easily create and test web scraping configurations, consider using the Coveo Labs Web Scraper Helper Chrome extension.
Indexing sitemap alternate URLs
The Sitemap source supports alternate language links but, by default, these links aren’t parsed.
You need to set the ParseSitemapAlternateLinks
parameter to true
to enable this feature (see ParseSitemapAlternateLinks).
Create your source
As a prerequisite, you need a Coveo organization. If you don’t have one, you can start a free trial.
Note
You can also create a test organization afterward. |
To create a source
-
Access the Sources (platform-ca | platform-eu | platform-au) page in your Coveo organization.
-
Click Add source in the upper-right corner of the screen.
-
In the Add a source of content dialog, select the source type you’ve chosen to use.
-
Name and configure your source.
-
Click Add and build source.
You can now browse the content you’ve indexed in the Content Browser (platform-ca | platform-eu | platform-au).
Notes
|
Web and Sitemap connector courses
Should you prefer a more guided approach to creating Web and Sitemap connectors, Level Up courses are ideally suited for your needs.
If you’re considering using the Sitemap connector to index your Adobe Experience Manager content, some Web connector course material provides valuable background. We therefore recommend the following learning path:
-
Read Using the Web Connector.
-
Read and do activities in Using the Sitemap Connector: Creating a Sitemap source.
-
Read the Refresh, rescan, and rebuild page.
-
Do Using the Sitemap Connector: Managing source schedules and content security.