Index with a generic connector

This is for:

Developer

The Coveo Platform provides two generic connectors that may be used to index website content, namely the Sitemap and Web connectors.

The goals of this article are:

  • To present the Sitemap and Web connector characteristics and features. This information will help you determine the optimal indexing strategy for your use case.

  • To guide you in creating your Coveo organization source(s), once your indexing strategy has been determined.

Note

Though they represent different concepts, the terms source and connector are often used interchangeably in Coveo terminology.

Sitemap and Web connector comparison table

Consider the characteristics of the Sitemap and Web connectors in the table below when deciding how to index your Adobe Experience Manager content. Green check marks in the table below indicate advantages of the given connector over the other connector.

Criteria Sitemap connector Web connector

Prerequisites

An existing sitemap or a new, Coveo-specific sitemap.

Learn about AEM out-of-the-box sitemaps starting with AEM 6.5.9.

check None.

Content coverage

 

(Ability to index all content from AEM. Ease of scoping the AEM content to be indexed.)

Covers all content.

 

DAM is supported if sitemap includes links to assets.

 

Content can be filtered using inclusion/exclusion rules.

Covers all content.

 

DAM is supported if assets are linked to parent web pages.

 

Content can be filtered using inclusion/exclusion rules.

Indexing speed

check Faster than with Web connector since the Sitemap connector simply fetches the web pages listed in the sitemap file.

Slower than with the Sitemap connector because the Web connector has to discover content, reading a web page to find links to other pages.

Metadata

 

(What kind of metadata can you index?)

Partial update support

 

(Can the connector index only what has changed in the source since the last indexing operation? Is this indexing triggered manually or can it be scheduled?)

check Manual and scheduled refresh is supported, provided the target sitemap file defines the optional Last Modification Date (for example, lastmod for XML sitemaps).

 

The maximum refresh schedule frequency is every 5 minutes.

 

A rescan (either manual or scheduled) or rebuild operation is required to take into account deleted and new sitemap entries.

Refresh isn’t supported. Only rescans (either manual or scheduled) and rebuilds are available.

The Coveo Sitemap connector is the ideal choice for Adobe Experience Manager content, not only from an indexing performance perspective but also because of the many metadata indexing options it provides. The Coveo Web connector should only be considered as a fallback solution.

Adobe Experience Manager metadata indexing options

Depending on the way your Adobe Experience Manager website metadata is organized, one of the options below (or a combination thereof) will fulfill your needs. The options are presented in order of performance.

Adding Coveo metadata tags directly in your sitemap file

By default, when using the Sitemap connector, Coveo doesn’t index the content of the <meta> tags in the <head> of the web pages. This operation is costly resource-wise and may therefore impact the indexing performance.

Instead, by default, the Coveo Sitemap connector is coded to look for item metadata added directly inside the website sitemap file <url> elements. The connector expects this metadata to be included in a <coveo:metadata> tag. A developer, therefore, needs to extend the Sitemap protocol, and to modify or generate the sitemap file with the necessary Coveo metadata tag structure and content (see Coveo-Specific Custom Metadata).

Using JSON-LD script tags in your web pages

If you’re already using JSON-LD <script> tags in your web pages as your metadata implementation format, Coveo has an IndexJsonLdMetadata Sitemap connector parameter you can enable to extract that metadata.

Note

Coveo also provides the IndexJsonLdMetadata parameter in its Web connector. However, the Web connector also automatically parses the entire document which is a drawback from a performance standpoint.

For general information on how to enable or configure connector parameters, see Edit a source JSON configuration.

Indexing web page head section metadata in a Sitemap source

As mentioned in Adding Coveo metadata tags directly in your sitemap file, the Sitemap connector doesn’t fetch the content of the <meta> tags in the <head> of web pages by default. However, the Sitemap connector has the IndexHtmlMetadata parameter you can enable to do just that. You can then create a field and mapping to store that metadata.

Using a web scraping configuration

Unlike the previous options that automatically capture a site’s metadata because it’s presented in a standard format, setting up a web scraping configuration requires more work on your part. Moreover, the web scraping configuration may vary from one page to another. However, a web scraping configuration is more flexible than the previous metadata extraction options.

The Sitemap and Web connectors both support web scraping configurations. Once again, you should favor the Sitemap connector for performance considerations.

To more easily create and test web scraping configurations, consider using the Coveo Labs Web Scraper Helper Chrome extension.

Indexing sitemap alternate URLs

The Sitemap source supports alternate language links but, by default, these links aren’t parsed. You need to set the ParseSitemapAlternateLinks parameter to true to enable this feature (see ParseSitemapAlternateLinks).

Create your source

As a prerequisite, you need a Coveo organization. If you don’t have one, you can start a free trial.

Note

You can also create a test organization afterward.

To create a source

  1. Access the Sources (platform-ca | platform-eu | platform-au) page in your Coveo organization.

  2. Click Add source in the upper-right corner of the screen.

  3. In the Add a source of content dialog, select the source type you’ve chosen to use.

  4. Name and configure your source.

  5. Click Add and build source.

You can now browse the content you’ve indexed in the Content Browser (platform-ca | platform-eu | platform-au).

Notes

Web and Sitemap connector courses

Should you prefer a more guided approach to creating Web and Sitemap connectors, Level Up courses are ideally suited for your needs.

If you’re considering using the Sitemap connector to index your Adobe Experience Manager content, some Web connector course material provides valuable background. We therefore recommend the following learning path: