Index XML Sitemap Metadata

The Sitemap source supports indexing additional metadata included in an XML sitemap file. This metadata can come from:

Moreover, the Sitemap source can also index metadata retrieved from the meta tags in the head of the web pages listed in your sitemap.

In any case, the steps to configure how Coveo Cloud stores this information are the same.

Third-Party Extensions

Some sites such as Google offer extensions adding extra metadata to your sitemap (see Image Sitemap). Alternatively, you can build your own extension Extending the Sitemaps Protocol. Either way, the data added to your sitemap can be retrieved and made searchable by Coveo Cloud. See Configuring Fields and Mappings to configure Coveo Cloud adequately.

Retrieving metadata originating from a third-party extension is only possible with Coveo Cloud V2 (see Coveo Cloud V1 and V2 Compared).

See also Video sitemaps and video sitemap alternatives for another example.

Coveo-Specific Custom Metadata

A developer can include custom metadata in an XML sitemap file specifically for Coveo Cloud indexing purposes. When they can generate or modify the sitemap XML file of a repository to index, they can also include a Coveo Cloud namespace (coveo:metadata) and metadata to provide information on items that isn’t found in default fields (i.e., Sitemap standard source fields and Coveo Cloud default fields).

Since you have control on the sitemap file (it isn’t generated by a third party), you decide to create your XML sitemap file dynamically and add all the custom metadata you need.

Although the added Coveo Cloud metadata will only be read by the Coveo Cloud crawler and connector and ignored by all other processes, it still respects the Sitemap protocol (see Sitemaps XML format).

The following procedure requires a user that has the permissions and skills to modify or create an XML sitemap file and the required privileges in the Coveo Administration Console.

To index custom metadata in an XML sitemap

You or a developer must code a third-party process to modify or create an XML sitemap file as follows:

  1. In the urlset XML element start tag (<urlset>), extend the Sitemap protocol using the Coveo Cloud namespace by adding the following line:

    xmlns:coveo="https://www.coveo.com/en/company/about-us"

    From a Coveo perspective, the value of the xmlns:coveo attribute (i.e., the URI) is irrelevant. The Coveo sitemap crawler ignores this value. However, other web search engine indexing services may need to validate this URI.

    The attribute name (i.e., xmlns:coveo) is important as the sitemap XML file will contain elements in the coveo namespace scope.

     <?xml version="1.0" encoding="UTF-8"?>
     <urlset
     xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
     xmlns:coveo="https://www.coveo.com/en/company/about-us">
    
  2. For each url elements (<url></url>) in the sitemap, create a new XML element named coveo:metadata (<coveo:metadata></coveo:metadata>).
     <url>
     <loc>http://example.com/about/</loc>
     <lastmod>2015-02-10T13:47:23+00:00</lastmod>
     <changefreq>weekly</changefreq>
     <priority>1.00</priority>
     <coveo:metadata>
     </coveo:metadata>
     </url>
    
  3. Within the coveo:metadata elements, add your custom metadata (name and value).
    • Character Data (CDATA) is supported when you place the CDATA tag (![CDATA[) at the beginning of the node (see Character Data and Markup).

      For example:

        <coveo:metadata>
        <casenumber>18467</casenumber>
        <companyname>
            <![CDATA[
            Company XYZ Inc. <USA>
            ]]>
        </companyname>
        </coveo:metadata>
      
    • The source ignores the CDATA tag and indexes the rest of the node content such as special characters (e.g., &, %, $, and ~) and <xml> tags as text.

    You want to add the name of the author, the last date of modification and the document tags (if any) so you add the following XML elements:

     <coveo:metadata>
     <modificationdate>2015-02-10T13:47:23+00:00</modificationdate>
     <authorname>John Smith</authorname>
     <tags />
     </coveo:metadata>
    

    Once done, the sitemap could look like the following:

     <?xml version="1.0" encoding="UTF-8"?>
     <urlset
         xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
         xmlns:coveo="https://www.example.com/schemas">
     <url>
     <loc>http://example.com/about/</loc>
     <lastmod>2015-02-10T13:47:23+00:00</lastmod>
     <changefreq>weekly</changefreq>
     <priority>1.00</priority>
     <coveo:metadata>
         <modificationdate>2015-02-10T13:47:23+00:00</modificationdate>
         <authorname>John Smith</authorname>
         <tags />
     </coveo:metadata>
     </url>
     </urlset>
    

Indexing a Sitemap Source by Reference

If you have all the information you want to index in your Coveo-specific custom metadata, you may want to ignore the links referenced in the sitemap, therefore improving indexing performance (see Indexing by Reference). To do so:

  1. On the Sources page of the Coveo Administration Console, add a Sitemap source.

  2. Access the Edit a Source JSON Configuration panel of the source you just created.

  3. In the documentConfig section of the JSON source configuration, find the extensionSettings section.

  4. In the extensionSettings section, delete the ByExtensions and ByContentTypes sections.

    byExtension

  5. Find the noExtension and the other sections.

    • In the noExtension section, change the action value from Retrieve to Reference.

    • In the other section, change the action value from Retrieve to Reference.

    indexByReference

  6. Click Save and Rebuild Source.

Meta Tags of Listed Web Pages

By default, Coveo doesn’t index the content of the meta tags in the head of the web pages listed in your sitemap. This operation is costly resource-wise and may therefore impact the indexing performance.

If you want to index the content of the meta tags as source item metadata, add the following to the source JSON configuration:

"IndexHtmlMetadata": {
  "sensitive": false,
  "value": "true"
}

As a result, Coveo will index the content attribute of meta tags when this tag is keyed with one of the following attributes: name, property, itemprop, or http-equiv.

In the tag <meta property="og:title" content="The Article Title"/>, The Article Title is indexed.

Configuring Fields and Mappings

Regardless of how the additional metadata was added to your sitemap, you must configure Coveo Cloud so that it indexes this information adequately.

  1. In the Coveo Administration Console, ensure that you have the with the required privileges.

  2. For the metadata you want to see in your item details, add the corresponding custom fields in mapping rules (see Add or Edit a Mapping Rule).

    • Ensure that each Field name starts with the sitemap prefix.

    • It’s not mandatory to add all custom metadata, but when you do so, the metadataName in mapping rules must match the XML element name.

      XML element names are case-sensitive.

    • Although nested metadata isn’t supported, Coveo supports a single level of metadata, for example:
     <coveo:metadata> <OSCodes>WW1</OSCodes> <product>Inspiron XPS;Dimension XPS</product> </coveo:metadata>
    
    • The metadata is flattened, i.e., the key of each piece of data is the result of the path to the corresponding value. For example, the sitemap excerpt below results in the following flattened metadata: "video.thumbnail_loc": "http://img.youtube.com/vi/wejYF7l0kKQ/2.jpg".
     <video:video>
     <video:thumbnail_loc>http://img.youtube.com/vi/wejYF7l0kKQ/2.jpg</video:thumbnail_loc>
     </video:video>
    
    • You want to have the author name in the results metadata, so you add the sitemapauthorname field and use the following mapping rule: %[authorname].

    Admin-SitemapMappingRuleEx

    • You want to have the video thumbnail in the results metadata, so you add the sitemapvideothumbnail field and use the following mapping rule: %[video.thumbnail_loc].
  3. Save and rebuild your Sitemap source.

  4. On the Content Browser page, in the Fields tab located in the Properties panel of your Sitemap source items, ensure that the new metadata is available (see Access the “Fields” Tab).

Recommended Articles