Add a Confluence Server source

This is for:

In this article

Source key characteristics
Requirements
Add a Confluence Server source
Indexing page properties
Limitation
Required privileges
What’s next?

Confluence is a knowledge sharing tool that enables users to create and share content. Members with the required privileges can add the content of a Confluence instance to a Coveo organization.

Leading practice

The number of items that a source processes per hour (crawling speed) depends on various factors, such as network bandwidth and source configuration. See About crawling speed for information on what can impact crawling speed, as well as possible solutions.

Source key characteristics

Features	Supported	Additional information
Confluence version	7 and 8	Only the minor versions currently maintained by Atlassian are supported.
Indexable content	Spaces, pages (such as Wiki pages), blog posts, pages and blog posts comments (indexed as metadata), and attachments (in pages, blog posts, and comments).
Content update operations	refresh		Requires Coveo’s plugin to be fully functional.
rescan		Takes place every day by default. If you change the name of a Confluence space, the rescan operation detects the change only for pages created or modified after the change. You must therefore rebuild the source to get the space name changed on all space pages.
rebuild
Content security options	Same users and groups as in your content system		Requires Coveo’s plugin.
Specific users and groups
Everyone

Features

Supported

Additional information

Confluence version

7 and 8

Only the minor versions currently maintained by Atlassian are supported.

Indexable content

Spaces, pages (such as Wiki pages), blog posts, pages and blog posts comments (indexed as metadata), and attachments (in pages, blog posts, and comments).

Content update operations

refresh

Requires Coveo’s plugin to be fully functional.

rescan

Takes place every day by default. If you change the name of a Confluence space, the rescan operation detects the change only for pages created or modified after the change. You must therefore rebuild the source to get the space name changed on all space pages.

rebuild

Content security options

Same users and groups as in your content system

Requires Coveo’s plugin.

Specific users and groups

Everyone

Requirements

Supported Confluence versions

The source supports 7 and 8 on-premises installations using the Confluence REST API and Search REST API. Only the minor versions currently maintained by Atlassian are supported. Coveo doesn’t test all minor versions since they aren’t expected to have breaking changes for the source.

Note

Confluence Data Center is supported.

Atlassian Confluence server accessible to Coveo

When the access to communication ports between Coveo and the Confluence server is restricted, the appropriate ports must be opened in the network infrastructure such as in firewalls to allow Coveo to access the content.

Crawling account

To index Confluence content, you must provide the credentials of a Confluence account. Coveo will use this crawling account to retrieve your content.

Your crawling account must:

Be a native Confluence account (not managed by an identity provider such as Google).
Be a member of a Confluence group named coveo-connector-plugin-users or a Confluence administrator. This allows the account to use the plugin.
Have read access to all the content you want to index, thanks to the following permissions:
- View on every space to index.
- Can view on every restricted page to index.
Have login CAPTCHAs disabled.
Be used by Coveo only.

Coveo plugin for Confluence

To complete the capabilities of its Confluence Server source, Coveo offers a plugin on the Atlassian Marketplace.

You must install it to benefit from the following features:

Refreshes: the plugin allows the source to run refresh operations that index deleted, restored, and moved items. Without the plugin, you’d need to run a rescan or rebuild operation instead, which would take more time and resources to complete. Therefore, the plugin allows your search interface to reflect content changes faster, with minimal impact on your Confluence server.
Permission indexing: the plugin allows the source to index item permissions, which will then be replicated in your Coveo search interface. As a result, users of your search interface only see the content they’re allowed to see in Confluence. If they’ve been forbidden to access an article in Confluence, they won’t see it in their search results. To read more on how Coveo manages permissions, see Management of security identities and item permissions.

About the new plugin version

In July 2024, Coveo released version 2 of the plugin. Starting with this new version, the plugin doesn’t require your crawling account to be a Confluence administrator anymore. Instead, you can create a Confluence group named coveo-connector-plugin-users, and then add your crawling account to this group. This will allow your crawling account to use the plugin.

If you’ve installed the Coveo plugin before July 2024, you’re using version 1 and you can continue doing so. However, if you want to avoid the Confluence administrator permission requirement, switch plugin versions.

Enabling the Confluence SOAP remote API (Web Service)

Due to a Confluence REST API limitation, the connector must use the SOAP Remote API to retrieve content permissions. For these permissions to be replicated in a Coveo-powered search interface, a Confluence system administrator must enable the remote API on your Confluence instance.

Add a Confluence Server source

A Confluence Server source indexes on-premises (server) content. To retrieve cloud content instead, see Add a Confluence Cloud source.

Follow the instructions below to add a Confluence Server source that uses the desired content retrieval method.

On the Sources (platform-ca | platform-eu | platform-au) page, click Add source.
In the Add a source of content panel, click the On-premises () or the Crawling Module () tab, depending on your content retrieval context. With the latter, you must install the Crawling Module to make your source operational.
Click the Confluence Server tile.
Configure your source.

Leading practice

It’s best to create or edit your source in your sandbox organization first. Once you’ve confirmed that it indexes the desired content, you can copy your source configuration to your production organization, either with a snapshot or manually.

See About non-production organizations for more information and best practices regarding sandbox organizations.

"Configuration" tab

In the Add a Confluence Server source panel, the Configuration tab is selected by default. It contains your source’s general and authentication information, as well as other parameters.

General information

Source name

Enter a name for your source.

Leading practice

A source name can’t be modified once it’s saved, therefore be sure to use a short and descriptive name, using letters, numbers, hyphens (-), and underscores (_). Avoid spaces and other special characters.

Instance URL

Enter the address of the Confluence Wiki site or space that you want to make searchable. Ensure to include the protocol (http:// or https://).

Depending on your use case, use one of the following URL formats:

To index a complete Confluence site, add the Confluence server root URL:

https://MyConfluenceServer:8090/
To index a specific space, add its URL:

https://MyConfluenceServer:8090/display/space1
To index a specific space when Confluence isn’t installed at the server root:

http://server/MyConfluence/display/spacename

Paired Crawling Module

If your source is a Crawling Module source, and if you have more than one Crawling Module linked to this organization, select the one with which you want to pair your source. If you change the Crawling Module instance paired with your source, a successful rebuild is required for your change to apply.

Optical character recognition (OCR)

If you want Coveo to extract text from image files or PDF files containing images, enable the appropriate option.

The extracted text is processed as item data, meaning that it’s fully searchable and will appear in the item Quick view. See Enable optical character recognition for details on this feature.

Project

If you have the Enterprise edition, use the Project selector to associate your source with one or multiple Coveo projects.

"Authentication" section

If you want to index secured content or index Confluence permissions to replicate them in your search interface, you must provide one of the following:

The credentials of a Confluence crawling account.
A personal access token corresponding to this account. However, to use this option, you must have the secure administrator session feature deactivated.

See Source Credentials Leading Practices for details.

"Content to include" Section

Consider changing the default value of the parameters in this section to fine-tune how your Confluence site is crawled.

Space type

Select which spaces you want to index. By default, global space content is indexed and personal space content isn’t.

Space status

Select which spaces should be indexed, depending on their status. Options are:

Current (non-archived spaces)
Archived

Space filter

If you want to index only a subset of a Confluence site, enter a regex that the desired spaces match. This parameter is especially useful when you want to index spaces that have an element in common in their space keys.

Example

You want to index all spaces with keys starting with an uppercase letter followed by a number, so you enter the following regex:

^[A-Z][0-9].*$

Options

Select the items to index:

Attachments (binary files attached to a page, blog post, or comment)
Comments (on blog posts and pages)

Note

Comments are indexed as page metadata rather than as items.

"Content security" tab

Select who will be able to access the source items through a Coveo-powered search interface. For details on this parameter, see Content security.

"Access" tab

In the Access tab, set whether each group (and API key, if applicable) in your Coveo organization can view or edit the current source.

For example, when creating a new source, you could decide that members of Group A can edit its configuration while Group B can only view it.

See Custom access level for more information.

Completion

Finish adding or editing your source:

When you want to save your source configuration changes without starting a build/rebuild, such as when you know you want to do other changes soon, click Add source/Save.

When you’re done editing the source and want to make changes effective, click Add and build source/Save and rebuild source.

Note

On the Sources (platform-ca | platform-eu | platform-au) page, you must click Launch build or Start required rebuild in the source Status column to add the source content or to make your changes effective, respectively.

Back on the Sources (platform-ca | platform-eu | platform-au) page, you can follow the progress of your source addition or modification.

Once the source is built or rebuilt, you can review its content in the Content Browser.

Once your source is done building or rebuilding, review the metadata Coveo is retrieving from your content.

On the Sources (platform-ca | platform-eu | platform-au) page, click your source, and then click More > View and map metadata in the Action bar.
If you want to use a currently not indexed metadata in a facet or result template, map it to a field.
1. Click the metadata and then, at the top right, click Add to Index.
2. In the Apply a mapping on all item types of a source panel, select the field you want to map the metadata to, or add a new field if none of the existing fields are appropriate.
  Notes
  
  For details on configuring a new field, see Add or edit a field.
  
  For advanced mapping configurations, like applying a mapping to a specific item type, see Manage mappings.
3. Click Apply mapping.

Depending on the source type you use, you may be able to extract additional metadata from your content. You can then map that metadata to a field, just like you did for the default metadata.

More on custom metadata extraction and indexing

Some source types let you define rules to extract metadata beyond the default metadata Coveo discovers during the initial source build.

For example:

Source type Custom metadata extraction methods

Source type	Custom metadata extraction methods
Push API	Define metadata key-value pairs in the `addOrUpdate` section of the `PUT` request payload used to upload push operations to an Amazon S3 file container.
REST API and GraphQL API	In the JSON configuration (REST API \| GraphQL API) of the source, define metadata names (REST API \| GraphQL API) and specify where to locate the metadata values in the JSON API response Coveo receives.
Database	Add `<CustomField>` elements in the XML configuration. Each element defines a metadata name and the database field to use to populate the metadata with.
Web	Configure web scraping configurations that contain metadata extraction rules using CSS or XPath selectors. Extract metadata from JSON-LD `<script>` tags.
Sitemap	Extract metadata included in the XML sitemap file. Configure web scraping configurations that contain metadata extraction rules using CSS or XPath selectors. Extract JSON-LD `<script>` tag metadata. Extract `<meta>` tag content using the `IndexHtmlMetadata` JSON parameter.

Push API

Define metadata key-value pairs in the addOrUpdate section of the PUT request payload used to upload push operations to an Amazon S3 file container.

REST API
and
GraphQL API

In the JSON configuration (REST API | GraphQL API) of the source, define metadata names (REST API | GraphQL API) and specify where to locate the metadata values in the JSON API response Coveo receives.

Database

Add <CustomField> elements in the XML configuration. Each element defines a metadata name and the database field to use to populate the metadata with.

Web

Configure web scraping configurations that contain metadata extraction rules using CSS or XPath selectors.
Extract metadata from JSON-LD <script> tags.

Sitemap

Extract metadata included in the XML sitemap file.
Configure web scraping configurations that contain metadata extraction rules using CSS or XPath selectors.
Extract JSON-LD <script> tag metadata.
Extract <meta> tag content using the IndexHtmlMetadata JSON parameter.

Some source types automatically map metadata to default or user created fields, making the mapping process unnecessary. Some source types automatically create mappings and fields for you when you configure metadata extraction.

See your source type documentation for more details.

When you’re done reviewing and mapping metadata, return to the Sources (platform-ca | platform-eu | platform-au) page.
To reindex your source with your new mappings, click Launch rebuild in the source Status column.
Once the source is rebuilt, you can review its content in the Content Browser.

Confluence server authentication error

The connector may produce an authentication error if a CAPTCHA is required to enter your Confluence administrator crawling account. This CAPTCHA appears if you fail to log in to your account three or more times.

To avoid this error, you can disable login CAPTCHAs entirely on the Security Configuration page. See Configuring Captcha for Failed Logins for detailed instructions.

Indexing page properties

By default, Coveo doesn’t index pages or blog post properties (metadata.properties). To do so, you must edit your source’s JSON configuration to specify the desired page properties.

In the Configuration tab of the Edit configuration with JSON panel, add "MetadataPropertiesToExpand": "<VALUES>", where <VALUES> are the properties you want to index, separated by commas.

Example: "MetadataPropertiesToExpand": "owner,status"

To refer to a property nested within another, concatenate their names with a dot (.) separator.

Example: "MetadataPropertiesToExpand": "owner.lastname,status"

Limitation

When indexing content with the Crawling Module, ensure not to change space character encoding in an item’s URI, as Coveo uses URIs to distinguish items.

For example, an item whose URI would change from example.com/my first item to example.com/my%20first%20item wouldn’t be recognized as the same by Coveo. As a result, it would be indexed twice, and the older version wouldn’t be deleted.

Item URIs are displayed in the Content Browser (platform-ca | platform-eu | platform-au). We recommend you check where these URIs come from before making changes that affect space character encoding. Depending on your source type, the URI may be an item’s URL, or it may be built out of pieces of metadata by your source mapping rules. For example, your item URIs may consist of the main site URL plus the item filename, due to a mapping rule such as example.com/%[filename]. In such a case, changing space encoding in the item filename could impact the URI.

Required privileges

You can assign privileges to allow access to specific tools in the Coveo Administration Console. The following table indicates the privileges required to view or edit elements of the Sources (platform-ca | platform-eu | platform-au) page and associated panels. See Manage privileges and Privilege reference for more information.

Note

The Edit all privilege isn’t required to create sources. When granting privileges for the Sources domain, you can grant a group or API key the View all or Custom access level, instead of Edit all, and then select the Can Create checkbox to allow users to create sources. See Can Create ability dependence for more information.

Actions	Service	Domain	Required access level
View sources, view source update schedules, and subscribe to source notifications	Content	Fields	View
Sources
Organization	Organization
Edit sources, edit source update schedules, and view the View and map metadata subpage	Content	Fields	Edit
Sources
Content	Source metadata	View
Organization	Organization

Actions

Service

Domain

Required access level

View sources, view source update schedules, and subscribe to source notifications

Content

Fields

View

Sources

Organization

Edit sources, edit source update schedules, and view the View and map metadata subpage

Content

Fields

Edit

Sources

Content

Source metadata

View

Organization

What’s next?

Schedule source updates.
If you’re using the Crawling Module to retrieve your content, consider subscribing to deactivation notifications to receive an alert when a Crawling Module component becomes obsolete and stops the content crawling process.