Add a Confluence Data Center source
Add a Confluence Data Center source
Confluence Data Center is a knowledge sharing tool that enables users to create and share content. Members with the required privileges can add the content of a Confluence Data Center instance to a Coveo organization.
|
|
Leading practice
The number of items that a source processes per hour (crawling speed) depends on various factors, such as network bandwidth and source configuration. See About crawling speed for information on what can impact crawling speed, as well as possible solutions. |
Source key characteristics
The following table presents the main characteristics of a Confluence Data Center source.
| Features | Supported | Additional information | |
|---|---|---|---|
Indexable content |
Spaces, pages (such as Wiki pages), blog posts, pages and blog posts comments (indexed as metadata), and attachments (in pages, blog posts, and comments). |
||
Requires Coveo’s plugin to be fully functional. |
|||
Takes place every day by default. If you change the name of a Confluence space, the rescan operation detects the change only for pages created or modified after the change. You must therefore rebuild the source to get the space name changed on all space pages. |
|||
Content security options |
Requires Coveo’s plugin. |
||
Automatic mapping of metadata to fields that have the same name |
This setting is disabled by default and not recommended for this source type. |
||
Automatically indexed metadata |
Examples of auto-populated default fields (no user-defined metadata required):
After a content update, inspect your item field values in the Content Browser. |
||
Extracted but not indexed metadata |
The Confluence Data Center source extracts some of the metadata that the Confluence API makes available. After a rebuild, review the View and map metadata subpage for the list of indexed metadata, and index additional metadata. |
||
Custom metadata extraction |
Confluence lets you add custom properties to pages, blog posts, attachments, and spaces using the Confluence API.
Then, you can use the |
||
Requirements
Atlassian Confluence Data Center accessible to Coveo
When the access to communication ports between Coveo and the Confluence Data Center is restricted, the appropriate ports must be opened in the network infrastructure such as in firewalls to allow Coveo to access the content.
Crawling account
To index Confluence Data Center content, you must provide the credentials of a Confluence account. Coveo will use this crawling account to retrieve your content.
Your crawling account must:
-
Be a native Confluence account (not managed by an identity provider such as Google).
-
Be a member of a Confluence group named
coveo-connector-plugin-usersor a Confluence administrator. This allows the account to use the plugin. -
Have read access to all the content you want to index, thanks to the following permissions:
-
View on every space to index.
-
Can view on every restricted page to index.
-
-
Have login CAPTCHAs disabled.
-
Be used by Coveo only.
Coveo plugin for Confluence
To complete the capabilities of its Confluence Data Center source, Coveo offers a plugin on the Atlassian Marketplace.
You must install it to benefit from the following features:
-
Refreshes: the plugin allows the source to run refresh operations that index deleted, restored, and moved items. Without the plugin, you’d need to run a rescan or rebuild operation instead, which would take more time and resources to complete. Therefore, the plugin allows your search interface to reflect content changes faster, with minimal impact on your Confluence Data Center instance.
-
Permission indexing: the plugin allows the source to index item permissions, which will then be replicated in your Coveo search interface. As a result, users of your search interface only see the content they’re allowed to see in Confluence. If they’ve been forbidden to access an article in Confluence, they won’t see it in their search results. To read more on how Coveo manages permissions, see Management of security identities and item permissions.
About the new plugin version
In July 2024, Coveo released version 2 of the plugin.
Starting with this new version, the plugin doesn’t require your crawling account to be a Confluence administrator anymore.
Instead, you can create a Confluence group named coveo-connector-plugin-users, and then add your crawling account to this group.
This will allow your crawling account to use the plugin.
If you’ve installed the Coveo plugin before July 2024, you’re using version 1 and you can continue doing so. However, if you want to avoid the Confluence administrator permission requirement, switch plugin versions.
Enabling the Confluence SOAP remote API (Web Service)
Due to a Confluence REST API limitation, the connector must use the SOAP Remote API to retrieve content permissions. For these permissions to be replicated in a Coveo-powered search interface, a Confluence system administrator must enable the remote API on your Confluence instance.
Add a Confluence Data Center source
A Confluence Data Center source indexes on-premises (server) content. To retrieve cloud content instead, see Add a Confluence Cloud source.
|
|
Leading practice
It’s best to create or edit your source in your sandbox organization first. Once you’ve confirmed that it indexes the desired content, you can copy your source configuration to your production organization, either with a snapshot or manually. See About non-production organizations for more information and best practices regarding sandbox organizations. |
Follow the instructions below to add a Confluence Data Center source that uses the desired content retrieval method.
-
On the Sources (platform-ca | platform-eu | platform-au) page, click Add source.
-
In the Add a source of content panel, click the On-premises (
) or the Crawling Module (
) tab, depending on your content retrieval context. With the latter, you must install the Crawling Module to make your source operational.
-
Click the Confluence Data Center tile.
-
In the Add a new Confluence Data Center source panel, provide the following information:
-
Name: The source name can’t be modified once it’s saved. Therefore, make sure to use a short and descriptive name, using letters, numbers, hyphens, and underscores. Avoid spaces and other special characters.
-
Address: The base URL of your Confluence Data Center instance. Ensure to include the protocol (
https://orhttp://). Alternatively, you can enter the URL of a specific Confluence Data Center space, provided it matches one of the following patterns. If it doesn’t, see Supporting other space URLs.-
[URL]/spaces/viewspaces.actions?key=[KEY] -
[URL]/display/[KEY] -
[URL]/spaces/[KEY]
-
-
Authentication: How Coveo should log in to your Confluence Data Center instance to index your content.
If you select Personal access token
-
Create a crawling account dedicated to your source. This account must have access to all the content that you want to index. See Source credentials leading practices for other leading practices to follow.
-
Create a personal access token for your crawling account.
-
Back in the source configuration panel, enter this access token.
If you select Basic authentication
-
Create a crawling account dedicated to your source. This account must have access to all the content that you want to index. See Source credentials leading practices for other leading practices to follow.
-
Make sure the secure administrator session feature is deactivated.
-
Back in the source configuration panel, enter the username and password of this account.
If you select No login - Content to index is public
-
Click Next.
-
Select who will be able to access the source items through a Coveo-powered search interface. For details on this parameter, see Content security.
-
-
-
Click Add source.
-
Specify your source settings. Refer to the following sections for detailed information on the source settings:
"Configuration" tab
When configuring or editing your Confluence Data Center source, the Configuration tab is selected by default. It contains your source’s general and authentication information, as well as other parameters that let you specify the content to index.
"Content to index" subtab
The Content to index subtab lets you define the content that you want to make available as search results.
Spaces
In the Spaces card, specify whether you want to index communal and/or personal spaces. Then, specify whether you want to index active and/or archived spaces.
If you want to index specific pages only, enter a regex matching the desired space keys under Space key. Alternatively, you can enter a regex to exclude specific spaces from the indexing process. As a result, when crawling your Confluence instance, Coveo will target/ignore the spaces whose space key matches your regex.
For instance, if you want to index all spaces with keys starting with an uppercase letter followed by a number, you enter the following regex: ^[A-Z][0-9].*$.
Conversely, enter ^(?!Spacekey$).*$ to exclude the space with the key Spacekey.
Additional content
Optionally, you can index the files attached to the indexed pages, blog posts, and comments.
You can also index comments posted on pages and blog posts. These comments will be indexed as metadata of this content.
"Authentication" subtab
The Authentication subtab contains settings used by the source crawler to emulate the behavior of a user authenticating to access restricted Confluence Data Center content.
Confluence address: The base URL of your Confluence Data Center instance.
Ensure to include the protocol (https:// or http://).
Alternatively, you can enter the URL of a specific Confluence Data Center space, provided it matches one of the following patterns.
If it doesn’t, see Supporting other space URLs.
-
[URL]/spaces/viewspaces.actions?key=[KEY] -
[URL]/display/[KEY] -
[URL]/spaces/[KEY]
Authentication: How Coveo should log in to your Confluence Data Center instance to index your content.
If you select Personal access token
-
Create a crawling account dedicated to your source. This account must have access to all the content that you want to index. See Source credentials leading practices for other leading practices to follow.
-
Create a personal access token for your crawling account.
-
Back in the source configuration panel, enter this access token.
If you select Basic authentication
-
Create a crawling account dedicated to your source. This account must have access to all the content that you want to index. See Source credentials leading practices for other leading practices to follow.
-
Make sure the secure administrator session feature is deactivated.
-
-
Back in the source configuration panel, enter the username and password of this account.
If you select No login
Go to the Content security tab, and then select who will be able to access the source items through a Coveo-powered search interface. For details on this parameter, see Content security.
"Identification" subtab
The Identification subtab contains general information about the source.
"Items" tab
On the Items tab, you can specify how the source handles items based on their file type or content type.
File types
File types let you define how the source handles items based on their file extension or content type. For each file type, you can specify whether to index the item content and metadata, only the item metadata, or neither.
You should fine-tune the file type configurations with the objective of indexing only the content that’s relevant to your users.
Your repository contains .pdf files, but you don’t want them to appear in search results.
You click Extensions and then, for the .pdf extension, you change the Default action and Action on error values to Ignore item.
For more details about this feature, see File type handling.
Content and images
If you want Coveo to extract text from image files or PDF files containing images, enable the appropriate option. The extracted text is processed as item data, meaning that it’s fully searchable and will appear in the item Quick view.
|
|
Note
When OCR is enabled, ensure the source’s relevant file type configurations index the item content. Indexing the item’s metadata only or ignoring the item will prevent OCR from being applied. |
See Enable optical character recognition for details on this feature.
"Content security" tab
Select who will be able to access the source items through a Coveo-powered search interface. For details on the content security options, see Content security.
"Access" tab
On the Access tab, specify whether each group (and API key, if applicable) in your Coveo organization can view or edit the current source.
For example, when creating a new source, you could decide that members of Group A can edit its configuration, while Group B can only view it.
For more information, see Custom access level.
Build the source
-
Finish adding or editing your source:
-
When you’re done editing the source and want to make your changes effective, click Add and build source/Save and rebuild source.
-
When you want to save your source configuration changes without starting a build/rebuild, such as when you know you want to make other changes soon, click Add source/Save. On the Sources (platform-ca | platform-eu | platform-au) page, click Launch build or Start required rebuild when you’re ready to make your changes effective and index your content.
-
-
On the Sources (platform-ca | platform-eu | platform-au) page, follow the progress of your source addition or modification.
-
Once the source is built or rebuilt, review its content in the Content Browser.
Index metadata
To use metadata values in search interface facets or result templates, the metadata must be mapped to fields. Coveo automatically maps only a subset of the metadata it extracts. You must map any additional metadata to fields manually.
|
|
Note
Not clear on the purpose of indexing metadata? Watch this video. |
-
On the Sources (platform-ca | platform-eu | platform-au) page, click your source, and then click More > View and map metadata in the Action bar.
-
Review the default metadata that your source is extracting from your content.
-
Map any currently not indexed metadata that you want to use in facets or result templates to fields.
-
Click the metadata and then, at the top right, click Add to Index.
-
In the Apply a mapping on all item types of a source panel, select the field you want to map the metadata to, or add a new field if none of the existing fields are appropriate.
NoteFor advanced mapping configurations, like applying a mapping to a specific item type, see Manage mappings.
-
Click Apply mapping.
-
-
Return to the Sources (platform-ca | platform-eu | platform-au) page.
-
To reindex your source with your new mappings, click your source, and then click More > Rebuild in the Action bar.
-
Once the source is rebuilt, review your item field values. They should now include the values of the metadata you selected to index.
-
On the Sources (platform-ca | platform-eu | platform-au) page, click your source, and then click More > Open in Content Browser in the Action bar.
-
Select the card of the item for which you want to inspect properties, and then click Properties in the Action bar.
-
In the panel that appears, select the Fields tab.
-
-
If needed, extract and map additional metadata.
More on custom metadata extraction
Confluence lets you add custom properties to pages, blog posts, attachments, and spaces using Content Property and Space Property POST requests. You can then extract these custom properties.
-
On the Sources (platform-ca | platform-eu | platform-au) page, click your source, and then click More > Edit configuration with JSON in the Action bar.
-
On the Parameters tab, locate the
MetadataPropertiesToIndexparameter in the JSON configuration box. -
Add the desired custom property names in the
valueof theMetadataPropertiesToIndexparameter as a comma-separated list. To refer to a property nested within another, concatenate their names with a dot (.) separator.For example:
"MetadataPropertiesToIndex": { "sensitive": false, "value": "categorization.department,categorization.team" } -
Rebuild and map each extracted custom metadata to a field, as you did for the default metadata. In the View and map metadata page, the metadata names will have the
metadata.propertiesprefix. For example:metadata.properties.categorization.department.
-
Confluence Data Center authentication error
The connector may produce an authentication error if a CAPTCHA is required to enter your Confluence administrator crawling account. This CAPTCHA appears if you fail to log in to your account three or more times.
To avoid this error, you can disable login CAPTCHAs entirely on the Security Configuration page. See Configuring Captcha for Failed Logins for detailed instructions.
Supporting other space URLs
In the Confluence address box, you can enter the URL of a specific Confluence space, provided it matches one of the following patterns:
-
[URL]/spaces/viewspaces.actions?key=[KEY] -
[URL]/display/[KEY] -
[URL]/spaces/[KEY]
If your space address doesn’t match any of these default patterns, you can still index it by following these steps:
-
In the Confluence address box, enter the root URL of your Confluence instance. This address often ends with
/wiki/. -
Configure the source as you would to index your space, and then click Add source.
-
In the Coveo Administration Console, click the source you just created, and then click Edit configuration with JSON in the More menu.
-
In the Edit configuration with JSON panel that opens, locate the
RootSpaceRegexobject in theparametersobject. -
Change the regex pattern to match your space URL.
-
Scroll down to locate the
startingAddressesarray, and then, in place of your Confluence instance root URL, enter the URL of your space. -
Click Save and rebuild source.
Required privileges
You can assign privileges to allow access to specific tools in the Coveo Administration Console. The following table indicates the privileges required to view or edit elements of the Sources (platform-ca | platform-eu | platform-au) page and associated panels. See Manage privileges and Privilege reference for more information.
|
|
Note
The Edit all privilege isn’t required to create sources. When granting privileges for the Sources domain, you can grant a group or API key the View all or Custom access level, instead of Edit all, and then select the Can Create checkbox to allow users to create sources. See Can Create ability dependence for more information. |
| Actions | Service | Domain | Required access level |
|---|---|---|---|
View sources, view source update schedules, and subscribe to source notifications |
Content |
Fields |
View |
Sources |
|||
Organization |
Organization |
||
Edit sources, edit source update schedules, and edit source mappings |
Organization |
Organization |
View |
Content |
Fields |
Edit |
|
Sources |
|||
View and map metadata |
Content |
Source metadata |
View |
Fields |
|||
Organization |
Organization |
||
Content |
Sources |
Edit |
What’s next?
-
If you’re using the Crawling Module to retrieve your content, consider subscribing to deactivation notifications to receive an alert when a Crawling Module component becomes obsolete and stops the content crawling process.