Add a Database source

A Database source indexes the content of a database. You can then make this content searchable through a Coveo-powered search interface.

For example, your company developer may have created a custom database to manage the parts used in your facilities, their location in your warehouse, and purchase orders. You could decide to index data regarding purchase orders only so that your buyers can find this content via your Coveo-powered employee portal.

Source key characteristics

The following table presents the main characteristics of a Database source.

Features Supported Additional information

Content update operations

refresh

check

See Enabling the Refresh Capability.

rescan

check

Takes place every day by default.

rebuild

check

Content security options

Same users and groups as in your content system

check

Specific users and groups

check

Everyone

check

metadata indexing for search

Automapping of metadata to a field with a matching name

Automatically indexed metadata

Sample of auto-populated fields (when the source configuration XML includes the corresponding element):

After a content update operation, inspect your item field values in the Content Browser.

Collected indexable metadata

Source configuration <CustomField> elements specified in the source whose name attribute values don’t match a Coveo default field name are collected, but not indexed by default.

After a rebuild, review the View and map metadata subpage for the list of indexed metadata and to index additional metadata from those available.

Tip
Leading practice

The number of items that a source processes per hour (crawling speed) depends on various factors, such as network bandwidth and source configuration. See About crawling speed for information on what can impact crawling speed, as well as possible solutions.

Supported database types

The cloud version (cloud-icon) of the source supports the following database types:

  • Databricks

  • Microsoft SQL Server

  • MySQL

  • Postgres

  • Oracle

  • Redshift

The Crawling Module version (crawlingmodule) supports the following database types:

  • Microsoft SQL Server

  • MySQL

  • Postgres

  • Oracle

  • Any database using an ODBC driver. This driver must be installed on your Crawling Module server.

Commerce requirements

When using a Database source to index commerce-specific content, such as products, variants, and availabilities, you have to undergo a catalog configuration process to benefit from all commerce-related capabilities.

More specifically, you must:

Enable Coveo Personalization-as-you-go

Coveo Machine Learning tools include Coveo Personalization-as-you-go (PAYG) capabilities for commerce use cases. This suite of advanced features learns from a user’s intent and reacts within a few clicks. PAYG models require the building of a product vector space to represent the products contained in your source. For Database sources, Coveo PAYG needs to be enabled in order to produce the product vector space. Contact your Coveo representative to discuss your options.

Catalog configuration

Behind the scenes, the Database source uses the Stream API to push content to the Coveo index. Therefore, Database sources must be associated with a catalog entity to ensure a complete configuration. This allows the source to accurately build a product vector space.

For instructions on how to create a catalog entity, see Commerce catalog entity.

Add a Database source

Follow the instructions below to add a Database source using the desired content retrieval method.

  1. On the Sources (platform-ca | platform-eu | platform-au) page, click Add source.

  2. In the Add a source of content panel, click the Cloud (cloud-icon) or Crawling Module (crawlingmodule) tab, depending on your content retrieval context. With the latter, you must install the Crawling Module to make your source operational.

  3. Click the Database tile.

  4. Configure your source.

Tip
Leading practice

It’s best to create or edit your source in your sandbox organization first. Once you’ve confirmed that it indexes the desired content, you can copy your source configuration to your production organization, either with a snapshot or manually.

See About non-production organizations for more information and best practices regarding sandbox organizations.

"Configuration" tab

On the Add a Database Source page, the Configuration tab is selected by default. It contains your source’s general and authentication information, as well as other parameters.

"Identification" subtab

The Identification subtab contains general information about the source.

Name

Enter a name for your source.

Tip
Leading practice

A source name can’t be modified once it’s saved, therefore be sure to use a short and descriptive name, using letters, numbers, hyphens (-), and underscores (_). Avoid spaces and other special characters.

Project

Use the Project selector to associate your source with one or more Coveo projects.

"Connection parameters" subtab

Enter your database connection parameters or a connection string. Coveo will use this information to access your database.

Since the connection string syntax differs from one database type to another, you might want to see The connection strings reference for details.

When providing credentials to access your database, enter the credentials of a dedicated user account that has access to the content you want to index. See Source credentials leading practices.

Notes

When you paste a connection string into the Connection string box:

  • You may notice that Coveo slightly reformats your connection string or reorders parameters.

  • Coveo automatically parses your connection string to fill connection parameter boxes. Depending on the Data provider type, only the Username and Password boxes may be displayed and filled.

  • Coveo replaces the credentials in the connection string with placeholders @uid and @pwd. Since connection strings aren’t encrypted, they should never contain credentials in plain text.

    Coveo Database Connection String Credential Security

These behaviors are normal and don’t indicate that your connection string is invalid.

About ODBC drivers

If you’re using a Crawling Module source to index content from a database using an ODBC driver, ensure that you have the appropriate ODBC driver installed on your Crawling Module server.

A driver acts as a intermediary between Coveo’s connector and your database server. For example, to connect Coveo to an Oracle source, you must install an Oracle driver. ODBC drivers are typically provided by the database vendor. You can also use ODBC Data Source Administrator to review the drivers installed on your server.

"Content to index" subtab

Item types

Enter the name of the tables or view objects you want to index.

XML configuration

For each item you entered above, add the corresponding XML read query to define what to index and how. Specify the data to copy from database fields to Coveo fields. See the reference page to help you in the process.

Important

Your XML queries should be read queries only, or else you could make unwanted changes to your database.

If needed, you can run subqueries on every item to complete the information with more complex queries. See also Further configuration for other options you can implement.

"Advanced settings" subtab

The advanced settings are optional. They let you fine-tune the behavior of your source in some specific situations.

Command timeout

Try increasing the timeout value when:

  • Your database is slow to respond to queries.

  • Your queries take long to execute.

  • You encounter database timeout errors.

We recommend first increasing the value to 120 seconds. If the issue persists, you can either increase the timeout value again or use paged SQL queries instead.

Maximum response page size

This is the maximum number of items a database can return at once. If you know your query will return a large number of items and this may affect your database’s stability, try decreasing this value.

Content and images

If you want Coveo to extract text from image files or PDF files containing images, enable the appropriate option.

The extracted text is processed as item data, meaning that it’s fully searchable and will appear in the item Quick view. See Enable optical character recognition for details on this feature.

"Crawling Module" subtab

If your source is a Crawling Module source, and if you haven’t already installed the Coveo Crawling Module on a server that has access to the database of which you want to retrieve the content, click Download in the information box to do so.

If you have more than one Crawling Module linked to this organization, select the one with which you want to pair your source. If you change the Crawling Module instance paired with your source, a successful rebuild is required for your change to apply.

"Content security" tab

Select who will be able to access the source items through a Coveo-powered search interface. For details on this parameter, see Content security.

If you select Everyone or Specific users or groups, skip to "Access" tab.

If you select the Same users and groups as in your current permission system option to index a permission system, ensure that you’ve defined AllowedUsers nodes in your XML queries.

Moreover, check the Permission system is Active Directory box if applicable. If so, you’ll need to enter your Active Directory details.

Active Directory details

Username and password

Enter credentials to grant Coveo access to your Active Directory. See Source credentials leading practices.

Email attributes

By default, Coveo retrieves the email address associated to each security identity from the mail attribute. Optionally, you can specify additional or different attributes to check. Should an attribute contain more than one value, Coveo uses the first one.

Use TLS protocol to connect to Active Directory

Select this option to use a TLS protocol to retrieve your security identities. If you do, we strongly recommend selecting StartTLS if you can. Since LDAPS is a much older protocol, you should only select this value if StartTLS is incompatible with your environment.

The database content is secured with well-known SIDs

Select this option if you want the users that are included in your Active Directory well-known security identifiers to be granted access to the indexed content. Expect an increase in the duration of the security identity provider refresh operation. Supported well-known SIDs are: Everyone, Authenticated Users, Domain Admins, Domain Users, and Anonymous Users.

Tip
Leading practice

If your entire content is secured with Everyone or Authenticated users, we recommend selecting the Everyone content security option instead. The result will be the same, that is, all users will be able to access the database content through your search interface, and Coveo’s update operations will be more efficient.

Bidirectional relationship between the root domain of your Crawling Module server and your other domains

Select this option to have Coveo connect to your root domain to get the security identities of your other domains through the root domain.

If your environment contains more than one domain, you can establish a bidirectional or outbound cross-link relationship between the root domain of your Crawling Module server and your additional domains. When you do so, these domains trust your root domain, and Coveo can get their security identities through this root domain.

However, when enabling this option, you should expect an increase in the duration of the security identity provider refresh operation. Moreover, if a linked domain is unreachable, Coveo stops the security identity provider refresh operation.

"Access" tab

In the Access tab, specify whether each group (and API key, if applicable) in your Coveo organization can view or edit the current source.

For example, when creating a new source, you could decide that members of Group A can edit its configuration, while Group B can only view it.

For more information, see Custom access level.

Completion

  1. Click Add source. This will save your source without launching a build to index your content. This is useful if you plan on making changes shortly or if, while writing your XML queries, you decided to populate fields that don’t already appear on the Fields (platform-ca | platform-eu | platform-au) page. In such case, you must add these new fields before building your source.

  2. Ensure that your source correctly maps all the fields to populate. In sources created before January 10, 2024, if a field doesn’t have a mapping, you must create one. In sources created after that date, Coveo will automatically populate fields whose name exactly matches a metadata key.

    Example

    In your content, profile picture URIs are stored under profilepic. If a pictureuri fields exists in your organization, you’ll create the following mapping to populate the pictureuri field with picture URIs: %[profilepic].

    However, if a profilepic field already exists in your organization, Coveo will automatically populate it with the profilepic value extracted from your content.

  3. On the Sources (platform-ca | platform-eu | platform-au) page, you must click Launch build or Launch rebuild to make your changes effective.

Crawling Module time zone adjustment

If you’re using the Crawling Module to index your content and if this Crawling Module is running in a different time zone than your database, you must edit the source JSON configuration and add the TzdbTimeZoneId hidden parameter as follows. The expected time zone format is IANA TZDB.

Example

The time zone of your database is America/New_York, therefore you add the following to the source JSON configuration:

"TzdbTimeZoneId": {
"sensitive": false,
"value": "America/New_York"
}

Limitation

When indexing content with the Crawling Module, ensure not to change space character encoding in an item’s URI, as Coveo uses URIs to distinguish items.

For example, an item whose URI would change from example.com/my first item to example.com/my%20first%20item wouldn’t be recognized as the same by Coveo. As a result, it would be indexed twice, and the older version wouldn’t be deleted.

Item URIs are displayed in the Content Browser (platform-ca | platform-eu | platform-au). We recommend you check where these URIs come from before making changes that affect space character encoding. Depending on your source type, the URI may be an item’s URL, or it may be built out of pieces of metadata by your source mapping rules. For example, your item URIs may consist of the main site URL plus the item filename, due to a mapping rule such as example.com/%[filename]. In such a case, changing space encoding in the item filename could impact the URI.

Required privileges

You can assign privileges to allow access to specific tools in the Coveo Administration Console. The following table indicates the privileges required to view or edit elements of the Sources (platform-ca | platform-eu | platform-au) page and associated panels. See Manage privileges and Privilege reference for more information.

Note

The Edit all privilege isn’t required to create sources. When granting privileges for the Sources domain, you can grant a group or API key the View all or Custom access level, instead of Edit all, and then select the Can Create checkbox to allow users to create sources. See Can Create ability dependence for more information.

Actions Service Domain Required access level

View sources, view source update schedules, and subscribe to source notifications

Content

Fields

View

Sources

Organization

Organization

Edit sources, edit source update schedules, and edit source mappings

Organization

Organization

View

Content

Fields

Edit

Sources

View and map metadata

Content

Source metadata

View

Fields

Organization

Organization

Content

Sources

Edit

What’s next?