Add or Edit a Database Source

A Database source allows members of the Administrators and Content Managers built-in groups to retrieve and make searchable the content of a local database via the Coveo On-Premises Crawling Module (see Coveo On-Premises Crawling Module).

Your company developer created a custom database to manage the parts used in your facilities, their location in your warehouse, and purchase orders. You decide to index data regarding purchase orders only so that your buyers can find this content via your Coveo-powered search page.

As an administrator or a content manager, you can add the content of a local database to a Coveo organization. In a Coveo-powered search interface, the source content is accessible to either everyone, the source creator only, or specific users as determined by source permissions (see Content Security).

Source Key Characteristics

Add or Edit a Database Source

Before you start, ensure that the Coveo On-Premises Crawling Module is installed on a server that has access to the database of which you want to retrieve the content. Both servers must use the same time zone to prevent indexing timing issues.

If the crawling module is running in a different time zone than the database, you must add a parameter by modifying the source JSON configuration from the Coveo Administration Console during the completion process.

Then, follow the instructions below when adding or editing your Database source.

“Configuration” Tab

In the Add/Edit a Database Source subpage, the Configuration tab is selected by default. It contains your source’s general and authentication information, as well as other parameters.

If you have not already installed the Coveo On-Premises Crawling Module on a server that has access to the database of which you want to retrieve the content, click Download Crawling Module to do so.

General Information

Source Name

Enter a name for your source.

Use a short and descriptive name, using letters, numbers, hyphens (-), and underscores (_). Avoid spaces and other special characters.

Connection String

Enter your database connection parameters. Since the connection string syntax differs from one database type to another, you might want to see The Connection Strings Reference for details.

Since connection strings aren’t encrypted, they should never contain credentials in plain text. You can hide the password and user ID in the connection string by respectively introducing the @uid and @pwd tokens. The Database source internally replaces the tokens with the information provided in the Authentication section.

You must provide either both tokens or none. If you don’t provide tokens, but add source credentials, the behavior will be the same as before, meaning that the credentials will be used to impersonate the process running the queries.

  • Basic connection string: Data Source=mydatabase.mycompany.com;Initial Catalog=MyDatabase;User Id=companyUser;Password=MyPassword

  • Hiding password and user ID using tokens: Data Source=mydatabase.mycompany.com;Initial Catalog=MyDatabase;User Id=@uid;Password=@pwd

Item Type

Enter the table or view object names (<Mapping type="name">) that you intend to index. You will later define them in the database configuration.

Driver Type

Select the software driver that provides access to your database.

To get a complete list of available drivers and therefore of supported database types, use the Crawling Module REST API GET call on /api/odbc/drivers (see Crawling Module REST API Reference and Getting the Available Drivers for an ODBC Source)

Paired Crawling Module

If your source is a Crawling Module source and if you have more than one Crawling Module linked to this organization, select the one with which you want to pair your source. If you change the Crawling Module instance paired with your source, a successful rebuild is required for your change to apply.

Character Optical Recognition (OCR)

Check this box if you want Coveo Cloud to extract text from image files or PDF files containing images. OCR-extracted text is processed as item data, meaning that it’s fully searchable and will appear in the item Quick View. See Enable Optical Character Recognition for details on this feature.

Index

When adding a source, if you have more than one logical (non-Elasticsearch) index in your organization, select the index in which the retrieved content will be stored (see Leverage Many Coveo Indexes). If your organization only has one index, this drop-down menu isn’t visible and you have no decision to make.

  • To add a source storing content in an index different than default, you need the View access level on the Logical Index domain (see Manage Privileges and Logical Indexes Domain).

  • Once the source is added, you can’t switch to a different index.

“Authentication” Section

Enter the Username and Password of a dedicated user account that has access to the content you want to index. See Source Credentials Leading Practices. Then, select one of the following options from the drop-down list:

If you selected Active Directory on-premises, you must also fill the Active Directory username and Active Directory password fields that appear since this option won’t work with only the dedicated user credentials. If you selected Native, skip to “Content Security” Tab.

  • If you leave the Active Directory username and Active Directory password fields blank, your current (dockerless) crawling module credentials will be automatically used.

  • Active Directory on-premises doesn’t affect the authentication of the Database source.

Active Directory Username and Active Directory Password

Enter credentials to grant Coveo Cloud access to your Active Directory.

Expand Well-Known SIDs

Check this box if you want the users included in your Active Directory well-known security identifiers to be granted access to the indexed content. Expect an increase in the duration of the security identity provider refresh operation. Supported well-known SIDs are: Everyone, Authenticated Users, Domain Admins, Domain Users, and Anonymous Users.

If your entire content is secured with the Everyone or Authenticated users well-known, it’s more cost-effective resource-wise to index it with a source whose content is accessible to everyone than to expand the well-known with a source that indexes permissions.

Enable TLS

Check this box to use a TLS protocol to retrieve your security identities. If you do, we strongly recommend selecting StartTLS if you can. Since LDAPS is a much older protocol, you should only select this value if StartTLS is incompatible with your environment.

Email Attributes

By default, Coveo Cloud retrieves the email address associated to each security identity from the mail attribute. Optionally, you can specify additional or different attributes to check. Should an attribute contain more than one value, Coveo Cloud uses the first one.

“Database Configuration” Section

In the XML configuration box, enter a XML-formatted configuration consisting of queries against your database to retrieve and copy the data from record fields to Coveo Cloud fields. This configuration should contain only read queries, or else you could make unwanted changes to your database.

You can also run subqueries on every item to complete the information with more complex queries. Moreover, if your query is complex and already configured as a stored procedure, you can leverage this procedure in your XML configuration.

If you’ve selected Active Directory on-premises in the Authentication section, you must further the allowedusers. For example:

  <?xml version="1.0" encoding="utf-8" ?>
  <ODBC>
  <CommonMapping>
  <AllowedUsers>
      <AllowedUser type="Windows" allowed="true">
      <Name>coveo\agroup</Name>
      <Server></Server>
      </AllowedUser>
  </AllowedUsers>
Defining Items to Delete

If you want to index certain database items depending on a characteristic that may vary, your configuration must include instructions as to which items are no longer relevant and can be deleted from the index.

You index MESSAGE items with a message_id that’s less than 100. The configuration is the following for your MS-SQL database:

  <?xml version="1.0" encoding="utf-8" ?>
  <ODBC>
  <CommonMapping>
  <AllowedUsers>
      <AllowedUser type="Windows" allowed="true">
      <Name>everyone</Name>
      <Server></Server>
      </AllowedUser>
  </AllowedUsers>
  </CommonMapping>
  <Mapping type="MESSAGE">
  <Accessor type="query"
              IncrementalRefreshFieldName="date">
  SELECT * FROM (
      SELECT ROW_NUMBER() OVER (ORDER BY mid) AS mid, -- message.mid,
      message.sender,
      message.date,
      message.message_id,
      message.subject,
      Cast(@LastRefresh as nvarchar(4000)) as FunkyDate,
      message.body,
      message.folder
      FROM message
  )  AS T
  WHERE mid &lt;= 100
  </Accessor>
  <Fields>
      <Uri>https://www.coveo.com/Emails/details.aspx?Id=%[mid]</Uri>
      <ClickableUri>https://www.coveo.com</ClickableUri>
      <FileName>Message_%[mid].txt</FileName>
      <Title>Message_%[mid] - %[FunkyDate]</Title>
      <ModifiedDate>%[date]</ModifiedDate>
      <Body>%[body]</Body>
      <CustomFields>
      <CustomField name="sysAuthor">%[sender]</CustomField>
      </CustomFields>
  </Fields>
  <AllowedUsers>
      <AllowedUser type="CustomGroup" allowed="true">
      <Name>everyone</Name>
      <Server></Server>
      </AllowedUser>
  </AllowedUsers>
  </Mapping>
  </ODBC>

However, if, among the items with a message_id inferior to 100, you want to index items of a certain age only, you need to add an instruction for Coveo Cloud to delete the items that don’t satisfy your age condition. Since the age of an item changes constantly, each of the indexed items will eventually become irrelevant and will need to be removed from your index.

You want items that are less than a month old only to be searchable. With the following instruction inserted before <Fields> in your XML configuration, every time your Database source is refreshed, items that no longer meet your age criterion are deleted from the index:

<AccessorForItemsToDelete type="query">
  SELECT message.mid
  FROM message
  WHERE message.date &lt; DATEADD(month, -1, GETDATE())
    AND message.mid &lt; 100
</AccessorForItemsToDelete>

“Content Security” Tab

Select who will be able to access the source items through a Coveo-powered search interface. For details on this parameter, see Content Security.

“Access” Tab

In the Access tab, determine whether each group and API key can view or edit the source configuration (see Resource Access):

  1. In the Access Level column, select View or Edit for each available group.

  2. On the left-hand side of the tab, if available, click Groups or API Keys to switch lists.

Completion

  1. Finish adding or editing your source:

    • When you want to save your source configuration changes without starting a build/rebuild, such as when you know you want to do other changes soon, click Add Source/Save.

      To add the source content or to make your changes effective, on the Sources page, you must click Start initial build or Start required rebuild in the source Status column.

      OR

    • When you’re done editing the source and want to make changes effective, click Add and Build Source/Save and Rebuild Source.

      Back on the Sources page, you can review the progress of your source addition or modification.

    Once the source is built or rebuilt, you can review its content in the Content Browser.

  2. Optionally, consider editing or adding mappings.

    You can only manage mapping rules once you build the source (see Refresh, Rescan, or Rebuild Sources).

  3. If the crawling module is running in a different time zone than the database, you must modify the source JSON configuration from the Coveo Administration Console by adding the TzdbTimeZoneId parameter.

    The time zone of the crawled system is America/New_York; therefore you add the following to the source JSON configuration:

     "TzdbTimeZoneId": {
     "sensitive": false,
     "value": "America/New_York"
     }
    

For more information on the source JSON configuration, see Edit a Source JSON Configuration.

Enabling Refresh on a Database Source

A refresh operation keeps items up to date by scanning repositories and re-indexing modified items at short intervals. For this to be possible with a Database source, each of the database items to retrieve, such as tables or views, must have a Date type field indicating their latest modification date. This date must also be updated whenever the record is modified. The name of this field is irrelevant to Coveo, but it’s crucial that’s contains a date, as Coveo uses this information to determine whether the content of a database item has changed since the last update operation. If so, the item must be re-indexed, so that the content searchable in your Coveo-powered search page reflects your actual database content.

So that we state in the code below you must ensure that those fields are the same, but can be any name that you would like.

The refresh takes into account deleted items when the AccessorForItemsToDelete accessor is configured in the XML configuration (see XML Configuration). Otherwise, a source rescan or rebuild is required.

In the SQL query, the SELECT statement must have a WHERE clause with a criterion on the last modification date field.

The following example should work with common database engines such as Microsoft SQL Server 2012, PostgreSQL, and MySQL. The [PARAMETER] field is sent by the crawler to the query to indicate when the last refresh was performed.

  • With a MSSQL or SQLServer database, one must select the SQL Client driver and, in the XML, replace [PARAMETER] with @LastRefresh.

  • With a database of a different type (e.g., a NorthWind database), one must select the ODBC driver and, in the XML, replace [PARAMETER] with ?.

However, regardless of your database type, the value of the OrderByFieldName and IncrementalRefreshFieldName parameters must be the name of the latest modification date field in your database.

<Accessortype="query"
  OrderByFieldName="dateModifiedField"
  OrderByFieldType="DateTime"
  IncrementalRefreshFieldName="dateModifiedField">
  <![CDATA[
    Select
    id,
    title,
    dateModified,
    content,
    author
    FROM blog
    WHERE dateModified>=[PARAMETER]
    order by dateModified
    OFFSET @startRow ROWS FETCH NEXT (@endRow-@startRow) ROWS ONLY;
  ]]>
</Accessor>

The example also includes support for pagination (see OFFSET FETCH Clause (SQL Server Compact)).

Enable Pausing/Resuming Update Operations

By default, Database sources don’t offer the option to pause/resume their update operations. To enable these actions, you must alter your source configuration.

  1. In the SQL query, add an ORDER BY with the same chronological fields as in the SELECT statement.

  2. In your source XML configuration, add XML attributes on the Accessor element:

  • OrderByFieldName

    Specifies the name of the column on which the ORDER BY is applied. This attribute must be present to enable the Pause/Resume options. It can be, for example, the name of the latest modification date field in your database.

  • OrderByFieldType

    Specifies the .NET data type of that column. This attribute isn’t normally required. The source automatically tries to determine the data type by preparing the SQL query - without however executing it - and looking at the schema of the results. However, if a specific DBMS doesn’t handle that process correctly, you can manually specify the data type with this attribute. The allowed types are the following:

    • short (16-bit signed integer)

    • ushort (16-bit unsigned integer)

    • int (32-bit signed integer)

    • uint (32-bit unsigned integer)

    • long (64-bit signed integer)

    • ulong (64-bit unsigned integer)

    • float (single-precision floating point number)

    • double (double-precision floating point number)

    • string (String)

  • IncrementalRefreshFieldName

    To support both Refresh and Pause/Resume options, the value of the IncrementalRefreshFieldName parameter must be the same as that of the OrderByFieldName parameter.

    See this example for an excerpt of a configuration file for a source with Pause/Resume and Refresh options enabled:

    You must replace <PARAMETER> by @LastRefresh in an SqlClient scenario or by ? otherwise.

    <Mapping type="Orders">
      <Accessor type="query"
                OrderByFieldName="OrderDate"
                OrderByFieldType="DateTime"
                IncrementalRefreshFieldName="OrderDate">
       SELECT Shippers.CompanyName AS ShipperName,
              Orders.OrderID AS ID,
              Orders.CustomerID,
              Orders.OrderDate,
              Orders.RequiredDate,
              Orders.ShippedDate,
              Customers.CompanyName,
              Employees.LastName,
              Employees.FirstName
       FROM   Orders, Shippers, Customers, Employees
       WHERE  Orders.ShipVia = Shippers.ShipperID AND
              Orders.CustomerID = Customers.CustomerID AND
              Orders.EmployeeID = Employees.EmployeeID AND
              Orders.OrderDate >= <PARAMETER>
        ORDER BY Orders.OrderDate
      </Accessor>
    </Mapping>
    

Complement Information Retrieval Using Subqueries

The Database source acquires information about each indexed item through a query performed against a database. For each query, it’s possible to associate one or more subqueries to be executed and used to complement information.

You can run a main query, and for each row, run a subquery that crawls more/different information. All the results of a single row from the main query, along with everything from the subquery, are merged into a single item.

The source requires a mapping configuration to execute properly. For each mapping type, it’s necessary to specify an Accessor representing the SQL query to execute.

Specifying Subqueries

To specify subqueries, you must set the type of the Accessor to query.

<Accessor type="query">

Following the Accessor definition, add an AccessorSubQueries node with all subqueries:

The master key (value following SELECT) in the AccessorSubQuery node must match exactly the one returned by the server. The key you include must also have the same casing. The following error is thrown when the key couldn’t be found:

Unable to index document : There's a formatting error in a sub query. Cannot find master key %[key].

<AccessorSubQueries>
  <AccessorSubQuery name="FirstNameLastName" separator=";" behaviorOnMultiRows="join" allowDuplicates = "false">
    SELECT firstName,
    lastName
    FROM employeelist
    WHERE Email_id = %[sender]
  </AccessorSubQuery>
</AccessorSubQueries>

Subquery Attributes

  • name

    Subquery name referred to in the Fields section of the mapping.

  • separator

    Separator used when concatenating many rows.

  • behaviorOnMultiRows

    Action to take when a subquery returns more than one row. The only supported behavior is join, which concatenates values with the provided separator.

  • allowDuplicates (optional)

    This attribute is mainly used when your subquery returns many rows. If set to false, duplicates in the results are ignored in the concatenation of the results. If set to true, duplicates are present.

  • singleQuoteEscapeSequence (optional)

    When the value of the returned field contains single quotes, these single quotes must be escaped. By default when you omit this attribute, the source escapes the single quotes by doubling them (e.g., ''). Usually, this escaping mechanism should work. However, some database types require a different escaping sequence for single quotes. In such cases, use this attribute to specify the single quote escape sequence.

    For the MySQL database, the single quote escaping sequence is \'. In this case, in the AccessorSubQuery tag, include the singleQuoteEscapeSequence attribute as follows:

      <AccessorSubQuery name="FirstNameLastName" separator=";" behaviorOnMultiRows="join" allowDuplicates = "false" singleQuoteEscapeSequence="\'">
    

Subquery Master Key

In a subquery, a master key used in the WHERE clause must respect the format %[fieldName], which corresponds to metadata acquired from the main accessor. The master key is used to make the join between the main query and subqueries.

Specifying Subquery Metadata for Fields

The <Fields> section of the mapping configuration is used to specify the metadata to use for indexing.

See the following example for a typical <Fields> section of a mapping configuration:

<Fields>
  <Uri>https://www.coveo.com/Emails/details.aspx?Id=%[mid]</Uri>
  <ClickableUri>https://www.coveo.com</ClickableUri>
  <FileName>Message_%[mid].txt</FileName>
  <Title>Message_%[mid]</Title>
  <ModifiedDate>%[date]</ModifiedDate>
  <Body>%[body]</Body>
  <CustomFields>
    <CustomField name="sysAuthor">%[sender]</CustomField>
    <CustomField name="firstName">%[FirstNameLastName.firstName]</CustomField>
    <CustomField name="lastName">%[FirstNameLastName.lastName]</CustomField>
  </CustomFields>
</Fields>

The metadata of a subquery can be specified for a field or a custom field. The way to specify is similar to the way it’s done when referring a field coming from the main accessor: %[subQueryName.fieldName]. In the above example, custom field firstName is referring subquery named FirstNameLastName and uses the firstName metadata.

See the following for a complete mapping configuration, used in Coveo unit tests:

<?xml version="1.0" encoding="utf-8" ?>
<ODBC>
  <CommonMapping excludedItems="employeelist">
    <AllowedUsers>
      <AllowedUser type="Windows" allowed="true">
        <Name>everyone</Name>
        <Server></Server>
      </AllowedUser>
    </AllowedUsers>
  </CommonMapping>
  <Mapping type="message">
    <Accessor type="query">
      SELECT message.mid,
      message.sender,
      message.date,
      message.message_id,
      message.subject,
      message.body,
      message.folder
      FROM message
      WHERE DATE like '2001-04-07%'
    </Accessor>
      <AccessorSubQueries>
        <AccessorSubQuery name="FirstNameLastName" separator=";" behaviorOnMultiRows="join">
          SELECT firstName, lastName
          FROM employeelist
          WHERE Email_id = %[sender]
        </AccessorSubQuery>
      </AccessorSubQueries>
    <Fields>
      <Uri>https://www.coveo.com/Emails/details.aspx?Id=%[mid]</Uri>
      <ClickableUri>https://www.coveo.com</ClickableUri>
      <FileName>Message_%[mid].txt</FileName>
      <Title>Message_%[mid]</Title>
      <ModifiedDate>%[date]</ModifiedDate>
      <Body>%[body]</Body>
      <CustomFields>
        <CustomField name="sysAuthor">%[sender]</CustomField>
        <CustomField name="firstName">%[FirstNameLastName.firstName]</CustomField>
        <CustomField name="lastName">%[FirstNameLastName.lastName]</CustomField>
      </CustomFields>
    </Fields>
    <AllowedUsers>
      <AllowedUser type="CustomGroup" allowed="true">
        <Name>everyone</Name>
        <Server></Server>
      </AllowedUser>
    </AllowedUsers>
  </Mapping>
</ODBC>

What’s Next?

Recommended Articles