Add or Edit a Database Source

A Database source allows administrators and content managers to retrieve and make searchable the content of a local database via the Coveo On-Premises Crawling Module (see Coveo On-Premises Crawling Module).

Your company developer created a custom database to manage the parts used in your facilities, their location in your warehouse, and purchase orders. You decide to index data regarding purchase orders only so that your buyers can find this content via your Coveo-powered search page.

As an administrator or a content manager, you can add the content of a local database to a Coveo Cloud organization. Your Database source can be shared, private, or secured (see Content Security).

By default, a Database source starts a rescan every day to retrieve content changes (addition, modification, or deletion).

Some permission systems are not officially supported by Coveo Cloud. A custom project by Coveo experts may allow you to index your Database source permissions. So, if you want to create a secured source, contact the Coveo Support team.

Source Features Summary

Features Supported Additional information
Content update Refresh

See Enabling Refresh on a Database Source

Rescan
Rebuild
Content security options Secured Some permission systems are not officially supported by Coveo Cloud. Contact the Coveo Support team.
Private
Shared

Add or Edit a Database Source

You can configure a Database source though the Coveo Cloud administration console if you use a 64-bit driver to connect to your database. If you use a 32-bit driver, you must use the Coveo Cloud Platform API and a JSON source configuration (see Creating a Crawling Module Source Using the Source API).

  1. Ensure the Coveo On-Premises Crawling Module is installed on a server that has access to the database of which you want to retrieve the content (see Crawling Module Deployment Overview).

  2. Log in to the Coveo Cloud administration console with an administrator account (see Logging in to Coveo Cloud V2).

  3. Go to the Add/Edit a Database Source panel:

    • To add a source, in the main menu, under Content, select Sources > Add Source button > Database.

      OR

    • To edit a source, in the main menu, under Content, select Sources > source row > Edit in the Action bar.

  4. In the Add/Edit a Database Source panel, in the Configuration tab, if not already done, click Download Crawling Module to install the Coveo On-Premises Crawling Module on a server that has access to the database of which you want to retrieve the content.

  5. Enter appropriate values for the available parameters:

    • Source name

      A descriptive name for your source under 255 characters (not already in use for another source in this organization).

      You cannot change the source name once it is created.

    • Connection string

      The parameters to use to connect to your database.

      Since connection strings are not encrypted, they should never contain credentials in plain-text. You can hide the password and user ID in the connection string by respectively introducing the @uid and @pwd tokens. The Database source internally replaces the tokens with the information provided in the Authentication section (see Authentication).

      You must provide either both tokens or none at all. If you do not provide tokens, but add source credentials, the behavior will be the same as before, meaning that the credentials will be used to impersonate the process running the queries.

      For a basic connection string: Data Source=mydatabase.mycompany.com;Initial Catalog=MyDatabase;User Id=companyUser;Password=MyPassword

      Hiding password and user ID using tokens: Data Source=mydatabase.mycompany.com;Initial Catalog=MyDatabase;User Id=@uid;Password=@pwd

      The connection string syntax differs from one database type to another (see Connection Strings).

    • Item type

      The table or view object names (<Mapping type="name">) that are defined in the database configuration and that you want to index (see XML Configuration).

    • Driver type

      Select the software driver that provides access to your database.

      To obtain a complete list of available drivers and therefore of supported database types, use the Crawling Module REST API GET call on /api/odbc/drivers (see Crawling Module REST API Reference and Viewing the Available Drivers for an ODBC Source)

    • Character optical recognition (OCR)

      Check this box if you want Coveo Cloud to extract text from image files and/or PDF files containing images (see Enable Optical Character Recognition). OCR-extracted text is processed as item data, meaning that it is fully searchable and will appear in the item Quick View (see Search Result Quick View).

      Since the OCR feature is available at an extra charge, you must first contact Coveo Sales to add this feature to your organization license. You can then enable it for your source.

    • Index

      When adding a source, if you have more than one logical (non-Elasticsearch) index in your organization, select the index in which the retrieved content will be stored (see Leverage Multiple Coveo Indexes). If your organization only has one index, this drop-down menu is not visible and you have no decision to make.

      • To add a source storing content in an index different than default, you need the View access level on the Logical Index domain (see Privilege Management and Logical Indexes Domain).

      • Once the source is added, you cannot switch to a different index.

    • Content security

      Select who should be able to access the database content through a Coveo search (see Content Security). Your options are:

      • Private: Only the source creator will be able to find the source content in the search interface. They must however authenticate to the search interface with the identity with which they created the source.

      • Shared: Everyone will be able to find the source content in the search interface.

      • Secured: Only users authenticated in the search interface will see the source items for which they have read permission.

        If you want to create a secured source to use with the Coveo On-Premises Crawling Module, contact the Coveo Support team (see Coveo On-Premises Crawling Module).

  6. In the Authentication section, in the Username box, enter the username of a dedicated administrator account that has access to all the content you want to include. In the Password box, enter the corresponding password.

  7. In the Database Configuration section, in the XML configuration box, enter a XML-formatted configuration instructing Coveo Cloud to retrieve and copy the data from record fields to Coveo default and standard source fields.

    • Ensure that your configuration only contain read-only queries to not make any changes to your database.

    • The source indexes items through a query against a database. Subqueries can run on every item to complete the information with more complex queries (see Complement Information Retrieval Using Subqueries). Moreover, if your query is complex and already configured as a stored procedure, you can leverage it in your XML configuration (see Use a Stored Procedure).

    • If you want to index certain database items depending on a characteristic that may vary, your configuration must include instructions as to which items are no longer relevant and can be deleted from the index.

      You index MESSAGE items with a message_id that is less than 100. The configuration is the following for your MS-SQL database:

        <?xml version="1.0" encoding="utf-8" ?>
        <ODBC>
        <CommonMapping>
        <AllowedUsers>
            <AllowedUser type="Windows" allowed="true">
            <Name>everyone</Name>
            <Server></Server>
            </AllowedUser>
        </AllowedUsers>
        </CommonMapping>
        <Mapping type="MESSAGE">
        <Accessor type="query"
                    IncrementalRefreshFieldName="date">
        SELECT * FROM (
            SELECT ROW_NUMBER() OVER (ORDER BY mid) AS mid, -- message.mid,
            message.sender,
            message.date,
            message.message_id,
            message.subject,
            Cast(@LastRefresh as nvarchar(4000)) as FunkyDate,
            message.body,
            message.folder
            FROM message
        )  AS T
        WHERE mid &lt;= 100
        </Accessor>
        <Fields>
            <Uri>https://www.coveo.com/Emails/details.aspx?Id=%[mid]</Uri>
            <ClickableUri>https://www.coveo.com</ClickableUri>
            <FileName>Message_%[mid].txt</FileName>
            <Title>Message_%[mid] - %[FunkyDate]</Title>
            <ModifiedDate>%[date]</ModifiedDate>
            <Body>%[body]</Body>
            <CustomFields>
            <CustomField name="sysAuthor">%[sender]</CustomField>
            </CustomFields>
        </Fields>
        <AllowedUsers>
            <AllowedUser type="CustomGroup" allowed="true">
            <Name>everyone</Name>
            <Server></Server>
            </AllowedUser>
        </AllowedUsers>
        </Mapping>
        </ODBC>
      

    However, if, among the items with a message_id inferior to 100, you want to index items of a certain age only, you need to add an instruction for Coveo Cloud to delete the items that do not satisfy your age condition. Since the age of an item changes constantly, each of the indexed items will eventually become irrelevant and will need to be removed from your index.

    You want items that are less than a month old only to be searchable. With the following instruction inserted before <Fields> in your XML configuration, every time your Database source is refreshed, items that no longer meet your age criterion are deleted from the index (see Refresh, Rescan, and Rebuild):

     <AccessorForItemsToDelete type="query">
      SELECT message.mid
      FROM message
      WHERE message.date &lt; DATEADD(month, -1, GETDATE())
         AND message.mid &lt; 100
     </AccessorForItemsToDelete>
    
  8. In the Access tab, determine whether each group and API key can view or edit the source configuration (see Understanding Resource Access):
    1. In the Access Level column, select View or Edit for each available group.
    2. On the left-hand side of the tab, if available, click Groups or API Keys to switch lists.

    If you remove the Edit access level from all the groups of which you are a member, you will not be able to edit the source again after saving. Only administrators and members of other groups that have Edit access on this resource will be able to do so. To keep your ability to edit this resource, you must grant the Edit access level to at least one of your groups.

  9. Optionally, consider editing or adding mappings between Database item metadata and fields in your Coveo Cloud organization (see Adding and Managing Source Mappings).

    You can only manage mapping rules once you build the source (see Refresh, Rescan, or Rebuild Sources).

  10. Complete your source addition or edition:

    • Click Add Source/Save when you want to save your source configuration changes without starting a build/rebuild, such as when you know you want to do other changes soon.

      On the Sources page, you must click Start initial build or Start required rebuild in the source Status column to add the source content or make your changes effective, respectively.

      OR

    • Click Add and Build Source/Save and Rebuild Source when you are done editing the source and want to make changes effective.

      Back on the Sources page, you can review the progress of your Database source addition or modification (see Adding and Managing Sources).

    Once the source is built or rebuilt, you can review its content in the Content Browser (see Content Browser - Page).

Enabling Refresh on a Database Source

A refresh operation keeps items up to date by scanning repositories and re-indexing modified items at short intervals. For this to be possible with a Database source, each of the database items to retrieve, such as tables or views, must have a Date type field indicating their latest modification date. This date must also be updated whenever the record is modified. The name of this field is irrelevant to Coveo, but it is crucial that is contains a date, as Coveo uses this information to determine whether the content of a database item has changed since the last update operation. If so, the item must be re-indexed, so that the content searchable in your Coveo-powered search page reflects your actual database content.

So that we state in the code below you must ensure that those fields are the same, but can be any name that you would like.

The refresh takes into account deleted items when the AccessorForItemsToDelete accessor is configured in the XML configuration (see XML Configuration). Otherwise, a source rescan or rebuild is required.

In the SQL query, the SELECT statement must have a WHERE clause with a criterion on the last modification date field.

The following example should work with common database engines such as Microsoft SQL Server 2012, PostgreSQL, and MySQL. The [PARAMETER] field is sent by the crawler to the query to indicate when the last refresh was performed.

  • With a MSSQL or SQLServer database, one must select the SQL Client driver and, in the XML, replace [PARAMETER] with @LastRefresh.

  • With a database of a different type (e.g., a NorthWind database), one must select the ODBC driver and, in the XML, replace [PARAMETER] with ?.

However, regardless of your database type, the value of the OrderByFieldName and IncrementalRefreshFieldName parameters must be the name of the latest modification date field in your database.

<Accessortype="query"
  OrderByFieldName="dateModifiedField"
  OrderByFieldType="DateTime"
  IncrementalRefreshFieldName="dateModifiedField">
  <![CDATA[
    Select
    id,
    title,
    dateModified,
    content,
    author
    FROM blog
    WHERE dateModified>=[PARAMETER]
    order by dateModified
    OFFSET @startRow ROWS FETCH NEXT (@endRow-@startRow) ROWS ONLY;
  ]]>
</Accessor>

The example also includes support for pagination (see OFFSET FETCH Clause (SQL Server Compact)).

Complement Information Retrieval Using Subqueries

The Database source acquires information about each indexed item through a query performed against a database. For each query, it is possible to associate one or more subqueries to be executed and used to complement information.

You can run a main query, and for each row, run a subquery that crawls more/different information. All the results of a single row from the main query, along with everything from the subquery, are merged into a single item.

The source requires a mapping configuration to execute properly. For each mapping type, it is necessary to specify an Accessor representing the SQL query to execute.

Specifying Subqueries

To specify subqueries, you must set the type of the Accessor to query.

<Accessor type="query">

Following the Accessor definition, add an AccessorSubQueries node with all subqueries:

The master key (value following SELECT) in the AccessorSubQuery node must match exactly the one returned by the server. The key you include must also have the same casing. The following error is thrown when the key could not be found:

Unable to index document : There is a formatting error in a sub query. Cannot find master key %[key].

<AccessorSubQueries>
  <AccessorSubQuery name="FirstNameLastName" separator=";" behaviorOnMultiRows="join" allowDuplicates = "false">
    SELECT firstName,
    lastName
    FROM employeelist
    WHERE Email_id = %[sender]
  </AccessorSubQuery>
</AccessorSubQueries>

Subquery Attributes

  • name

    Subquery name referred to in the Fields section of the mapping.

  • separator

    Separator used when concatenating multiple rows.

  • behaviorOnMultiRows

    Action to take when a subquery returns more than one row. The only supported behavior is join, which concatenates values with the provided separator.

  • allowDuplicates (optional)

    This attribute is mainly used when your subquery returns multiple rows. If set to false, duplicates in the results are ignored in the concatenation of the results. If set to true, duplicates are present.

  • singleQuoteEscapeSequence (optional)

    When the value of the returned field contains single quotes, these single quotes must be escaped. By default when you omit this attribute, the source escapes the single quotes by doubling them (e.g., ''). This escaping mechanism should work in most cases. However, some database types require a different escaping sequence for single quotes. In such cases, use this attribute to specify the single quote escape sequence.

    For the MySQL database, the single quote escaping sequence is \'. In this case, in the AccessorSubQuery tag, include the singleQuoteEscapeSequence attribute as follows:

      <AccessorSubQuery name="FirstNameLastName" separator=";" behaviorOnMultiRows="join" allowDuplicates = "false" singleQuoteEscapeSequence="\'">
    

Subquery Master Key

In a subquery, a master key used in the WHERE clause must respect the format %[fieldName], which corresponds to metadata acquired from the main accessor. The master key is used to make the join between the main query and subqueries.

Specifying Subquery Metadata for Fields

The <Fields> section of the mapping configuration is used to specify the metadata to use for indexing.

Refer to the following example for a typical <Fields> section of a mapping configuration:

<Fields>
  <Uri>https://www.coveo.com/Emails/details.aspx?Id=%[mid]</Uri>
  <ClickableUri>https://www.coveo.com</ClickableUri>
  <FileName>Message_%[mid].txt</FileName>
  <Title>Message_%[mid]</Title>
  <ModifiedDate>%[date]</ModifiedDate>
  <Body>%[body]</Body>
  <CustomFields>
    <CustomField name="sysAuthor">%[sender]</CustomField>
    <CustomField name="firstName">%[FirstNameLastName.firstName]</CustomField>
    <CustomField name="lastName">%[FirstNameLastName.lastName]</CustomField>
  </CustomFields>
</Fields>

The metadata of a subquery can be specified for a field or a custom field. The way to specify is similar to the way it is done when referring a field coming from the main accessor: %[subQueryName.fieldName]. In the above example, custom field firstName is referring subquery named FirstNameLastName and uses the firstName metadata.

Refer to the following for a complete mapping configuration, used in Coveo unit tests:

<?xml version="1.0" encoding="utf-8" ?>
<ODBC>
  <CommonMapping excludedItems="employeelist">
    <AllowedUsers>
      <AllowedUser type="Windows" allowed="true">
        <Name>everyone</Name>
        <Server></Server>
      </AllowedUser>
    </AllowedUsers>
  </CommonMapping>
  <Mapping type="message">
    <Accessor type="query">
      SELECT message.mid,
      message.sender,
      message.date,
      message.message_id,
      message.subject,
      message.body,
      message.folder
      FROM message
      WHERE DATE like '2001-04-07%'
    </Accessor>
      <AccessorSubQueries>
        <AccessorSubQuery name="FirstNameLastName" separator=";" behaviorOnMultiRows="join">
          SELECT firstName, lastName
          FROM employeelist
          WHERE Email_id = %[sender]
        </AccessorSubQuery>
      </AccessorSubQueries>
    <Fields>
      <Uri>https://www.coveo.com/Emails/details.aspx?Id=%[mid]</Uri>
      <ClickableUri>https://www.coveo.com</ClickableUri>
      <FileName>Message_%[mid].txt</FileName>
      <Title>Message_%[mid]</Title>
      <ModifiedDate>%[date]</ModifiedDate>
      <Body>%[body]</Body>
      <CustomFields>
        <CustomField name="sysAuthor">%[sender]</CustomField>
        <CustomField name="firstName">%[FirstNameLastName.firstName]</CustomField>
        <CustomField name="lastName">%[FirstNameLastName.lastName]</CustomField>
      </CustomFields>
    </Fields>
    <AllowedUsers>
      <AllowedUser type="CustomGroup" allowed="true">
        <Name>everyone</Name>
        <Server></Server>
      </AllowedUser>
    </AllowedUsers>
  </Mapping>
</ODBC>

Use a Stored Procedure

If your database content retrieval query is complex and already configured as a stored procedure, you may want to leverage it in your XML configuration.

While writing your XML configuration, under Accessor type="query" enter exec storedProcedureName; instead of a SELECT statement with a FROM clause, and then replace storedProcedureName with the name of the desired procedure. If parameters are required to call the stored procedure, add them as follows:

exec storedProcedureName @param1 = 'param1Value',
@param2= 'param2Value',
@param3= 'param3Value';

What’s Next?

Review your source update schedule and optionally change it so that it better fits your needs (see Edit a Source Schedule). By default, your content is rescanned every day.