XML Configuration Further Customization

To index database content, you must provide a list of XML queries defining the content that Coveo should retrieve and how it should index it. Coveo will run these queries against your database and index the returned information in the fields you specified.

In addition, you can optionally customize your XML configuration to enable additional functionalities that will make your source and the indexing process more efficient.

Defining Items to Delete

If you want to index certain database items based on a characteristic that may vary, your XML configuration must include instructions as to which items are no longer relevant and can be deleted from the index.

If using the AccessorForItemsToDelete accessor, ensure to enable the refresh capability, as item deletion takes place during refresh operations only.

Example

You index MESSAGE items with a message_id that’s less than 100. The configuration is the following for your MS-SQL database:

  <?xml version="1.0" encoding="utf-8" ?>
  <ODBC>
  <CommonMapping>
  <AllowedUsers>
      <AllowedUser type="Windows" allowed="true">
      <Name>everyone</Name>
      <Server></Server>
      </AllowedUser>
  </AllowedUsers>
  </CommonMapping>
  <Mapping type="MESSAGE">
  <Accessor type="query"
              IncrementalRefreshFieldName="date">
  SELECT * FROM (
      SELECT ROW_NUMBER() OVER (ORDER BY mid) AS mid, -- message.mid,
      message.sender,
      message.date,
      message.message_id,
      message.subject,
      Cast(@LastRefresh as nvarchar(4000)) as FunkyDate,
      message.body,
      message.folder
      FROM message
  )  AS T
  WHERE mid &lt;= 100
  </Accessor>
  <Fields>
      <Uri>link:https://www.coveo.com/Emails/details.aspx?Id=%[mid]</Uri>
      <ClickableUri>https://www.coveo.com</ClickableUri>
      <FileName>Message_%[mid].txt</FileName>
      <Title>Message_%[mid] - %[FunkyDate]</Title>
      <ModifiedDate>%[date]</ModifiedDate>
      <Body>%[body]</Body>
      <CustomFields>
      <CustomField name="sysAuthor">%[sender]</CustomField>
      </CustomFields>
  </Fields>
  <AllowedUsers>
      <AllowedUser type="CustomGroup" allowed="true">
      <Name>everyone</Name>
      <Server></Server>
      </AllowedUser>
  </AllowedUsers>
  </Mapping>
  </ODBC>

However, if, among the items with a message_id inferior to 100, you want to index items of a certain age only, you need to add an instruction for Coveo to delete the items that don’t satisfy your age condition. Since the age of an item changes constantly, each of the indexed items will eventually become irrelevant and will need to be removed from your index.

You want items that are less than a month old only to be searchable. With the following instruction inserted before <Fields> in your XML configuration, every time your Database source is refreshed, items that no longer meet your age criterion are deleted from the index:

<AccessorForItemsToDelete type="query">
  SELECT message.mid
  FROM message
  WHERE message.date &lt; DATEADD(month, -1, GETDATE())
    AND message.mid &lt; 100
</AccessorForItemsToDelete>

Enabling the Refresh Capability

A refresh operation keeps your source up to date by crawling your database and re-indexing modified items.

To enable the refresh capability of your Database source, ensure that the database items have a Date type field indicating their latest modification date. This date must be updated whenever a record is modified. Moreover, this database field must be dedicated for use by Coveo and shouldn’t be used for anything other than its original purpose.

Then, in your Database source XML configuration, provide the name of this database field as a dynamic value under ModifiedDate.

Thanks to this dynamic value, Coveo has the date on which each item has last been modified. Coveo then uses this date to determine whether a database item has changed since the last source update operation. If so, Coveo re-indexes the item so that the content searchable in your Coveo-powered search page matches your actual database content.

To enable the refresh capability

In the SQL query, the SELECT statement must have a WHERE clause with a criterion on the last modification date field.

Example

The following example should work with common database engines such as Microsoft SQL Server 2012, PostgreSQL, and MySQL. The [PARAMETER] field is sent by the crawler to the query to indicate when the last refresh was performed.

  • With a MSSQL or SQLServer database, one must select the SQL Client driver and, in the XML, replace [PARAMETER] with @LastRefresh.

  • With a database of a different type, one must select the ODBC driver and, in the XML, replace [PARAMETER] with ?.

However, regardless of your database type, the value of the OrderByFieldName and IncrementalRefreshFieldName parameters must be the name of the latest modification date field in your database.

<Accessor type="query"
  OrderByFieldName="dateModified"
  OrderByFieldType="DateTime"
  IncrementalRefreshFieldName="dateModified">
    Select
    id,
    title,
    dateModified,
    content,
    author
    FROM blog
    WHERE dateModified>=[PARAMETER]
    order by dateModified
    OFFSET @startRow ROWS FETCH NEXT (@endRow-@startRow) ROWS ONLY;
</Accessor>

Enable Pausing/Resuming Update Operations

By default, Database sources don’t offer the option to pause and resume their update operations. To enable these actions, you must alter your source configuration.

  1. In the SQL query, add an ORDER BY with the same chronological fields as in the SELECT statement.

  2. In your source’s XML queries, add XML attributes on the Accessor element:

    • OrderByFieldName

      Specifies the name of the column on which the ORDER BY is applied. This attribute must be present to enable the Pause/Resume options. It can be, for example, the name of the latest modification date field in your database.

    • OrderByFieldType

      Specifies the .NET data type of that column. This attribute isn’t normally required. The source automatically tries to determine the data type by preparing the SQL query - without however executing it - and looking at the schema of the results. However, if a specific DBMS doesn’t handle that process correctly, you can manually specify the data type with this attribute. The allowed types are the following:

      • short (16-bit signed integer)

      • ushort (16-bit unsigned integer)

      • int (32-bit signed integer)

      • uint (32-bit unsigned integer)

      • long (64-bit signed integer)

      • ulong (64-bit unsigned integer)

      • float (single-precision floating point number)

      • double (double-precision floating point number)

      • string (String)

    • IncrementalRefreshFieldName

      To support both Refresh and Pause/Resume options, the value of the IncrementalRefreshFieldName parameter must be the same as that of the OrderByFieldName parameter.

      See this example for an excerpt of a query for a source with Pause/Resume and Refresh options enabled:

      Note

      You must replace <PARAMETER> by @LastRefresh in an SqlClient scenario or by ? otherwise.

      <Mapping type="Orders">
        <Accessor type="query"
                  OrderByFieldName="OrderDate"
                  OrderByFieldType="DateTime"
                  IncrementalRefreshFieldName="OrderDate">
         SELECT Shippers.CompanyName AS ShipperName,
                Orders.OrderID AS ID,
                Orders.CustomerID,
                Orders.OrderDate,
                Orders.RequiredDate,
                Orders.ShippedDate,
                Customers.CompanyName,
                Employees.LastName,
                Employees.FirstName
         FROM   Orders, Shippers, Customers, Employees
         WHERE  Orders.ShipVia = Shippers.ShipperID AND
                Orders.CustomerID = Customers.CustomerID AND
                Orders.EmployeeID = Employees.EmployeeID AND
                Orders.OrderDate >= <PARAMETER>
          ORDER BY Orders.OrderDate
        </Accessor>
      </Mapping>

Complement Information Retrieval Using Subqueries

The Database source acquires information about each indexed item through a query performed against a database. For each query, it’s possible to associate one or more subqueries to be executed and used to complement information.

Example

You can run a main query, and for each row, run a subquery that crawls more/different information. All the results of a single row from the main query, along with everything from the subquery, are merged into a single item.

The source requires a mapping configuration to execute properly. For each mapping type, it’s necessary to specify an Accessor representing the SQL query to execute.

Note

Coveo forbids using dynamic expressions that populate the Uri field with metadata extracted through a subquery. This is because Coveo wouldn’t be able to resolve the Uri value should the query ever fail. As a result, the impacted content items wouldn’t be indexed or might be deleted from your index.

Specifying Subqueries

To specify subqueries, you must set the type of the Accessor to query.

<Accessor type="query">

Following the Accessor definition, add an AccessorSubQueries node with all subqueries:

Note

The master key (value following SELECT) in the AccessorSubQuery node must match exactly the one returned by the server. The key you include must also have the same casing. The following error is thrown when the key couldn’t be found:

Unable to index document : There’s a formatting error in a sub query. Cannot find master key %[key].

<AccessorSubQueries>
  <AccessorSubQuery name="FirstNameLastName" separator=";" allowDuplicates = "false">
    SELECT firstName,
    lastName
    FROM employeelist
    WHERE Email_id = %[sender]
  </AccessorSubQuery>
</AccessorSubQueries>

Subquery Attributes

  • name

    Subquery name referred to in the Fields section of the mapping.

  • separator

    Separator used when concatenating many rows.

  • allowDuplicates (optional)

    This attribute is mainly used when your subquery returns many rows. If set to false, duplicates in the results are ignored in the concatenation of the results. If set to true, duplicates are present.

  • singleQuoteEscapeSequence (optional)

    When the value of the returned field contains single quotes, these single quotes must be escaped. By default when you omit this attribute, the source escapes the single quotes by doubling them (for example, ''). Usually, this escaping mechanism should work. However, some database types require a different escaping sequence for single quotes. In such cases, use this attribute to specify the single quote escape sequence.

    Example

    For the MySQL database, the single quote escaping sequence is \'. In this case, in the AccessorSubQuery tag, include the singleQuoteEscapeSequence attribute as follows:

      <AccessorSubQuery name="FirstNameLastName" separator=";" allowDuplicates = "false" singleQuoteEscapeSequence="\'">

Subquery Master Key

In a subquery, a master key used in the WHERE clause must respect the format %[fieldName], which corresponds to metadata acquired from the main accessor. The master key is used to make the join between the main query and subqueries.

Specifying Subquery Metadata for Fields

The <Fields> section of the mapping configuration is used to specify the metadata to use for indexing.

See the following example for a typical <Fields> section of a mapping configuration:

<Fields>
  <Uri>link:https://www.coveo.com/Emails/details.aspx?Id=%[mid]</Uri>
  <ClickableUri>https://www.coveo.com</ClickableUri>
  <FileName>Message_%[mid].txt</FileName>
  <Title>Message_%[mid]</Title>
  <ModifiedDate>%[date]</ModifiedDate>
  <Body>%[body]</Body>
  <CustomFields>
    <CustomField name="sysAuthor">%[sender]</CustomField>
    <CustomField name="firstName">%[FirstNameLastName.firstName]</CustomField>
    <CustomField name="lastName">%[FirstNameLastName.lastName]</CustomField>
  </CustomFields>
</Fields>

The metadata of a subquery can be specified for a field or a custom field. The way to specify is similar to the way it’s done when referring a field coming from the main accessor: %[subQueryName.fieldName]. In the above example, custom field firstName is referring subquery named FirstNameLastName and uses the firstName metadata.

See the following for a complete mapping configuration, used in Coveo unit tests:

<?xml version="1.0" encoding="utf-8" ?>
<ODBC>
  <CommonMapping excludedItems="employeelist">
    <AllowedUsers>
      <AllowedUser type="Windows" allowed="true">
        <Name>everyone</Name>
        <Server></Server>
      </AllowedUser>
    </AllowedUsers>
  </CommonMapping>
  <Mapping type="message">
    <Accessor type="query">
      SELECT message.mid,
      message.sender,
      message.date,
      message.message_id,
      message.subject,
      message.body,
      message.folder
      FROM message
      WHERE DATE like '2001-04-07%'
    </Accessor>
      <AccessorSubQueries>
        <AccessorSubQuery name="FirstNameLastName" separator=";">
          SELECT firstName, lastName
          FROM employeelist
          WHERE Email_id = %[sender]
        </AccessorSubQuery>
      </AccessorSubQueries>
    <Fields>
      <Uri>link:https://www.coveo.com/Emails/details.aspx?Id=%[mid]</Uri>
      <ClickableUri>https://www.coveo.com</ClickableUri>
      <FileName>Message_%[mid].txt</FileName>
      <Title>Message_%[mid]</Title>
      <ModifiedDate>%[date]</ModifiedDate>
      <Body>%[body]</Body>
      <CustomFields>
        <CustomField name="sysAuthor">%[sender]</CustomField>
        <CustomField name="firstName">%[FirstNameLastName.firstName]</CustomField>
        <CustomField name="lastName">%[FirstNameLastName.lastName]</CustomField>
      </CustomFields>
    </Fields>
    <AllowedUsers>
      <AllowedUser type="CustomGroup" allowed="true">
        <Name>everyone</Name>
        <Server></Server>
      </AllowedUser>
    </AllowedUsers>
  </Mapping>
</ODBC>