XML Configuration Further Customization
XML Configuration Further Customization
To index database content, you must provide a list of XML queries defining the content that Coveo should retrieve and how it should index it. Coveo will run these queries against your database and index the returned information in the fields you specified.
In addition, you can optionally customize your XML configuration to enable additional functionalities that will make your source and the indexing process more efficient.
Defining Items to Delete
If you want to index certain database items based on a characteristic that may vary, your XML configuration must include instructions as to which items are no longer relevant and can be deleted from the index.
If using the AccessorForItemsToDelete
accessor, ensure to enable the refresh capability, as item deletion takes place during refresh operations only.
You index MESSAGE
items with a message_id
that’s less than 100
.
The configuration is the following for your MS-SQL database:
<?xml version="1.0" encoding="utf-8" ?>
<ODBC>
<CommonMapping>
<AllowedUsers>
<AllowedUser type="Windows" allowed="true">
<Name>everyone</Name>
<Server></Server>
</AllowedUser>
</AllowedUsers>
</CommonMapping>
<Mapping type="MESSAGE">
<Accessor type="query"
IncrementalRefreshFieldName="date">
SELECT * FROM (
SELECT ROW_NUMBER() OVER (ORDER BY mid) AS mid, -- message.mid,
message.sender,
message.date,
message.message_id,
message.subject,
Cast(@LastRefresh as nvarchar(4000)) as FunkyDate,
message.body,
message.folder
FROM message
) AS T
WHERE mid <= 100
</Accessor>
<Fields>
<Uri>link:https://www.coveo.com/Emails/details.aspx?Id=%[mid]</Uri>
<ClickableUri>https://www.coveo.com</ClickableUri>
<FileName>Message_%[mid].txt</FileName>
<Title>Message_%[mid] - %[FunkyDate]</Title>
<ModifiedDate>%[date]</ModifiedDate>
<Body>%[body]</Body>
<CustomFields>
<CustomField name="sysAuthor">%[sender]</CustomField>
</CustomFields>
</Fields>
<AllowedUsers>
<AllowedUser type="CustomGroup" allowed="true">
<Name>everyone</Name>
<Server></Server>
</AllowedUser>
</AllowedUsers>
</Mapping>
</ODBC>
However, if, among the items with a message_id
inferior to 100
, you want to index items of a certain age only, you need to add an instruction for Coveo to delete the items that don’t satisfy your age condition.
Since the age of an item changes constantly, each of the indexed items will eventually become irrelevant and will need to be removed from your index.
You want items that are less than a month old only to be searchable.
With the following instruction inserted before <Fields>
in your XML configuration, every time your Database source is refreshed, items that no longer meet your age criterion are deleted from the index:
<AccessorForItemsToDelete type="query">
SELECT message.mid
FROM message
WHERE message.date < DATEADD(month, -1, GETDATE())
AND message.mid < 100
</AccessorForItemsToDelete>
Enabling the Refresh Capability
A refresh operation keeps your source up to date by crawling your database and re-indexing modified items.
To enable the refresh capability of your Database source, ensure that the database items have a Date
type field indicating their latest modification date.
This date must be updated whenever a record is modified.
Moreover, this database field must be dedicated for use by Coveo and shouldn’t be used for anything other than its original purpose.
Then, in your Database source XML configuration, provide the name of this database field as a dynamic value under ModifiedDate
.
Thanks to this dynamic value, Coveo has the date on which each item has last been modified. Coveo then uses this date to determine whether a database item has changed since the last source update operation. If so, Coveo re-indexes the item so that the content searchable in your Coveo-powered search page matches your actual database content.
To enable the refresh capability
In the SQL query, the SELECT
statement must have a WHERE
clause with a criterion on the last modification date field.
The following example should work with common database engines such as Microsoft SQL Server 2012, PostgreSQL, and MySQL.
The [PARAMETER]
field is sent by the crawler to the query to indicate when the last refresh was performed.
-
With a MSSQL or SQLServer database, one must select the SQL Client driver and, in the XML, replace
[PARAMETER]
with@LastRefresh
. -
With a database of a different type, one must select the ODBC driver and, in the XML, replace
[PARAMETER]
with?
.
However, regardless of your database type, the value of the OrderByFieldName
and IncrementalRefreshFieldName
parameters must be the name of the latest modification date field in your database.
<Accessor type="query"
OrderByFieldName="dateModified"
OrderByFieldType="DateTime"
IncrementalRefreshFieldName="dateModified">
Select
id,
title,
dateModified,
content,
author
FROM blog
WHERE dateModified>=[PARAMETER]
order by dateModified
OFFSET @startRow ROWS FETCH NEXT (@endRow-@startRow) ROWS ONLY;
</Accessor>
Enable Pausing/Resuming Update Operations
By default, Database sources don’t offer the option to pause and resume their update operations. To enable these actions, you must alter your source configuration.
-
In the SQL query, add an
ORDER BY
with the same chronological fields as in theSELECT
statement. -
In your source’s XML queries, add XML attributes on the
Accessor
element:-
OrderByFieldName
Specifies the name of the column on which the
ORDER BY
is applied. This attribute must be present to enable the Pause/Resume options. It can be, for example, the name of the latest modification date field in your database. -
OrderByFieldType
Specifies the .NET data type of that column. This attribute isn’t normally required. The source automatically tries to determine the data type by preparing the SQL query - without however executing it - and looking at the schema of the results. However, if a specific DBMS doesn’t handle that process correctly, you can manually specify the data type with this attribute. The allowed types are the following:
-
short (16-bit signed integer)
-
ushort (16-bit unsigned integer)
-
int (32-bit signed integer)
-
uint (32-bit unsigned integer)
-
long (64-bit signed integer)
-
ulong (64-bit unsigned integer)
-
float (single-precision floating point number)
-
double (double-precision floating point number)
-
string (String)
-
-
IncrementalRefreshFieldName
To support both Refresh and Pause/Resume options, the value of the
IncrementalRefreshFieldName
parameter must be the same as that of theOrderByFieldName
parameter.See this example for an excerpt of a query for a source with Pause/Resume and Refresh options enabled:
NoteYou must replace
<PARAMETER>
by@LastRefresh
in an SqlClient scenario or by?
otherwise.<Mapping type="Orders"> <Accessor type="query" OrderByFieldName="OrderDate" OrderByFieldType="DateTime" IncrementalRefreshFieldName="OrderDate"> SELECT Shippers.CompanyName AS ShipperName, Orders.OrderID AS ID, Orders.CustomerID, Orders.OrderDate, Orders.RequiredDate, Orders.ShippedDate, Customers.CompanyName, Employees.LastName, Employees.FirstName FROM Orders, Shippers, Customers, Employees WHERE Orders.ShipVia = Shippers.ShipperID AND Orders.CustomerID = Customers.CustomerID AND Orders.EmployeeID = Employees.EmployeeID AND Orders.OrderDate >= <PARAMETER> ORDER BY Orders.OrderDate </Accessor> </Mapping>
-
Complement Information Retrieval Using Subqueries
The Database source acquires information about each indexed item through a query performed against a database. For each query, it’s possible to associate one or more subqueries to be executed and used to complement information.
You can run a main query, and for each row, run a subquery that crawls more/different information. All the results of a single row from the main query, along with everything from the subquery, are merged into a single item.
The source requires a mapping configuration to execute properly.
For each mapping type, it’s necessary to specify an Accessor
representing the SQL query to execute.
Note
Coveo forbids using dynamic expressions that populate the |
Specifying Subqueries
To specify subqueries, you must set the type of the Accessor
to query.
<Accessor type="query">
Following the Accessor
definition, add an AccessorSubQueries
node with all subqueries:
Note
The master key (value following
|
<AccessorSubQueries>
<AccessorSubQuery name="FirstNameLastName" separator=";" allowDuplicates = "false">
SELECT firstName,
lastName
FROM employeelist
WHERE Email_id = %[sender]
</AccessorSubQuery>
</AccessorSubQueries>
Subquery Attributes
-
name
Subquery name referred to in the
Fields
section of the mapping. -
separator
Separator used when concatenating many rows.
-
allowDuplicates (optional)
This attribute is mainly used when your subquery returns many rows. If set to
false
, duplicates in the results are ignored in the concatenation of the results. If set totrue
, duplicates are present. -
singleQuoteEscapeSequence (optional)
When the value of the returned field contains single quotes, these single quotes must be escaped. By default when you omit this attribute, the source escapes the single quotes by doubling them (for example,
''
). Usually, this escaping mechanism should work. However, some database types require a different escaping sequence for single quotes. In such cases, use this attribute to specify the single quote escape sequence.ExampleFor the MySQL database, the single quote escaping sequence is
\'
. In this case, in theAccessorSubQuery
tag, include thesingleQuoteEscapeSequence
attribute as follows:<AccessorSubQuery name="FirstNameLastName" separator=";" allowDuplicates = "false" singleQuoteEscapeSequence="\'">
Subquery Master Key
In a subquery, a master key used in the WHERE
clause must respect the format %[fieldName]
, which corresponds to metadata acquired from the main accessor.
The master key is used to make the join between the main query and subqueries.
Specifying Subquery Metadata for Fields
The <Fields>
section of the mapping configuration is used to specify the metadata to use for indexing.
See the following example for a typical <Fields>
section of a mapping configuration:
<Fields>
<Uri>link:https://www.coveo.com/Emails/details.aspx?Id=%[mid]</Uri>
<ClickableUri>https://www.coveo.com</ClickableUri>
<FileName>Message_%[mid].txt</FileName>
<Title>Message_%[mid]</Title>
<ModifiedDate>%[date]</ModifiedDate>
<Body>%[body]</Body>
<CustomFields>
<CustomField name="sysAuthor">%[sender]</CustomField>
<CustomField name="firstName">%[FirstNameLastName.firstName]</CustomField>
<CustomField name="lastName">%[FirstNameLastName.lastName]</CustomField>
</CustomFields>
</Fields>
The metadata of a subquery can be specified for a field or a custom field.
The way to specify is similar to the way it’s done when referring a field coming from the main accessor: %[subQueryName.fieldName]
.
In the above example, custom field firstName
is referring subquery named FirstNameLastName
and uses the firstName
metadata.
See the following for a complete mapping configuration, used in Coveo unit tests:
<?xml version="1.0" encoding="utf-8" ?>
<ODBC>
<CommonMapping excludedItems="employeelist">
<AllowedUsers>
<AllowedUser type="Windows" allowed="true">
<Name>everyone</Name>
<Server></Server>
</AllowedUser>
</AllowedUsers>
</CommonMapping>
<Mapping type="message">
<Accessor type="query">
SELECT message.mid,
message.sender,
message.date,
message.message_id,
message.subject,
message.body,
message.folder
FROM message
WHERE DATE like '2001-04-07%'
</Accessor>
<AccessorSubQueries>
<AccessorSubQuery name="FirstNameLastName" separator=";">
SELECT firstName, lastName
FROM employeelist
WHERE Email_id = %[sender]
</AccessorSubQuery>
</AccessorSubQueries>
<Fields>
<Uri>link:https://www.coveo.com/Emails/details.aspx?Id=%[mid]</Uri>
<ClickableUri>https://www.coveo.com</ClickableUri>
<FileName>Message_%[mid].txt</FileName>
<Title>Message_%[mid]</Title>
<ModifiedDate>%[date]</ModifiedDate>
<Body>%[body]</Body>
<CustomFields>
<CustomField name="sysAuthor">%[sender]</CustomField>
<CustomField name="firstName">%[FirstNameLastName.firstName]</CustomField>
<CustomField name="lastName">%[FirstNameLastName.lastName]</CustomField>
</CustomFields>
</Fields>
<AllowedUsers>
<AllowedUser type="CustomGroup" allowed="true">
<Name>everyone</Name>
<Server></Server>
</AllowedUser>
</AllowedUsers>
</Mapping>
</ODBC>