Add or Edit a Source Using One of the Available Connectors
- Amazon S3 Source
- Box (Personal) Source
- Box Business Source
- Confluence Cloud Source
- Confluence Self-Hosted Source
- Database Source
- Dropbox (Personal) Source
- Dropbox Business Source
- Exchange Enterprise Source
- Exchange Online (Personal) Source
- File System Source
- Generic REST API Source
- Gmail for Work Source
- Gmail (Personal) Source
- Google Drive (Personal) Source
- Google Drive for Work Source
- Jira Software Cloud Source
- Jira Software Self-Hosted Source
- Jive Cloud Source
- Jive Server Source
- Khoros Community Source
- Microsoft Dynamics 365 Source
- OneDrive for Business Source
- OTCS Source
- Push Source
- RSS Source
- Salesforce Source
- ServiceNow Source
- SharePoint Online Source
- SharePoint Online Legacy Source
- SharePoint Server Source
- Sitecore Source
- Sitemap Source
- Twitter Source
- Web Source
- YouTube Source
- Zendesk Source
Add or Edit a Database Source
A Database source allows members of the Administrators and Content Managers built-in groups to retrieve and make searchable the content of a local database via the Coveo On-Premises Crawling Module (see Coveo On-Premises Crawling Module).
Your company developer created a custom database to manage the parts used in your facilities, their location in your warehouse, and purchase orders. You decide to index data regarding purchase orders only so that your buyers can find this content via your Coveo-powered search page.
As an administrator or a content manager, you can add the content of a local database to a Coveo Cloud organization. In a Coveo-powered search interface, the source content is accessible to either everyone, the source creator only, or specific users as determined by source permissions (see Content Security).
Some permission systems aren’t officially supported by Coveo Cloud. A custom project by Coveo experts may allow you to index your Database source permissions. So, if you want your Coveo-powered search interface to replicate these permissions, contact the Coveo Support team.
By default, a Database source starts a rescan every day to retrieve content changes (addition, modification, or deletion).
Source Features Summary
|Content security options||Determined by source permissions||Some permission systems aren't officially supported by Coveo Cloud. Contact the Coveo Support team.|
Add or Edit a Database Source
You can configure a Database source though the Coveo Cloud Administration Console if you use a 64-bit driver to connect to your database. If you use a 32-bit driver, you must use the Coveo Cloud Platform API and a JSON source configuration (see Creating a Crawling Module Source Using the Source API).
Ensure that the Coveo On-Premises Crawling Module is installed on a server that has access to the database of which you want to retrieve the content (see Crawling Module Deployment Overview).
Access the Add/Edit a Database Source panel:
To add a source, in the main menu, under Content, select Sources > Add Source button > Database.
To edit a source, in the main menu, under Content, select Sources > source row > Edit in the Action bar.
In the Add/Edit a Database Source panel, in the Configuration tab, if not already done, click Download Crawling Module to install the Coveo On-Premises Crawling Module on a server that has access to the database of which you want to retrieve the content.
Enter appropriate values for the available parameters:
A descriptive name for your source under 255 characters (not already in use for another source in this organization).
You can’t change the source name once it’s created.
The parameters to use to connect to your database.
Since connection strings aren’t encrypted, they should never contain credentials in plain-text. You can hide the password and user ID in the connection string by respectively introducing the
@pwdtokens. The Database source internally replaces the tokens with the information provided in the Authentication section (see Authentication).
You must provide either both tokens or none. If you don’t provide tokens, but add source credentials, the behavior will be the same as before, meaning that the credentials will be used to impersonate the process running the queries.
For a basic connection string:
Data Source=mydatabase.mycompany.com;Initial Catalog=MyDatabase;User Id=companyUser;Password=MyPassword
Hiding password and user ID using tokens:
Data Source=mydatabase.mycompany.com;Initial Catalog=MyDatabase;User Id=@uid;Password=@pwd
The connection string syntax differs from one database type to another (see Connection Strings).
The table or view object names (
<Mapping type="name">) that are defined in the database configuration and that you want to index (see XML Configuration).
Select the software driver that provides access to your database.
Paired Crawling Module
If your source is a Crawling Module source and if you have more than one Crawling Module linked to this organization, select the one with which you want to pair your source. If you change the Crawling Module instance with which your source is paired, a successful rebuild is required for your change to apply.
Character optical recognition (OCR)
Check this box if you want Coveo Cloud to extract text from image files or PDF files containing images (see Enable Optical Character Recognition). OCR-extracted text is processed as item data, meaning that it’s fully searchable and will appear in the item Quick View (see Search Result Quick View).
When adding a source, if you have more than one logical (non-Elasticsearch) index in your organization, select the index in which the retrieved content will be stored (see Leverage Many Coveo Indexes). If your organization only has one index, this drop-down menu isn’t visible and you have no decision to make.
In the Authentication section, in the Username box, enter the username of a dedicated administrator account that has access to all the content you want to include. In the Password box, enter the corresponding password.
In the Database Configuration section, in the XML configuration box, enter a XML-formatted configuration instructing Coveo Cloud to retrieve and copy the data from record fields to Coveo default and standard source fields.
Ensure that your configuration only contain read-only queries to not make any changes to your database.
The source indexes items through a query against a database. Subqueries can run on every item to complete the information with more complex queries (see Complement Information Retrieval Using Subqueries). Moreover, if your query is complex and already configured as a stored procedure, you can leverage it in your XML configuration (see Use a Stored Procedure).
If you want to index certain database items depending on a characteristic that may vary, your configuration must include instructions as to which items are no longer relevant and can be deleted from the index.
MESSAGEitems with a
message_idthat’s less than
100. The configuration is the following for your MS-SQL database:
<?xml version="1.0" encoding="utf-8" ?> <ODBC> <CommonMapping> <AllowedUsers> <AllowedUser type="Windows" allowed="true"> <Name>everyone</Name> <Server></Server> </AllowedUser> </AllowedUsers> </CommonMapping> <Mapping type="MESSAGE"> <Accessor type="query" IncrementalRefreshFieldName="date"> SELECT * FROM ( SELECT ROW_NUMBER() OVER (ORDER BY mid) AS mid, -- message.mid, message.sender, message.date, message.message_id, message.subject, Cast(@LastRefresh as nvarchar(4000)) as FunkyDate, message.body, message.folder FROM message ) AS T WHERE mid <= 100 </Accessor> <Fields> <Uri>https://www.coveo.com/Emails/details.aspx?Id=%[mid]</Uri> <ClickableUri>https://www.coveo.com</ClickableUri> <FileName>Message_%[mid].txt</FileName> <Title>Message_%[mid] - %[FunkyDate]</Title> <ModifiedDate>%[date]</ModifiedDate> <Body>%[body]</Body> <CustomFields> <CustomField name="sysAuthor">%[sender]</CustomField> </CustomFields> </Fields> <AllowedUsers> <AllowedUser type="CustomGroup" allowed="true"> <Name>everyone</Name> <Server></Server> </AllowedUser> </AllowedUsers> </Mapping> </ODBC>
However, if, among the items with a
100, you want to index items of a certain age only, you need to add an instruction for Coveo Cloud to delete the items that don’t satisfy your age condition. Since the age of an item changes constantly, each of the indexed items will eventually become irrelevant and will need to be removed from your index.
You want items that are less than a month old only to be searchable. With the following instruction inserted before
<Fields>in your XML configuration, every time your Database source is refreshed, items that no longer meet your age criterion are deleted from the index (see Refresh, Rescan, and Rebuild):
<AccessorForItemsToDelete type="query"> SELECT message.mid FROM message WHERE message.date < DATEADD(month, -1, GETDATE()) AND message.mid < 100 </AccessorForItemsToDelete>
In the Content Security tab, select who will be able to access the source items through a Coveo-powered search interface. For details on this parameter, see Content Security.
In the Access tab, determine whether each group and API key can view or edit the source configuration (see Understanding Resource Access):
In the Access Level column, select View or Edit for each available group.
On the left-hand side of the tab, if available, click Groups or API Keys to switch lists.
If you remove the Edit access level from all the groups of which you’re a member, you won’t be able to edit the source again after saving. Only administrators and members of other groups that have Edit access on this resource will be able to do so. To keep your ability to edit this resource, you must grant the Edit access level to at least one of your groups.
Optionally, consider editing or adding mappings between Database item metadata and fields in your Coveo Cloud organization (see Adding and Managing Source Mappings).
You can only manage mapping rules once you build the source (see Refresh, Rescan, or Rebuild Sources).
Complete your source addition or edition:
Click Add Source/Save when you want to save your source configuration changes without starting a build/rebuild, such as when you know you want to do other changes soon.
On the Sources page, you must click Start initial build or Start required rebuild in the source Status column to add the source content or make your changes effective, respectively.
Click Add and Build Source/Save and Rebuild Source when you’re done editing the source and want to make changes effective.
Once the source is built or rebuilt, you can review its content in the Content Browser (see Inspect Items With the Content Browser).
Enabling Refresh on a Database Source
A refresh operation keeps items up to date by scanning repositories and re-indexing modified items at short intervals. For this to be possible with a Database source, each of the database items to retrieve, such as tables or views, must have a
Date type field indicating their latest modification date. This date must also be updated whenever the record is modified. The name of this field is irrelevant to Coveo, but it’s crucial that’s contains a date, as Coveo uses this information to determine whether the content of a database item has changed since the last update operation. If so, the item must be re-indexed, so that the content searchable in your Coveo-powered search page reflects your actual database content.
So that we state in the code below you must ensure that those fields are the same, but can be any name that you would like.
The refresh takes into account deleted items when the
AccessorForItemsToDelete accessor is configured in the XML configuration (see XML Configuration). Otherwise, a source rescan or rebuild is required.
In the SQL query, the
SELECT statement must have a
WHERE clause with a criterion on the last modification date field.
The following example should work with common database engines such as Microsoft SQL Server 2012, PostgreSQL, and MySQL. The
[PARAMETER] field is sent by the crawler to the query to indicate when the last refresh was performed.
With a MSSQL or SQLServer database, one must select the SQL Client driver and, in the XML, replace
With a database of a different type (e.g., a NorthWind database), one must select the ODBC driver and, in the XML, replace
However, regardless of your database type, the value of the
IncrementalRefreshFieldName parameters must be the name of the latest modification date field in your database.
<Accessortype="query" OrderByFieldName="dateModifiedField" OrderByFieldType="DateTime" IncrementalRefreshFieldName="dateModifiedField"> <![CDATA[ Select id, title, dateModified, content, author FROM blog WHERE dateModified>=[PARAMETER] order by dateModified OFFSET @startRow ROWS FETCH NEXT (@endRow-@startRow) ROWS ONLY; ]]> </Accessor>
The example also includes support for pagination (see OFFSET FETCH Clause (SQL Server Compact)).
Enable Pausing/Resuming Update Operations
In the SQL query, add an
ORDER BYwith the same chronological fields as in the
In your source XML configuration, add XML attributes on the
Specifies the name of the column on which the
ORDER BYis applied. This attribute must be present to enable the Pause/Resume options. It can be, for example, the name of the latest modification date field in your database.
Specifies the .NET data type of that column. This attribute isn’t normally required. The source automatically tries to determine the data type by preparing the SQL query - without however executing it - and looking at the schema of the results. However, if a specific DBMS doesn’t handle that process correctly, you can manually specify the data type with this attribute. The allowed types are the following:
short (16-bit signed integer)
ushort (16-bit unsigned integer)
int (32-bit signed integer)
uint (32-bit unsigned integer)
long (64-bit signed integer)
ulong (64-bit unsigned integer)
float (single-precision floating point number)
double (double-precision floating point number)
To support both Refresh and Pause/Resume options, the value of the
IncrementalRefreshFieldNameparameter must be the same as that of the
See this example for an excerpt of a configuration file for a source with Pause/Resume and Refresh options enabled:
You must replace
@LastRefreshin an SqlClient scenario or by
<Mapping type="Orders"> <Accessor type="query" OrderByFieldName="OrderDate" OrderByFieldType="DateTime" IncrementalRefreshFieldName="OrderDate"> SELECT Shippers.CompanyName AS ShipperName, Orders.OrderID AS ID, Orders.CustomerID, Orders.OrderDate, Orders.RequiredDate, Orders.ShippedDate, Customers.CompanyName, Employees.LastName, Employees.FirstName FROM Orders, Shippers, Customers, Employees WHERE Orders.ShipVia = Shippers.ShipperID AND Orders.CustomerID = Customers.CustomerID AND Orders.EmployeeID = Employees.EmployeeID AND Orders.OrderDate >= <PARAMETER> ORDER BY Orders.OrderDate </Accessor> </Mapping>
Complement Information Retrieval Using Subqueries
The Database source acquires information about each indexed item through a query performed against a database. For each query, it’s possible to associate one or more subqueries to be executed and used to complement information.
You can run a main query, and for each row, run a subquery that crawls more/different information. All the results of a single row from the main query, along with everything from the subquery, are merged into a single item.
The source requires a mapping configuration to execute properly. For each mapping type, it’s necessary to specify an
Accessor representing the SQL query to execute.
To specify subqueries, you must set the type of the
Accessor to query.
Accessor definition, add an
AccessorSubQueries node with all subqueries:
The master key (value following
SELECT) in the
AccessorSubQuery node must match exactly the one returned by the server. The key you include must also have the same casing. The following error is thrown when the key could not be found:
Unable to index document : There's a formatting error in a sub query. Cannot find master key %[key].
<AccessorSubQueries> <AccessorSubQuery name="FirstNameLastName" separator=";" behaviorOnMultiRows="join" allowDuplicates = "false"> SELECT firstName, lastName FROM employeelist WHERE Email_id = %[sender] </AccessorSubQuery> </AccessorSubQueries>
Subquery name referred to in the
Fieldssection of the mapping.
Separator used when concatenating many rows.
Action to take when a subquery returns more than one row. The only supported behavior is join, which concatenates values with the provided separator.
This attribute is mainly used when your subquery returns many rows. If set to
false, duplicates in the results are ignored in the concatenation of the results. If set to
true, duplicates are present.
When the value of the returned field contains single quotes, these single quotes must be escaped. By default when you omit this attribute, the source escapes the single quotes by doubling them (e.g.,
''). Usually, this escaping mechanism should work. However, some database types require a different escaping sequence for single quotes. In such cases, use this attribute to specify the single quote escape sequence.
For the MySQL database, the single quote escaping sequence is
\'. In this case, in the
AccessorSubQuerytag, include the
singleQuoteEscapeSequenceattribute as follows:
<AccessorSubQuery name="FirstNameLastName" separator=";" behaviorOnMultiRows="join" allowDuplicates = "false" singleQuoteEscapeSequence="\'">
Subquery Master Key
In a subquery, a master key used in the
WHERE clause must respect the format
%[fieldName], which corresponds to metadata acquired from the main accessor. The master key is used to make the join between the main query and subqueries.
Specifying Subquery Metadata for Fields
<Fields> section of the mapping configuration is used to specify the metadata to use for indexing.
See the following example for a typical
<Fields> section of a mapping configuration:
<Fields> <Uri>https://www.coveo.com/Emails/details.aspx?Id=%[mid]</Uri> <ClickableUri>https://www.coveo.com</ClickableUri> <FileName>Message_%[mid].txt</FileName> <Title>Message_%[mid]</Title> <ModifiedDate>%[date]</ModifiedDate> <Body>%[body]</Body> <CustomFields> <CustomField name="sysAuthor">%[sender]</CustomField> <CustomField name="firstName">%[FirstNameLastName.firstName]</CustomField> <CustomField name="lastName">%[FirstNameLastName.lastName]</CustomField> </CustomFields> </Fields>
The metadata of a subquery can be specified for a field or a custom field. The way to specify is similar to the way it’s done when referring a field coming from the main accessor:
%[subQueryName.fieldName]. In the above example, custom field
firstName is referring subquery named
FirstNameLastName and uses the
See the following for a complete mapping configuration, used in Coveo unit tests:
<?xml version="1.0" encoding="utf-8" ?> <ODBC> <CommonMapping excludedItems="employeelist"> <AllowedUsers> <AllowedUser type="Windows" allowed="true"> <Name>everyone</Name> <Server></Server> </AllowedUser> </AllowedUsers> </CommonMapping> <Mapping type="message"> <Accessor type="query"> SELECT message.mid, message.sender, message.date, message.message_id, message.subject, message.body, message.folder FROM message WHERE DATE like '2001-04-07%' </Accessor> <AccessorSubQueries> <AccessorSubQuery name="FirstNameLastName" separator=";" behaviorOnMultiRows="join"> SELECT firstName, lastName FROM employeelist WHERE Email_id = %[sender] </AccessorSubQuery> </AccessorSubQueries> <Fields> <Uri>https://www.coveo.com/Emails/details.aspx?Id=%[mid]</Uri> <ClickableUri>https://www.coveo.com</ClickableUri> <FileName>Message_%[mid].txt</FileName> <Title>Message_%[mid]</Title> <ModifiedDate>%[date]</ModifiedDate> <Body>%[body]</Body> <CustomFields> <CustomField name="sysAuthor">%[sender]</CustomField> <CustomField name="firstName">%[FirstNameLastName.firstName]</CustomField> <CustomField name="lastName">%[FirstNameLastName.lastName]</CustomField> </CustomFields> </Fields> <AllowedUsers> <AllowedUser type="CustomGroup" allowed="true"> <Name>everyone</Name> <Server></Server> </AllowedUser> </AllowedUsers> </Mapping> </ODBC>
Use a Stored Procedure
If your database content retrieval query is complex and already configured as a stored procedure, you may want to leverage it in your XML configuration.
While writing your XML configuration, under
Accessor type="query" enter
exec storedProcedureName; instead of a
SELECT statement with a
FROM clause, and then replace
storedProcedureName with the name of the desired procedure. If parameters are required to call the stored procedure, add them as follows:
exec storedProcedureName @param1 = 'param1Value', @param2= 'param2Value', @param3= 'param3Value';
Review your source update schedule and optionally change it so that it better fits your needs (see Edit a Source Schedule). By default, your content is rescanned every day.
If you encounter database timeout errors, you may want to edit the source JSON configuration and set the
CommandTimeouthidden parameter value to
600seconds in the
parameterssection (see Add a Hidden Source Parameter).
If the issue persists, you can either increase the parameter value or use paged SQL queries instead.
Once the source is built or rebuilt, you can review its content in the content browser (see Inspect Items With the Content Browser).
Consider subscribing to deactivation notifications to receive an alert when a Crawling Module component becomes obsolete and stops the content crawling process.