Ingest data into a data collection
Ingest data into a data collection
This article explains how to ingest data into a data collection. You can use update operations to send incremental changes or rebuild operations to replace the entire contents of the data collection.
Prerequisites
Make sure you have:
-
A data collection with the appropriate
dataCollectionTypevalue. Create the data collection if you haven’t created it yet.
-
An API key with the privileges listed in the following table. See Manage privileges and Privilege reference for details.
Actions Service Domain Required access level View data collections and their items
Organization
Data collection
View
Organization
Organization
View
Create, edit, and delete data collection configurations
Organization
Data collection
Edit
Add, update, and delete items in a data collection
Organization
Organization
Edit
Content
Push items to sources
Allow for all sources
NoteThe Edit privilege on the
Data collectiondomain automatically grants the ability to create data collections. See Can Create ability dependence for more information.
Update operations
Use the update operation sequence when you want to specify changes to the data collection content. You can add, partially update, or delete items.
Step 1: Create a file container
Create a temporary, private, and encrypted Amazon S3 file container using the following request. Save the response body because it contains file container information that you’ll use in the next steps of the update operation.
Request template
POST https://api.cloud.coveo.com/push/v1/organizations/<MyOrganizationId>/files?useVirtualHostedStyleUrl=<true|false> HTTP/1.1
Accept: application/json
Content-Type: application/json
Authorization: Bearer <MyAccessToken>
Request parameters
In the request path:
-
Replace
<MyOrganizationId>with your organization ID.
In the query string:
-
Optionally, set
useVirtualHostedStyleUrltotrueif you want the service to return a virtual-hosted-style URL, such ascoveo-nprod-customerdata.s3.amazonaws.com/.... The default value is currentlyfalse, which means that the service returns path-style URLs, such ass3.amazonaws.com/coveo-nprod-customerdata/....The
useVirtualHostedStyleUrlquery string parameter will soon be deprecated as part of the path-style URL deprecation. From this point onwards, the service will only return virtual hosted-style URLs.
In the Authorization HTTP header:
-
Replace
<MyAccessToken>with an API key that has the required privileges to push content to the data collection.
Payload: None
Successful response: 201 Created
The body of a successful response contains important information about the created file container:
{
"uploadUri": "<UPLOAD_URI>",
"fileId": "<FILE_ID>",
"requiredHeaders": {
"x-amz-server-side-encryption": "AES256",
"Content-Type": "application/octet-stream"
}
}
The uploadUri property contains a pre-signed URI that you use to
make a PUT request when pushing a batch of data collection items in step 2.
|
|||
The fileId property contains the unique identifier of your file container that you’ll need in step 3. |
|||
The requiredHeaders property contains the required HTTP headers for sending a PUT request to the uploadUri. |
Step 2: Upload the update into the file container
To upload the data collection content update into the Amazon S3 file container you created in step 1, perform the following PUT request:
Request template:
PUT <MyUploadURI> HTTP/1.1
<HTTPHeaders>
Request parameters
In the request path:
-
Replace
<MyUploadURI>with the value of theuploadUriproperty you received in the response when you created your file container in step 1.
For the <HTTPHeaders>:
-
Enter the key-value pairs of the
requiredHeadersobject property you received in the response when you created your file container in step 1.
Payload:
The payload must be a JSON of no more than 256 MB and can contain any combination of addOrUpdate, partialUpdate, and delete operations.
{
"addOrUpdate": [
{
"itemId": "transaction-002",
"timestamp": "2025-01-15T10:30:00.000Z",
"currency": "USD",
"transaction": {
"revenue": 49.99
},
"products": [
{
"product": {
"productId": "SKU-1001",
"price": 24.99
},
"quantity": 2
}
]
},
{
"itemId": "transaction-003",
"timestamp": "2025-01-16T14:22:00.000Z",
"currency": "USD",
"transaction": {
"revenue": 129.97
},
"products": [
{
"product": {
"productId": "SKU-2045",
"price": 129.97
},
"quantity": 1
}
]
}
],
"partialUpdate": [
{
"itemId": "transaction-004",
"operator": "fieldValueReplace",
"field": "transaction",
"value": {
"revenue": 65.99
}
}
],
"delete": [
{
"itemId": "transaction-001"
}
]
}
Each item in the addOrUpdate array must adhere to the relevant schema. |
|||||||||
Each item in the partialUpdate array must include the itemId, operator, field, and value properties.
itemIdThe unique identifier of the transaction to update.
This must match an existing operatorThe partial update operator to apply. The following operators are supported:
fieldThe name of the property to update. valueThe value to set, add, or remove, depending on the chosen operator. For For |
|||||||||
Each item in the delete array must specify the itemId property. |
Successful response: 200 OK
A successful response has no content, but indicates that the content update was successfully uploaded to the Amazon S3 file container.
Step 3: Send the file container to update your data collection
To send the file container to the data collection and trigger the update processing, perform the following POST request:
Request template:
POST https://api.cloud.coveo.com/push/v1/organizations/<MyOrganizationId>/data-collections/<MyDataCollectionId>/stream/update?fileId=<MyFileId> HTTP/1.1
Content-Type: application/json
Authorization: Bearer <MyAccessToken>
Request parameters
In the request path:
-
Replace
<MyOrganizationId>with your organization ID. -
Replace
<MyDataCollectionId>with the unique identifier of your data collection that you received when you created the data collection.
In the query string:
-
Replace
<MyFileId>with the value of thefileIdproperty you received in the response when you created your file container in step 1. ThefileIdmay be used to locate log entries related to this update in the Log Browser (platform-ca | platform-eu | platform-au).
In the Authorization HTTP header:
-
Replace
<MyAccessToken>with an access token, such as an API key that has the required privileges to push content to the data collection.
Payload: None
Successful response: 202
A successful response indicates that the update operation was successfully queued for processing.
The response body contains an orderingId, which represents the timestamp (in epoch milliseconds) when the operation was accepted, and a requestId.
{
"orderingId": "1781207560610",
"requestId": "28e37f9c-c65b-4ed3-9c09-7bf4135a5235"
}
Rebuild operations
Use the rebuild operation sequence when you want to replace the entire contents of a data collection. Any item in the data collection not received during the rebuild will be deleted when the rebuild stream is closed. This is useful when you need to guarantee full data integrity.
This involves a three-step process:
Step 1: Open a stream
To open a stream targeting the data collection, perform the following POST request:
Request template:
POST https://api.cloud.coveo.com/push/v1/organizations/<MyOrganizationId>/data-collections/<MyDataCollectionId>/stream/open HTTP/1.1
Content-Type: application/json
Authorization: Bearer <MyAccessToken>
Request parameters
In the request path:
-
Replace
<MyOrganizationId>with your organization ID. -
Replace
<MyDataCollectionId>with the unique identifier of your data collection that you received when you created the data collection.
In the Authorization HTTP header:
-
Replace
<MyAccessToken>with an access token, such as an API key that has the required privileges to push content to the data collection.
Payload: None
Successful response: 201
A successful response indicates that the stream was successfully opened.
The response body contains a streamId to identify the stream and an uploadUri for uploading items in step 2.
{
"uploadUri": "https://coveo-nprod-customerdata.s3.amazonaws.com/stream/barcateamjqcy9k1a/284da6bc-6150-44fe-8821-c4ad0172a49e/07739df9-dfcf-46ee-80d0-b075d9a49c9f?X-Amz-Security-Token=IQoJb3JpZ2luX2VjED0aCXVzLWVhc3QtMSJGMEQCIDydGUFoV2choYYac%2FivQ5nP4czb5gjH8ysv1V9JX5s5AiBw9pkqMdDIE1P91liS9FuoOm0bF62yyZjTvYb0yK5oMirsAwgGEAAaDDA2NDc5MDE1NzE1NCIMBF4xEzSFuDPRVaudKskDGnY0VhCWNMY0%2FdYJ9yQSz6vvUwad7U0Z9Stbq41Cm3%2Fb0w8z%2FnxbvNF9Yr%2BzE3GlM7kM8i9riICfd5a7CUFE%2FgQ1ruVm3nokj2RZ%2ByaiJwTWij2tb2d%2FmA7gZFOz4jE%2FJBvBoe31hUbpf0iHbC2tGij3w%2B1%2BLasEpfFwjdQ2TA%2BCjlWQqVk1UJugcGUBFQulAPI61tiy28ekb2Lc6g1oItUjB9Tb2P26bMTgfhvkf0uBDx8J4SsyrnT%2BGUG1FU8I%2F5LG6CR2sAmNM3N06k2TSinHaISAB08rwIZF%2Fczh1XFKSUCntoDSQE4A7fa2oVRPqNhmI1a9RpuocK%2FUJ5eYnTvXWAZt71ZC0QSp%2FxSGJoIcaeL3zEu3l3yvNbp49qn7nxsWRXraVTbHzYORo7l3Z%2FSDfZeTZPNhfwBphqM3WuBbAHQ44rRokgfeUBLOAOniMienXzQjZ7IBqMSPSuzk5claaqEl7hRhqZQeip16PXY8Rv49OErThcJIfDiLQzrqTbbWbMeioWEJKQzH6r%2F8FHszeP9%2Fkj3P7aH49%2FyZ1%2BgRjVyV66AfwLgi443CMTqmXggirXEO8xYtvR%2B05uhP4%2FFvyDGUvyvPNTCBtazRBjqiAV33d26Owcls5Vv8LBVlRGgXodMmWr0RrvsXEhrEhur2u8%2Fc4i7Cub0vL9ZWkuIWy%2FG2ASCIuW6p%2FSJPrueOhuWKJ%2FgDPwT%2B8sCHEafh86DULrRfN5XWioopcwFifaMrXQonG56mC%2FzdOpFagiSaq9K4kFFfMkm90pqJUBgtw%2Bsf2GpX8yvLziOg%2FmbUDH%2Bos%2FUPMHLizV2zZc3mHrXy3bfkaA%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20260611T204759Z&X-Amz-SignedHeaders=content-type%3Bhost%3Bx-amz-server-side-encryption&X-Amz-Credential=ASIAQ6FOLK5RLGF4NLAF%2F20260611%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Expires=3600&X-Amz-Signature=6f23e14dad833e220e2fcbb8c8d89a97886c111286a8d35c71a1e74037299ce3",
"fileId": "07739df9-dfcf-46ee-80d0-b075d9a49c9f",
"requiredHeaders": {
"x-amz-server-side-encryption": "AES256",
"Content-Type": "application/octet-stream"
},
"streamId": "284da6bc-6150-44fe-8821-c4ad0172a49e"
}
Step 2: Upload items into the stream
To upload the items into the stream you opened in step 1, perform the following PUT request:
Request template:
PUT <MyUploadURI> HTTP/1.1
Content-Type: application/json
x-amz-server-side-encryption: AES256
Request parameters
In the request path:
-
Replace
<MyUploadURI>with the value of theuploadUriproperty you received in the response when you opened your stream in step 1.
Payload:
The payload must be a JSON of no more than 256 MB and can only contain an addOrUpdate array.
{
"addOrUpdate": [
{
"itemId": "transaction-001",
"timestamp": "2025-01-15T10:30:00.000Z",
"currency": "USD",
"transaction": {
"revenue": 49.99
},
"products": [
{
"product": {
"productId": "SKU-1001",
"price": 24.99
},
"quantity": 2
}
]
},
{
"itemId": "transaction-002",
"timestamp": "2025-01-16T14:22:00.000Z",
"currency": "USD",
"transaction": {
"revenue": 129.97
},
"products": [
{
"product": {
"productId": "SKU-2045",
"price": 129.97
},
"quantity": 1
}
]
}
]
}
Each item in the addOrUpdate array must adhere to the relevant schema. |
Successful response: 200 OK
A successful response has no content, but indicates that the items were successfully uploaded to the stream.
Step 3: Close the stream
To close the stream and synchronize the data collection, perform the following POST request:
Request template:
POST https://api.cloud.coveo.com/push/v1/organizations/<MyOrganizationId>/data-collections/<MyDataCollectionId>/stream/<MyStreamId>/close HTTP/1.1
Content-Type: application/json
Authorization: Bearer <MyAccessToken>
Request parameters
In the request path:
-
Replace
<MyOrganizationId>with your organization ID. -
Replace
<MyDataCollectionId>with the unique identifier of your data collection that you received when you created the data collection. -
Replace
<MyStreamId>with the value of thestreamIdproperty you received in the response when you opened your stream in step 1. ThestreamIdmay be used to locate log entries related to this rebuild in the Log Browser (platform-ca | platform-eu | platform-au).
In the Authorization HTTP header:
-
Replace
<MyAccessToken>with an access token, such as an API key that has the required privileges to push content to the data collection.
Payload: None
Successful response: 202
A successful response indicates that the stream was successfully closed and the rebuild operation was queued for processing. Any item not received during the rebuild will be removed from the data collection.
{
"orderingId": "1781207560610",
"requestId": "f8a5c2d1-3b4e-4c5f-8d9e-1a2b3c4d5e6f"
}
Validation schemas
To be ingested, items in a data collection must adhere to the predefined schema for the collection’s dataCollectionType.
For each property, the schema indicates whether the property is required, its type, and any property value constraints.
The additionalProperties property indicates whether properties not explicitly defined in the schema are allowed in the current object.
The default value is true.
Offline purchases v1 schema
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://api.cloud.coveo.com/v1/data-collections/schemas/offline-purchases-schema-v1.json",
"title": "Offline Purchases Schema",
"description": "This schema is used to validate the format of Offline Purchases sent to Coveo through ingestion APIs. For more information, refer to the Data Collection documentation.",
"type": "object",
"required": [
"itemId",
"timestamp",
"transaction",
"products"
],
"additionalProperties": false,
"properties": {
"itemId": {
"type": "string",
"description": "Unique identifier for the transaction. When a transaction with an existing id is received, the previous record is overwritten.",
"minLength": 1,
"maxLength": 255
},
"currency": {
"type": "string",
"pattern": "^[A-Z]{3}$",
"default": "USD",
"description": "Currency code in upper-case ISO 4217 format."
},
"timestamp": {
"type": "string",
"format": "date-time",
"description": "Timestamp of when the transaction occurred, in ISO 8601 format."
},
"transaction": {
"type": "object",
"required": ["revenue"],
"additionalProperties": false,
"properties": {
"revenue": {
"type": "number",
"minimum": 0,
"description": "Total revenue contained in the transaction. Currency must remain consistent within a transaction."
}
}
},
"products": {
"type": "array",
"minItems": 1,
"items": {
"type": "object",
"required": ["product", "quantity"],
"additionalProperties": false,
"properties": {
"product": {
"type": "object",
"required": ["productId", "price"],
"additionalProperties": false,
"properties": {
"productId": {
"type": "string",
"description": "Unique identifier of the product purchased."
},
"price": {
"type": "number",
"minimum": 0,
"description": "Price paid by the customer per unit of quantity."
}
}
},
"quantity": {
"type": "number",
"minimum": 0,
"description": "Quantity of the item purchased. The total revenue for the product in a transaction should be obtained by multiplying price and quantity."
}
}
}
}
}
}
Review ingestion logs
The Log Browser (platform-ca | platform-eu | platform-au) provides a unified interface to review ingestion logs for all your data collections. Use it to monitor the ingestion process and review any validation errors or other issues with your ingested data.
1 |
Operation timestamp |
2 |
Operation type |
3 |
|
4 |
|
5 |
The
Use the item URI filter (8) to search for specific items or batches |
6 |
|
7 |
Log entry expansion button |
8 |
Item URI filter to search for specific items or batches. |
9 |
Date filter |
Logs are grouped into three stages:
| Stage | Operation target | Description |
|---|---|---|
|
Batch |
This is the initial log entry recorded when a batch of items is received through the Stream API. |
|
Batch | item |
This is the second processing stage during which the batch payload is chunked into |
|
Item |
This is the final processing stage during which an added or updated item is validated against the |
The operation result indicates the outcome of the ingestion operation:
-
Completed
-
The operation was successful.
-
Warning
-
The operation was unsuccessful or only partly successful, but the ingestion process was able to continue. Click
near the right edge of the operation log entry for an error description and details about the information that wasn’t ingested.
-
Error
-
The operation was unsuccessful and the ingestion process was stopped. Click
near the right edge of the operation log entry for more details.