Ingest data into a data collection

This article explains how to ingest data into a data collection. You can use update operations to send incremental changes or rebuild operations to replace the entire contents of the data collection.

Prerequisites

Make sure you have:

  • An API key with the privileges listed in the following table. See Manage privileges and Privilege reference for details.

    Actions Service Domain Required access level

    View data collections and their items

    Organization

    Data collection

    View

    Organization

    Organization

    View

    Create, edit, and delete data collection configurations

    Organization

    Data collection

    Edit

    Add, update, and delete items in a data collection

    Organization

    Organization

    Edit

    Content

    Push items to sources

    Allow for all sources

    Note

    The Edit privilege on the Data collection domain automatically grants the ability to create data collections. See Can Create ability dependence for more information.

Update operations

Use the update operation sequence when you want to specify changes to the data collection content. You can add, partially update, or delete items.

Step 1: Create a file container

Create a temporary, private, and encrypted Amazon S3 file container using the following request. Save the response body because it contains file container information that you’ll use in the next steps of the update operation.

Request template

POST https://api.cloud.coveo.com/push/v1/organizations/<MyOrganizationId>/files?useVirtualHostedStyleUrl=<true|false> HTTP/1.1

Accept: application/json
Content-Type: application/json
Authorization: Bearer <MyAccessToken>
Request parameters

In the request path:

In the query string:

  • Optionally, set useVirtualHostedStyleUrl to true if you want the service to return a virtual-hosted-style URL, such as coveo-nprod-customerdata.s3.amazonaws.com/.... The default value is currently false, which means that the service returns path-style URLs, such as s3.amazonaws.com/coveo-nprod-customerdata/....

    Important

    The useVirtualHostedStyleUrl query string parameter will soon be deprecated as part of the path-style URL deprecation. From this point onwards, the service will only return virtual hosted-style URLs.

In the Authorization HTTP header:

Payload: None

Successful response: 201 Created

The body of a successful response contains important information about the created file container:

{
    "uploadUri": "<UPLOAD_URI>", 1
    "fileId": "<FILE_ID>", 2
    "requiredHeaders": { 3
        "x-amz-server-side-encryption": "AES256",
        "Content-Type": "application/octet-stream"
    }
}
1 The uploadUri property contains a pre-signed URI that you use to make a PUT request when pushing a batch of data collection items in step 2.
Notes
  • The Amazon S3 file container applies AES-256 server-side encryption to your data.

  • The file container is automatically deleted as soon as its content has been successfully forwarded to the service.

  • The uploadUri automatically expires after 60 minutes.

Therefore, it’s safe to upload sensitive information to the Amazon S3 file container.

2 The fileId property contains the unique identifier of your file container that you’ll need in step 3.
3 The requiredHeaders property contains the required HTTP headers for sending a PUT request to the uploadUri.

Step 2: Upload the update into the file container

To upload the data collection content update into the Amazon S3 file container you created in step 1, perform the following PUT request:

Request template:

PUT <MyUploadURI> HTTP/1.1

<HTTPHeaders>
Request parameters

In the request path:

  • Replace <MyUploadURI> with the value of the uploadUri property you received in the response when you created your file container in step 1.

For the <HTTPHeaders>:

  • Enter the key-value pairs of the requiredHeaders object property you received in the response when you created your file container in step 1.

Payload:

The payload must be a JSON of no more than 256 MB and can contain any combination of addOrUpdate, partialUpdate, and delete operations.

Sample offline purchases payload
{
  "addOrUpdate": [ 1
    {
      "itemId": "transaction-002",
      "timestamp": "2025-01-15T10:30:00.000Z",
      "currency": "USD",
      "transaction": {
        "revenue": 49.99
      },
      "products": [
        {
          "product": {
            "productId": "SKU-1001",
            "price": 24.99
          },
          "quantity": 2
        }
      ]
    },
    {
      "itemId": "transaction-003",
      "timestamp": "2025-01-16T14:22:00.000Z",
      "currency": "USD",
      "transaction": {
        "revenue": 129.97
      },
      "products": [
        {
          "product": {
            "productId": "SKU-2045",
            "price": 129.97
          },
          "quantity": 1
        }
      ]
    }
  ],
  "partialUpdate": [ 2
  {
      "itemId": "transaction-004",
      "operator": "fieldValueReplace",
      "field": "transaction",
      "value": {
        "revenue": 65.99
      }
    }
  ],
  "delete": [ 3
    {
      "itemId": "transaction-001"
    }
  ]
}
1 Each item in the addOrUpdate array must adhere to the relevant schema.
2 Each item in the partialUpdate array must include the itemId, operator, field, and value properties.
itemId

The unique identifier of the transaction to update. This must match an existing itemId in the data collection.

operator

The partial update operator to apply. The following operators are supported:

Operator Description

fieldValueReplace

Replaces a property value regardless of the original type. For example, updating the transaction revenue. To update two properties of the same item, include two separate entries in the partialUpdate array with the same itemId and different field values.

arrayAppend

Adds elements to an array field. Can only be used on arrays of primitive values (for example, strings, numbers, Boolean values).

arrayRemove

Removes elements from an array field. Can only be used on arrays of primitive values (for example, strings, numbers, Boolean values).

field

The name of the property to update.

value

The value to set, add, or remove, depending on the chosen operator.

For fieldValueReplace: provide the new value to set. For example, to update the transaction revenue to 65.99, set field to transaction and value to { "revenue": 65.99 }.

For arrayAppend or arrayRemove: provide an array of values to add or remove.

3 Each item in the delete array must specify the itemId property.

Successful response: 200 OK

A successful response has no content, but indicates that the content update was successfully uploaded to the Amazon S3 file container.

Step 3: Send the file container to update your data collection

To send the file container to the data collection and trigger the update processing, perform the following POST request:

Request template:

POST https://api.cloud.coveo.com/push/v1/organizations/<MyOrganizationId>/data-collections/<MyDataCollectionId>/stream/update?fileId=<MyFileId> HTTP/1.1

Content-Type: application/json
Authorization: Bearer <MyAccessToken>
Request parameters

In the request path:

In the query string:

In the Authorization HTTP header:

  • Replace <MyAccessToken> with an access token, such as an API key that has the required privileges to push content to the data collection.

Payload: None

Successful response: 202

A successful response indicates that the update operation was successfully queued for processing. The response body contains an orderingId, which represents the timestamp (in epoch milliseconds) when the operation was accepted, and a requestId.

Sample response body
{
  "orderingId": "1781207560610",
  "requestId": "28e37f9c-c65b-4ed3-9c09-7bf4135a5235"
}

Rebuild operations

Use the rebuild operation sequence when you want to replace the entire contents of a data collection. Any item in the data collection not received during the rebuild will be deleted when the rebuild stream is closed. This is useful when you need to guarantee full data integrity.

This involves a three-step process:

Step 1: Open a stream

To open a stream targeting the data collection, perform the following POST request:

Request template:

POST https://api.cloud.coveo.com/push/v1/organizations/<MyOrganizationId>/data-collections/<MyDataCollectionId>/stream/open HTTP/1.1

Content-Type: application/json
Authorization: Bearer <MyAccessToken>
Request parameters

In the request path:

In the Authorization HTTP header:

  • Replace <MyAccessToken> with an access token, such as an API key that has the required privileges to push content to the data collection.

Payload: None

Successful response: 201

A successful response indicates that the stream was successfully opened. The response body contains a streamId to identify the stream and an uploadUri for uploading items in step 2.

Sample response body
{
  "uploadUri": "https://coveo-nprod-customerdata.s3.amazonaws.com/stream/barcateamjqcy9k1a/284da6bc-6150-44fe-8821-c4ad0172a49e/07739df9-dfcf-46ee-80d0-b075d9a49c9f?X-Amz-Security-Token=IQoJb3JpZ2luX2VjED0aCXVzLWVhc3QtMSJGMEQCIDydGUFoV2choYYac%2FivQ5nP4czb5gjH8ysv1V9JX5s5AiBw9pkqMdDIE1P91liS9FuoOm0bF62yyZjTvYb0yK5oMirsAwgGEAAaDDA2NDc5MDE1NzE1NCIMBF4xEzSFuDPRVaudKskDGnY0VhCWNMY0%2FdYJ9yQSz6vvUwad7U0Z9Stbq41Cm3%2Fb0w8z%2FnxbvNF9Yr%2BzE3GlM7kM8i9riICfd5a7CUFE%2FgQ1ruVm3nokj2RZ%2ByaiJwTWij2tb2d%2FmA7gZFOz4jE%2FJBvBoe31hUbpf0iHbC2tGij3w%2B1%2BLasEpfFwjdQ2TA%2BCjlWQqVk1UJugcGUBFQulAPI61tiy28ekb2Lc6g1oItUjB9Tb2P26bMTgfhvkf0uBDx8J4SsyrnT%2BGUG1FU8I%2F5LG6CR2sAmNM3N06k2TSinHaISAB08rwIZF%2Fczh1XFKSUCntoDSQE4A7fa2oVRPqNhmI1a9RpuocK%2FUJ5eYnTvXWAZt71ZC0QSp%2FxSGJoIcaeL3zEu3l3yvNbp49qn7nxsWRXraVTbHzYORo7l3Z%2FSDfZeTZPNhfwBphqM3WuBbAHQ44rRokgfeUBLOAOniMienXzQjZ7IBqMSPSuzk5claaqEl7hRhqZQeip16PXY8Rv49OErThcJIfDiLQzrqTbbWbMeioWEJKQzH6r%2F8FHszeP9%2Fkj3P7aH49%2FyZ1%2BgRjVyV66AfwLgi443CMTqmXggirXEO8xYtvR%2B05uhP4%2FFvyDGUvyvPNTCBtazRBjqiAV33d26Owcls5Vv8LBVlRGgXodMmWr0RrvsXEhrEhur2u8%2Fc4i7Cub0vL9ZWkuIWy%2FG2ASCIuW6p%2FSJPrueOhuWKJ%2FgDPwT%2B8sCHEafh86DULrRfN5XWioopcwFifaMrXQonG56mC%2FzdOpFagiSaq9K4kFFfMkm90pqJUBgtw%2Bsf2GpX8yvLziOg%2FmbUDH%2Bos%2FUPMHLizV2zZc3mHrXy3bfkaA%3D%3D&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20260611T204759Z&X-Amz-SignedHeaders=content-type%3Bhost%3Bx-amz-server-side-encryption&X-Amz-Credential=ASIAQ6FOLK5RLGF4NLAF%2F20260611%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Expires=3600&X-Amz-Signature=6f23e14dad833e220e2fcbb8c8d89a97886c111286a8d35c71a1e74037299ce3",
  "fileId": "07739df9-dfcf-46ee-80d0-b075d9a49c9f",
  "requiredHeaders": {
    "x-amz-server-side-encryption": "AES256",
    "Content-Type": "application/octet-stream"
  },
  "streamId": "284da6bc-6150-44fe-8821-c4ad0172a49e"
}

Step 2: Upload items into the stream

To upload the items into the stream you opened in step 1, perform the following PUT request:

Request template:

PUT <MyUploadURI> HTTP/1.1

Content-Type: application/json
x-amz-server-side-encryption: AES256
Request parameters

In the request path:

  • Replace <MyUploadURI> with the value of the uploadUri property you received in the response when you opened your stream in step 1.

Payload:

The payload must be a JSON of no more than 256 MB and can only contain an addOrUpdate array.

Sample offline purchases payload
{
  "addOrUpdate": [ 1
    {
      "itemId": "transaction-001",
      "timestamp": "2025-01-15T10:30:00.000Z",
      "currency": "USD",
      "transaction": {
        "revenue": 49.99
      },
      "products": [
        {
          "product": {
            "productId": "SKU-1001",
            "price": 24.99
          },
          "quantity": 2
        }
      ]
    },
    {
      "itemId": "transaction-002",
      "timestamp": "2025-01-16T14:22:00.000Z",
      "currency": "USD",
      "transaction": {
        "revenue": 129.97
      },
      "products": [
        {
          "product": {
            "productId": "SKU-2045",
            "price": 129.97
          },
          "quantity": 1
        }
      ]
    }
  ]
}
1 Each item in the addOrUpdate array must adhere to the relevant schema.

Successful response: 200 OK

A successful response has no content, but indicates that the items were successfully uploaded to the stream.

Step 3: Close the stream

To close the stream and synchronize the data collection, perform the following POST request:

Request template:

POST https://api.cloud.coveo.com/push/v1/organizations/<MyOrganizationId>/data-collections/<MyDataCollectionId>/stream/<MyStreamId>/close HTTP/1.1

Content-Type: application/json
Authorization: Bearer <MyAccessToken>
Request parameters

In the request path:

In the Authorization HTTP header:

  • Replace <MyAccessToken> with an access token, such as an API key that has the required privileges to push content to the data collection.

Payload: None

Successful response: 202

A successful response indicates that the stream was successfully closed and the rebuild operation was queued for processing. Any item not received during the rebuild will be removed from the data collection.

Sample response body
{
  "orderingId": "1781207560610",
  "requestId": "f8a5c2d1-3b4e-4c5f-8d9e-1a2b3c4d5e6f"
}

Validation schemas

To be ingested, items in a data collection must adhere to the predefined schema for the collection’s dataCollectionType. For each property, the schema indicates whether the property is required, its type, and any property value constraints.

The additionalProperties property indicates whether properties not explicitly defined in the schema are allowed in the current object. The default value is true.

Offline purchases v1 schema
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://api.cloud.coveo.com/v1/data-collections/schemas/offline-purchases-schema-v1.json",
  "title": "Offline Purchases Schema",
  "description": "This schema is used to validate the format of Offline Purchases sent to Coveo through ingestion APIs. For more information, refer to the Data Collection documentation.",
  "type": "object",
  "required": [
    "itemId",
    "timestamp",
    "transaction",
    "products"
  ],
  "additionalProperties": false,
  "properties": {
    "itemId": {
      "type": "string",
      "description": "Unique identifier for the transaction. When a transaction with an existing id is received, the previous record is overwritten.",
      "minLength": 1,
      "maxLength": 255
    },
    "currency": {
      "type": "string",
      "pattern": "^[A-Z]{3}$",
      "default": "USD",
      "description": "Currency code in upper-case ISO 4217 format."
    },
    "timestamp": {
      "type": "string",
      "format": "date-time",
      "description": "Timestamp of when the transaction occurred, in ISO 8601 format."
    },
    "transaction": {
      "type": "object",
      "required": ["revenue"],
      "additionalProperties": false,
      "properties": {
        "revenue": {
          "type": "number",
          "minimum": 0,
          "description": "Total revenue contained in the transaction. Currency must remain consistent within a transaction."
        }
      }
    },
    "products": {
      "type": "array",
      "minItems": 1,
      "items": {
        "type": "object",
        "required": ["product", "quantity"],
        "additionalProperties": false,
        "properties": {
          "product": {
            "type": "object",
            "required": ["productId", "price"],
            "additionalProperties": false,
            "properties": {
              "productId": {
                "type": "string",
                "description": "Unique identifier of the product purchased."
              },
              "price": {
                "type": "number",
                "minimum": 0,
                "description": "Price paid by the customer per unit of quantity."
              }
            }
          },
          "quantity": {
            "type": "number",
            "minimum": 0,
            "description": "Quantity of the item purchased. The total revenue for the product in a transaction should be obtained by multiplying price and quantity."
          }
        }
      }
    }
  }
}

Review ingestion logs

The Log Browser (platform-ca | platform-eu | platform-au) provides a unified interface to review ingestion logs for all your data collections. Use it to monitor the ingestion process and review any validation errors or other issues with your ingested data.

Log Browser page showing data collection ingestion logs | Coveo Administration Console

1

Operation timestamp

2

Operation type

3

Operation result

4

Collection ID

5

The Item URI serves to identify specific items or batches:

  • For item-level logs, the Item URI represents the itemId.

  • For batch-level logs, the Item URI represents the streamId for rebuild operations or the fileId for update operations.

  • For the Delete older than operation that Coveo automatically performs when closing a stream, the Item URI represents the orderingId property in the response.
     

Use the item URI filter (8) to search for specific items or batches

6

Log stage

7

Log entry expansion button

8

Item URI filter to search for specific items or batches.

9

Date filter

Logs are grouped into three stages:

Stage Operation target Description

Streaming

Batch

This is the initial log entry recorded when a batch of items is received through the Stream API.

Applying streaming extension

Batch | item

This is the second processing stage during which the batch payload is chunked into addOrUpdate, partialUpdate, and delete objects, the contents of which are analyzed for conformance.

Validation

Item

This is the final processing stage during which an added or updated item is validated against the dataCollectionType schema.

The operation result indicates the outcome of the ingestion operation:

green Completed

The operation was successful.

triangle-orange Warning

The operation was unsuccessful or only partly successful, but the ingestion process was able to continue. Click chevron-down near the right edge of the operation log entry for an error description and details about the information that wasn’t ingested.

square-red Error

The operation was unsuccessful and the ingestion process was stopped. Click chevron-down near the right edge of the operation log entry for more details.