Add or Edit an Amazon S3 Source

Amazon simple storage service (S3) is a cloud-based data storage designed to store, manage, and distribute large quantities of data worldwide. Members of the Administrators and Content Managers built-in groups can add the content of Amazon S3 buckets to a Coveo organization. Coveo indexes Amazon S3 files to make them searchable.

Source Key Characteristics

Features Supported Additional information
Amazon S3 version Latest cloud version Following available Amazon S3 releases
Searchable content types1 Buckets2 and objects (folders and files)
Content update operations Refresh
Rescan Takes place every day by default.
Rebuild
Content security options Determined by source permissions
Source creator
Everyone

Note 1: An access key is needed to connect to the Amazon Web Services (AWS) service through the software development kit (SDK). The access key is a way to authenticate from the SDK as an Identity and Access Management (IAM) account. The number of requests is unlimited, but every request to your Amazon S3 bucket(s) has a charge (see Request Pricing).

Note 2: Amazon S3 Requester Pays buckets aren’t supported.

Add or Edit an Amazon S3 Source

When adding or editing an Amazon S3 Source, follow the instructions below.

“Configuration” Tab

In the Add/Edit an Amazon S3 Source panel, the Configuration tab is selected by default. It contains your source’s general and authentication information, as well as other parameters.

General Information

Source Name

Enter a name for your source.

Use a short and descriptive name, using letters, numbers, hyphens (-), and underscores (_). Avoid spaces and other special characters.

Amazon S3 bucket URL

Enter the address of one or more Amazon S3 buckets using one of the following formats:

  • Virtual-host style

    • http://<BUCKET>.s3.amazonaws.com/

    • http://<BUCKET>.s3.<AWS_REGION>.amazonaws.com/

    where you replace <BUCKET> with the name of your actual bucket, and <AWS_REGION> with your region-specific endpoint.

  • Path style

    • http://s3.amazonaws.com/<BUCKET>

    • http://s3.<AWS_REGION>.amazonaws.com/<BUCKET>

    where you replace <BUCKET> with the name of your actual bucket, and <AWS_REGION> with your region-specific endpoint.

  • You can enter more than one bucket address, but you must ensure that all source parameters apply to all Amazon S3 buckets. Otherwise, a good practice is to create separate sources for other buckets.

  • If a region isn’t specified in the URL, it uses the US Standard (us-east-1) region endpoint by default.

  • When the URL points to a folder inside a bucket, only keys starting with that prefix will be crawled.

Character Optical Recognition (OCR)

Check this box if you want Coveo Cloud to extract text from image files or PDF files containing images. OCR-extracted text is processed as item data, meaning that it’s fully searchable and will appear in the item Quick View. See Enable Optical Character Recognition for details on this feature.

Index

When adding a source, if you have more than one logical (non-Elasticsearch) index in your organization, select the index in which the retrieved content will be stored (see Leverage Many Coveo Indexes). If your organization only has one index, this drop-down menu isn’t visible and you have no decision to make.

  • To add a source storing content in an index different than default, you need the View access level on the Logical Index domain (see Manage Privileges and Logical Indexes Domain).

  • Once the source is added, you can’t switch to a different index.

“Authentication” Section

Fill the appropriate boxes depending on whether your S3 bucket content is secured or public.

  • If your S3 bucket content is secured, meaning not accessible by anonymous users, enter the AWS Access Key ID and AWS Secret Access Key values linked to an AWS Identity and Access Management (IAM) account. The IAM account must have at least the read permission on the bucket content to index. See the Console Access section in the Understanding and Getting Your Security Credentials article for more details.

  • If your S3 bucket content is public, meaning anonymous users can access the content, you may leave the AWS Access Key ID and AWS Secret Access Key boxes empty.

    • You must, however, ensure that bucket permissions include List for the Everyone grantee, to prevent getting an authentication error such as: Coveo Cloud isn't able to authenticate to your Amazon S3 bucket and consequently can't perform any action regarding your source. Edit the source configuration to review the provided AWS Access Key ID and AWS Secret Access Key ID.

    • Before building the source, in a browser, test your bucket URL (without a path), and validate that it returns an XML file listing the bucket content (keys). If you get a short Access denied XML error, the source will give an authentication error.

“Content Security” Tab

Select who will be able to access the source items through a Coveo-powered search interface. For details on this parameter, see Content Security.

“Access” Tab

In the Access tab, determine whether each group and API key can view or edit the source configuration (see Resource Access):

  1. In the Access Level column, select View or Edit for each available group.

  2. On the left-hand side of the tab, if available, click Groups or API Keys to switch lists.

Completion

  1. Finish adding or editing your source:

    • When you want to save your source configuration changes without starting a build/rebuild, such as when you know you want to do other changes soon, click Add Source/Save.

      To add the source content or to make your changes effective, on the Sources page, you must click Start initial build or Start required rebuild in the source Status column.

      OR

    • When you’re done editing the source and want to make changes effective, click Add and Build Source/Save and Rebuild Source.

      Back on the Sources page, you can review the progress of your source addition or modification.

    Once the source is built or rebuilt, you can review its content in the Content Browser.

  2. Optionally, consider editing or adding mappings.

    You can only manage mapping rules once you build the source (see Refresh, Rescan, or Rebuild Sources).

What’s Next?

Adapt the source update schedule to your needs.

Recommended Articles