Switch to the New Crawling Module

In July 2020, Coveo released a new Crawling Module. This entirely refactored version, which doesn’t require Docker, is meant to replace the Crawling Module working with Docker that has been released in 2017.

Coveo recommends you switch to new version of the Crawling Module as soon as possible, as it will no longer provide updates and support for the old Crawling Module after December 31, 2020. However, if you decide not to switch to the new Crawling Module, your current instance will still work past this date.

To identify the Crawling Module you’re currently using, on the Crawling Modules page of the Coveo Administration Console, look at the Maestro reported version:

  • Versions > 1: new Crawling Module

  • Versions < 1: Crawling Module with Docker

Advantages of the New Crawling Module

Since the new Crawling Module doesn’t require Docker to work on your server, installing the new Crawling Module is much more convenient and overall a seamless process. While the old Crawling Module installation process required running commands in a terminal window and changing environment variables, the 2020 version is installed through a wizard like many applications.

Maestro installation wizard

As you go through the installation steps, you also configure the Crawling Module so that it’s ready to start indexing your content when you complete the wizard. Since no Docker expertise is required, almost anyone can install the Crawling Module.

The absence of Docker also makes the Crawling Module significantly lighter and easier to troubleshoot should any issue arise.

Moreover, some antivirus software products were known to interfere with Docker. The new Crawling Module should no longer be impacted by the antivirus software running on your host server.

Switch to the New Crawling Module

  1. If you originally installed the Crawling Module on a virtual machine (VM), we recommend skipping to step 2 to install the Crawling Module on a new VM, and then uninstalling your old deployment. However, if you rather installed the Crawling Module on a physical server or if you want to install the Crawling Module on the same VM, follow these steps to uninstall the old Crawling Module before you install the new one:

    1. On the Crawling Modules page of the Coveo Administration Console, ensure that your Crawling Module instance is up to date.

    2. Uninstall your current Crawling Module instance:

      1. In your Crawling Module folder, open the scripts folder.

      2. Select the File tab, and then, in the menu, select Open Windows PowerShell as an administrator.

      3. In the Administrator: Windows PowerShell window, type UninstallCrawlingModule.ps1, and then press Enter.

  2. Deploy the new Crawling Module.

  3. Migrate your Crawling Module sources:

    1. On the Sources page of the Coveo Administration Console, duplicate one of your Crawling Module sources.

    2. Edit the duplicate to pair it with your new Crawling Module instance. A source rebuild is required to make your change effective.

    3. In the Content Browser, ensure that your new source indexes the same content as the original one and that this content is accessible to the same search interface end users.

    4. Use an A/B test to test your duplicate source and compare its effect on your search interface with that of your original source.

    5. Once you are certain that the source change is successful, delete the original source.

    6. Repeat the migration steps for each Crawling Module source.

  4. If you chose to install the Crawling Module on a new VM at step 1, uninstall the old Crawling Module from your old VM:

    1. On the Crawling Modules page of the Coveo Administration Console, ensure that your Crawling Module instance is up to date.

    2. Uninstall your current Crawling Module instance:

      1. In your Crawling Module folder, open the scripts folder.

      2. Select the File tab, and then, in the menu, select Open Windows PowerShell as an administrator.

      3. In the Administrator: Windows PowerShell window, type UninstallCrawlingModule.ps1, and then press Enter.

Leading Practices

We recommend following these leading practices when switching to the new Crawling Module.

Use a New Virtual Machine

If you originally installed the Crawling Module on a virtual machine (VM), you can follow the procedure above and install the new Crawling Module on the same machine. However, while you proceed, your source content won’t be updated, which may result in discrepancies between the content available through your search interface and your actual data. We therefore recommend installing the new Crawling Module on a different VM to avoid these discrepancies.

Should you encounter issues, you will also be able to revert to your old Crawling Module deployment.

Duplicate Your Sources

Duplicating your Crawling Module sources rather than creating new ones from scratch allows you to compare content indexed by the old and the new source and avoid introducing differences in the configuration of the new source. Your original source remains available while you test your changes in case you need to revert.

Work Iteratively

When duplicating your sources, start with your smallest Crawling Module source. Once the duplicate is successfully built and tested, you can delete its original version, and then duplicate a larger source. This iterative migration reduces the number of changes you have to validate at once.

Delete your original source once you’re done testing its duplicate to avoid consuming unnecessary resources and slowing down the indexing process.

Use A/B Tests

Once you have duplicated a source and paired it with the new Crawling Module deployment, use A/B tests to compare a pipeline using your old source to an identical pipeline using your new source.

Assuming that:

  • Source A is your original source (associated to the old Crawling Module deployment), that source B is your duplicate source (associated to the new Crawling Module deployment).

  • Your search interface uses pipeline X, which has a filter listing the sources it should use. Source A is among these sources (e.g., @source==(“Source A”, “Source 1”, “Source 2”)).

Your workflow should be the following:

  1. Duplicate pipeline X. The resulting pipeline is pipeline Y.

  2. Edit pipeline Y’s filter so that it uses source B rather than source A (e.g., @source==(“Source B”, “Source 1”, “Source 2”)).

  3. Create an A/B test comparing pipeline X and pipeline Y. Split the traffic 50/50, unless you suspect issues with source B. In such case, reduce the traffic allocated to pipeline Y.

  4. Launch the test, and then create a Usage Analytics A/B report to ensure that key UA metrics (i.e., query click-through, visit click-through, and average click rank) remain the same. See Analyze the Performance of Pipeline A Vs Pipeline B to evaluate the results.

  5. If your new source seems to work just like the original:

    1. Stop the A/B test.

    2. Delete pipeline Y.

    3. Update pipeline X so that it uses source B rather than source A.

    4. Delete source A.

Recommended Articles