THIS IS ARCHIVED DOCUMENTATION

The GroupJoin Extension

Coveo for Sitecore 4.1 (November 2018)

The GroupJoin extension is defined among other LINQ extensions. It allows the results of two queries to be joined while grouping the results from the first query. It’s also possible to use an aggregation function on the grouped results. Like the Join extension, GroupJoin allows a result selector to be specified, which defines how results are returned. The result type can be different from the types used in the queries.

Here is the signature of the extension method.

public static IEnumerable<TResult> GroupJoin<TOuter, TInner, TKey, TResult>(this IEnumerable<TOuter> outer,
                                                                            IEnumerable<TInner> inner,
                                                                            Func<TOuter, TKey> outerKeySelector,
                                                                            Func<TInner, TKey> innerKeySelector,
                                                                            Func<TOuter, IEnumerable<TInner>, TResult> resultSelector)

The method takes five arguments. The first one is implicit and corresponds to the outer queryable instance from which the extension method is called. The remaining four arguments are:

  1. The inner enumerable instance. In practice, the underlying queryable instance is used.
  2. The outer key selector. It’s a lambda expression that indicates which object property should be used as the outer key.
  3. The inner key selector. It’s a lambda expression that indicates which object property should be used as the inner key.
  4. The result selector. It’s the lambda expression that defines which result object is returned by the extension. The expression can return an anonymous type.

The index fields that correspond to the outer and inner keys must be marked as facet in the Coveo search index. If the fields aren’t of type facet, the extension returns no result.

Calling GroupJoin on a Coveo search index triggers several search queries. See the Additional Information section below for details regarding technical considerations. When performance is a concern, it may be more efficient to use computed fields to compute the aggregate functions that you need. This way, you most likely avoid additional search queries.

Example

If you want to have a list of employees with their respective managers, and each employee is linked to its manager by its item ID, your business objects would probably look like this:

public abstract class Person
{
    public Sitecore.Data.ID Id { get; set; }
    public string Name { get; set; }
}
 
public class Manager : Person
{
}
 
public class Employee : Person
{
    public Sitecore.Data.ID ManagerId { get; set; }
}

The Group Join query would look like this:

using (var context = ContentSearchManager.GetSearchIndex("sitecore_master_index").CreateSearchContext()) {
    // The queryables for the managers and the employees.
    // You can add specific filters to those queryables.
    var managersQueryable = context.GetQueryable<Manager>();
    // Notice the "Take(100)" on the employees queryable.
    // See the "Additional Information" section for details.
    var employeesQueryable = context.GetQueryable<Employee>().Take(100);
 
    // This call joins the managers and the employees. Then it returns the number of employee
    // per manager and the employee names.
    var results = managersQueryable.GroupJoin(employeesQueryable,
                                              manager => manager.Id,
                                              employee => employee.ManagerId,
                                              (manager, employees) => new {
                                                  ManagerName = manager.Name,
                                                  NumberOfEmployees = employees.Count(),
                                                  EmployeesNames = String.Join(", ", employees.Select(employee => employee.Name))
                                              });
}

The results enumerable uses an anonymous type to return the names of the managers with the number of employees and their names.

Additional Information

The Coveo implementation of the GroupJoin method uses Nested Queries in the background. When calling the extension, a first query is sent to retrieve groups. Then, when enumerating these groups, new search queries are performed to compute the aggregate functions.

Explaining the GroupJoin Result Selector

Result selectors are intuitive and easy to use. However, you must be careful with this extension as it can be hard to use. A common GroupJoin result selector would look like this:

(group, results) => new {
    Group = group.Name,
    NumberOfResults = results.Count()
}

The group parameter represents the group that’s being enumerated. Most of the time, it’s one of your business objects. The results parameter is an IEnumerable<T> where T is the result type. Again, it will be one of your business objects.

The point is that Sitecore and Coveo handle instances of IQueryable<T> and IEnumerable<T> differently. Calling the Count method on an instance of IQueryable<T> creates a new search context and performs a query on the search index. But the search context knows that it must return the total number of results. On the other hand, when calling Count on an instance of IEnumerable<T>, a search context is created internally and a query is sent to the search index. The difference is that the results are returned (default is 10) before the Count method is executed. Since only 10 results were returned and the internal list contains 10 results, the Count method returns 10. This result is, however, wrong.

If you want to retrieve the total number of results, you must call the Count method from an IQueryable<T> instance instead. You can do it this way.

(group, results) => {
    Group = group.Name,
    NumberOfResults = results.AsQueryable().Count()
}

There are two solutions if you want to retrieve more than the number of results from the result selector. The first one is to use the AsQueryable method every time you need it.

(group, results) => {
    Group = group.Name,
    NumberOfResults = results.AsQueryable().Count(),
    FirstResult = results.AsQueryable().First(),
    LastResult = results.AsQueryable().Last()
}

This ensures that the aggregate functions are considering the whole set of results from the search index. However, doing this quickly increases the number of search queries performed on the server. Depending on your needs, the second approach may be better.

When you want to enumerate the group results, or you know that the groups contain only a few results, you can specify the number of results to retrieve directly on the inner query. This way, you can call the aggregate functions directly on the IEnumerable<T> instance. The search context then uses the IQueryable<T> instance that was specified earlier. Here is an example to illustrate it.

groupedResults = outerQueryable.GroupJoin(innerQueryable.Take(100),
                                          outerItem => outerItem.Key,
                                          innerItem => innerItem.OuterKey,
                                          (group, results) => {
                                              Group = group.Name,
                                              NumberOfResults = results.Count(),
                                              FirstResult = results.First(),
                                              LastResult = results.Last()
                                          });

This example works as long as there are fewer than 100 results per group. When iterating the group, a single search query is sent to retrieve the first 100 group results. Then the Count, First, and Last aggregate functions are executed from memory.

So, the implementation you choose mostly depends on the amount of data that you have and the aggregate functions that you have to call. Keep in mind, however, that it’s more efficient to use computed fields to compute aggregate functions than to compute it on the fly.