As explained in Ten times faster, distributed faceting in standard Solr is two-phase:
- Each shard performs standard faceting and returns the top
limit*1.5+10terms. The merger calculates the top
limitterms. Standard faceting is a two-step process:
- For each term in each hit, update the counter for that term.
- Extract the top
limit*1.5+10terms by running through all the counters with a priority queue.
- Each shard returns the number of occurrences of each term in the top
limitterms, calculated by the merger from phase 1. This is done by performing a mini-search for each term, which takes quite a long time. See Even sparse faceting is limited for details.
- Addendum: If the number for a term was returned by a given shard in phase 1, that shard is not asked for that term again.
- Addendum: If the shard returned a count of 0 for any term as part of phase 1, that means is has delivered all possible counts to the merger. That shard will not be asked again.
Sparse faceting speeds up phase 1 step 2 by only visiting the updated counters. It also speeds up phase 2 by repeating phase 1 step 1, then extracting the counts directly for the wanted terms. Although it sounds heavy to repeat phase 1 step 1, the total time for phase 2 for sparse faceting is a lot lower than standard Solr. But why repeat phase 1 step 1 at all?
Today, caching of the counters from phase 1 step 1 was added to Solr sparse faceting. Caching is tricky business to get just right, especially since the sparse cache must contain a mix of empty counters (to avoid re-allocation of large structures on the Java heap) as well as filled structures (from phase 1, intended for phase 2). But theoretically, it is simple: When phase 1 step 1 is finished, the counter structure is kept and re-used in phase 2. So time for testing:
Note that there are no measurements of standard Solr faceting in the graph. See the Ten times faster article for that. What we have here are 4 different types of search:
- no_facet: Plain searches without faceting, just to establish the baseline.
- skip: Only phase 1 sparse faceting. This means inaccurate counts for the returned terms, but as can be seen, the overhead is very low for most searches.
- cache: Sparse faceting with caching, as described above.
- nocache: Sparse faceting without caching.
For 1-1000 hits, nocache is actually a bit faster than cache. The peculiar thing about this hit-range is that chances are high that all shards returns all possible counts (phase 2 addendum 2), so phase 2 is skipped for a lot of searches. When phase 2 is skipped, this means wasted caching of a filled counter structure, that needs to be either cleaned for re-use or discarded if the cache is getting too big. This means a bit of overhead.
For more than 1000 hits, cache wins over nocache. Filter through the graph noise by focusing on the medians. As the difference between cache and nocache is that the base faceting time is skipped with cache, the difference of their medians should be the about the same as the difference of the medians from no_facet and skip. Are they? Sorta-kinda. This should be repeated with a larger sample.
Caching with distributed faceting means a small performance hit in some cases and a larger performance gain in other. Nothing Earth-shattering and as it works best when there is more memory allocated for caching, it is not clear in general whether it is best to use it or not. Download a Solr sparse WAR from GitHub and try for yourself.