Faster grouping, take 1

A failed attempt of speeding up grouping in Solr, with an idea for next attempt.

Grouping at a Statsbiblioteket project

We have 100M+ articles from 10M+ pages belonging to 700K editions of 170 newspapers in a single Solr shard. It can be accessed at Mediestream. If you speak Danish, try searching for “strudsehest”. Searches are at the article level, with the results sorted by score and grouped by edition, with a maximum of 10 articles / edition. Something like this:

q=strudsehest&group=true&group.field=edition&group.limit=10

This works well for most searches. But for the heavier ones, response times creeps into seconds, sometimes exceeding the 10 second timeout we use. Not good. So what happens in a grouped search that is sorted by document score?

  1. The hits are calculated
  2. A priority queue is used to find the top-X groups with the highest scores
    1. For each hit, calculate its score
    2. If the score is > the lowest score in the queue, resolve the group value and update the priority queue
  3. For each of the top-X groups, a priority queue is created and filled with document IDs
    1. For each hit, calculate is score and resolve its group value (a BytesRef)
    2. If the group value matched one of the top-X groups, update that group’s queue
      1. Updating the queue might involve resolving multiple field values for the document, depending on in-group sorting
  4. Iterate the top-X groups and resolve the full documents

Observation 1: Hits are iterated twice. This is hard to avoid if we need more than 1 entry in each group. An alternative would be to keep track of all groups until all the hits has been iterated, but this would be extremely memory costly with high cardinality fields.

Observation 2: In step 3.1, score and group resolving is performed for all hits. It is possible to use the same logic as step 2.1, where the group is only resolved if the score is competitive.

Attempt 1: Delayed group resolving

The idea in observation 2 has been implemented as a kludge-hacky-proof-of-concept. Code is available at the group_4_10 branch at GitHub for those who like hurt.

When the hits are iterated the second time, all scores are resolved but only the group values for the documents with competitive scores are resolved. So how well does it work?

run=1_collated

Lazy group value resolving for Solr

Observation: Optimized (aka lazy group value resolving) grouping is a bit slower than vanilla Solr grouping for some result sets, probably the ones where most of the group values has to be resolved. For other result sets there is a clear win.

It should be possible to optimize a bit more and bring the overhead of the worst-case optimized groupings down to near-zero. However, since there are so few best-case result sets and since the win is just about a third of the response time, I do not see this optimization attempt as being worth the effort.

Idea: A new level of lazy

IMPORTANT: I am a klutz and did not read the Solr code properly. Please skip ahead to the next section.

Going back to the algorithm for grouping we can see that “resolving the value” occurs multiple times. But what does it mean?

With DocValued terms, this is really a two-step process: The DocValue ordinal is requested for a given docID (blazingly fast) and the ordinal is used to retrieve the term (fast) in the form of a BytesRef. You already know where this is going, don’t you?

Millions of “fast” lookups accumulates to slow and we don’t really need the terms as such. At least not before we have to deliver the final result to the user. What we need is a unique identifier for each group value and the ordinal is exactly that.

But wait. Ordinals are not comparable across segments! We need to map the segment ordinals to a global structure. Luckily this is exactly what happens when doing faceting with facet.method=fc, so we can just scrounge the code from there.

With this in mind, the algorithm becomes

  1. The hits are calculated
  2. A priority queue is used to find the top-X groups with the highest scores
    1. For each hit, calculate its score
    2. If the score is > the lowest score in the queue, resolve the group value ordinal and update the priority queue
  3. For each of the top-X groups, a priority queue is created and filled with document IDs
    1. For each hit, resolve its group value segment-ordinal and convert that to global ordinal
    2. If the group value ordinal matches one of the top-X groups, update that group’s queue
      1. Updating the queue might involve resolving the document score or resolving multiple field value ordinals for the document, depending on in-group sorting
  4. Iterate the top-X groups and resolve the Terms from the group value ordinals as well as the full documents

Note how the logic is reversed for step 3.1, prioritizing value ordinal resolving over score calculation. Experience from the facet code suggests that ordinal lookup is faster than score calculation.

This idea has not been implemented yet. Hopefully it will be done Real Soon Now, but no promises.

Update 2016-01-28: Read the code properly

The Solr code does not resolve BytesRefs before it is necessary in step 3. On the contrary, it resolved the segment-specific group ordinals from the BytesRefs delivered by step 2 and uses fast lookup of those ordinals before calculating the score.

So while the first pass (determining the groups) seems optimizable by postponing BytesRef resolving, the second pass does not have any apparent performance failings.

About Toke Eskildsen

IT-Developer at statsbiblioteket.dk with a penchant for hacking Lucene/Solr.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s