While sparse faceting has profound effect on response time in our web-archive, we are a bit doubtful about the amount of multi billion document Solr indexes out there. Luckily we also have our core index at Statsbiblioteket, which should be a bit more representative of your everyday Solr installation: Single-shard, 50GB, 14M documents. The bulk of the traffic are user-issued queries, which involves spellcheck, edismax qf & pf on 30+ fields and faceting on 8 fields. In this context, the faceting is of course the focus.
Of the 8 facet fields, 6 are low-cardinality and 2 are high-cardinality. Sparse was very recently enabled for the 2 high-cardinality ones, namely subject (4M unique values, 51M instances (note to self: 51M!? How did it get so high?)) and author (9M unique values, 40M instances).
To get representative measurements, the logged response times were extracted for the hours 07-22; there’s maintenance going on at night and it skews the numbers. Only user-entered searches with faceting were considered. To compare before- and after sparse-enabling, the data for this Tuesday and last Tuesday were used.
The performance improvement is palpable with response time being halved, compared to the non-sparse faceting. Fine-reading the logs, the time spend on faceting the high-cardinality fields is now in the single-digit milliseconds for nearly all queries. We’ll have to do some test to see what stops the total response time from getting down to that level. I am guessing spellcheck.
As always, sparse faceting is readily available for the adventurous at SOLR-5894.
To verify that last Tuesday was not a lucky shot, here’s the numbers for the last 4 Wednesdays. Note that the amount of queries/day is fairly low for the first two weeks. This is due to semester start. Also note that the 10^8 hits (basically the full document set) were removed as those were all due to the same query being repeated by a dashboard tool.