Category Archives: Low-level

Faster DocValues in Lucene/Solr 7+

This is a fairly technical post explaining LUCENE-8374 and its implications on Lucene, Solr and (qualified guess) Elasticsearch search and retrieval speed. It is primarily relevant for people with indexes of 100M+ documents. Teaser We have a Solr setup for … Continue reading

Posted in eskildsen, Hacking, Low-level, Lucene, Performance, Solr | Leave a comment

70TB, 16b docs, 4 machines, 1 SolrCloud

At Statsbiblioteket we maintain a historical net archive for the Danish parts of the Internet. We index it all in Solr and we recently caught up with the present. Time for a status update. The focus is performance and logistics, … Continue reading

Posted in Hacking, Low-level, Performance, Solr, Statsbiblioteket, Uncategorized | 6 Comments

The ones that got away

Two and a half ideas of improving Lucene/Solr performance that did not work out. Track the result set bits At the heart of Lucene (and consequently also Solr and ElasticSearch), there is a great amount of doc ID set handling. … Continue reading

Posted in eskildsen, Hacking, Low-level, Lucene, open source, Performance, Solr | Leave a comment

Speeding up core search

Issue a query, get back the top-X results. It does not get more basic with Solr. So great win if we can improve on that, right? Truth be told, the answer is still “maybe”, but read on for some thoughts, … Continue reading

Posted in eskildsen, Hacking, Low-level, Lucene, open source, Performance, Solr, Uncategorized | 2 Comments

Sampling methods for heuristic faceting

Initial experiments with heuristic faceting in Solr were encouraging: Using just a sample of the result set, it was possible to get correct facet results for large result sets, reducing processing time by an order of magnitude. Alas, further experimentation … Continue reading

Posted in eskildsen, Faceting, Low-level, open source, Performance, Solr | Leave a comment

Dubious guesses, counted correctly

We do have a bit of a performance challenge with heavy faceting on large result sets in our Solr based Net Archive Search. The usual query speed is < 2 seconds, but if the user requests aggregations based on large … Continue reading

Posted in eskildsen, Faceting, Low-level, open source, Performance, Solr | 1 Comment

Heuristically correct top-X facets

For most searches in our Net Archive, we have acceptable response time, due to the use of sparse faceting with Solr. Unfortunately as well as expectedly, some of the searches are slow. Response times in minutes slow, if we’re talking … Continue reading

Posted in eskildsen, Faceting, Hacking, Low-level, Performance, Solr | 1 Comment