Category Archives: open source

The ones that got away

Two and a half ideas of improving Lucene/Solr performance that did not work out. Track the result set bits At the heart of Lucene (and consequently also Solr and ElasticSearch), there is a great amount of doc ID set handling. … Continue reading

Posted in eskildsen, Hacking, Low-level, Lucene, open source, Performance, Solr | Leave a comment

Speeding up core search

Issue a query, get back the top-X results. It does not get more basic with Solr. So great win if we can improve on that, right? Truth be told, the answer is still “maybe”, but read on for some thoughts, … Continue reading

Posted in eskildsen, Hacking, Low-level, Lucene, open source, Performance, Solr, Uncategorized | 1 Comment

Sampling methods for heuristic faceting

Initial experiments with heuristic faceting in Solr were encouraging: Using just a sample of the result set, it was possible to get correct facet results for large result sets, reducing processing time by an order of magnitude. Alas, further experimentation … Continue reading

Posted in eskildsen, Faceting, Low-level, open source, Performance, Solr | Leave a comment

Dubious guesses, counted correctly

We do have a bit of a performance challenge with heavy faceting on large result sets in our Solr based Net Archive Search. The usual query speed is < 2 seconds, but if the user requests aggregations based on large … Continue reading

Posted in eskildsen, Faceting, Low-level, open source, Performance, Solr | 1 Comment

Net Archive Search building blocks

An extremely webarchive-discovery and Statsbiblioteket centric description of some of the technical possibilities with Net Archive Search. This could be considered internal documentation, but we like to share. There are currently 2 generations of indexes at Statsbiblioteket: v1 (22TB) & … Continue reading

Posted in eskildsen, open source, Solr | 2 Comments

Sparse facet caching

As explained in Ten times faster, distributed faceting in standard Solr is two-phase: Each shard performs standard faceting and returns the top limit*1.5+10 terms. The merger calculates the top limit terms. Standard faceting is a two-step process: For each term … Continue reading

Posted in eskildsen, Faceting, Hacking, Low-level, open source, Performance, Solr | 3 Comments

Ten times faster

One week ago I complained about Solr’s two-phase distributed faceting being slow in the second phase – ten times slower than the first phase. The culprit was the fine-counting of top-X terms, with each term-count being done as an intersection … Continue reading

Posted in eskildsen, Faceting, Hacking, Low-level, open source, Performance, Solr, Uncategorized | 5 Comments