Category Archives: Lucene

The ones that got away

Two and a half ideas of improving Lucene/Solr performance that did not work out. Track the result set bits At the heart of Lucene (and consequently also Solr and ElasticSearch), there is a great amount of doc ID set handling. … Continue reading

Posted in eskildsen, Hacking, Low-level, Lucene, open source, Performance, Solr | Leave a comment

Speeding up core search

Issue a query, get back the top-X results. It does not get more basic with Solr. So great win if we can improve on that, right? Truth be told, the answer is still “maybe”, but read on for some thoughts, … Continue reading

Posted in eskildsen, Hacking, Low-level, Lucene, open source, Performance, Solr, Uncategorized | 1 Comment

Changing field type in Lucene/Solr

The problem We have 25 shards of 900GB / 250M documents. It took us 25 * 8 days = half a year to build them. Three fields did not have DocValues enabled when we build the shards: crawl_date (TrieDateField): Unknown … Continue reading

Posted in eskildsen, Hacking, Low-level, Lucene, Solr | Leave a comment

Sparse facet counting on a web archive

This post is a folow-up to Sparse facet counting on a real index. Where the last post explored using a sparse counter for faceting on author on Statsbibliotekets index of library material, this post will focus on faceting on url … Continue reading

Posted in eskildsen, Faceting, Hacking, Low-level, Lucene, Performance, Solr, Uncategorized | Leave a comment

Sparse facet counting on a real index

It was time for a little (nearly) real-world testing of a sparse facet counter for Lucene/solr (see Fast faceting with high cardinality and small result set for details). The first results are both very promising and somewhat puzzling. The corpus … Continue reading

Posted in eskildsen, Faceting, Hacking, Low-level, Lucene, Performance, Solr | 2 Comments

Fast faceting with high cardinality and small result set

This is a follow-up to the idea presented  more than a year ago at https://sbdevel.wordpress.com/2013/01/23/large-facet-corpus-small-result-set/. It can be read independently of the old post. The premise is simple: We have a Lucene/Solr index and we want to do some faceting. … Continue reading

Posted in eskildsen, Faceting, Hacking, Low-level, Lucene, Solr | 4 Comments

Over 9000 facet fields

In some ways, Lucene’s faceting, Solr’s faceting and our own home brewed solution SOLR-2412 works the same: Keep a permanent list from document IDs to term IDs. When searching, create a list of counters for each term and count the … Continue reading

Posted in Faceting, Hacking, Low-level, Lucene, Solr, Uncategorized | 3 Comments