Monthly Archives: November 2016

70TB, 16b docs, 4 machines, 1 SolrCloud

At Statsbiblioteket we maintain a historical net archive for the Danish parts of the Internet. We index it all in Solr and we recently caught up with the present. Time for a status update. The focus is performance and logistics, … Continue reading

Posted in Hacking, Low-level, Performance, Solr, Statsbiblioteket, Uncategorized | 6 Comments

Prototype demo for OCR postfix in Danish Newspapers

In The Danish Newspaper Archive you can search in 25million newspaper pages and view the pages. The search engine uses OCR (optical character recognition) from scanned pages but often the software reading the text from the scanned images makes reading … Continue reading

Posted in Uncategorized | Leave a comment