Author Archives: thomasegense

About thomasegense

Thomas Egense Mathematician Works at The State and University Library, Denmark

SolrWayback Machine

Another ‘google innovation week’ at work has produced the SolrWayback Machine. It works similar to the Internet Archive: Wayback Machine (https://archive.org/web/) and can be used to show harvested web content (Warc files).  The Danish Internet Archive has over 20billion harvested … Continue reading

Posted in Uncategorized | 1 Comment

Automated improvement of search in low quality OCR using Word2Vec

This abstract has been accepted for Digital Humanities in the Nordic Countries 2nd Conference, http://dhn2017.eu/ In the Danish Newspaper Archive[1] you can search and view 26 million newspaper pages. The search engine[2] uses OCR (optical character recognition) from scanned pages … Continue reading

Posted in Uncategorized | Leave a comment

Prototype demo for OCR postfix in Danish Newspapers

In The Danish Newspaper Archive you can search in 25million newspaper pages and view the pages. The search engine uses OCR (optical character recognition) from scanned pages but often the software reading the text from the scanned images makes reading … Continue reading

Posted in Uncategorized | Leave a comment

2D visualization of high dimensional word embeddings

In this blog post I tried to make an method for a computer to  read a text and analyse the characters and then make a 2D visualization of the similarity of the characters. To achieve this I am using the … Continue reading

Posted in Uncategorized | 1 Comment