SolrWayback software bundle has been released

The SolrWayback software bundle can be used to search and playback archived webpages in Warc format. It is an out of the box solution with index workflow, Solr and Tomcat webserver and a free text search interface with playback functionality. Just add your Warc to a folder and start the index job.

The search interface has additional features besides freetext search. This includes:

  • Image search similar to google images
  • Search by uploading a file. (image/pdf etc.) See if the resource has been harvested and from where.
  • Link graph showing links (ingoing/outgoing) for domains using the D3 javascript framework.
  • Raw download of any harvested resource from the binary Arc/Warc file.
  • Export a search resultset to a Warc-file. Streaming download, no limit of size of resultset.
  • An optional built in SOCKS proxy can be used to view historical webpages without browser leaking resources from the live web.

See the GitHub page for screenshots of SolrWayback and scroll down to the install guide try it out.

Link: SolrWayback

 

solrwayback_search.png

solrwayback_linkgraph.pnggps_exif_search.png

 

Advertisements

About thomasegense

Thomas Egense Mathematician Works at The State and University Library, Denmark
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s