A productive day today!
Henrik and I sat down and pushed a preliminary did-you-mean service to Summa trunk. It’s to be considered pre-alpha quality at this stage, but it will be mature for Summa 1.5.2. It’s based on Karl Wettin’s old lucene-didyoumean contrib that lingered in the Apache bugtracker for years (yes, and I mean years literally). You can find the updated code on github.com/mkamstrup/lucene-didyoumean there are branches in the Git repo for Lucene 3.0 (master), Lucene 2.9.* and Lucene-2.4.* – but be aware that this code has not been production tested yet.
The large scale testing of Juglr I just referred to is actually a new project of mine… Man – I tend to spew out a few too many projects these days :-). The new project on the stack is Higgla – with tag line: “a lightweight, scalable, indexed, JSON document storage”. If you are wondering where the name came from I can inform you that Higgla is Jamaican for “higgler” which a quick Googling defines as “A person who trades in dairy, poultry, and small game animals; A person who haggles or negotiates for lower prices“. The point is that Higgla is about dealing with any old sort of data and doing it in a very non-formal way.
Higgla is very much in the spirit of CouchDB and Elastic Search – a schema free database, and not just schema free, but completely schema free. There is no structure implied as CouchDB’s Views does. Indexing is done on a document level, and each document need not have the same searchable fields as others. Heck each document revision does not need to be indexed the same way as the previous revision!
As I hinted, Higgla is based on Juglr. Higgla illustrates pretty well the power of a combined actor+http messaging stack like Juglr – if you browse the source code you will see that there really is not a lot of it!
In there core Higgla leverages the always awesome Lucene. I had to think quite hard to make the storage engine transaction safe in a massively parallel setup because Lucene doesn’t as such support parallel transactions (but it does support sequential transactions quite well). I figured it out eventually though.
Even though this is just a 0.0.1 Higgla already ships with Python- and Java client libraries (even though talking straight HTTP+JSON shouldn’t be that hard in most frameworks, it’s still nice with a simple convenience lib). An example with the Python client looks like:
import higgla import json # Connect to the server session = higgla.Session("localhost", 4567, "my_books") # Prepare a box for storage, with id 'book_1', # revision 0 (since this is a new box), and indexing # the fields 'title' and 'author' box = session.prepare_box("book_1", 0, "title", "author") # Add some data to the box box["title"] = "Dive Into Python" box["author"] = "Mark Pilgrim" box["stuff"] = [27, 68, 2, 3, 4] # Store it on the server session.store([box]) # Now find the box again query = session.prepare_query(author="mark") results = session.send_query(query) print json.dumps(results, indent=2) print "TADAAA!"
That completes the Puthon example. The Java API is almost identical so I wont cover it, although I can’t do the same fancy varargs stuff that Python provides 🙂