Juglr, Higgla, and DidYouMean

A productive day today!

Summa DidYouMean

Henrik and I sat down and pushed a preliminary did-you-mean service to Summa trunk. It’s to be considered pre-alpha quality at this stage, but it will be mature for Summa 1.5.2. It’s based on Karl Wettin’s old lucene-didyoumean contrib that lingered in the Apache bugtracker for years (yes, and I mean years literally). You can find the updated code on  github.com/mkamstrup/lucene-didyoumean there are branches in the Git repo for Lucene 3.0 (master), Lucene 2.9.* and Lucene-2.4.* – but be aware that this code has not been production tested yet.

Juglr 0.2.1

My pet peeve project, the actor model and messaging library for Java 6+, Juglr,  has hit 0.2.1. I’ve now done some more large scale testing with and it seems to work pretty well.

Introducing Higgla

The large scale testing of Juglr I just referred to is actually a new project of mine… Man – I tend to spew out a few too many projects these days🙂. The new project on the stack is Higgla – with tag line: “a lightweight, scalable, indexed, JSON document storage”. If you are wondering where the name came from I can inform you that Higgla is Jamaican for “higgler” which a quick Googling defines as “A person who trades in dairy, poultry, and small game animals; A person who haggles or negotiates for lower prices. The point is that Higgla is about dealing with any old sort of data and doing it in a very non-formal way.

Higgla is very much in the spirit  of CouchDB and Elastic Search – a schema free database, and not just schema free, but completely schema free. There is no structure implied as CouchDB’s Views does. Indexing is done on a document level, and each document need not have the same searchable fields as others. Heck each document revision does not need to be indexed the same way as the previous revision!

As I hinted, Higgla is based on Juglr. Higgla illustrates pretty well the power of a combined actor+http messaging stack like Juglr – if you browse the source code you will see that there really is not a lot of it!

In there core Higgla leverages the always awesome Lucene. I had to think quite hard to make the storage engine transaction safe in a massively parallel setup because Lucene doesn’t as such support parallel transactions (but it does support sequential transactions quite well). I figured it out eventually though.

Even though this is just a 0.0.1 Higgla already ships with Python- and Java client libraries (even though talking straight HTTP+JSON shouldn’t be that hard in most frameworks, it’s still nice with a simple convenience lib). An example with the Python client looks like:

import higgla
import json

# Connect to the server
session = higgla.Session("localhost", 4567, "my_books")

# Prepare a box for storage, with id 'book_1',
# revision 0 (since this is a new box), and indexing
# the fields 'title' and 'author'
box = session.prepare_box("book_1", 0, "title", "author")

# Add some data to the box
box["title"] = "Dive Into Python"
box["author"] = "Mark Pilgrim"
box["stuff"] = [27, 68, 2, 3, 4]

# Store it on the server
session.store([box])

# Now find the box again
query = session.prepare_query(author="mark")
results = session.send_query(query)
print json.dumps(results, indent=2)
print "TADAAA!"

That completes the Puthon example. The Java API is almost identical so I wont cover it, although I can’t do the same fancy varargs stuff that Python provides🙂

This entry was posted in Database, Hacking, kamstrup, Lucene, open source, Summa and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s