Archive for April, 2009

Rising Summa

April 21, 2009

The people higher up in the food-chain has decided to provide a Summa based search backend to public libraries in Denmark. For an annual fee, Statsbiblioteket handles the flow of data from raw dumps to webservices and keeps the servers running. Maintenance, money and sales is fairly boring, seen from our developer perspective, but tweaking Summa to allow for easy experimentation and setup has been very rewarding.

As usual Mikkel weaved his magic and created a package (read: A collection of scripts and all the JARs from the Summa project) that makes it very simple to set up a local Summa for experimentation. The working title was Summix, but we all know how it goes with working titles.

With some late night fiddling, the complexity was reduced to “Unzip and run a script”, which gets a Summa demo running with a skeleton web front end. Added bonus? It runs under Windows as well as Linux (and probably OS X too, but we haven’t checked). We will write a tutorial on the wiki Real Soon Now.

Getting there...

Getting there...

Asking for Trouble? You’ve Come to the Right Place!

April 2, 2009

Toke was asking for trouble yesterday. I would assume that he knew me better by now… With my last commit it is now actually possible to inline Javascript inside your configuration when using a ScriptFilter.

The following now actually works:

<xproperties>
  <entry>
    <key>filter.name</key>
    <value class="string">InlineJavascriptTest</value>
  </entry>
  <entry>
    <key>summa.filter.sequence.filterclass</key>
    <value class="string">dk.statsbiblioteket.summa.common.filter.object.ScriptFilter</value>
  </entry>
  <entry>
    <key>filter.script.inline</key>
    <value class="string"><![CDATA[
                    payload.getRecord().setId('inlineJavascript');
    ]]></value>
  </entry>
</xproperties>

I can not begin to enumerate all the dangers in doing this, but somehow the thrill of the possibilities got the better of me. If Javascript isn’t your thing you can specify the script language to anything supported by your Java runtime by defining the property filter.script.lang.

So – use this at your own peril!

UPDATE: You can find a list of available ScriptEngines for Java at scripting.dev.java.net.

Javascript Filters in Summa

April 1, 2009

I just completed the draft implementation of Javascript filters for Summa and I am posting here to hear some comments. If nobody complains the existing implementation will be likely to stay unchanged. Really the implementation supports any old scripting language supported by the ScriptEngineManager of the JVM, but in practice Javascript will probably be the most important one.

The scripting environment will include two “magic” variables: payload and allowPayload. Unsurprisingly the payload variable contains a reference to the Payload object being processed. The allowPayload variable is a boolean value that defaults to true. If allowPayload is set to false the payload will be dropped from the processing pipeline.

Update: The script filters now have a third magic variable called log sporting the methods log.trace|debug|info|warn|error|fatal(string).

The best way to explain this is probably with an example. To write a Javascript filter for Summa create a file called myFilter.js with the following content:

var record = payload.getRecord();

if (!record.getId().startsWith(record.getBase())) {
    record.setId(record.getBase() + "_" + record.getId())
}

if (record.getId().endsWith('taboo')) {
    allowPayload = false;
}

This script will make sure that all records have their ids prefixed with their base name, and will filter out any records which id ends with “taboo”.

To plug the script into your filter pipeline you need to stick something like the following in your filter chain configuration:

<xproperties>
    <entry>
        <key>filter.name</key>
        <value class="string">FixRecordIdsandDropTaboos</value>
    </entry>
    <entry>
        <key>summa.filter.sequence.filterclass</key>
        <value class="string">dk.statsbiblioteket.summa.common.filter.object.ScriptFilter</value>
    </entry>
    <entry>
        <key>filter.script.url</key>
        <value class="string">http://example.com/filters/myFilter.js</value>
    </entry>
</xproperties>

I am using the ScriptEngine framework which appeared in Java 6 for all of this, and all in all the development experience has been quite nice. Writing this blog post took almost as long as it took me to write that filter :-)

UPDATE: You can find a list of all available scripting engines at scripting.dev.java.net.