Author Archive

Juglr, Higgla, and DidYouMean

February 10, 2010

A productive day today!

Summa DidYouMean

Henrik and I sat down and pushed a preliminary did-you-mean service to Summa trunk. It’s to be considered pre-alpha quality at this stage, but it will be mature for Summa 1.5.2. It’s based on Karl Wettin’s old lucene-didyoumean contrib that lingered in the Apache bugtracker for years (yes, and I mean years literally). You can find the updated code on  github.com/mkamstrup/lucene-didyoumean there are branches in the Git repo for Lucene 3.0 (master), Lucene 2.9.* and Lucene-2.4.* – but be aware that this code has not been production tested yet.

Juglr 0.2.1

My pet peeve project, the actor model and messaging library for Java 6+, Juglr,  has hit 0.2.1. I’ve now done some more large scale testing with and it seems to work pretty well.

Introducing Higgla

The large scale testing of Juglr I just referred to is actually a new project of mine… Man – I tend to spew out a few too many projects these days :-). The new project on the stack is Higgla – with tag line: “a lightweight, scalable, indexed, JSON document storage”. If you are wondering where the name came from I can inform you that Higgla is Jamaican for “higgler” which a quick Googling defines as “A person who trades in dairy, poultry, and small game animals; A person who haggles or negotiates for lower prices. The point is that Higgla is about dealing with any old sort of data and doing it in a very non-formal way.

Higgla is very much in the spirit  of CouchDB and Elastic Search – a schema free database, and not just schema free, but completely schema free. There is no structure implied as CouchDB’s Views does. Indexing is done on a document level, and each document need not have the same searchable fields as others. Heck each document revision does not need to be indexed the same way as the previous revision!

As I hinted, Higgla is based on Juglr. Higgla illustrates pretty well the power of a combined actor+http messaging stack like Juglr – if you browse the source code you will see that there really is not a lot of it!

In there core Higgla leverages the always awesome Lucene. I had to think quite hard to make the storage engine transaction safe in a massively parallel setup because Lucene doesn’t as such support parallel transactions (but it does support sequential transactions quite well). I figured it out eventually though.

Even though this is just a 0.0.1 Higgla already ships with Python- and Java client libraries (even though talking straight HTTP+JSON shouldn’t be that hard in most frameworks, it’s still nice with a simple convenience lib). An example with the Python client looks like:

import higgla
import json

# Connect to the server
session = higgla.Session("localhost", 4567, "my_books")

# Prepare a box for storage, with id 'book_1',
# revision 0 (since this is a new box), and indexing
# the fields 'title' and 'author'
box = session.prepare_box("book_1", 0, "title", "author")

# Add some data to the box
box["title"] = "Dive Into Python"
box["author"] = "Mark Pilgrim"
box["stuff"] = [27, 68, 2, 3, 4]

# Store it on the server
session.store([box])

# Now find the box again
query = session.prepare_query(author="mark")
results = session.send_query(query)
print json.dumps(results, indent=2)
print "TADAAA!"

That completes the Puthon example. The Java API is almost identical so I wont cover it, although I can't do the same fancy varargs stuff that Python provides :-)

What did I mean?

January 28, 2010

It has been a long standing wish to get a good did-you-mean service shipped with Summa. And by “did-you-mean service” I mean the little helpful tip that shows up underneath the text entry when you mistype something when doing a search. Note that I say “mistype” and not “misspell”, because a good did-you-mean service is a lot more complex than a spell checker.

Consider when I read aloud my wish list to my mom over the phone and I try to explain to her that I badly want “Heroes of Might and Magic”  for Christmas. This phrase being completely meaningless to her she types in the search field:

heroes of light and magic

Notice that this is indeed a correctly spelled phrase, but nonetheless not what she/I wanted. A good search engine would ask Did you mean: “heroes of might and magic”?.

On the other hand if a search engine runs on a database of bad monochrome underworld games and “Heroes of Might and Magic” wasn’t there, but instead the index contained a game called “Heroes of Fight and Magic” the search engine should of course suggest Did you mean: “heroes of fight and magic”? in stead.

So we’ve identified two things we want that a normal spellchecker doesn’t provide:

  • Consider each word in a query in the context of the whole phrase it appears in
  • Only suggest stuff that is actually in the index

The Code

After surveying what was available on the open source market we realized that none of the solutions out there did what we wanted. I was pointed at Karl Wettin’s work on LUCENE-626. Although Karl’s work is great, it’s not compatible with the new API in Lucene >= 3.0 and it has a hardwired dependency on Berkely DB that we could not accept. So I branched his work in order to bring it into 2010 and I am proud to say that I’ve now reached an almost-works state. You can find the code on GitHub: github.com/mkamstrup/lucene-didyoumean

The new thing about this is also that we are now engaging in upstream Lucene work, rather than staying in our own Summa backyard. Quite exciting, and a very rewarding experience for a software developer. Toke has some more news in this regard as well – he’s doing some upstream stuff that has far bigger implications than my odd-job did-you-mean-hacking. But I’ll leave you hanging there and let Toke talk about this himself.

Juglr 0.0.1

January 15, 2010

Yesterday I did two things (well many more things, especially some boring things, but I am not gonna lament those for now); I released Juglr 0.0.1 and moved the Juglr Git repository to GitHub. First about the release…

Juglr 0.0.1

What is Juglr: It’s a small asynchronous messaging and actor model library for Java 6 and later. It utilizes a no-fuzz approach, meaning that there is no domain specific XML, autogenerated code, or pre- or post processing steps. Just plain old Java as you know it.

I promised 11 days ago that the first Juglr release would be out “Real Soon Now”(TM). And if you bear with me and accept that 11 days is “soon”, then here you are. As the version number indicates this is not something that I would deploy in a production environment just yet, but it’s still a functioning release.

UPDATE 2010-01-15: Released Juglr 0.0.2 with some fixes in HTTPServerExample

Moved Juglr Code Repository

As you may be aware the Juglr project started out on Gitorious, chosen because it is open source, supports Git, and is quite fast. However I was in bad need of a file release system and a bug tracker and Gitorious didn't provide this. Source Forge is slower than the human mind can comprehend and I really wanted to use Git because it seems to be the DVCS with most traction here at the State and University Library of Denmark, so no Launchpad for Juglr even though LP is my prefered project hosting (since LP only supports Bazaar, not Git).

After evaluating a few options GitHub provided all I needed and then some. It's very fast and I like it already.

The case: RPC vs. Messaging

December 17, 2009

There’s a classical flamefest discussion about Remote Procedure Calls (RPC) vs. Messaging. People much brighter than me has discussed it elsewhere, but that doesn’t stop me from throwing in my 2 cents. It appears to me that there is a whole crowd of people still refusing to realize why RPC is so bad.

Before I get too deep in this let’s get the terms RPC and Messaging more well defined. I wont claim that I have the “correct” definitions, but here’s what I mean when I use those terms: RPC is a mechanism that allows you to call methods on remote services as though they where methods on a local object. In pseudo code:

calc = lookup_calculator_service("127.0.0.1", 8080)
four = calc.add(2, 2)
eight = calc.multiply(2, 4)
print ("Result of (2+2)+(2x4) = " + calc.add(four, eight))

For Messaging consider it like email, not between people but between different apps on different machines. A message is typically some container-like format with some extra metadata naming the sender and the recipent(s), maybe timestamps and serial numbers. All you can do in a messaging system is basically to send a message to a particular address. Whether or when the resident at that address responds is not possible to determine - just like email in that sense. For a large scale example of a messaging system we have the internet itself. The very much hyped REST interactions of online services is also an example where messaging is starting to show success.

Back to the RPC example above - it's very convenient and easy to work with right? If this example is really all you need to do, then I tend to agree that this kind of RPC is fine. But what happens if you are writing a mission critical system where data integrity is paramount, you have lots of interconnected services, and needs low latency and high throughput? Let's examine the situation a bit...

The server might be implemented as:

function add (num1, num2) {
    return num1 + num2
}

The RPC system would then wrap the server object and expose some predfined methods as remote methods. It magically parses incoming calls and delegates control to my server's add() function giving it the right arguments.

Problems of RPC

What happens in line 2 in the client code above if calc.add(2,2) causes the calculator service to go out of memory? Some RPC systems like Java RMI has the "feature" of sending you the raw exceptions as they happen on the server directly. In case of an OutOfMemoryError (OOM) the exception would completely escape the server's logging or critical error handling and be send to the caller. Our calculator client then gets an OOM without the slightest chance of figuring out whether it is itself OOM or the server is OOM. And all while the client thinks it is OOM, and might crash, the server which is really OOM happily chucks along down whatever path of complete failure lies ahead of it.

This can be solved partially if the client wraps all remote calls in try/catch clauses catching the most general type of error the runtime has. In Java this would be Throwable. Also the server needs to wrap all of its remotely available methods in try/catch in order to shut down nicely (or protect itself in some way) in case of OOM or other critical errors. So our previous example now becomes:

calc = lookup_calculator_service("127.0.0.1", 8080)
try {
    four = calc.add(2, 2)
} catch (Throwable t) {
   log.warn("Error adding numbers!")
   return
}
... Nah... I am pretty sure you don't want to read the rest of the try/catch hell

As you of course realize this can all be solved by thorough exception handling in both clients and servers. It wont be fun, but it can be done. Let's call this problem the Non-Local Exceptions Problem.

The next problem inherent in RPC could be called the Indirect Memory Allocation Problem. This problem arises anywhere you accept a datastructure of an arbitrary size in your methods' arguments, eg. an array. Suppose I change my calculator server's API to be more flexible, so that the add() method takes an array of numbers to add, like calc.add([1,2,3,4,5,6]) = 21. Now what happens if a client sends me an array with 10^9 numbers to add? If we assume that a number is 4 bytes, then the RPC system on the server will try to allocate 4*10^9 = 4GB for the array before passing control into add(). This will likely cause the server to OOM before even reaching into my method.

To handle the indirect memory allocation problem I must either be able to ensure that my RPC system will not allow clients to send such huge arguments, or be able to parse the arguments in some streaming manner on the server side - but the latter does not sound a lot like RPC does it?

- and note that the indirect memory allocation problem is not only on the server side. The server may also return a huge datastructure as a method response so the client needs to guard against this too.

Next up on the list of problems is the Blocking Calls Problem. When the client calls to the server it issues a request over the network and really has no way to anticipate when that call returns. While it waits it blocks the thread from which it is calling (or at least all RPC systems I know does this). So if you want to do concurrent calls you'd have one thread per call in progress. If you've never seen an app go belly up because of thread starvation or I bet you've never programmed multi threaded production systems. Blocking calls make your system more fragile and also much more affected by network latency.

Skipping on to the next problem, this one particularly strikes strongly typed programming languages (like Java, which we use a lot here at the State and University Library of Denmark). Let's call it the Static Interface Problem. In a strongly typed language you need to be able to resolve the method signatures at compile time (that or use varargs signatures everywhere -eeeks!). In order to do this one frequently hand writes or autogenerates some interface- or stub classes. If the remote API changes you app is likely to crash or simply not run at all - the interface classes needs to be regenerated and your code recompiled against these new interfaces. If you are a purist you might say that such pubilc interfaces should never change and that I must surely be a slacker since I even bring this up, but the sad fact of the matter is that in real life you can not control the entire world and interfaces do change.

Looking back RPC have:

  • Non-Local Exceptions
  • Indirect Memory Allocation
  • Blocking Calls
  • Static Interfaces (in strongly typed languages)

The way these problems are solved in an RPC context is typically to write a CalcClient class which does the needed client side magic (catching exceptions, delegating work to an async thread, hides the remote interface declaration etc.) and then pass a bunch of HashMaps or parametized Value types around with each method where you can stuff any arguments you need to add to the interface in a backwards compatible way. The only thing that is nearly impossible to tackle is the indirect memory allocation problem.

Enter Messaging. Messaging solves all of the above problems in one fell swoop, and if you decide to use a standard, like HTTP, for the connections then you can even talk to you messaging services via you browser or standard Unix command line tools like wget or curl.

Tooting my Own Horn

The above list of problems is not just pulled out of my hat. We have seen, and faught, them all in Summa.  To start moving down the messaging road i started the no-nonsense Juglr project on Gitorious. It's still far from ready but it's coming along nicely. In a nutshell it is an Actor model implementation coupled with a JSON/HTTP high performance messaging system. In order not to reinvent the wheel too much I am basing the actor model implementation on Doug Lea's Fork/Join framework that is also scheduled for inclusion in Java 7.

Real life examples

A non-complete list of the RPC systems I've crossed paths with:

  • Java RMI
  • SOAP
  • CORBA

A ditto list of Messaging systems:

  • HTTP and Email
  • REST(ful) web services
  • DBus
  • Protocol Buffers

The last two: DBus and Protobufs deserve an extra note. When you get down at the protocol level these two systems are indeed both messaging systems, but they are most often used as RPC systems! I am honestly now sure why it is so, but it's probably because it is (deceptively) easier to get started with an RPC based approach.

Solid Toys for the Boys

December 8, 2009

As some may know we have experimented quite a bit with Lucene indexes on Solid State Drives and we’ve had very good experiences with it. Seeing huge performance gains. Since we are also routinely running big applications and other heavy duty tasks on our desktop machines our dear Toke had the idea that we should all have SSDs in our desktops. After a good deal of shopping about he settled on the Kingston v 40GB drive as research revealed that this exact model had the good Intel metal inside (this is fx. not the case for the 64GB model).

Yesterday we got the delivery and immediately start unpacking and upgrading our machines. And boy where these babies worth every penny! :-)

(sorry for the ugly scaling of the following images – WordPress is killing me)

Toke was the Super delivery boy

Quick - get them before they are gone!

Yours truly is a Super happy camper

Super tag team getting their hands dirty

Firstly we did clean installations of Ubuntu. With a 10GB root partition and a ~26GB /home partition and ~4GB swap. Root and /home formatted with Ext4. All on the SSD. The time?

  • Installing Ubuntu Karmic 64 bit from USB stick: 4 minutes (with ~1 minute waiting for network on a slow repository)

The next thing was the boot… While we where rebooting from the install-session we talked about how fast the boot was going to be. But in the talking we almost didn’t react before the reboot was back up to the login screen. Wow. As we didn’t have a timer with sub-minute resolution at hand we can only give you subjective numbers. Among the spectators the opinions range from “negative time” to “5s” to “10s”. My personal estimates are:

  • Boot from GRUB to GDM login screen: 5s
  • From login screen to working GNOME desktop: 4s

This is pretty darn fast I tell you :-)

In general application launching is also noticably faster. Especially so for applications with lots of IO, likethe  Evolution mail reader or our development environment IntelliJ Idea. Compiling the Summa project is also a heavily IO bound process. The result:

  • Compiling Summa from scratch with cold disk caches: With conventioanl drives ~6 minutes. With our new SSDs 2.5 minutes. That’s a speedup of a factor ~2.5.

As you might have guessed by now – we like SSDs – a lot!

IntelliJ Idea Open Sourced

October 16, 2009

Wow, I must admit that the latest news from JetBrains takes me quite by surprise! But what a sweet surprise it is!

Scala and Git support out of the box you say? This is more than welcome – now the next generation development experience is enabled out of the box.

I can’t help but wonder why they did it though? Growing pressure from Netbeans and Eclipse? I’ve always thought that Idea was the better of the three – thus expecting it to generate a fine revenue? Perhaps not -  or perhaps JetBrains had a sudden fit of philanthropy? Or perhaps open source is just a superior development model. No matter the true motivation I am pretty hyped about this :-)

An Excursion in Java Recursion

September 4, 2009

A quick Googling defines Excursion as: “a journey taken for pleasure”. Considering what I am about to blog about the title of this blog post might be a bit misleading, but you gotta give me one for the rhyme ;-)

As you might or might not know, doing recursion in Java is simply a bad thing. This is mainly because Java can’t do tail recursion. You can use recursion in Java if you are absolutely positive that you are only going to do a very limited number of recursive calls. If you could possibly go over 100 calls you should consider making it a for or while loop instead, if the Java runtime performs somewhere around 1000 recursive calls you will get a StackOverflowError. This is really bad – you see if you read the StackOverflowError docs you will see that it is a subclass of VirtualMachineError. The docs for VirtualMachineError says:

Thrown to indicate that the Java Virtual Machine is broken or has run out of resources necessary for it to continue operating

This means that you have pretty much no choice but to log a fatal error and abort the JVM.

There are ways for making the recursion limit of the JVM bigger by setting some system properties, but that is really just a band aid and I would advise against using them.

The Real Life Case: XML Parsing

Java 6 ships with a new XML parsing library, the core class of which being XMLStreamReader (also known as the “push parser”). I must say that it is quite a nice library and a huge improvement over SAX parsing, while still keeping a blazing performance. We use it in Summa and has been very happy with it.

The problem came when we started indexing documents like this one: java-recursion-lection-1.xml. You can definitely expect to find similar structures out in the wild (as we have seen here at work). The basic document structure is as follows:

<mydocument>
  <mytag>
     SOME TEXT BLOCK
  </mytag>
</mydocument>

If we just want to extract the text block it would be annoying with a standard SAX parser because a SAX parser splits up characters segments into arbitrary chunks and you have to collect them into one string yourself. The push parser API makes this a lot easier because it defines the property XMLInputFactory.IS_COALESCING which, when set, requires the parser to collect all the text chunks into one string. So extracting the raw text contents is easy peasy lemon squeezy:

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;
import javax.xml.stream.events.XMLEvent;
import java.io.FileReader;

/**
 * A small excursion in Java recursion.
 */
public class JavaRecursionLecture1 {

  public static void main(String[] args) throws Exception {
    XMLInputFactory inputFactory = XMLInputFactory.newInstance();
    inputFactory.setProperty(XMLInputFactory.IS_COALESCING, Boolean.TRUE);

    XMLStreamReader reader = inputFactory.createXMLStreamReader(
               new FileReader("/home/mke/Documents/java-recursion-lection-1.xml"));
    parse(reader);
  }

  public static void parse(XMLStreamReader reader) throws Exception {
    while (reader.hasNext()) {
      int event = reader.getEventType();
      switch (event) {
        case XMLEvent.START_DOCUMENT :
          System.out.println("Document start");
          break;
        case XMLEvent.START_ELEMENT :
          System.out.println("Element: " + reader.getLocalName() );
          break;
        case XMLStreamReader.CHARACTERS :
          // Warning: Here be StackOverflowErrors
          System.out.println("Char data:\n" + reader.getText());
          break;
      }
    reader.next();
    }
  }
}

Except that this will throw a StackOverflowError if you run it on the file I linked you to. "What is up with that, there is no recursion here!" - you ask?

The problem here is that XMLStreamReader is highly recursive underneath the hood. My file contains lots of XML entities and the parser will make a recursive call each time a new entity is found in the stream. Looking at the heart of the implementation you will see that the author(s) actually where very minute about making sure that all recursive calls where tail calls. This would have been very robust had the Java runtime supported tail recursion - alas.

There are two ways to work around this misfeature. The first one is to don't set the IS_COALESCING property, and then change the switch statement to something like this, using reader.getElementText() instead:

switch (event) {
  case XMLEvent.START_DOCUMENT :
    System.out.println("Document start");
    break;
  case XMLEvent.START_ELEMENT :
    System.out.println("Element: " + reader.getLocalName() );

    if ("mytag".equals(reader.getLocalName())) {
      System.out.println(reader.getElementText());
    }
    break;
  case XMLStreamReader.CHARACTERS :
    // Warning: Here be StackOverflowErrors
    System.out.println("Char data:\n" + reader.getText());
    break;
 }

This is not particularly elegant since it hard codes our <mytag> element. A more generic way is to provide your own coalescing implementation of getText():

/**
 * Use this method in response to XMLEvent.CHARACTERS event instead of
 * XMLStreamReader.getElementText() on a XMLEvent.START_ELEMENT. The former
 * approach will
 * @param reader the XMLStreamReader to pull character data out of,
 *               the reader is expected to be in a XMLEvent.CHARACTERS state
 * @return A string containing the full character data as one string
 * @throws Exception if the Jupiter aligns with Mars
 */
 public static String getCoalescedText(XMLStreamReader reader)
 throws XMLStreamException {
   StringBuilder builder = new StringBuilder();
   char[] buf = new char[1024];

   while (reader.getEventType() == XMLEvent.CHARACTERS) {
     int offset = 0;
     int len;
     while (true) {
       len = reader.getTextCharacters(offset, buf, 0, buf.length);
       if (len != 0) builder.append(buf, 0, len);
       if (len < buf.length) break;
     }
     reader.next();
   }
   return builder.toString();

And then in the switch branch checking on character events do:

     case XMLStreamReader.CHARACTERS :
       // Warning: If you expect a StackOverflowError here, you are
       //          going to wait a long while!
       System.out.println("Character data:\n"
                          + getCoalescedText(reader));
       break;

Anyway - this became a long an code-full post. All I really wanted to say was Avoid recursion in Java unless you know exactly what you are doing.

Summa Moving to SourceForge

August 4, 2009

Yesterday I had the pleasure to announce on the mailing lists that Summa has reached the first milestone in migrating to SourceForge, and here follows the blog post :-)

From now on all Summa code is hosted and developed in the “summa” project on SourceForge now, in addition all bugs have been migrated from our old GForge solution to a Trac instance hosted via the cool new “hosted apps” functionality on SourceForge.

We will also move the mailing lists over in the near future. The fate of the Summa wiki is still left unclear.

I must be frank and admit that I have long felt that SourceForge was in a bit of a standstill applying only visual refreshes every now and then, and never fixing the real issues with the site. However the new Hosted Apps approach is simply sweet! There is a huge list of popular open source products you can choose to run on your site as a hosted apps (see an incomplete list here). For instance; some may surprised to know that popular version control systems such as Git, Mercurial, and Bazaar is supported as well as Subversion. Right now we run only a Trac issue tracker and a Subversion repository.

On a personal note I must still admit that my heart lies with the recently open sourced Launchpad, despite the recent kick-assiness from the SF team.

Efficient sorting and iteration on large databases

June 15, 2009

Before you read on, heed my words that this post might be a wee bit technical… If not extremely technical – caveat emptor

The Problem

In our continuous quest for a blazingly fast Summa, we ran into a performance problem extracting and sorting huge result sets from our caching database. Concretely we store ~9M rows in a H2 database, all records are annotated with a modification time (henceforth mtime) and we use this timestamp to determine if we need to update the index. When updating the index we read records from the database, sorted by this mtime column.

This means that for the initial indexing we create a sorted result set of 9M records. The first observation is that we should definitely have an index on the mtime column. Even with that, many databases will take some time for such queries and it might lead to big memory allocations or temporary tables being set up. We don’t want any of that. We want lightweight transactions and speed!

Take One, LIMIT and OFFSET

The naive approach (atleast, the first thing that I tried!) is to use the LIMIT and OFFSET Sql statements to create small result sets, of size 1000, and then do client side paging, something alá:

  SELECT * FROM records ORDER BY mtime OFFSET $last_offset LIMIT 1000

Here we increment last_offset by 1000 each time we request a new page. However this solution will perform extremely bad. The database server will need to discard the first last_offset records before it can return the next 1000 records to you, when we are talking millions of records this can be quite an overhead. The database can not apply any smart tricks to make this fast because it has no a priori way to find out where the record with offset last_offset into the result set begins.

Take Two, Salted Timestamps

So what can we do? The thing that databases are fast at is looking stuff up in indexes. We need to make it use some indexes to calculate the pages... 

The idea is to use the index on the mtime column to calculate the offset, then when we request a new page we use the mtime of the last record in the last result set. This may just work out because we sort everything by mtime. Maybe like:

  SELECT * FROM records WHERE mtime>$last_mtime ORDER BY mtime LIMIT 1000

Alas, this contains a subtle bug. Since we might insert more than one record per millisecond the mtime of a record might not be unique. This means that we might skip some records in between pages or include some records in multiple pages.

If we somehow force the mtimes to be unique the above query would actually work. One solution is to always ensure that there is at minimum 1ms between each insertion - this is way too slow for us, so we deviced what we have dubbed Salted Timestamps.

Instead of using 32 bit INTEGERs for mtime we use 64 bit integers (a BIGINT on most Sql servers). We move the actual timestamp to the most significant 44 bits and then store a salt in the least significant 20 bits. The salt is basically just a counter that is reset each millisecond, meaning that we can add 1048576 records per millisecond before we run out of salts. With this construct way we get a "timestamp" that still sorts correctly and we can even create a UNIQUE index on the mtime column.

Conclusion

We have adopted the approach with salted timestamps as described above for Summa and so far it has proven to perform quite well (avg. ~2000 records/s). An added bonus is that we only put very light load on the db, because the transactions are small and fast. You can find an implementation of this scheme in the DatabaseStorage* class and the timestamp handling in the UniqueTimestampGenerator* class in the Storage module in the Summa sources.

*) Most sorry that these links require a login (which is freely available, but anyway) - we are working on a solution with anonymous access. More on that later.

Asking for Trouble? You’ve Come to the Right Place!

April 2, 2009

Toke was asking for trouble yesterday. I would assume that he knew me better by now… With my last commit it is now actually possible to inline Javascript inside your configuration when using a ScriptFilter.

The following now actually works:

<xproperties>
  <entry>
    <key>filter.name</key>
    <value class="string">InlineJavascriptTest</value>
  </entry>
  <entry>
    <key>summa.filter.sequence.filterclass</key>
    <value class="string">dk.statsbiblioteket.summa.common.filter.object.ScriptFilter</value>
  </entry>
  <entry>
    <key>filter.script.inline</key>
    <value class="string"><![CDATA[
                    payload.getRecord().setId('inlineJavascript');
    ]]></value>
  </entry>
</xproperties>

I can not begin to enumerate all the dangers in doing this, but somehow the thrill of the possibilities got the better of me. If Javascript isn't your thing you can specify the script language to anything supported by your Java runtime by defining the property filter.script.lang.

So - use this at your own peril!

UPDATE: You can find a list of available ScriptEngines for Java at scripting.dev.java.net.


Follow

Get every new post delivered to your Inbox.