ThreadLocal StringBuilders for Fast Text Processing

by

It is a common task for many server-like applications to process a lot of text-like objects in some streaming manner or other. Many Java programmers tend to think that it is a very good idea to do it like this:

public String processRecord(Record rec) {
    StringBuilder builder = new StringBuilder(1024);
    // Build a string from rec
    return builder.toString();
}

If this pattern looks completely sensible to you then please read this blog post carefully :-)

Debunking A Myth: Object Allocations Are Not Free

Allocating lots of Java objects is a bad idea; simple as that. I recently optimized our object allocation in the Summa indexer using the technique I am about to describe and it gave a 2-3 times increase of our overall throughput! Java, specifically the JVM, is not a magic beast that can dispose and allocate memory for free. The modern garbage collectors are very cool and advanced, but they are only so good. Even though Java is a garbage collected language you still have to think about your memory allocations!

In the example above the StringBuilder is especially bad because it typically allocates a rather big char array. So we should try to avoid that.

Resetting a StringBuilder

Contrary to the impression that the Javadocs for StringBuilder will give most people (in the classical over generalization manner “most people” will mean “me”). You can reset a StringBuilder by doing

builder.setLength(0);

This will not allocate a new char array underneath; I know because I checked the Java 6 source code.

Keeping Only One StringBuilder Around: Thread Locals

Thread locals are an often under used feature, both in Java and in many other languages. With Java generics they are actually a breeze to use. Firstly I should better clarify what “thread local” means. A variable that is thread local will only exist on the thread for which it was created. This makes it easier to handle concurrency because, well, you don’t have to :-) Each thread will have its own copy of the variable around.

Now might be a good time to check out the Javadocs for the TreadLocal class. To create a thread local string builder declare a variable like this:

private ThreadLocal<StringBuilder> threadLocalBuilder =
                                               new ThreadLocal<StringBuilder>() {
        @Override
        protected StringBuilder initialValue() {
            return new StringBuilder();
        }

        @Override
        public StringBuilder get() {
            StringBuilder b = super.get();
            b.setLength(0); // clear/reset the buffer
            return b;
        }

    };

Beware: The above thread local string builder will reset its character buffer each time you grab a reference to it. I usually find myself wanting this behavior, but if you don’t want this you should comment out the setLength(0) line.

To use the thread local builder in a method simply do:

public String processRecord(Record rec) {
    StringBuilder builder = threadLocalBuilder.get();
    // Build a string from rec
    return builder.toString();
}

Caveat Emptor

So what’s the catch? You will be keeping one string builder around for each new thread that ever enters processRecord(). This could potentially end up as lots of string builders if your application is designed like this. Also if you ever build a very large string the string builder will keep its internal character buffer at that size even though you reset it. It will be up to you dear reader to determine if that will be a problem for you. Note however that the thread local variable are deallocated when the thread owning them dies.

Of course one could also add some more intelligent resetting logic in in the ThreadLocal.get() method above. Like allocating a new string builder if b.capacity() becomes too big.

More Optimization: Don’t Allocate the Final String

The observant reader will notice that I also allocate a new String when I do builder.toString() in the end of processRecord(). This can also be avoided, but I have to change the method signature to return a Reader instead of a String:

public Reader processRecord(Record rec);

Unfortunately Java does not come with a Reader implementation that wraps a StringBuilder (or more generally a CharSequence). You can find such a CharSequenceReader in the Summa source code under the LGPL (I would have linked directly to the code in our SVN repo, but it requires a login (which you can create yourself, but it is a pain)).

So the exercise for the eager reader is to wrap that CharSequenceReader in a thread local to also avoid allocating that one again. Note that you also need to reset the CharSequenceReader by calling reset() on it.

Here’s to a faster future!

About these ads

3 Responses to “ThreadLocal StringBuilders for Fast Text Processing”

  1. Glen Newton Says:

    Depending on the nature of your application and how it uses threads and objects, using object pooling (like http://commons.apache.org/pool/) for StringBuilder might be a better solution. Of course, testing will verify which of these strategies is better for your particular application.

  2. Ernest Says:

    Thanks for the idea. I am trying this to lower the number of StringBuilders creations when producing the attributes for each HTML element in renderSnake.

  3. Teettemoons Says:

    Man .. Excellent .. Wonderful .. I will bookmark your website and take the feeds additionallyI am satisfied to find so many helpful information right here within the post, we need develop more techniques in this regard, thanks for sharing. . . . . .

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

%d bloggers like this: