Archive for December, 2008

Toke’s Christmas Present for Y’all

December 19, 2008

Just one last status update before I rush off to buy the last Christmas presents. Toke just converted Summa’s internal Lucene document builder to use the new XMLStreamReader found in Java 6, instead of using some XPath/DOM magic we had been using hitherto when indexing in Summa (yeah, we know DOM parsing is silly here, but it had proven stable and “OK” for a very long time).

This provided an overal indexing speedup of a factor 8. I think that is Toke’s way of saying “merry Christmas everybody” :-)

Scp magic

December 19, 2008

Ever needed to copy a file from one host to another, but being forced to go over an intermediate host because you don’t have the rights to access the machine with the file directly?

Let me put an end to your dispair then! Here’s a small script called scp-via that can do this for you! Just run

scp-via $VIA_HOST $FROM_TARGET $TO_TARGET

- with the same syntax for FROM_TARGET and TO_TARGET as you use for regular scp.

Basically it performs a little scp, netcat, ssh magic under the hood, but hopefully you need not worry about that.

I believe I need not say that scp-via comes with absolutely no warranty or guarantees about not desctroying your hard drives. Run it at your own risk.

Observing the observers

December 16, 2008

Today and tomorrow two guys from Information and Media Science will be observing how we go about our work – and how we don’t, I suppose – as part of their obligatory curricular activities. They’ll be doing seven rounds of observing and some interviews with key persons in the department.

Being a proponent of ethnographic methods and observation in general, I think this is a very god idea and I really hope they get some good data.

However, at times observation can be hard work. Here’s what it looked like when I observed them observe us:
observing_the_observers

Hurry! The train is coming!

December 15, 2008

Ye Olde Chu Chu Train aka Stable Summa proved to be not so stable after all. Or did it? Well, that depends on the eyes behind the big finger of blame…

A high-profile source for the Search system at Statsbiblioteket was only partially indexed some days ago. Okay, funny stuff happens at times. Maybe it was faulty source data, maybe the bad moon, maybe network problems? Let’s give it another go, the crafty developers agreed. Lo and behold, after the second go… the system still tanked for the same source (and for another source, but nobody noticed, because that wasn’t as high-profile and could comfortably hide in a corner).

Twice in a row is bad. A rollback to a working index was performed. Again. The accusing finger was pointed at some suspicious-looking machines in the distributed indexing network, they were excluded and the third round was started. In reality the machines were just scapegoats and fearing waterboarding by the users, the not-so-cocky developers began to tear index-segments and log-files apart in search of The Real Explanation.

It turns out that Lucene is known for corrupting indexes under certain circumstances. Aha! So Lucene was responsible! …Err, no. The error was with Java HotSpot version 1.6.0_04 – 1.6.0_10-b25 and was triggered by Lucene when working with large (~20GB) indexes. While the single machines in the distributed network never reached that size, they surely generated more than that in total. If the HotSpot-error is triggered mainly by chance and not by index-size, which seems very likely, that would explain the problems.

The bach-generation of the new index was underway. It was still in the ingest-phase, but quickly approaching the distributed index-phase and thus another crash. The frantic developers checked Java-versions, excluded some machines and up- or downgraded other machines to ensure that the network was clean. Happy fun time, but it was done in time and the tracks are now in order. The train will enter the destination tomorrow evening, so here’s to a safe journey.

train_bridge

All men on deck! Fix the tracks! (picture by Thomas Milne)

-

All this would of course not have happend if the production system was running the new Summa instead of the old one (yes, we’re working on the switch, but we keep ketting sidetracked by firefights like this one). You see, the new Summa train is capable of flying and hitting 88 mph!

*ahem*

Time to go home.

Christmas decorations a.k.a. Stein Bagger tribute

December 11, 2008

bagger_500

Following tradition, we have a fair amount of Christmas decorations here at the office. The new pride of our collection, however, must be the Stein Bagger tribute.

Wroum Wroum!

December 2, 2008

I just committed a pretty hefty optimization of our generic database backend. The optimization is in the area where we put records into the system – in Summa land known as “Ingest”.

1 hour of profiling, 200 lines of test code, and re-shuffling 10 lines of code in our database layer improved our ingest rate with a factor 20. Admitedly the ingest rate was a bit on the slow side before I started, but it feels good nonetheless :-)

The devilish details

December 1, 2008

We’re working hard to make Search from Summa.

Yes, a bit confusing, isn’t it? But as the enlightened reader knows, Summa is the general back end and Search is the specific installation at Statsbiblioteket – including all the little tweaks that are necessary to handle nearly-but-not-quite-there conforming to standard data. Nothing major, but it does drag all the little devils, pixies and gremlins into the light.

Speaking of nasty critters, fall has caught up with us and bacteria and viruses are taking turns on us and our kids. Please, please give us a couple of days without sickness, meetings, interruptions, workshops and firefights, so that we can sit down together and get this show on the road!

Some of us break down

Some of us break down