Ye Olde Chu Chu Train aka Stable Summa proved to be not so stable after all. Or did it? Well, that depends on the eyes behind the big finger of blame…
A high-profile source for the Search system at Statsbiblioteket was only partially indexed some days ago. Okay, funny stuff happens at times. Maybe it was faulty source data, maybe the bad moon, maybe network problems? Let’s give it another go, the crafty developers agreed. Lo and behold, after the second go… the system still tanked for the same source (and for another source, but nobody noticed, because that wasn’t as high-profile and could comfortably hide in a corner).
Twice in a row is bad. A rollback to a working index was performed. Again. The accusing finger was pointed at some suspicious-looking machines in the distributed indexing network, they were excluded and the third round was started. In reality the machines were just scapegoats and fearing waterboarding by the users, the not-so-cocky developers began to tear index-segments and log-files apart in search of The Real Explanation.
It turns out that Lucene is known for corrupting indexes under certain circumstances. Aha! So Lucene was responsible! …Err, no. The error was with Java HotSpot version 1.6.0_04 – 1.6.0_10-b25 and was triggered by Lucene when working with large (~20GB) indexes. While the single machines in the distributed network never reached that size, they surely generated more than that in total. If the HotSpot-error is triggered mainly by chance and not by index-size, which seems very likely, that would explain the problems.
The bach-generation of the new index was underway. It was still in the ingest-phase, but quickly approaching the distributed index-phase and thus another crash. The frantic developers checked Java-versions, excluded some machines and up- or downgraded other machines to ensure that the network was clean. Happy fun time, but it was done in time and the tracks are now in order. The train will enter the destination tomorrow evening, so here’s to a safe journey.
All men on deck! Fix the tracks! (picture by Thomas Milne)
All this would of course not have happend if the production system was running the new Summa instead of the old one (yes, we’re working on the switch, but we keep ketting sidetracked by firefights like this one). You see, the new Summa train is capable of flying and hitting 88 mph!
Time to go home.