Wednesday, April 25, 2007

More Hoogle progress

I haven't done much on Hoogle 4 in quite a while, and since I keep telling people that various bugs will be "fixed in Hoogle 4", it seems only polite to fix those bugs in the near future. I decided that this evening I would focus on Hoogle 4, since I was slowly going insane from writing academic papers. I'm now just about to go to bed, so thought I would summarise what I did.

One of the things that has been slowing me down on Hoogle is the fact that I rarely get a chance to work on it, and when I do that means I have to refamiliarise myself with the code base. Each time I do this, I forget something, forget what I was working for, and generally end up slowing down development. It also means that working on Hoogle for "an odd half hour" is not practical, it takes at least a couple of hours to get up to speed. In order to solve this, I've done some diagrams of the module structure, and the main Hoogle database. They may not mean much to anyone else, but I hope they are enough to keep my mind fresh with where things go.

As part of documenting the module structure, it also gives a chance to reflect on where the module structure is a bit unclear, and modify as necessary. Previously the module structure had DataBase and Item each being dependent on each other - with tricks like having an extra field in Item which was of a parametrised type, i.e:

data Item a = Item {name :: String, typeSig :: TypeSig, other :: a}

Fortunately now all that is gone, and the structure is a lot more logical.

The other thing I've been doing is moving to the Deferred Binary library. Part of the reason for Hoogle 4 becoming bogged down is the creation of a central database that is loaded on demand, and contains tries etc to perform efficient searches. Unfortunately, this loading on demand aspect had infected the entire code, resulting in a big IO Monad mess. It was slightly more pleasant than coding in C, but all the low-level thoughts had seeped through, requiring careful though, slow progress and frequent mistakes.

To take one example, the Hoogle.DataBase.Texts module, which is responsible for performing all textual searches and storing the lazily-loaded trie. Compare the old version and the current version. The code has shrunk from 186 lines to 65 - of which only about 30 are real code. The IO Monad is nowhere to be seen - everything is pure. The really low-level functions such as sizeTree and layoutTree have gone entirely.

I'm not sure when I will next be able to get to Hoogle 4, but next time I hope to start working on it faster, and that the cleaner code makes for happier coding!

Note: I have started displaying adverts on this blog because it was only a button click away (very impressive integration by Google), it seemed like a fun thing to try and that the adverts in Gmail are sometimes quite helpful. I have deliberately picked to not show too many adverts, placed them off to the side where they won't interfere with reading and tried to keep it minimal - but if they are annoying please let me know. I am going to donate whatever money (if any) from this to charity - but please don't take that as an incentive to artificially click on adverts you aren't interested in.

No comments: