Wednesday, 19 March 2008

Apache Jackrabbit as an [Institutional|Cultural|Learning Object] Repository

Over the past months I've looked at dspace and fedora and played with both in a pretty serious way. The goal was to determine if we could *relly* use standard IR (Institutional Repo, as opposed to Information Retrieval) software to hold collections of IMS and ieee-LOM (Learning Object Metadata) records, and Peoples Network Cultural records (PNDS-Dublin Core Application Profile) as well as the E20CL (Exploring 20th Century London). The real driver here was that it might be possible to dump our existing OAI code and just use existing solutions. The brick wall in all cases came for me when I tried to integrate the repository "Blob" with our rich domain models for each schema type. Ideally, using repository workflow, I can pass on these blobs to domain specific subsystems that can do real application work with the items. I gave up on integrating with despace and fedora for LOM and cultural items (And now bibliographic resources too). In the end, we created our own repository which made it much easier to integrate with backend domain models.

I first looked at apache jackrabbit and the JSR dealing with content repositories a year or so ago, and decided it wasn't mature enough. David Flanders observation that Content Management companies were at the JISC OSS-Watch event and that IR's should "Watch Out" got me thinking about jackrabbit again. Thing is, IMNSHO, Current content management systems are as much vertical applications as IR's are. The trouble I had *usefully* getting non-IR resources into dspace and fedora (IE, it's very doable at the proof-of concept phase, but after that the 80/20 law quickly takes over) is going to be exactly the same issue content management providers have forcing the square peg of article prints into their web-site round hole. Of course everyone claims to have a "Generic Model" but they seldom are. In these days of rapid development, keeping a pure abstract model intact is difficult indeed.

Apache jackrabbit turns this on it's head a bit for me. Instead of being vertical application trying to spread out horizontally into new domains, it's nothing but a horizontal service thats entirely domain neutral. There's no danger of domain specifics creeping into the model, as there's no application to support directly, only repository services.

So, my new lunchtime project is to re-visit Apache Jackrbbit. It looks a whole lot more useable than it did a year ago, and I think the question I want to answer is can a horizontal tool like jackrabbit have vertical OAI-PMH (Superficially to me, Jackrabbit looks like it will fit the OAI-ORE model very tightly) and SRW/SRU services added to make it behave functionally well in the vertical sectors of Institutional, Cultural and Learning Object repositories. If so, Jackrabbit already has many of the features Jisc CRIG is talking about IR's really needing (Events, Security, etc) and I suspect it could be a real worthwhile approach. Although the startup time won't be as fast as domain specific tools, the developer resource and wealth of existing mature software give longer term benefits.

Having said all that, getting started with jackrabbit is a bit of a curve. The docs and samples seem to be geared to those wishing to improve the horizontal framework. What I needed was a vertical application developer guide for jackrabbit. Over the next few days I'm going to try and invest my lunchtime play hour in documenting the application of jackrabbit to a vertical domain, with specific emphasis on support for OAI-PMH and OAI-ORE. If you're interested, a maven2 pom file that has all the needed dependencies for my vertical test is here: http://developer.k-int.com/svn/default/sandbox/repo/jackrabbit/pom.xml and a sample unit test that creates a stand alone repository is here: http://developer.k-int.com/svn/default/sandbox/repo/jackrabbit/src/test/java/com/k_int/repository/test/RepoTest.java Tomorrows job is unpicking the core authentication mechanism and trying to get some objects in (I've got some LOM records, Marc records, Dublin Core, a pdf and some gifs, so thats a good starter set I reckon).

Watch this space :)

Knowledge Integration Ltd