Wednesday, 12 December 2007

JISC CRIG #2 - An Undiscovered Scenario?

I'm pretty sure one of the agendas it was hoped I would push at the CRIG Unconference was the libraries / search one. More specifically, the scenario "I'm a librarian, and I want to see results from the institutional repository in my OPAC". There are tons of variations on this one, but it boils down to the exposing of repository items in a way that is compatible with existing search services. I never really made it as far as putting that on a sheet of paper, mostly because I was trying to engage in other discussions and arrive at a common point where we could discuss this. Some of the barriers to discussing this use case.....

1) The word repository... it has at least two different meanings just in terms of being "A container you can put stuff in". It can be metadata, digitial items, content packages, etc. We started from the useful perspective of "It doesn't matter, it's still just a repository" but from the perspective of an OPAC, it certainly does matter to me. It matters even more if from all scenarios you want to be able to provide 1-click access to the actual resource. Perhaps if it doesn't matter what kind of repository it is, we need to be more specific about the classes of item we can put in a repository.

2) Metadata, packages, disaggregation... The "What sort of stuff is in a repository" issue starts to raise fundamental questions about content disaggregation. If we think of the base item, for example a PDF of a paper, as being the actual item we want to give people access to, then we need to ask, how does a specific metadata record become attached to that item, is it via a content package, or via loose coupling URI references. I'm a programmer and I like loose coupling. The upshot of loose coupling however is that our opac really needs to access metadata repositories that can point at content repositories. Repositories of content packages can submit their metadata components for indexing, but that metadata is typically poor.

3) What word do we use for the thing that takes the metadata records, builds and maintains a searchable index. At times the word repository has been used by different communities, but thats right out now I reckon. Search index has specific meaning in the IR community, so thats out. I've heard "Searchable Repository" but I think that muddies the water. (Search is a repository service, I would seriously have to suggest muddies the already murky waters). There is certainly a need for this indexing component that takes content-metadata records (Whatever the source) and points at content repositories containing the actual item.

4) SOLR is great but..... when SOLR people talk about federated search they mean federated amongst SOLR instances. There are already well established protocols for remote search. I really wish the SOLR people would quit trying to create a new defacto standard "SOLR search via URI and XML presentation" and adopt one of the more standard ones. I'd personally love it to be SRW but OpenSearch would be good too. I tried to engage the SOLR community and offered to work on the SRW adapter but there was absolutely no interest. Whilst I appreciate the "Just do it" nature of open source, it was incredibly hard to gain traction on this. One of the reasons for this is that SOLR has lots of really nice extensions for hit-highlighting and results categorisation that aren't present in other protocols. But many search services don't support those features. Thats one of the harder things about federated search, but just using a proprietary protocol is putting off the problem until, as we say, your librarian simply wants to Z39.50 or SRW cross search the Institutional repository content alongside the catalog holdings.

Anyway, all that aside, I just want there to be a way to see an SRW or Z39.50 service which will supply me appropriately profiled metadata records pointing me at the actual resource (Physical or Electronic). I guess thats a challenge. The SOLR SRW project seems like an important one to me in this context, maybe the time has come to try and revive it. A good first step would certainly be the "Explain function" for repositories, which would give us a way to profile standard metadata schemes and access points against the ad-hoc indexes and metadata schemes we find in most SOLR indexes.

So, some use cases:

There is a PDF on "The effects of IR spectrum light on bacterial growth" in a content repository, it has a descriptive entry in a metadata repository, how do I enable an actor typing an appropriate query into their opac to see a the descriptive record of sufficient quality to enable them to judge it of interest and retrieve the digital item (Lets leave aside appropriate copy and authentication for now).

There's an item in a well structured web-site containing an animated gif of a transverse wave. For lecturers teaching sound engineering, the resource is considered an ideal supplement for wave-mechanics lectures. How do we enable actors to find and use this resource. Actually, this leads into a second more interesting scenario.. that of the old JISC concept of "Search landscapes". Does our "Sound engineering student" want their search landscape to be "The Opac, The Institutional Repository, my course repository (Containing more specific materials), and my lecturers X,Y and Z repository. How do we enable the actor to locate and select the searchable indexes (And the current awareness feeds for that matter). In this case of course, there is a metadata record, but no item in a content repository.

Finally, there's the case of a known item in a content repoisitory with no metadata description? I think this UC basically just pulls in the auto metadata creation UC and we go from there?

Well that was a bit cathartic, and way too long but hey.


Knowledge Integration Ltd