Wednesday, 12 December 2007

JISC CRIG #2 - An Undiscovered Scenario?

I'm pretty sure one of the agendas it was hoped I would push at the CRIG Unconference was the libraries / search one. More specifically, the scenario "I'm a librarian, and I want to see results from the institutional repository in my OPAC". There are tons of variations on this one, but it boils down to the exposing of repository items in a way that is compatible with existing search services. I never really made it as far as putting that on a sheet of paper, mostly because I was trying to engage in other discussions and arrive at a common point where we could discuss this. Some of the barriers to discussing this use case.....

1) The word repository... it has at least two different meanings just in terms of being "A container you can put stuff in". It can be metadata, digitial items, content packages, etc. We started from the useful perspective of "It doesn't matter, it's still just a repository" but from the perspective of an OPAC, it certainly does matter to me. It matters even more if from all scenarios you want to be able to provide 1-click access to the actual resource. Perhaps if it doesn't matter what kind of repository it is, we need to be more specific about the classes of item we can put in a repository.

2) Metadata, packages, disaggregation... The "What sort of stuff is in a repository" issue starts to raise fundamental questions about content disaggregation. If we think of the base item, for example a PDF of a paper, as being the actual item we want to give people access to, then we need to ask, how does a specific metadata record become attached to that item, is it via a content package, or via loose coupling URI references. I'm a programmer and I like loose coupling. The upshot of loose coupling however is that our opac really needs to access metadata repositories that can point at content repositories. Repositories of content packages can submit their metadata components for indexing, but that metadata is typically poor.

3) What word do we use for the thing that takes the metadata records, builds and maintains a searchable index. At times the word repository has been used by different communities, but thats right out now I reckon. Search index has specific meaning in the IR community, so thats out. I've heard "Searchable Repository" but I think that muddies the water. (Search is a repository service, I would seriously have to suggest muddies the already murky waters). There is certainly a need for this indexing component that takes content-metadata records (Whatever the source) and points at content repositories containing the actual item.

4) SOLR is great but..... when SOLR people talk about federated search they mean federated amongst SOLR instances. There are already well established protocols for remote search. I really wish the SOLR people would quit trying to create a new defacto standard "SOLR search via URI and XML presentation" and adopt one of the more standard ones. I'd personally love it to be SRW but OpenSearch would be good too. I tried to engage the SOLR community and offered to work on the SRW adapter but there was absolutely no interest. Whilst I appreciate the "Just do it" nature of open source, it was incredibly hard to gain traction on this. One of the reasons for this is that SOLR has lots of really nice extensions for hit-highlighting and results categorisation that aren't present in other protocols. But many search services don't support those features. Thats one of the harder things about federated search, but just using a proprietary protocol is putting off the problem until, as we say, your librarian simply wants to Z39.50 or SRW cross search the Institutional repository content alongside the catalog holdings.

Anyway, all that aside, I just want there to be a way to see an SRW or Z39.50 service which will supply me appropriately profiled metadata records pointing me at the actual resource (Physical or Electronic). I guess thats a challenge. The SOLR SRW project seems like an important one to me in this context, maybe the time has come to try and revive it. A good first step would certainly be the "Explain function" for repositories, which would give us a way to profile standard metadata schemes and access points against the ad-hoc indexes and metadata schemes we find in most SOLR indexes.

So, some use cases:

There is a PDF on "The effects of IR spectrum light on bacterial growth" in a content repository, it has a descriptive entry in a metadata repository, how do I enable an actor typing an appropriate query into their opac to see a the descriptive record of sufficient quality to enable them to judge it of interest and retrieve the digital item (Lets leave aside appropriate copy and authentication for now).

There's an item in a well structured web-site containing an animated gif of a transverse wave. For lecturers teaching sound engineering, the resource is considered an ideal supplement for wave-mechanics lectures. How do we enable actors to find and use this resource. Actually, this leads into a second more interesting scenario.. that of the old JISC concept of "Search landscapes". Does our "Sound engineering student" want their search landscape to be "The Opac, The Institutional Repository, my course repository (Containing more specific materials), and my lecturers X,Y and Z repository. How do we enable the actor to locate and select the searchable indexes (And the current awareness feeds for that matter). In this case of course, there is a metadata record, but no item in a content repository.

Finally, there's the case of a known item in a content repoisitory with no metadata description? I think this UC basically just pulls in the auto metadata creation UC and we go from there?

Well that was a bit cathartic, and way too long but hey.

Monday, 10 December 2007

JISC CRIG Unconference #1 - The Meta Stuff

Just back from the JISC CRIG Unconference


I think that about covers it. I have to completely take my hat off to the CRIG Support team for the sheer bravery in innovation they've shown with the unconference approach. The event was hugely fun, even if quite draining, and I think practice for all involved can only improve the outputs of these kinds of events in the future. JISC should take the support team and give them a huge pat on the back for their work here.

From a personal perspective, I found the unconference process entirely charming. As someone who quite seriously studied Stafford Beer and Organisational Cybernetics experiencing the unconference was like living in the pages of Beyond Dispute What follows is more for my own benefit and memory, but may be of interest I suppose. The result of what I know as the "problem jostle" generated some variety, although a few soap-boxes did seem to skew the work. The team did a great job of assembling enough variety of backgrounds to try and get some emergent activity. I think there needed to be a little more attenuation and coordination as a result of the initial problem jostle, there were terms that needed to be harmonized and some topics that I think probably needed to be toned down. To this end, I think it might have been fun to have a "System 2" (In terms of the VSM) board somewhere in the room, a board where we can scribble common definitions, and other coordinative activity. Again, one slight problem with the coordinative activity is that it all seemed to take place in the heads of the facilitators, which is sort of what you want when freeing the participants to think about their areas, but it did raise the slight spectre of agenda setting. In part I think this is a danger of the facilitators having expert domain knowledge. Although I'm not at all complaining, I think the team did a great job. The balloons, apparently functioning as some kind of parasympathetic channel, didn't really work I don't think. I can see where it might work in the US, but there were too many british sensitivities preventing them being useful. Actually, this is a pity because there was a need for some mechanism like this. I think if the participants had more time to gel before the actual event, it might have been less of an issue (Then again, it might have been more of an issue). Overall, the outputs seemed to be quite rich, although there didn't seem to be quite as much new variety as I expected.

Well thats enough meta-conference for now, on to the details....

Knowledge Integration Ltd