I'm pretty sure one of the agendas it was hoped I would push at the CRIG Unconference was the libraries / search one. More specifically, the scenario "I'm a librarian, and I want to see results from the institutional repository in my OPAC". There are tons of variations on this one, but it boils down to the exposing of repository items in a way that is compatible with existing search services. I never really made it as far as putting that on a sheet of paper, mostly because I was trying to engage in other discussions and arrive at a common point where we could discuss this. Some of the barriers to discussing this use case.....
1) The word repository... it has at least two different meanings just in terms of being "A container you can put stuff in". It can be metadata, digitial items, content packages, etc. We started from the useful perspective of "It doesn't matter, it's still just a repository" but from the perspective of an OPAC, it certainly does matter to me. It matters even more if from all scenarios you want to be able to provide 1-click access to the actual resource. Perhaps if it doesn't matter what kind of repository it is, we need to be more specific about the classes of item we can put in a repository.
2) Metadata, packages, disaggregation... The "What sort of stuff is in a repository" issue starts to raise fundamental questions about content disaggregation. If we think of the base item, for example a PDF of a paper, as being the actual item we want to give people access to, then we need to ask, how does a specific metadata record become attached to that item, is it via a content package, or via loose coupling URI references. I'm a programmer and I like loose coupling. The upshot of loose coupling however is that our opac really needs to access metadata repositories that can point at content repositories. Repositories of content packages can submit their metadata components for indexing, but that metadata is typically poor.
3) What word do we use for the thing that takes the metadata records, builds and maintains a searchable index. At times the word repository has been used by different communities, but thats right out now I reckon. Search index has specific meaning in the IR community, so thats out. I've heard "Searchable Repository" but I think that muddies the water. (Search is a repository service, I would seriously have to suggest muddies the already murky waters). There is certainly a need for this indexing component that takes content-metadata records (Whatever the source) and points at content repositories containing the actual item.
4) SOLR is great but..... when SOLR people talk about federated search they mean federated amongst SOLR instances. There are already well established protocols for remote search. I really wish the SOLR people would quit trying to create a new defacto standard "SOLR search via URI and XML presentation" and adopt one of the more standard ones. I'd personally love it to be SRW but OpenSearch would be good too. I tried to engage the SOLR community and offered to work on the SRW adapter but there was absolutely no interest. Whilst I appreciate the "Just do it" nature of open source, it was incredibly hard to gain traction on this. One of the reasons for this is that SOLR has lots of really nice extensions for hit-highlighting and results categorisation that aren't present in other protocols. But many search services don't support those features. Thats one of the harder things about federated search, but just using a proprietary protocol is putting off the problem until, as we say, your librarian simply wants to Z39.50 or SRW cross search the Institutional repository content alongside the catalog holdings.
Anyway, all that aside, I just want there to be a way to see an SRW or Z39.50 service which will supply me appropriately profiled metadata records pointing me at the actual resource (Physical or Electronic). I guess thats a challenge. The SOLR SRW project seems like an important one to me in this context, maybe the time has come to try and revive it. A good first step would certainly be the "Explain function" for repositories, which would give us a way to profile standard metadata schemes and access points against the ad-hoc indexes and metadata schemes we find in most SOLR indexes.
So, some use cases:
There is a PDF on "The effects of IR spectrum light on bacterial growth" in a content repository, it has a descriptive entry in a metadata repository, how do I enable an actor typing an appropriate query into their opac to see a the descriptive record of sufficient quality to enable them to judge it of interest and retrieve the digital item (Lets leave aside appropriate copy and authentication for now).
There's an item in a well structured web-site containing an animated gif of a transverse wave. For lecturers teaching sound engineering, the resource is considered an ideal supplement for wave-mechanics lectures. How do we enable actors to find and use this resource. Actually, this leads into a second more interesting scenario.. that of the old JISC concept of "Search landscapes". Does our "Sound engineering student" want their search landscape to be "The Opac, The Institutional Repository, my course repository (Containing more specific materials), and my lecturers X,Y and Z repository. How do we enable the actor to locate and select the searchable indexes (And the current awareness feeds for that matter). In this case of course, there is a metadata record, but no item in a content repository.
Finally, there's the case of a known item in a content repoisitory with no metadata description? I think this UC basically just pulls in the auto metadata creation UC and we go from there?
Well that was a bit cathartic, and way too long but hey.
Wednesday, 12 December 2007
JISC CRIG #2 - An Undiscovered Scenario?
Posted by Ibbo at 01:52 0 comments
Labels: CRIG, JISC, JISC-CRIG-2007, jisc-crig-unconference-2007, Repositories, Search, SOLR
Monday, 10 December 2007
JISC CRIG Unconference #1 - The Meta Stuff
Just back from the JISC CRIG Unconference
Wow!
I think that about covers it. I have to completely take my hat off to the CRIG Support team for the sheer bravery in innovation they've shown with the unconference approach. The event was hugely fun, even if quite draining, and I think practice for all involved can only improve the outputs of these kinds of events in the future. JISC should take the support team and give them a huge pat on the back for their work here.
From a personal perspective, I found the unconference process entirely charming. As someone who quite seriously studied Stafford Beer and Organisational Cybernetics experiencing the unconference was like living in the pages of Beyond Dispute What follows is more for my own benefit and memory, but may be of interest I suppose. The result of what I know as the "problem jostle" generated some variety, although a few soap-boxes did seem to skew the work. The team did a great job of assembling enough variety of backgrounds to try and get some emergent activity. I think there needed to be a little more attenuation and coordination as a result of the initial problem jostle, there were terms that needed to be harmonized and some topics that I think probably needed to be toned down. To this end, I think it might have been fun to have a "System 2" (In terms of the VSM) board somewhere in the room, a board where we can scribble common definitions, and other coordinative activity. Again, one slight problem with the coordinative activity is that it all seemed to take place in the heads of the facilitators, which is sort of what you want when freeing the participants to think about their areas, but it did raise the slight spectre of agenda setting. In part I think this is a danger of the facilitators having expert domain knowledge. Although I'm not at all complaining, I think the team did a great job. The balloons, apparently functioning as some kind of parasympathetic channel, didn't really work I don't think. I can see where it might work in the US, but there were too many british sensitivities preventing them being useful. Actually, this is a pity because there was a need for some mechanism like this. I think if the participants had more time to gel before the actual event, it might have been less of an issue (Then again, it might have been more of an issue). Overall, the outputs seemed to be quite rich, although there didn't seem to be quite as much new variety as I expected.
Well thats enough meta-conference for now, on to the details....
Posted by Ibbo at 03:03 0 comments
Labels: CRIG, JISC, JISC-CRIG-2007, jisc-crig-unconference-2007, Repositories
Wednesday, 22 August 2007
OpenRequest3 Beta2
OpenRequest3 beta2 is completed today.
OpenRequest3 is the next major revision of an open source system for resource sharing. Initially, the project was devised to quickly enable smaller Library Management System vendors to provide ISO ILL messaging capabilities. In version 2, additions were made for very large scale operations wanting to replace their legacy messaging infrastructure. OpenRequest3 adds web service API's for managing the request process, and a native web application that can be used out of the box to participate in resource sharing networks. Although the system started out life as a java toolkit project to support ISO 10161 and ISO 10160 ILL messaging, it has evolved into a service component capable of integrating with any host system capable of talking web services (WSDL is used to define the interface). At the same time, the back end has evolved into an engine capable of integrating with many different messaging protocols including generic script, simple email and email with web-links.
Beta2 adds the following:
OR-Infrastructure
* Embedded Tomcat for web services and native web application
OR-API
* Web service support for creating new locations and endpoints from host systems
* Web service support for location inbox monitoring
* Web service support for marking messages read
* Web service support for create request
* Web service support for shipped
* Web service support for cancel
OR-Web Application
* End user registration
* End user location creation
* End user requesting
* User home page listing user locations
* View location
* View transactions
* Bulk action transactions
OpenRequest3 can be used in three distinct ways
1) Simply as a protocol library. The jar contains all the basic BER routines to encode and decode ISO 10161 protocol messages. If you just need to be able to do ISO messaging, this library takes the leg work out of the encoding process. Nobody really uses the library in this form to the best of our knowledge.
2) As a protocol engine or ILL ASE (Application Service Environment) OpenRequest takes care of sending, receiving, and storing ILL messages. Client applications (The HOST LMS most often) talks to the engine via and API to arrange for messages to be sent and to pick up notifications of incoming events. In OpenRequest2 this was done with java RMI or database integration. OpenRequest3 adds WebServices interfaces that can be called locally or remotely. OR3 also now uses tomcat in embedded mode, so setup and installation is greatly simplified.
3) In order to ease testing, there is an OpenRequest web application. Users can simply use this application out of the box to manage locations, send and receive requests, or to check location/transaction/message status. Developers find the webapp useful for getting going with the protocol, slowly replacing the web interface features with calls to their own host system. Other users may wish to simply rebrand the web application and just use it out of the box to provide a fully working resource sharing messaging system in just a few minutes.
Contact ian dot ibbotson at k hypen int dot com for more information and access to the test system. Source can be downloaded from the knowledge integration subversion system, snapshot builds uploaded to the maven2 repository.
Posted by Ibbo at 04:48 0 comments
Labels: OpenRequest