Ian's work blog: June 2008

Sunday 8 June 2008

JISC CRIG DRY / IEDemonstrator BarCamp - Testing the Z3950/SOLR Bridge

The JZKit to SOLR Z3950 bridge has had it's first real foray outside of internal k-int projects by providing a z39.50 server to cross search Intute and the Oxford University Research Archive in a single z39.50 search. Been itching to try this for absolutely ages, and the CRIG/DRYbarcamp was the perfect opportunity to talk to the projects and quickly hack out the config needed to set up the server.

Here's the proof of the pudding, a z39.50 search for title=science translated into solr searches against the two repositories and integrated into a single interleaved result set. Easily integrated into any meta-search engine implementing z39.50. Cool!


Z> open tcp:@:2100
Connecting...OK.
Sent initrequest.
Connection accepted by v3 target.
ID     : 174
Name   : JZkit generic server / JZKit Meta Search Service
Version: 3.0.0-SNAPSHOT
Options: search present delSet triggerResourceCtrl scan sort extendedServices namedResultSets negotiationModel
Elapsed: 0.006905
Z> base oxford intute
Z> format xml
Z> elements solr
Z> find @attr 1=4 science
Sent searchRequest.
Received SearchResponse.
Search was a success.
Number of hits: 803, setno 1
records returned: 0
Elapsed: 0.003859
Z> show 1+10
Sent presentRequest (1+10).
Records: 10
[SOLR]Record type: XML
No Third Party copyrightGordon L. ClarkGordon L. Clarkora:general>falsegeneral1982John Wiley & Sons, Ltd.D. T. HerbertD. T. HerbertR. J. JohnstonR. J. JohnstonSt Peter's College1982Social Sciences Division - Environment,Centre for the - Geography,School ofUniversity of OxfordGordon L. ClarkD. T. HerbertR. J. JohnstonJohn Wiley & Sons, Ltd.Transformations: Economy, Society and PlacePublisher has copyrightPeer reviewedGeographySocial Sciences Division - Environment,Centre for the - Geography,School ofClarkHerbertJohnstonGordon L. ClarkGordon L. ClarkD. T. HerbertD. T. HerbertR. J. JohnstonR. J. JohnstonJohn Wiley & Sons, Ltd.John Wiley & Sons, Ltd.Book (monograph): Section of book or chapter of bookGordon L.D. T.R. J.http://ora.ouls.ox.ac.uk:8081/10030/2034Geography and the urban environment : progress in research and applicationsuuid:89494f00-a92b-4914-8f9a-bcede5a4d9beUniversity of Oxford04711022531982enSt Peter's College41-61trueGordon L. ClarkD. T. HerbertR. J. Johnstonuuid:89494f00-a92b-4914-8f9a-bcede5a4d9beJohn Wiley & Sons, Ltd.Transformations: Economy, Society and PlacePeer reviewedPublisher has copyrightPublishedGeography2008-06-02T07:36:29.419ZPolicy science and instrumental reasontextuuid:89494f00-a92b-4914-8f9a-bcede5a4d9be5http://www.geog.ox.ac.uk/staff/glclark.phphttp://eu.wiley.com/WileyCDA/Section/index.html
[SOLR]Record type: XML
ScienceScienceurn:ISSN:0036-8075oai:epubs.cclrc.ac.uk:serial/1252oai:epubs.cclrc.ac.uk:serial/125257STFC ePublication Archive57-oai:epubs.cclrc.ac.uk:serial/1252ScienceScience

JISC CRIG / IEDemonstrator BarCamp - Controlled Vocabs

Beautiful Sunny morning here in Sheffield, and all seems well with the world. It's taken me a day to recover from the traveling (mostly) but now I'm feeling vaguely human again, it's time to write about bits of the CRIG / IEDemonstrator day.

Controlled Vocabularies / Terminology Services

Had a great discussion about this with the Names, HILT and the STAR project. Everyone showed a sample of what kind of vocab service they are working with, and the pattern of a pretty web app fronting a web-service back-end was pretty much the defacto. K-Int's interest really centers around the work we are doing with Vocman in the learning sector (See screen-shot). Although the lexaurus suite isn't tied to any particular metadata scheme or representation we have worked almost exclusively with ZThes to date. After talking with these projects it seems critical that we write the SKOS adapters sooner rather than later for import and export, so thats something I'm going to push for ASAP in the vocman development plan. Hopefully, that will add another SRU searchable terminology service to the IE.

Our small prototyping group was tasked with working out how vocabulary services could be used WRT repositories. We talked around many use cases, from improved metadata creation and validation on submission (This works great for both subject headings and the name authority services like NAMES) and also improved precision for searchers, and better current awareness and dissemination services, by allowing subscribers to follow a single controlled term and have that term translated into whatever subject scheme is in use at a given repo. The issue here is that without the initial effort of improved metadata (Keeping in mind Pauls closing comment about lets not get too hung up on the metadata) we decided to focus on ways of improving the metadata of items attached to deposited artifacts.

One of our group (I'm really sorry, memory has failed me, but please comment if it was you), discussed ways they have managed to put an external metadata editing page behind a repository submission page, through use of proxies. Thus, the repository is kept un-polluted by the metadata editing app, but the presentation of a form is transparent to the depositor. So our final paper prototype extended the deposit service by adding a response parameter of the URL at which the metadata for an item could be edited. This editing environment would be pre-configured to use external vocabulary services and assist the user in selecting such terms. The tool could them post back the metadata using some repository specific adapter. For example, adding a Datastream to a fedora object using the rest service, or some other system, for example, auto publishing into an indexing service such as Zebra.

One interesting side note is that we ran into the old content dis-aggregation ussues again a little when talking about how we can improve the metadata attached to a packaged item.

At k-int we've long since discussed the need to take the Tagging Tool and turn it into a web application for editing metadata records using controlled vocab sources and then publishing those records using a pluggable system of adapters. The Controlled Vocab conversations have made me look at this in a new light, and I think its about time we got to hacking something out. One for next weekend perhaps!

Sunday 1 June 2008

Upcoming Changes to JZKit Configuration

JZSome sunday musings on the Configuration mechanism in the jzkit_Service module...

the JZKit_service module is the glue module that pulls together all the other components into a federated meta search component capable of resolving internal collection and landscape names into a list of external z39.50/sru/srw/opensearch/SOLR/JDBC/etc/etc services and broadcasting a search to those services, integrating the results and providing a unified result set.

This gives rise to a problem: For solutions like the previous Z39.50/SOLR bridge, we want a simple XML config file that a user can hack once, then leave. For more complicated applications, we need a real relational database behind the app to manage the complex config that goes along with large information network archtectures.

Here's the rub: JZKit carries with it a very detailed service registry. It's purpose isnt like the JISC IESR to be a registry in it's own right, but to support the search process. More and more, there is a need to make the JZKit service registry searchable in it's own right (As a Z39.50 Explain database, or as an SRU Explain collection / ZeeRex records).

This has been bothering me for a while, yet the answer has pretty much been staring me in the face all along. So, here's what I'm considering for the final release of JZKit3:

1. The current "InMemoryConfig" which is loaded from XML config files will be deprecated.

2. It will be replaced by an in-memory derby database, essentially just the current database backed config mechanism, but with an in-memory database.

3. The XML config files will be left intact, but considered to be a "BootStrap" mechanism. At startup, JZKit will scan the config files and update/create any entries in the configuration database.

Thus, the "In-Memory" config will remain as before, but instead of being held in hashmaps the data will be inside a derby database. This means we can now define first and foremost a JDBC backed datasource for the explain database and make it searchable even for static content.

--> Free explain service for any JZKit shared collections.

Ian's work blog

Blog Archive