Thursday 6 November 2008

Catching up with sprints and progress

Been a while since I posted, alas blogging not uber high on my GTD list, so often gets pushed back... However, I thought it might be fun to share some of the work we're doing in the lexaurus vocabulary editor in relation to SKOS vocabularies. Rob Tice has sent me some screenshots of lexaurus ingesting the SKOS vocabulary for agriculture, here's the results

Here in image one there's the post ingest status report saying that 28954 terms have been imported in 269 seconds. As a part of the SKOS import we have to do a fair few cross references and referential lookups, so the performance isn't quite as blistering as the ZThes import, still pretty good tho.



Here's some more screenies showing the label nodes and editing pages in different languages. The data is held in the lexaurus canonical schema, so it can go out as ZThes or SKOK or whatever else we need.







More to follow, but I thought these interesting to put up now, just to stimulate discussion.

Wednesday 6 August 2008

Sprint 3 - Washup and planning

A few months ago we started to experiment with scrum, sprints and backlogs in an attempt to formalise our agile development processes. I thought it might be fun to try and keep the blog updated every couple of weeks with what's moving in the k-int world.. so here goes...

Hana's done some great work on the zero functionality release for the JISC Transcoder Project and it's basically finished. I'm tasked with uploading this to the Amazon elastic cloud this week whilst hana pushes on with having a go at SCORM 2004 to IMS CP 1.1 trascoding. Hana has also spent a couple of days fixing a drag and drop bug in the vocabulary editor application and getting reload installed for some testing. We're on the lookout for an SCORM 2004 content to test with (Especially older BBC JAM content).


Liams been doing loads of work refactoring the import and export modules in Vocabulary bank to make sure we can ingest and export both ZThes and SKOS and crosswalk between the two and in preparation for creating some additional export formats for specialist educational / eLearning applications. Liam's also managed to create an installer for the new release of OpenCRM, our open source third sector call management / CRM application to go off to Sheffield Advice Link. We've been told that the latest released vocabulary editor can import the full 7923 terms from the IPSV Integrated Public Sector Vocabulary in 17 seconds, which isn't bad going, considering the amount of cross referencing and revision management thats going on under the hood.



I've been on holiday ;) but whilst I've been away I've made huge steps forward on a stand-alone OAI/ORE server that we can use to replace all the custom OAI servers we have dotted around the place. With the new server you just point the app at a JDBC data source and it does introspection to discover the DB schema, detect primary keys and timestamps, a few clicks later and you have an OAI/ORE feed for your database. The same infrastructure can be used in online or batch mode (Batch mode detecting changed records with checksums rather than timestamps, useful for normalised schemas where changes might not update datestamps). Initially this is for an update to the Peoples network discover application, but I've high hopes it will be useful in a wide range of cultural and governmental settings. This should give us an instant linked-data capability for projects like peoples network and other related metadata aggregator projects we're involved with at the moment.

I've also (To my eternal shame in the office) started to do some real work on the JZKit documentation and the maven auto generated site, to make it easier for users of the toolkit to get to grips and diagnose problems. There seem to be a huge number of Z3950 projects in the offing at the moment, for a protocol thats supposedly legacy. I think thats great!

Rob's made huge steps forward with the assisted tagging tool which is being used (in anger by many users) to create descriptive metatdata for literally thousands of learning resources. The new tagging tool seamlessly integrates with the bank vocabulary service so we can update the vocabularies used without updating the tool, and the new assisted tagging feature makes it easy for users to quickly tag resources without needed an in-depth knowledge of curriculum and administrative structures.

Sunday 27 July 2008

Updates and New Projects

Thought it best to just have a quick update as there's so much going on at the moment I'm in danger of missing important announcements.

Just before I left for a weeks holiday Neil, Hana and I attended a kick-off meeting for the JISC Transcoder project. There's information about the project at that link, although I'll be posting more about this shortly. We've already had loads of interest both from content providers and interested HE partners who want to be involved in testing and development. To cause such a stir so early on is great IMNSHO. I'm going to try and get a project page together in the coming week, and unless anyone has any better ideas I'm planning to tag project announcements and direct links as jisc-transcoder. A project-transcoder tag is available for resources useful to the project. so we can get some feeds going.

Hana has already made huge progress on the first release (Which transcodes a package into itself, but lets us test the upload/download and cloud infrastructure) and we expect to be sitting down real soon now to put some priorities on the next release.

Here's an aggregated pipe of transcoder tags and blog entries.

Monday 14 July 2008

Wanted : HTML digitisation Silos needing an OAI feed

We've just been talking to a really cool sheffield geek who's got a neat HTML screen-scraping service and integrated it into JZKit to provide dynamic meta-search across services with no machine to machine interface. Very Cool.

However, in our discussions we realised that it might have even more value as a tool for converting HTML Silos into metadata-rich repositories. By combining this service with a SWORD deposit client and some MD5 checksums we can help digitisation projects who have large well-organised web sites, but perhaps don't have OAI or SRU/SRW, we can create a preservation hub that exposes the content using open standards.

If you have a HTML silo of digitized data (Or any data for that matter) but no OAI or SRU, and you need such services, we'd love to test this idea by writing the scripts to populate an OAI and SRU repository... Any volunteers?

Going to spend some spare time throwing this and the JDBC/OAI/SRU gateway together hopefully in time for library mashup and bathcamp. Maybe we can come out of those events with some newly exposed data sources contributing to the linked data network.

Fun :)

Sunday 13 July 2008

New version of JZkit Proxy Server

An updated version of the JZKit Proxy server (With an example configuration exposing a number of SOLR backends as a z3950 target) is available for download Here

Since the initial test at the Bath CRIG barcamp we've improved the error handling and diagnostics, improved the mapping of Z3950 database names, made the attribute mapping more configurable (The out out the box sample config is pretty much Bath profile level 1 OK), improved the structure==year mappings for date search and done some pretty hefty load testing which revealed a thread leak in certain situations (It's fixed now).

Please download, play and get in touch with any questions / comments.

Ian.

Sunday 8 June 2008

JISC CRIG DRY / IEDemonstrator BarCamp - Testing the Z3950/SOLR Bridge

The JZKit to SOLR Z3950 bridge has had it's first real foray outside of internal k-int projects by providing a z39.50 server to cross search Intute and the Oxford University Research Archive in a single z39.50 search. Been itching to try this for absolutely ages, and the CRIG/DRYbarcamp was the perfect opportunity to talk to the projects and quickly hack out the config needed to set up the server.

Here's the proof of the pudding, a z39.50 search for title=science translated into solr searches against the two repositories and integrated into a single interleaved result set. Easily integrated into any meta-search engine implementing z39.50. Cool!


Z> open tcp:@:2100
Connecting...OK.
Sent initrequest.
Connection accepted by v3 target.
ID : 174
Name : JZkit generic server / JZKit Meta Search Service
Version: 3.0.0-SNAPSHOT
Options: search present delSet triggerResourceCtrl scan sort extendedServices namedResultSets negotiationModel
Elapsed: 0.006905
Z> base oxford intute
Z> format xml
Z> elements solr
Z> find @attr 1=4 science
Sent searchRequest.
Received SearchResponse.
Search was a success.
Number of hits: 803, setno 1
records returned: 0
Elapsed: 0.003859
Z> show 1+10
Sent presentRequest (1+10).
Records: 10
[SOLR]Record type: XML
No Third Party copyrightGordon L. ClarkGordon L. Clarkora:general>falsegeneral1982John Wiley & Sons, Ltd.D. T. HerbertD. T. HerbertR. J. JohnstonR. J. JohnstonSt Peter's College1982Social Sciences Division - Environment,Centre for the - Geography,School ofUniversity of OxfordGordon L. ClarkD. T. HerbertR. J. JohnstonJohn Wiley & Sons, Ltd.Transformations: Economy, Society and PlacePublisher has copyrightPeer reviewedGeographySocial Sciences Division - Environment,Centre for the - Geography,School ofClarkHerbertJohnstonGordon L. ClarkGordon L. ClarkD. T. HerbertD. T. HerbertR. J. JohnstonR. J. JohnstonJohn Wiley & Sons, Ltd.John Wiley & Sons, Ltd.Book (monograph): Section of book or chapter of bookGordon L.D. T.R. J.http://ora.ouls.ox.ac.uk:8081/10030/2034Geography and the urban environment : progress in research and applicationsuuid:89494f00-a92b-4914-8f9a-bcede5a4d9beUniversity of Oxford04711022531982enSt Peter's College41-61trueGordon L. ClarkD. T. HerbertR. J. Johnstonuuid:89494f00-a92b-4914-8f9a-bcede5a4d9beJohn Wiley & Sons, Ltd.Transformations: Economy, Society and PlacePeer reviewedPublisher has copyrightPublishedGeography2008-06-02T07:36:29.419ZPolicy science and instrumental reasontextuuid:89494f00-a92b-4914-8f9a-bcede5a4d9be5http://www.geog.ox.ac.uk/staff/glclark.phphttp://eu.wiley.com/WileyCDA/Section/index.html
[SOLR]Record type: XML
ScienceScienceurn:ISSN:0036-8075oai:epubs.cclrc.ac.uk:serial/1252oai:epubs.cclrc.ac.uk:serial/125257STFC ePublication Archive57-oai:epubs.cclrc.ac.uk:serial/1252ScienceScience

JISC CRIG / IEDemonstrator BarCamp - Controlled Vocabs


Beautiful Sunny morning here in Sheffield, and all seems well with the world. It's taken me a day to recover from the traveling (mostly) but now I'm feeling vaguely human again, it's time to write about bits of the CRIG / IEDemonstrator day.

Controlled Vocabularies / Terminology Services

Had a great discussion about this with the Names, HILT and the STAR project. Everyone showed a sample of what kind of vocab service they are working with, and the pattern of a pretty web app fronting a web-service back-end was pretty much the defacto. K-Int's interest really centers around the work we are doing with Vocman in the learning sector (See screen-shot). Although the lexaurus suite isn't tied to any particular metadata scheme or representation we have worked almost exclusively with ZThes to date. After talking with these projects it seems critical that we write the SKOS adapters sooner rather than later for import and export, so thats something I'm going to push for ASAP in the vocman development plan. Hopefully, that will add another SRU searchable terminology service to the IE.

Our small prototyping group was tasked with working out how vocabulary services could be used WRT repositories. We talked around many use cases, from improved metadata creation and validation on submission (This works great for both subject headings and the name authority services like NAMES) and also improved precision for searchers, and better current awareness and dissemination services, by allowing subscribers to follow a single controlled term and have that term translated into whatever subject scheme is in use at a given repo. The issue here is that without the initial effort of improved metadata (Keeping in mind Pauls closing comment about lets not get too hung up on the metadata) we decided to focus on ways of improving the metadata of items attached to deposited artifacts.

One of our group (I'm really sorry, memory has failed me, but please comment if it was you), discussed ways they have managed to put an external metadata editing page behind a repository submission page, through use of proxies. Thus, the repository is kept un-polluted by the metadata editing app, but the presentation of a form is transparent to the depositor. So our final paper prototype extended the deposit service by adding a response parameter of the URL at which the metadata for an item could be edited. This editing environment would be pre-configured to use external vocabulary services and assist the user in selecting such terms. The tool could them post back the metadata using some repository specific adapter. For example, adding a Datastream to a fedora object using the rest service, or some other system, for example, auto publishing into an indexing service such as Zebra.

One interesting side note is that we ran into the old content dis-aggregation ussues again a little when talking about how we can improve the metadata attached to a packaged item.

At k-int we've long since discussed the need to take the Tagging Tool and turn it into a web application for editing metadata records using controlled vocab sources and then publishing those records using a pluggable system of adapters. The Controlled Vocab conversations have made me look at this in a new light, and I think its about time we got to hacking something out. One for next weekend perhaps!

Knowledge Integration Ltd