We've just been talking to a really cool sheffield geek who's got a neat HTML screen-scraping service and integrated it into JZKit to provide dynamic meta-search across services with no machine to machine interface. Very Cool.
However, in our discussions we realised that it might have even more value as a tool for converting HTML Silos into metadata-rich repositories. By combining this service with a SWORD deposit client and some MD5 checksums we can help digitisation projects who have large well-organised web sites, but perhaps don't have OAI or SRU/SRW, we can create a preservation hub that exposes the content using open standards.
If you have a HTML silo of digitized data (Or any data for that matter) but no OAI or SRU, and you need such services, we'd love to test this idea by writing the scripts to populate an OAI and SRU repository... Any volunteers?
Going to spend some spare time throwing this and the JDBC/OAI/SRU gateway together hopefully in time for library mashup and bathcamp. Maybe we can come out of those events with some newly exposed data sources contributing to the linked data network.
Fun :)
Monday, 14 July 2008
Wanted : HTML digitisation Silos needing an OAI feed
Posted by Ibbo at 09:45 1 comments
Labels: jzkit, LinkedData, OAI-PMH, Preservation, ScreenScraping, Search
Sunday, 8 June 2008
JISC CRIG DRY / IEDemonstrator BarCamp - Testing the Z3950/SOLR Bridge
The JZKit to SOLR Z3950 bridge has had it's first real foray outside of internal k-int projects by providing a z39.50 server to cross search Intute and the Oxford University Research Archive in a single z39.50 search. Been itching to try this for absolutely ages, and the CRIG/DRYbarcamp was the perfect opportunity to talk to the projects and quickly hack out the config needed to set up the server.
Here's the proof of the pudding, a z39.50 search for title=science translated into solr searches against the two repositories and integrated into a single interleaved result set. Easily integrated into any meta-search engine implementing z39.50. Cool!
Z> open tcp:@:2100
Connecting...OK.
Sent initrequest.
Connection accepted by v3 target.
ID : 174
Name : JZkit generic server / JZKit Meta Search Service
Version: 3.0.0-SNAPSHOT
Options: search present delSet triggerResourceCtrl scan sort extendedServices namedResultSets negotiationModel
Elapsed: 0.006905
Z> base oxford intute
Z> format xml
Z> elements solr
Z> find @attr 1=4 science
Sent searchRequest.
Received SearchResponse.
Search was a success.
Number of hits: 803, setno 1
records returned: 0
Elapsed: 0.003859
Z> show 1+10
Sent presentRequest (1+10).
Records: 10
[SOLR]Record type: XMLNo Third Party copyright Gordon L. Clark Gordon L. Clark ora:general >false general 1982 John Wiley & Sons, Ltd. D. T. Herbert D. T. Herbert R. J. Johnston R. J. Johnston St Peter's College 1982 Social Sciences Division - Environment,Centre for the - Geography,School of University of Oxford Gordon L. Clark D. T. Herbert R. J. Johnston John Wiley & Sons, Ltd. Transformations: Economy, Society and Place Publisher has copyright Peer reviewed Geography Social Sciences Division - Environment,Centre for the - Geography,School of Clark Herbert Johnston Gordon L. Clark Gordon L. Clark D. T. Herbert D. T. Herbert R. J. Johnston R. J. Johnston John Wiley & Sons, Ltd. John Wiley & Sons, Ltd. Book (monograph): Section of book or chapter of book Gordon L. D. T. R. J. http://ora.ouls.ox.ac.uk:8081/10030/2034 Geography and the urban environment : progress in research and applications uuid:89494f00-a92b-4914-8f9a-bcede5a4d9be University of Oxford 0471102253 1982 en St Peter's College 41-61 true Gordon L. Clark D. T. Herbert R. J. Johnston uuid:89494f00-a92b-4914-8f9a-bcede5a4d9be John Wiley & Sons, Ltd. Transformations: Economy, Society and Place Peer reviewed Publisher has copyright Published Geography 2008-06-02T07:36:29.419Z Policy science and instrumental reason text uuid:89494f00-a92b-4914-8f9a-bcede5a4d9be 5 http://www.geog.ox.ac.uk/staff/glclark.php http://eu.wiley.com/WileyCDA/Section/index.html
[SOLR]Record type: XMLScience Science urn:ISSN:0036-8075 oai:epubs.cclrc.ac.uk:serial/1252 oai:epubs.cclrc.ac.uk:serial/1252 57 STFC ePublication Archive 57-oai:epubs.cclrc.ac.uk:serial/1252 Science Science
Posted by Ibbo at 02:38 0 comments
Labels: IEDemonstrator, JISC, jzkit, Repositories, Z3950
Sunday, 1 June 2008
Upcoming Changes to JZKit Configuration
JZSome sunday musings on the Configuration mechanism in the jzkit_Service module...
the JZKit_service module is the glue module that pulls together all the other components into a federated meta search component capable of resolving internal collection and landscape names into a list of external z39.50/sru/srw/opensearch/SOLR/JDBC/etc/etc services and broadcasting a search to those services, integrating the results and providing a unified result set.
This gives rise to a problem: For solutions like the previous Z39.50/SOLR bridge, we want a simple XML config file that a user can hack once, then leave. For more complicated applications, we need a real relational database behind the app to manage the complex config that goes along with large information network archtectures.
Here's the rub: JZKit carries with it a very detailed service registry. It's purpose isnt like the JISC IESR to be a registry in it's own right, but to support the search process. More and more, there is a need to make the JZKit service registry searchable in it's own right (As a Z39.50 Explain database, or as an SRU Explain collection / ZeeRex records).
This has been bothering me for a while, yet the answer has pretty much been staring me in the face all along. So, here's what I'm considering for the final release of JZKit3:
1. The current "InMemoryConfig" which is loaded from XML config files will be deprecated.
2. It will be replaced by an in-memory derby database, essentially just the current database backed config mechanism, but with an in-memory database.
3. The XML config files will be left intact, but considered to be a "BootStrap" mechanism. At startup, JZKit will scan the config files and update/create any entries in the configuration database.
Thus, the "In-Memory" config will remain as before, but instead of being held in hashmaps the data will be inside a derby database. This means we can now define first and foremost a JDBC backed datasource for the explain database and make it searchable even for static content.
--> Free explain service for any JZKit shared collections.
Posted by Ibbo at 03:55 0 comments
Labels: IESR, jzkit, metasearch, ServiceRegistry
Tuesday, 27 May 2008
Exposing SOLR service(s) as a Z3950 server
JZKit is a pretty large toolkit for developers of search services to embed in their own systems, and it's not always easy to get to grips with ;). Partly thats because if you're using JZKit you're probably already dealing with the Z39.50 specifications along with a host of other concerns.
What developers need are simple starter apps that they can use to hit the ground running. In JZKit3 we've decided to try and address this by putting up some sample configurations of the tool that do useful stuff out of the box. First up.. Making a SOLR server visible as a Z39.50 Server using an easy to change XML config file. Why do that? Well lots of reasons, the most common one at the moment is that SOLR is being used to provide search interfaces into lots of interesting new content, not least of which are the whole new breed of digital object repository projects like Fedora and DSpace. What seems to keep coming back around is institutional librarians saying "But I want that content to be available along-side everything else".
Kewl, a problem we can do something about. For now, the gateway distro lives in our maven repository Here. If you fancy setting up a Z39.50 server to proxy for a local SOLR server, download the compressed tar file from the above URL and unpack it.
Configuration.
After unpacking, look in etc/JZKitConfig.xml you'll see a number of
From here, it should be plain sailing, here's the output of a yaz client session:
(N.B.) XML Markup in the result record is being filtered out by blogger. Actual result record is XML (Actually, in this case it's just the SOLR
yaz-client tcp:@:2100
Connecting...OK.
Sent initrequest.
Connection accepted by v3 target.
ID : 174
Name : JZkit generic server / JZKit Meta Search Service
Version: 3.0.0-SNAPSHOT
Options: search present delSet triggerResourceCtrl scan sort extendedServices namedResultSets negotiationModel
Elapsed: 0.006948
Z> base Test
Z> find @attr 1=4 Dell
Sent searchRequest.
Received SearchResponse.
Search was a success.
Number of hits: 1, setno 1
records returned: 0
Elapsed: 0.002231
Z> format xml
Z> show 1
Sent presentRequest (1+1).
Records: 1
[coll]Record type: XML
nextResultSetPosition = 2
Elapsed: 0.008594
Z> quit
Other components of the config file can be used to control the XML records returned, to convert SOLR records into MARC or to manage meta searching through the gateway, but thats a subject for next time.
Happy meta-searching.
Posted by Ibbo at 09:27 6 comments
Labels: jzkit, metasearch, SOLR, Z3950
Monday, 14 April 2008
I have this ASN.1 definition and this byte stream, can I use A2J?
I get about 1 email per week about this, and despite the fact that the docs are out there, for some reason people don't seem to be able to find them. So, if you have an ASN.1 definition file and need to encode/decode a byte stream from a device or some other source, here's how to do it with A2J.
1. Get A2J. There are a couple of options. you can download the source code from http://developer.k-int.com/svn/a2j/a2j_v2/trunk/ and build it yourself, or follow the other approach, and use Maven, which is the approach I'll discuss here. The a2j libraries are available from the public maven2 repositories so there's no special download or setup, just add the following to the dependencies section in your project.pom file and the jar will be downloaded from one of the maven2 repositories:
2. You need to precompile the asn.1 definition into codec classes. Use the following plugin :
Obviously, replace input_file with your input file, and the base package with whatever java package you want to use. This will generate a load of java stubs that can process and input and output byte streams defined by the asn.1 specification. Am exanple pom can be found here: http://developer.k-int.com/svn/jzkit/jzkit3/trunk/jzkit_z3950_plugin/pom.xml
3. I want to read bytes from an input stream. Again, you can copy code from jzkit3, specifically, http://developer.k-int.com/svn/jzkit/jzkit3/trunk/jzkit_z3950_plugin/src/main/java/org/jzkit/z3950/util/ZEndpoint.java but the abbreviated version:
while(running) {
try {
log.debug("Waiting for data on input stream.....");
BERInputStream bds = new BERInputStream(incoming_data, charset_encoding,DEFAULT_BUFF_SIZE, reg);
PDU_type pdu = null;
pdu = (PDU_type)codec.serialize(bds, pdu, false, "PDU");
log.debug("Notifiy observers");
notifyAPDUEvent(pdu);
log.debug("Yield to other threads....");
yield();
}
}
incoming_data is an input stream, charset_encoding is a Character Set, reg is the OID register than can be used to identify any externals / other OID's appearing in the data.
Have fun!