Monday 14 July 2008

Wanted : HTML digitisation Silos needing an OAI feed

We've just been talking to a really cool sheffield geek who's got a neat HTML screen-scraping service and integrated it into JZKit to provide dynamic meta-search across services with no machine to machine interface. Very Cool.

However, in our discussions we realised that it might have even more value as a tool for converting HTML Silos into metadata-rich repositories. By combining this service with a SWORD deposit client and some MD5 checksums we can help digitisation projects who have large well-organised web sites, but perhaps don't have OAI or SRU/SRW, we can create a preservation hub that exposes the content using open standards.

If you have a HTML silo of digitized data (Or any data for that matter) but no OAI or SRU, and you need such services, we'd love to test this idea by writing the scripts to populate an OAI and SRU repository... Any volunteers?

Going to spend some spare time throwing this and the JDBC/OAI/SRU gateway together hopefully in time for library mashup and bathcamp. Maybe we can come out of those events with some newly exposed data sources contributing to the linked data network.

Fun :)

1 Comment:

Mia Ridge said...

If you posted on the MCG (museums computer group) email list you might get some takers... you'd need to explain the process and benefits in simple terms but hopefully some people would be interested in getting data they can re-use.

cheers, Mia

Knowledge Integration Ltd