- Implement local Web archive access system
- Implement / customize local instance of "Wayback" software
- Transfer selected content collections from Internet Archive
- Implement capacity for selective document-oriented archiving with local access
- Explore / implement indexing alternatives to Nutchwax
- Release local web archive search system with one or more collections (e.g., human rights collection, Columbia University website collection, historic preservation collection)
- Explore / implement selected Semantic Web technologies
- Semantically analyze human rights web collection (English-language content)
Explore text mining, named entity recognition, frequency analysis, clustering; analyze internal links & 'anchor windows'; extend analysis to relevant non-Web resources; evaluate DPpedia and Tagpedia approaches
- Develop / leverage defined human rights ontology / concept map
Investigate existing research and practice; coordinate with other institutions working with human rights content; explore applicability of semantically aware discovery and query engines; explore relevant existing taxonomies
- Semantically characterize web collection
E.g., use ORE, resource maps, linked data, RDF/XML
- Develop prototype semantically-generated research guides and other tools
Develop prototype content guides / content overviews; perform user testing; revise and extend protype; explore functional integration with other related projects; explore enrichment of basic search and retrieval interface with semantically-generated metadata
- Create Human Rights Reference Portal
- Integrate web archive searching, reference tools, research guides
- Develop sustainable update strategy based on automated processes
- Explore extending local web archive searching to other relevant Achive-IT and CDL content
- Ingest local Web archives into Fedora for long-term preservation
- Develop / implement Fedora data model for Web content
- Ingest archived Web content into Columbia's Fedora repository
- Develop / implement content curation / migration strategy
Change log:
2010-07-09: Changed to show local implementation of access-oriented Wayback software as separate task from Fedora ingest and long-term preservation of Web content; generalized description to include creating local web collections in areas other than human rights; added task of enabling selective document-oriented archiving and access. |