Rev. 12/14/2011
A. Human Rights Web Portal
- Test tools for web harvesting, access
- Install and test Heritrix software locally [completed 12/2010]
- Install and test Wayback machine locally [completed 12/2010]
- Improve scripting and configurations for CUL / IA harvesting [ongoing]
- Create prototype search interface to Columbia IA HR collection
- Obtain snapshots of Columbia HR collection content [completed 4/2011]
- Explore methods for metadata + full-text indexing of content [completed, 10/2011]
- Create preliminary prototype keyword + metadata search interface to Columbia IA HR collection [completed 11/15/2011]
- Develop and Launch Beta 1 prototype
- Migrate preliminary prototype to new Web Application Framework [January 2012]
- Launch Beta 1 prototype [Feb. 1, 2012]
- Internal CUL/IS testing [Feb. - Mar. 2012]
- Incorporate results of internal testing [March 2012]
- Launch Beta 2 prototype [April 2012]
- Conduct formal user testing with outside groups
- Incorporate results of testing into product
Add selected non-Web resources to local search interface
- Identify other data sources to integrate into portal [in process 12/2012]
- Configure and do beta testing
- Create new local HR document repository and integrate into local search interface [in planning]
- Set up new Fedora collection
- Set up workflow for selectors to create metadata and upload documents
- Index metadata and full text
- Merge into composite index
- Perform user testing and assess results
- Launch Public Beta 3 [July 2012]
- Conduct formal user testing
- Incorporate results of testing into product
- Launch Public Portal Version 1.0 [Sept. 2012]
- Complete HR Portal site public interface
- Launch Version 1.0
B. Search Enhancements
(These features may or may not be included in main public portal release, but rather made available separately for ongoing development and testing)
- Explore / Implement Enhanced Searching / Semantic Web / Linked Data technologies
- Semantically analyze human rights web collection (English-language content)
Explore text mining, named entity recognition, frequency analysis, clustering; analyze internal links & 'anchor windows'; extend analysis to relevant non-Web resources; evaluate DPpedia and Tagpedia approaches
- Develop / leverage defined human rights ontology / concept map
Investigate existing research and practice; coordinate with other institutions working with human rights content; explore applicability of semantically aware discovery and query engines; explore relevant existing taxonomies
- Semantically characterize web collection
E.g., use ORE, resource maps, linked data, RDF/XML
- Develop prototype semantically-generated research guides and other tools
Develop prototype content guides / content overviews; perform user testing; revise and extend protype; explore functional integration with other related projects; explore enrichment of basic search and retrieval interface with semantically-generated metadata
- Complete and Release Portal Version 2.0
- Integrate enhanced searching, semantic web and linked data functionality
- Develop sustainable update strategy based on automated processes
- Explore extending local web archive searching to other relevant Achive-IT and CDL content
- Launch Version 2.0
- Ingest local Web archives into Fedora for long-term preservation
- Develop / implement Fedora data model for Web content
- Ingest archived Web content into Columbia's Fedora repository
- Develop / implement content curation / migration strategy
Change log:
2012-12-14:
- Updated completed and in-process tasks
- Integrated planned testing / beta version release schedule
- Broke out Search Enhancements as separate track
2011-10-03:
- Updated completed tasks
- Renamed releases Version 1.0, 2.0
- Added additional user testing tasks
2011-03-22:
- Removed task of local implementation of Wayback software; local interface now to access IA content remotely.
- Added milestones for testing tools for web harvesting/access
- Added milestones for merging non-Web metadata
- Added milestone for creating and merging local document repository
- Added milestone for releasing Phase 1 portal prototype
(Previous version of project plan.)
2010-07-09: Changed to show local implementation of access-oriented Wayback software as separate task from Fedora ingest and long-term preservation of Web content; generalized description to include creating local web collections in areas other than human rights; added task of enabling selective document-oriented archiving and access. |