We use Mediathread's Chrome extension to get images, videos, and sounds from the web into Mediathread, preserving metadata and source info for those assets. It's a port of the bookmarklet that's also in use right now. We're maintaining the collection code in both the bookmarklet and the extension. Eventually, it might make sense to move the common logic into a library, so it's more easily maintained.
The way it works is, when you're on a page, say, Wikipedia, you click on either the bookmarklet button or the button for the Chrome extension. The code then looks for anything on the page that should be imported to Mediathread.
My strategy for testing something like this comprises of both unit tests and tests that fetch example web pages we want to be compatible with to see if they have the expected structure. I don't have very many unit tests yet, but the basic ideas are here, for both test types: https://github.com/ccnmtl/mediathread-chrome/blob/master/tests/host-handler.js