A lot of time, we link to a website or an ongoing discussion rather than copying and pasting info over onto to Fanlore. But once a website or a link is dead, that data is lost and your Fanlore entry may lack context or key info.
Your best shot is to head over to the Internet Archive (Wayback Machine) and see if the website has been archived. But since the Wayback Machine crawls and archives randomly, you won't know if your citation can be resurrected until it is too late.
Enter: WebCite. A service designed for scholars to create a static snapshot of a website so that you can cite it (and the page contents) for longer periods. It is user driven - you have to submit the website link before the website goes down (when you're creating your Fanlore entry). It comes with a few caveats: it won't create a snapshot of pages that have the 'no robots' code. It won't grab locked content and if you're grabbing a page from an adult Livejournal community, all you may see is the 'Adults only' warning. And it is intended to be used in addition to the direct link to the website, not in place.
I've tested WebCite on the Professionals Fandom Timeline which pulls the bulk of its content from a few key LJ threads. We have already laboriously copied the data over to Fanlore (with permission), but it seemed like a good test candidate.
I also used WebCite to create links to a Stargate Award website that is not currently in the WayBack Machine.
If you have used this service before, or know anything more about it, please drop a note. I think it will be particularly useful for blogs and forum posts which are prone to vanish quickly. It comes with an easy to use Bookmarklet that will allow you to cite a webpage with one click.
edited: I had a brief discussion with someone about WebCite in which they expressed discomfort with the use of this tool (and about whether aspects of the Fanlore project in general could be seen as a breach of fannish community mores/trust). So I'll toss out this narrower question: How does using Webcite differ from our using the WayBack Machine/Internet Archive or Google as our citation sources? Both Webcite and the Wayback Machine are using the same caching process and both store the website snapshot on their servers. What I like about WebCite is that it is much more limited - it cites only the one page and does not scrape and archive the entire website (like the WayBack Machine). This offers us a better level of control over what we're citing to, makes certain we give proper credit to the source of info and grabs the smallest portion of material. In other words, it seems (to me) to be a better form of 'fair use'.
Thoughts? Input? Other ways of looking at the 'what to link, what to quote, what to cite' question? Is any use of any tool that caches a website (ex. Google, WayBack Machine, LJ Seek etc) something to avoid? I realize there may not be a single or uniform opinion, but like Fanlore, I think that plural POVs are good.
edited to add: I have to keep in mind that Fandom - and Fanlore - is not operating in isolation. Scholars, other Wikis, libraries, and historians are running into the same questions and looking at and evaluating the same tools. In fact Wired had a recent article about the US and UK digital archives and their reliance on the Oakland Archive Policy of 2001. More here.
And...a recent Library Science article discussing yet another 'caching' service: Memento Web
And...: links to legal articles on digital preservation and caching below.
Your best shot is to head over to the Internet Archive (Wayback Machine) and see if the website has been archived. But since the Wayback Machine crawls and archives randomly, you won't know if your citation can be resurrected until it is too late.
Enter: WebCite. A service designed for scholars to create a static snapshot of a website so that you can cite it (and the page contents) for longer periods. It is user driven - you have to submit the website link before the website goes down (when you're creating your Fanlore entry). It comes with a few caveats: it won't create a snapshot of pages that have the 'no robots' code. It won't grab locked content and if you're grabbing a page from an adult Livejournal community, all you may see is the 'Adults only' warning. And it is intended to be used in addition to the direct link to the website, not in place.
I've tested WebCite on the Professionals Fandom Timeline which pulls the bulk of its content from a few key LJ threads. We have already laboriously copied the data over to Fanlore (with permission), but it seemed like a good test candidate.
I also used WebCite to create links to a Stargate Award website that is not currently in the WayBack Machine.
If you have used this service before, or know anything more about it, please drop a note. I think it will be particularly useful for blogs and forum posts which are prone to vanish quickly. It comes with an easy to use Bookmarklet that will allow you to cite a webpage with one click.
edited: I had a brief discussion with someone about WebCite in which they expressed discomfort with the use of this tool (and about whether aspects of the Fanlore project in general could be seen as a breach of fannish community mores/trust). So I'll toss out this narrower question: How does using Webcite differ from our using the WayBack Machine/Internet Archive or Google as our citation sources? Both Webcite and the Wayback Machine are using the same caching process and both store the website snapshot on their servers. What I like about WebCite is that it is much more limited - it cites only the one page and does not scrape and archive the entire website (like the WayBack Machine). This offers us a better level of control over what we're citing to, makes certain we give proper credit to the source of info and grabs the smallest portion of material. In other words, it seems (to me) to be a better form of 'fair use'.
Thoughts? Input? Other ways of looking at the 'what to link, what to quote, what to cite' question? Is any use of any tool that caches a website (ex. Google, WayBack Machine, LJ Seek etc) something to avoid? I realize there may not be a single or uniform opinion, but like Fanlore, I think that plural POVs are good.
edited to add: I have to keep in mind that Fandom - and Fanlore - is not operating in isolation. Scholars, other Wikis, libraries, and historians are running into the same questions and looking at and evaluating the same tools. In fact Wired had a recent article about the US and UK digital archives and their reliance on the Oakland Archive Policy of 2001. More here.
And...a recent Library Science article discussing yet another 'caching' service: Memento Web
And...: links to legal articles on digital preservation and caching below.
Tags:
no subject
no subject
I am planning on using it mainly to cite to raw data or reference links, not the actual stories or art. But I suspect that the Fanlore Wiki would fall under their definition of a scholarly journal/article and if a scholar were writing about adult themes or erotica they'd be allowed to publish...and therefore to WebCite....
no subject
Just my two cents :)
ETA: In response to your ETA, I think the biggest concern people will have is that they'll lose control of their fanworks. I realise you said you weren't planning to screencap fic or art just yet, which is probably for the best. And in terms of fair use, can you screencap an entire fic? I mean, if you cited a textbook that went out of print or whatever and nobody could get at it, I don't think you could then photocopy the entire thing so people could have it for reference. Right?
Assuming we're only caching a single page....
But after reading the objections that were raised, then I started to wonder - what makes WebCite different from the WayBack Machine or Reocities or the Google pages that we link to?
So I think that even once we address the question of what these services cache or cap (entire story? Just a page? entire website?), there remains another unspoken issue: whether we can even link to cached/archives/screencaps.
I think one argument could go like this: it is OK for us to include an archive link but only if someone else (non-fannish) created it. Fans cannot create a cache or screencap a page because ....we're fans and our fannish morals say that capping or reproducing any portion of a website or a blog entry without permission is wrong - for any reason (ex. not permitted for commentary or to illustrate a point). But if someone else does it *for* us, we're off the moral hook?
The other argument might be: you cannot link to any cached copy, irrespective of who creates it. Fans should never link to archives or link to screencaps (without permission). There is no fair use exception in fandom when it comes to fannish works.
I suspect there are variations on these arguments, but it does seem to come down to: assuming it is (1) OK to document fannish history and assuming (2) you are only linking to one page, then the debate turns into (3) when can fans cache a page and, if they cannot cache it themselves, (4) when can they link to someone else's cached copy?
Fandom has a unique (and somewhat inconsistent) approach to fair use. One that is not accepted universally. The fact that services like Google are fighting for the ability to index material just proves how much 'diversity' (aka disagreement) there is on the topic of 'fair use.'
Re: Assuming we're only caching a single page....
Not sure how many people in fandom actually feel that way--it might be a small minority.
OTOH, if someone deleted a wanky post during Racefail and we were forced to rely on screencaps to continue the discussion, how many of those same people would object then? Why is fic more protected than meta?
(Though I gotta say, I'm not keen on what Google is doing.)
Re: Assuming we're only caching a single page....
Re: Assuming we're only caching a single page....
I don't think the use of meta-caps in discussions necessarily is or should be protected, though it happens in some circles, most notably Fandom Wank. I don't believe that's the model most fiction fen prefer to base their fannish interactions on, usually.
Re: Assuming we're only caching a single page....
Well, it works both ways too. In terms of public versus private, I think some would argue that taking down a fic is no different than if I took down my post about what happened at my cousin's BBQ, or a post in which I speculate about the Doctor Who finale. If I locked or filtered a post in my journal, I would be appalled to discover someone had screencapped it and was passing it around. Why should the fic I've posted to my journal, which I've now locked, or which I've taken down from an archive, be any different? Those are my words, and if I no longer want to share them, why would someone who claims to be part of my community try to undermine this decision? Simply because they think it's significant to fandom history?
Just playing devil's advocate.
Re: Assuming we're only caching a single page....
This is the perspective that I was approaching the question from--that fen may perceive deliberate, targeted archiving of their stories--even a single page of a story (and many stories are only a single page long)--as a transgression of fannish mores when the archiving was instigated by another fan, rather than by a bot. The larger question of whether linking to bot-collected fannish output transgresses is worthy of discussion, too, but it was the idea that this user-instigated collecting and archiving would be done by those in the same community as the collected that gave me pause and made me wonder if this approach should be examined, first.
no subject
*There are some interesting arguments based on the location of Web-Cite's servers, admittedly,
no subject
Last I heard, Google was struggling in the EU over their ability to cache and index websites.
Here in the US in 2006 Caching/indexing was found to be legal
http://www.practicalecommerce.com/articles/1457-Search-Engines-Indexing-and-Copyright-Law
But newspaper attorneys are trying to get the issue revisited (when profits get involved, law gets weird)
http://www.techdirt.com/articles/20091113/1357386926.shtml
The other issue is caching images. The 2006 case allowing Google to cache thumbnails is here (and I suspect is part of what Wikis use to argue they can use thumbnails - that and the fair use education exceptions).
http://en.wikipedia.org/wiki/Perfect_10,_Inc._v._Google_Inc.
If I can find articles about how the UK/EU is handing caching/archiving etc, I'll put them in another link.
no subject
http://www.out-law.com/page-10980
Info about legal disputes surrounding the WayBack machine is a bit harder to find. It makes me wonder if digital archiving (at least *their* digital archive) is no longer a low hanging fruit (too much legal protection for most content owners to go after the WM when they can just as easily use the WM's removal process).
There is this:
http://en.wikipedia.org/wiki/Internet_Archive#Controversies_and_legal_disputes
I suspect that WebCite will align itself with the Internet Archive and focus on the educational/scholarship aspects, arguing it falls under the Fair Use exception. Then there is its limited content 'grab' (also part of the fair use balancing test). That, coupled with their honoring the robots.txt flags as well as them having a removal request process might be their attempt to bring in the Safe Harbor rules.
Will keep digging.
The wiki entry on WebCite is also helpful, both in how they're positioning themselves legally as well as the fact that they're feeding the info to the Internet Archive.
http://en.wikipedia.org/wiki/WebCite
no subject
The two operations are almost complete opposites of each other. Furthermore, the Google result also depends on an implied licence agreement which depended on Field knowing that Google was going to crawl over his material and cache it. As I uncerstand it, this isn't the case with WebCite; people make a choice whether or not to cache it and that choice is made on an item by item basis. Finally, the Court held that Google was entitled to the safe harbor provisions of the DCMA; again, I'm not sure if these would be available to WebCite.
I'm not arguing that WebCite's activities might not qualify for being fair use, but I don't think Field v. Google goes nearly so far to justify them as they seem to think it does.
Field/Google - Website Archiving Analysis
As to the second issue - the DMCA Safe Harbor provisions, if you find a ruling or an article that discusses it in connection with digital archives like the WayBack Machine, let me know? I'd like to compare it to Field and see if there is a difference between a Google index cache and a digital archive.
What intrigues me is how little legal analysis I am finding about digital archives like the Internet Archive. Again, back to them not being worth the effort (yet)?
edited to add: a 2008 library group published their take on the Field decision and how it might - and might not - provide libraries with cover for their website archiving.
http://www.aallnet.org/aallwash/LCA_greenpapercombo_Dec%202008.pdf
no subject
Luckily, Google has enough money to buy fleets of lawyers, and in this they seem to be on the side of public access. Unlike the Google Libraries digitization project, where they want to charge for access...
no subject
Overview of Web Archiving Services/Legal Analysis
It is short on legal analysis. It offers sections on crawling and user driven archiving, commercial vs non profit etc.
This more recent article explores what the UK is discussing about website archiving:
More here
The part that applies to us here:
"Kristine Hanna, Director of Web Archiving Services told Wired: "We follow the Oakland Archive policy established in 2001, that allows a website owner/content provider to remove access from the archive, and/or prevent their content from being captured by putting up a robots.txt exclusion on their website.
"The Oakland Archive policy outlines an 'opt out' approach where, if requested, we will expeditiously remove a site from access. In most cases we find that when web site owners understand we are archiving the content for the library, and are not re-purposing the content for any other purpose (including re-sale or revenue) they decide to keep their site visible in our collections."
Last, a good overview of the two cases focusing on caching and how they would apply to website archiving (paper by a 2008 Library Coalition).
http://www.aallnet.org/aallwash/LCA_greenpapercombo_Dec%202008.pdf
no subject
no subject
I'll see if I can find a more detailed discussion of the opinion
no subject
But WebCite (and the Archive) preserve those sites for when they get deleted, and I think that deleting a website probably counts as withdrawing consent.
Disclaimer: What I've found wrt WebCite's legality in Germany predates the BGH ruling (and is all in German. :() Generally most rulings wrt copyright used to stress the necessity of opt-in procedures instead of this opt-out, so we'll see how it goes.
no subject
http://jurist.org/paperchase/2010/04/done-germany-high-court-rule-google-did-not-violate-copyright-laws.php
no subject
I read that article and found it ill-informed and rather annoying; specifically the ad hominem attack on Sanford and Brown. I mean, take a daft statement like
I mean, you only have to Google Sanford and this is the top hit (to say nothing of the fact that I believe he was the man who had his own blog which was one of the better IP blogs on the web until people's refusal to see him as distinct from his clients forced him to shut up shop).
In context, the allegation of "non-disclosure" is so bloody ridiculous as is the assertion
Anyone who knows a blind thing about IP law will see that the bit of their article he cites is referring to a specific provision in the copyright code regarding permitted archiving, and Masnick's comments regarding fair use (which is a defence to infringement) are just laughably off base. In fact, Masnick is coming over in that article as one of those typical "IANAL and that makes me both morally superior to and inherently more in tune with the underlying subject matter" Internet weevils.
no subject
no subject
Legal precedent doesn't exist in some sort of vacuum; certainly not in the case of IP. You can have a French/civil law concept of droit moral but neither the UK nor the US does, though the UK has grafted some droit moral concepts onto its IP law because of the harmonising effects of the EU. As a result, one of the fundamental questions of IP law is economic harm. If it's difficult to see a definable economic harm, then the sensible advice is not to bother suing, irrespective of whether one theoretically has rights or not.
But if you ever got an author who was as rich as JKR and as bonkers as Diane Gabaldon, then's where you'd get "litigation in principle". And that, imho, is where you see precedent really getbent.
no subject
no subject
Practically, Google and all the other search engines out there, are so useful that courts all over seem resistant to finding infringement.
There's a lot of value for fanlore in storing a copy of a linked-to page for future reference. That doesn't necessarily mean publishing or posting, but having something rather than nothing is extremely valuable in really expressing a plural point of view.
Guess I come down on the side of archiving as a relative benefit to us.
no subject
I don't think anyone would disagree that, for the purposes of breach of confidence, material put up on the web ceased to be confidential. But its author does not, as a result, lose copyright and it's the act of copying which is beinglooked at here.