May 18, 2009
Case Study Contents
The University of Washington (UW) Libraries established a digitization program in the late 1990s. One of the recipients of the first digitization grants—the Library of Congress/Ameritech grants—the UW Libraries has been working with local and regional partner organizations for over ten years to create a digital collection of over 180,000 image, text, and sound files. While the collection is necessarily focused on the people, places, and history of the Pacific Northwest, there are additional items of interest including images of architecture around the world, nineteenth-century actors, eighteenth- and nineteenth-century fashion plates, and freshwater and marine images.
Content for our digital collections is drawn from the UW Libraries’ Special Collections and branch library collections, slide collections from faculty, and student projects identified by subject librarians as candidates for the digital collections. Digitization at the UW Libraries is decentralized, but supported by the Digital Initiatives unit, which is part of the Libraries’ Information Technology Services division. Special Collections digitizes and creates metadata for those collections they want to make available online; the content is chosen either by the materials curators or the digitization specialist. For non-Special Collections materials, UW Libraries’ subject specialists typically work with faculty members to vet materials for the online collections. Branch libraries often have the internal resources necessary to digitize some materials and can use available student funds to hire additional help, or shift students from tasks such as shelf reading to digitization. Other subject specialists can utilize the scanners and other resources of digital initiatives for their projects; if they do not have student funds to hire help, they will often seek internal grants or work with external foundations to fund the project. In all cases, Digital Initiatives provides training and support when necessary. Staffing for digital projects in the UW Libraries varies according to the number of project currently underway, but generally is between three to five full-time equivalent (FTE) staff members, with two FTE devoted to supporting the digital collections full-time.
In addition to the technical aspects of supporting a digitization project, the UW Libraries also endeavor to create collections that are easily searchable and accessible through the use of well-formed metadata. To this end, the Libraries convened the Metadata Implementation Group which creates data dictionaries and consults on metadata. The charge of the Metadata Implementation Group is to develop and promote the use of metadata standards to ensure reliable resource discovery within and across digital library projects. The Group identifies appropriate metadata and coordinates consistent application of metadata across a variety of software environments and resource types.1
The Digital Collections site2 is indexed by Google and other search engines and heavily used by researchers of all ages around the world. The site typically receives about 150,000 visits a month.
Visitors to the Digital Collections site are directed to the site from a variety of sources including search engines, blogs, lesson plans, links from subject-specific Web pages at schools and universities, and Wikipedia. Users find our pages through very specific and focused queries, such as “federal emergency relief administration,” and queries of a broader nature, such as “fashion.” Entry points for visitors vary from the front page to collection pages to pages of individual images.
Wikis are a page or collection of pages that can be edited by a designated community. The designated community can be as small as one person or as large as the world (as is the case with Wikipedia). Pages are edited using a simplified mark-up language; however, there is no standard for this mark-up language, so different wiki software will use different syntaxes to accomplish the same task.3
Wikis can help communities manage collaborative efforts by providing a space for members to document activities, information gathering, code development, and a host of other data points. Using wiki software, it is possible to allow all members of a community access to pages for the purpose of editing; however, it is also possible to restrict access to certain pages. Some groups use wikis internally to manage projects and document problems and outcomes; other groups use wikis externally to present information to interested individuals and to collect feedback from their community as well.
Wikipedia is built on the MediaWiki software platform, which was developed specifically for Wikipedia. As such, the platform supports multilingual applications, scalability, editing of articles by subsection, and input by a large number of users.4 MediaWiki is just one of a number of wiki software programs available.
Wikipedia is organized by articles on specific topics; the articles cover every conceivable topic and then some. Some articles have a tendency to be vandalized more than others, and therefore have editors who closely monitor any changes made to the page. In these cases editors will often revert to the previous version without assessing whether the change made is vandalism or the addition of relevant content. This ability to revert to previous “incarnations” of the article is another powerful aspect of the MediaWiki software.
For example, say a reader of this article in January 2010 wants to verify that the MediaWiki article in Wikipedia does indeed contain the content I refer to in note 4; however, upon reading the page, there is no mention of the MediaWiki platform supporting multilingual text. By navigating to the “History” tab at the top of the MediaWiki article and clicking on 7 April 2009 (the entry closest to the access date listed in the note), the reader can verify whether the article did indeed mention multilingual functionality at the time I viewed it.
With over 2,800,000 articles as of April 3, 2009, the editorial policies of Wikipedia are quite lengthy and complicated. While touted as “the free encyclopedia that anyone can edit,”5 according to a 2006 article in the New York Times, Wikipedia periodically has to prevent editing of some articles due to vandalism or disagreement about their content. The list of articles with restricted editing does change and some articles that were once “protected” are now editable.6 Wikipedia policies, in and of themselves, are nearly encyclopedic, covering everything from content, point of view, behavior, and editing. Within each of these broad categories are sub-policies. For a quick overview of the types of Wikipedia policies, see the Wikipedia article, “Wikipedia: List of Policies.”7
The editorial policies of Wikipedia include a style guide to the creation of an article; it is worth taking a brief look at the style guide in the context of adding links to Wikipedia in order to provide a bit of context. Wikipedia articles typically have a lead section, which acts as a summary/introduction to the topic and generally includes a table of contents, particularly for longer articles; body sections, which include headings and sub-sections that help divide the article into manageable chunks and can help guide the user to the relevant section; and the appendices, which cover “see also” references, footnotes, and external links. Wikipedia editors will often review an article and ask that citations be included to support assertions made within the article. One of Wikipedia’s “core content policies”8 is verifiability, and as such, articles often contain citations to other published materials.
In 2006 I noticed, via the server statistics collected by the Libraries, that our digital collections were starting to see a great deal of traffic from Wikipedia, the “free encyclopedia anyone can edit” and go-to resource for many researchers and one of the top ten global Internet sites.9 An investigation led me to believe that adding links from Wikipedia into related areas of our digital collections might be an effective method to integrate our digital collections more fully into the information “flow”10 of researchers. At the time, many librarians were, and I think still are, concerned about the quality of information that researchers were finding on Wikipedia, and thus tended to steer researchers toward more high-quality, vetted resources such as the Encyclopedia Britannica. Certainly such encouragement is warranted within a library, but often researchers don’t darken the door or the chat room of libraries. The need for libraries to be represented in those places where people engage in research has become increasingly important in a world of readily-accessible, distributed information. It was our belief that rather than “shun” sites such as Wikipedia due to uneven quality (perceived or otherwise),11 we should use these sites to direct researchers to a place where authoritative resources are available.
In a recent report of preliminary findings by Alison Head and Michael Eisenberg of the University of Washington Information School, researchers discovered that Wikipedia is “a unique and indispensible research source for students.”12 Students tend to start their research with Wikipedia, because it provides them with an overview of their topic; they use Wikipedia to gain a better understanding, to find search terms, and because it “shows another network of research sources that exist.” Libraries need to meet students at their point of need, and including links to our collections in Wikipedia is one way to accomplish this.
The process of adding links to Wikipedia was quite lengthy; with a collection of over 180,000 images, our first step was to analyze the content of the various digital collections offered online by the University of Washington Libraries and our partners. We did this by using the “canned queries” about specific topics within a collection we had previously established through our CONTENTdm software. The graduate student assigned to the project used these topics to research potential articles to which our links could be added. Over two hundred links were added to Wikipedia articles; often there would be three or four links to our collections from one article.
An example of a link to our collection can be seen from the following screen shots (Figures 1 and 2) of the Alaska–Yukon–Pacific Exposition article in Wikipedia:
In this article, as in all articles to which we added links, pointers back to the digital collections are included under the “External links” section of the article (see figure 2). By clicking on the link “University of Washington Libraries Digital Collections—Alaska–Yukon–Pacific Exposition Photographs” (the second link in figure 2), the researcher is taken to our Collection page, which provides the researcher some context for the images.
There were times when a Wikipedia article editor would question the link or decline to have the link included; at these times, we corresponded with the article editor through the talk pages of our account or on the article talk page. In our correspondences with the editor, we always followed their guidance and suggestions. The ability to communicate with page editors was helpful for us because we were relative “newbies” to the Wikipedia community and wanted to comply with their policies.
In order to keep track of the UW collections and the corresponding Wikipedia articles, the graduate student developed a spreadsheet that included the initials of the person doing the Wikipedia editing, the dates the changes were made, and columns for tracking updates. This spreadsheet continues to be an invaluable tool for us, because it is easy to have a student use the spreadsheet to check articles to see whether the links have been removed.
Since the links were added to Wikipedia in the summer of 2006, our digital collections have seen an increase in traffic from Wikipedia, indicating that we are indeed somewhat more embedded in the information flow. In 2007, 5 percent of our referrals were from Wikipedia, while in 2008, this percentage dropped to 4.7 percent. This decrease in percentage, however, does not reflect a decrease in traffic. In fact, the traffic from Wikipedia increased by 6,139 visits. Rather, the decrease in percentage can be attributed to a greater number of referrals from other sources. In short, Wikipedia continues to be useful for us.
A student spends about four hours every six months checking links in Wikipedia and noting changes in a spreadsheet. Over the last two and a half years, most of the links have remained constant. Those that have been removed were generally due to a major overhaul in page content and organization, and have not re-entered. In fact, if a link is removed, we respect the views of the community and don’t add the links back. It is interesting (to me at least) that in most cases where links have been removed from an article, someone other than us has edited the page to include the links originally added by us.
One of the questions on my mind since we started putting links in Wikipedia is, how many people who visit an article in Wikipedia eventually follow the links to our pages? Early in our exploration of Wikipedia, it was not possible to extract this information because the data was simply not available; however, data for article traffic in Wikipedia has recently become available at the site Wikipedia Article Traffic Statistics.13 Using this site, I have been able to calculate the percentage of readers who visit our pages from various Wikipedia articles. Based on data from the statistics package we have for our servers, Urchin, I know that in November 2008, we had ten visitors from the Bubbleator article on Wikipedia. Using the data from Wikipedia statistics site I know that the Bubbleator page in Wikipedia received 248 visits that same month, so the percentage of visitors to the Wikipedia page who proceeded to our page was 15 percent. In contrast, the Space Needle article in Wikipedia received 18,508 visits in November 2008, and only eighteen visitors continued on to the Space Needle images on our Digital Collections site—a mere 0.09 percent of the total visitors. Not a very exciting number. However, looked at in the context of our digital collection site statistics, the picture is much rosier. As I mentioned earlier, around 5 percent, or 27,500, of our visitors are downstream traffic from Wikipedia.
As it turned out, adding links to our collections from Wikipedia is considered a conflict of interest by Wikipedia, although I do want to reiterate here that we always worked with editors to comply with any guidelines which they directed us to. Spam, as in all other areas of the Internet, is prevalent in Wikipedia, and policies, guidelines, and electronic measures are all used to troll and police for potential spam. Others who attempted to add links were not quite so fortunate as the University of Washington Libraries. Many who added links found themselves locked out of Wikipedia and considered a spammer. Quite lengthy discussions were held on Wikipedia pages with the result that Dirk Beetstra, a Wikipedia editor, wrote a helpful page for archives interested in adding links which includes topics such as “So why consider these link additions as spamming?” “How these link additions may be in violation of policies and guidelines,” and “Recommendations/solutions.”14 This page is enormously helpful, and I suggest that those considering a similar project read it.
Beyond using Wikipedia, archives that are adventurous could implement wiki software to allow researchers to ask questions, and for staff to answer those questions, all in an open format. Privacy concerns notwithstanding (users would not need to add identifying information), this could be an excellent way to promote collections that are often not easily findable, or to allow more casual discovery of a resource. Online finding aids are of enormous value but often function at the collection level; an archives wiki could provide a more “entry-level” look at the resources of an archives and could provide some item-level access to collections.
In the future, when we consider adding additional links to Wikipedia, we will start with the talk page for the article of interest and ask if folks think it would be a useful addition to the content and intent of the article. We might also do some deeper analysis of the article topic and our content and attempt to edit the Wikipedia article in a way that enhances the content while allowing us to create a citation link to our collections, rather than merely adding a link in the External Links section. By taking these actions, we hope to comply even more fully with the guidelines and policies of Wikipedia.
As mentioned previously, our interest in Wikipedia was piqued while analyzing server statistics for our digital collections site; as such, we reacted to a growing trend in the online community—namely, the use of Wikipedia to discover information. A more proactive approach would, of course, be to begin looking for additional sites such as Google’s Knol15 or Citizendium,16 and begin working with those communities to add links to our collections, or to create articles or “knols;” however, with diminishing budgets such resources should be examined closely for uptake by the online community. In analyzing server statistics over the last year, we have had one visit from Citizendium and only six from Knol.
Barrett, Daniel J. MediaWiki. Sebastapool, CA : O’Reilly Media, 2009.
Pressley, Lauren and Carolyn J. McCallum. “Putting the Library in Wikipedia.” Online 32, no. 5 (September 2008): 39-42.
Rosenzweig, Roy. “Can History be Open Source? Wikipedia and the Future of the Past.” The Journal of American History 93, no. 1 (June 2006): 117-146.
Woods, Dan and Peter Thoeny. Wikis for Dummies. Hoboken, NJ: Wiley, 2007.
Zentall, Lena and Camille Cloutier. “The Calisphere Wikipedia Project: Lessons Learned.” CSLA Journal 32, no. 1 (2008): 27-29.
1. “Metadata Implementation Group,” University of Washington Libraries http://www.lib.washington.edu/Msd/mig/default.html (accessed May 18, 2009).
6. Katie Hafner, “Growing Wikipedia Refines Its ‘Anyone Can Edit’ Policy,” New York Times, article published June 17, 2006, http://www.nytimes.com/2006/06/17/technology/17wiki.html?emc=eta1 (accessed April 3, 2009).
7. “Wikipedia: List of Policies,” Wikipedia, http://en.wikipedia.org/wiki/Wikipedia:List_of_policies (accessed April 3, 2009).
8. “Wikipedia:Verifiability,” Wikipedia, http://en.wikipedia.org/wiki/Wikipedia:Verifiability (accessed April 10, 2009).
10. Lorcan Dempsey, “In the Flow,” Lorcan Dempsey’s Weblog On Libraries, Services and Networks, entry posted on June 24, 2005, http://orweblog.oclc.org/archives/000688.html (accessed November 26, 2006).
12. Alison J. Head and Michael B. Eisenberg, “Finding Context: What Today’s College Students Say about Conducting Research in the Digital Age” (Project Information Literacy Progress Report, February 2009).
14. Dirk Beetstra, “User:Beetstra/Archivists,” Wikipedia, http://en.wikipedia.org/wiki/User:Beetstra/Archivists (accessed April 8, 2009).