> Exploring Inter Tagger Consistency Measures
(source: Kipp, Margaret EI in 20th Annual SIG/CR Classification Research Workshop, American Society for Information Science and Technology, Vancouver, BC, 6-11 November 2009 / poster déposé sur E-LIS)
“Kipp and Campbell (2006) examined tags assigned to the same URL in del.icio.us and determined that MDS and frequency graphs showed clusters of related terms as well as divergences between synonyms. Professional indexers too exhibit convergence and divergence in indexing behaviour, which has been measured in inter-indexer consistency studies. Leonard (1977) and Markey (1984) examined the results of multiple inter-indexer consistency studies examining not only the levels of inconsistency which varied widely but also the level of indexing exhaustivity (number of terms assigned to each document), method of collecting indexing data and vocabulary size. The majority of inter-indexer consistency studies show high levels of inconsistency between indexers (Leonard 1977; Markey 1984). While inter-indexer consistency studies have traditionally compared the indexing terms used by a small group of indexers, it is possible to adapt some of the more common measures to be used with large groups of indexers. A number of measures were examined in this study to determine which measures provide the most ability to distinguish between different indexers. This study used Salton’s Cosine measure, the Jaccard measure–also known as Hooper and Rolling’s measures (Markey 1984), Wolfram and Olsen’s Inter-indexer Consistency Density (Wolfram and Olsen 2007) and a Pairwise Jaccard measure (compares all indexers to each other without the need for a centroid or known good set of index terms). This study is part of a larger study examining measures of convergence and divergence in tagging systems. One goal of the larger study is to examine different ways of analysing tag data to see which methods provide the most useful analyses of the structures which develop in tagging. By calculating a number of different inter-indexer consistency measures it may be possible to make distinctions between tag lists to provide predictive analysis of tagging patterns.”
> Information Organisation Practices on the Web: Tagging and the Social Organisation of Information
(source: Kipp, Margaret EI / présentation déposée sur E-LIS)
“This talk (the public talk for my thesis) examines the phenomenon of social tagging from its early beginnings to its current level of prominence on a wide variety of websites in a series of linked studies examining the structures and patterns of tag term use to determine whether regular patterns appear that would support information organisation and retrieval.”
> Searching with Tags: Do Tags Help Users Find Things?
(source: Kipp, Margaret EI in 20th Annual SIG/CR Classification Research Workshop, American Society for Information Science and Technology, Vancouver, BC, 6-11 November 2009 / poster déposé sur E-LIS)
“In traditional library indexing systems, the indexer was an individual trained in the rules of information organisation to assign keywords for important information about the physical media and the subject matter of the content. While other groups have been involved in creating index terms (for example, journal article authors who are asked to provide keywords with their submitted articles), these keywords generally have a small circulation and are not widely used. Collaborative tagging systems such as CiteULike (http://www.citeulike.org) allow users to participate in the classification of journal articles by encouraging them to assign useful labels to the articles they bookmark. Studies comparing the terminology used in tagging journal articles to indexer assigned controlled vocabulary terms suggest that many tags are subject related and could work well as index terms or entry vocabulary (Hammond et al 2005; Kipp 2006; Kipp and Campbell 2006; Kipp 2007a). Some authors suggest that user classification systems demonstrate what vocabulary users actually use to describe concepts and that this could be incorporated into the system as entry vocabulary to the standard thesaurus terms (Mathes 2004; Morville 2005). However, the world of folksomonies includes relationships that would never appear in a library classification or thesaurus including time and task related tags, affective tags and the user name of the tagger (Kipp 2007b; Kipp and Campbell 2006; Kipp 2006). These short term and highly specific tags and relationships suggest important differences between user indexing systems and professional indexing systems which must be considered in examining the usability of tagging systems for resource discovery. Users searching online catalogues and databases often express admiration for the idea of controlled vocabularies and knowledge organisation systems, but find it difficult to adapt their vocabulary to the thesaurus and find the search process frustrating. (Fast and Campbell 2004) Additionally, controlled vocabulary indexing has proven costly and has not proven to be truly scalable when dealing with digital information, especially on the web. Morville (2005) suggests that tagging systems could scale along with digital information on the web allowing for some indexing of currently unindexed web materials. This study explores how users make use of an indexing system for enabling retrieval by performing an information retrieval study on a social bookmarking system and a more traditional online database in order to examine user search behaviour on the two different systems. This study asks the following questions: Do tags appear to enhance resource discovery? Do users feel that they have found what they are looking for? How do users find searching social bookmarking sites compared to searching more classically organised sites? Do users think that tags assigned by other users are more intuitive? Do tagging structures facilitate information retrieval? How does this compare to traditional structures of supporting information retrieval? The searchers were asked to search Pubmed and CiteULike for information on a specific assigned topic. Screen capture software, a think aloud protocol and an exit interview were used to capture the impressions of the users when faced with traditional classification or user tags. This data was analysed to explore the use of indexing terms by the participants as well as their use of other features in each system that support information finding and refinding. Participants selected their own keywords for searches on both tools. At the end of the search process, participants were asked to make a list of what terms they would now use if asked to search for this information again. Three sets of data were thus available for analysis: sets of initial and final keywords selected by the user, the recording of the search session and think aloud, and recorded exit interviews after the search session, all of which can be analysed to examine user impressions of the search process and the utility of the keywords in the process. Participants tended to prefer the search experience on the system used first, regardless of previous experience with either system. All users used multi word keywords initially, which is unsurprising as they are in training to be librarians. At the end of the search process, when users were asked to generate a new list of keywords they would now use for the search, many separated their list of final keywords by tool showing an awareness of the need to adapt a search to different systems. Items such as the presence of full metadata, abstracts and even full text links to articles were lauded while lack of vocabulary terms, and especially missing abstracts were deemed to be impediments to search. Participants found related article links and other newer features of systems to be a significant enhancement to the search process and some participants reported or were seen using tags or user names in CiteULike for similar purposes. Many of the participants in this study made use of the related articles links provided by PubMed and discussed the possibilities presented by MeSH in Pubmed and the tags on CiteULike but did not find that the structures were in place to fully support browsing of related items by keyword or combination of keywords. As shown by Ockerbloom (2006) these webs of related items can be built automatically using existing thesaurus structures and displayed to the user. This suggests that the use of indexing structures to link related items would be worthwhile to users if they are able to see the connections between items as they browse.”
