Victor's Final Year Project: October 2006

Tagging System become popular: these system enable users to add keywords (i.e. tags to Internet resources ( e.g., web page, images, videos) without relying on a controlled vocabulary.

Advantage: improve search, spam detection, reputation systems and personal organization while introducing new modalities of social communication and opportunities for data mining.

One approach to tagging has been emerged in “social bookmaking” tools where the act of tagging a resource is similar to categorizing personal bookmarks. In this model, tags allow users to store and collect resources and retrieve them using the tags applied similar keyword-based systems have existed in web browsers, photo repository applications and other collection management systems for many years; however, these tools have recently increased in popularity as elements of social interaction have been introduced, connecting individual bookmaking activities to a rich network of shared tags, resources, and users.

Social tagging systems, as we refer to them, allow users to share their tags for particular resources. In addition, each tag servers as a link to additional resources tagged the same way by others. Because of their lack of predefined taxonomic structure, social tagging systems rely on shared and emergent social structures and behaviors, as well as related conceptual and linguistic structures of the user community. Based on this observation, the popular of tags in social tagging systems have recently been termed folksonomy, a folk taxonomy of important and emerging concepts within the user group.

Another benefits of social tagging system is that a shared pool of tagged resource enhances the metadata for all users, potentially distributing the workload for metadata for all users, potentially distributing the workload for metadata creation amount many contributors. These systems may offer a way to overcome the Vocabulary Problem.

First articulated by George Furnas et al in, where different users use different teams to describe the same things (or actions). This disagreement in vocabulary can lead to missed information or inefficient user interactions. The taxonomy of tagging systems articulated in this paper, and the results of our preliminary experiments on the relationship between tag overlap and social connection, both point to the possibility that thoughtful sociotecnical design of tagging systems may uncover ways o overcome the Vocabulary Problem without requiring either the rigidity and steep learning curve of tightly controlled vocabularies, or the computational complexity and relatively LOW success of purely automatic approaches to term disambiguation.

Resources: the relationship between resources and links is a well-researched area. Most prominently, PageRank [18] has made analysis of link structure on the web a household name.
Users: Analysis of social ties and social networks is an established subfield of sociology [25] and has received attention from physicists, computer scientists, economists and numerous other areas of study.
Tags: The aggregation and semantic aspects of tags have been discussed and debated at length[16]. This discussion has mainly focused on the quality of information produced by tagging systems and the possible tradeoffs between folksonomies and crafted ontologies [17,20]. Furthermore, the challenges of shared vocabularies for description have been studied in the information science and library science communities from many years. [8]

Related work

Perhaps the most significant formal study of tagging systems appears in the work of Golder and Huberman [9]. The authors study the information dynamics in “collaborative tagging systems” – specifically, the Del.icio.us system. The authors discuss the information dynamics in such a system,, including how tags by individual resources( in the case of Del.icio.us, web resources) change – or more specifically, STABLEIZE- over time.

Golder and Huberman also discuss the semantic difficulties of tagging systems. As they point out, polysemy (when single word have multiple related meanings) and synonymy (when different words have the same meaning) in the tag database both hinder and precision and recall of tagging systems. In addition, the different expertise and purposes of tagging participants may result in tags that use various levels of abstraction to describe resources: a photo can be tagged at the “basic level” of abstraction [14] as “cat” of at a superordinate level as “animal” or at various subordinate levels below the basic level as “Persian cat”.

Inherent in our model of tagging systems are connections of links between resources. As mentioned above, research on link-based systems in the context of the web is hardly new. Obviously, the PageRank algorithm had a significant impact on the field and on the way we se the web today, by supplying a mechanism to assess the importance of web pages. Lately, link analysis has been suggested to help fight web spam[10] by identifying trusted resources. In tagging systems, similar concept can utilize the information ad trust in the social network and the links from users to resources (as well as between resources as before) to reason about the importance and trust of users and resources.

Kleinberg [13] suggested an algorithm to identify web pages that are “hubs” and nodes that are “authorities” in a linked graph of resources, given a query term. In his model, Kleinberg views the hubs and authorities approaches an inch closer to our model. Chakrbarti et al [5] extended Kleinberg’s work to include anchor text. Anchor text, the text that appears around a link to a certain resource, can be considered to have a similar role to tag in our model. Traditionally, the anchor text is associated with the resource the link is pointing to. The exact way the text is packed and associated with the resource comprehensiveness and accuracy of anchor-text based methods by treating the user and the resource separately in relevance metrics.

Taxonomy of Tagging Systems

Some key dimensions of tagging system’s design that may have immediate and considerable effect on the content and usefulness of tags generated by the system. For each dimension in our taxonomy, we note that ways in which the location of a system on this dimension may impact the behavior of the system. Some of these dimensions listed below interact; a decision along one of them may determine, or at least be correlated with, the system’s placement in another.

Tagging right: the most important characterization of a tagging system design is the system’s restriction on group tagging. It can restricted to self-tagging, where users only tag the resources they created or allow free-for-all tagging, where any use can tag any resource. This is not the apparent dichotomy that it seems, as systems can allow varying levels of compromise. For instant, system can choose the resources users are to tag or specify different levels of permissions to tag( as with the friends, family, and contact distinctions in Flickr). Likewise, systems can determine who may remove a tag, whether no one or anyone, the tag creator or the resource owner (e.g. Flickr). The implication for the nature of tags that emerge is that free-for-all systems are obviously broad, both in the magnitude of the group of tags assigned to a resource, and in the nature of tags assigned. For instance, tags that are assigned to a photo may be radically divergent depending on whether the tagging is performed by the photographers, their friends, r strangers looking at their photos.

Tagging support. The mechanism of tag entry can have great impact on tagging system behavior. Observed system fall into three distinct categories: blind tagging, where a tagging user cannot view tags assigned to the same resource by other users while tagging (e.g. Del.icio.us); viewable tagging, where the user can see the tags already associated with a resource (Yahoo! Podcasts); and suggestive tagging, where the system suggests possible tags to the user (Yahoo! We 2.0). the suggested tags may be based on existing tags by the SAME user, tags assigned to the same resource by other users. Suggested tags can also be generated from or other sources of related tags such as automatically gathered contextual metadata, or machine-suggested tag synonyms. The implication of suggested tagging may be a quicker convergence to a folksonomy. The suggestive system may help consolidate the tag under resource, or in the system, is much faster than the blind tagging system would. A convergent folksonomy is more likely to be generated when tagging is not blind. On the the hand, the suggestive model may be applied carefully to that the agreement is not too widespread. As for viewable tagging, implications may be overweighting certain tags that were associated with the resource first, even if they would not have arisen otherwise.

Aggregation. Another related feature of group dynamics comes from the aggregation of tags around a given resource. The system may allow for a multiplicity of tags for the same resource which may result in duplicate tags from different users; this approach is named by author as Bag-model (e.g. Del.icio.us). Alternatively, many systems ask the group to collectively tag an individual resource, thus denying any repetition; this interface they called a Set-model approach for tag input(e.g. You Tube, Flickr).

In case of bag-model, he system is afforded the ability to use aggregate statistic for a given resource to present users with the collective opinions of the taggers; for instance, the tags around a popular link on Del.icio.us can be shown to the user to help characterize the breath of opinions of the taggers. Furthermore, these data can be used to more accurately find relationships between users, tags, and resources given the added information of tag frequencies.

User Incentives

Incentives and motivations for users also play a significant role in affecting the tags that emerge from social tagging systems. Users are motivated both by personal needs and sociable interests. The motivations of some users stem from a prescribed purpose, while other users consciously repurpose available systems to meet their own needs or desires, and still others seek to contribute to a collective process.

The most analysis of a tagging system has been completed on data collected from the content bookmaking site Del.icio.us. The reason on choosing Flickr is that Flickr provide an alternative interpretation to the conclusions derived from the study. In nearly every category within the system taxonomy, Flickr occupies an alternate space from Del.icio.us: it contains user-controlled resources as opposed to global; tagging rights are restricted to self-tagging (and at best permission-based, although in practice self-tagging in most prevalent) instead of a free-for-all; tags are aggregated in sets instead of bags; and finally, the interface mostly affords for blind-tagging in stead of suggested-tagging.

These design decisions shape the incentive structures that drive people to tag resources. Since Del.icio.us is largely task-focused namely storing bookmarks for future retrieval, organizational motivations are most dominant. While the social element of tagging is evident from the leveraging of the community contribution, a lack of communication systems (e.g. messaging or explicit social networks) deemphasizes non-organizational social incentives.

Flickr users, on the other hand, are also likely to tag their own retrieval, but coupled with abundance of communication mechanisms, the system design encourages gaming and exploration tag uses. Users are primarily motivated by social incentives, including the opportunities to share and play.

Tag Usage

Tags are not mandatory in the Flickr usage model. Within a social tagging system, tags are typically an optional feature in a larger resource organization task. Like Del.icio.us, the Flickr interface prompts users for metadata bout each resource identified: a title, a caption, and a list of tags. In the case of both systems, the tag input comes third in the input interface, but also differentiates them from other resource management tools.

In addition to tagging one’s own photos, the Flickr system also allows users to tag their friends’ photos. However, this feature is not largely used; of the 58 millions tags they have observed, only a small subset are of this type; an overwhelming majority to tags are applied by the owners of photos.

Tag usage pattern vary quite drastically among Flickr users, and as expected, so does the adoption of tagging behavior. Figure 2 shows the cumulative distribution function (CDF) for tag vocabulary size across the set of users. The value at a given value is the probability (Y-axis) the random user has a set of distinct tags (X-axis) that is larger than collection size. For example, the probability that a Flickr user has more than 750 distinct tags is roughly 0.1%. the distribution illustrates the fact that most users have very few distinct tags while a small group has extremely large sets of tags.

Fig. 3 shows the growth of distinct tags for 10 randomly selected users over the course of uploaded photos. The users were selected as both frequent uploaders (greater than 100 photos) and frequent taggers (greater than 100 tags). Each point on this graph shows the number of distinct tags (Y-axis_ for a given user after the given photo number (X-axis). It is apparent from this plot that a number of different behaviors emerge from this social tagging system. In come cases (such as A in Fig.3 ), new tags are added consistently as photos are uploaded, suggesting a supply of fresh vocabulary and constant incentive for using tags. Sometimes only a few tags are used initially with a sudden growth spurt later on, suggesting that the user either discovered tags or found new incentives for using them, as with users B. for many users, such as those with few distinct tag in he graph, distinct tag growth decline over time, indicating either agreement on the tag vocabulary, or diminishing returns on their usage. Despite the heavy usage of tags for each of the individuals whose tags are depicted in the figure, a number of classes of behavior have arisen; implying that the interaction between user, tag and utility is varies one.

Vocabulary Formation

All tagging systems mentioned in this paper are arguably social in nature; in some cases the social aspect comes from leveraging the community’s collective intelligence, and in others there is explicit social interaction around use of tags.

Conclusions

Social tagging systems have the potential to improve on traditional solutions to many well-studied web and information system problems. Such problem including personalized or biased link analysis, organizing information, identifying synonyms and homonyms, building networks of trust to combat link spam, monitoring trends and drift in information systems and more. The prospects of reasoning about tags, users and resources in unity are encouraging.

Finally, by no means do we contend that the design taxonomy and incentive taxonomy we describe are complete. New uses for tagging systems are invented every day; users of such systems appropriate them with an ever-changing set of goals, motives, and aspirations. We hope that the taxonomy can server as a foundation for researcher and enable a more complete understanding of the constraints and affordances of tag-based information systems.

Victor's Final Year Project

Wednesday, October 18, 2006

HT06, Tagging Paper, Taxonomy, Flickr, Academic article, To read

Tuesday, October 17, 2006

Finished registeration and login page

Blog Archive

Links