Thursday, November 30, 2006

Web 2.0 from Dr Andy Chun

Web 2.0 from Dr Andy Chun
http://www.zdnetasia.com/blog/web2/

Saturday, November 11, 2006

Suggestion on making tag on classification

Although there are no standard guidelines on good tag selection practices, those in the folksonomy community have offered many ideas. Some “best practices” including:

  1. using plurals rather than singulars
  2. using lower case.
  3. grouping words using an underscore,
  4. following tag convention started by others and
  5. adding synonyms.

Many folksonomies allow users to modify this tags, and there is considerable scope for users to tidy up the entries that they have already created. Currently, tags are generally defined as single words or compound words, which means that information can be lost during that tagging process. Single-word tags lose that information that would generally be encoded in the word order of a phrase.

The commonness of compound tags, including tags that concatenate more than two words, may suggest that users miss the richness of the sentence structure. The “non-breakable space” can be introduces. Although many compound words are produced sing separator characters, such as this_is_tag.

Several del.icio.us taggers have established a private presudo-hierarchy of terms, by establishing tag conventions that resemble directory tructures, such as, Programming/C++, Programming/JAVA.

Smart systems

Alongside education users, there is much that system creators can do to improve the end-data their systems are helping to create. There are tow main ways in which improvements can be made. Firstly, much can be done at the point at which new resources are contributed to the system. Error-checking potentially accounts for a number of tag errors --- although rather fewer misspellings occur than may be expected. Furthermore, some sites already make tag suggestions when users submit resources. Scrumptions, a recent Firefox extension, offers popular tags for every URL. Systems could easily suggest synonyms, expansion of acronyms, and the like when users type in their tags.

Secondly, improvements can be made in the way systems search for resources already in the system. Synonyms suggestions could also be made here, suggesting for example, “ladybug” instead of “ladybird”.

Clay Shirky notes:

Tagging gets better with scale. With a multiplicity of points of view the question isn’t ‘’ Is everyone tagging any given line ‘correctly’’’, but rather ‘’ Is anyone tagging it the way I do?’’ As long as at least on other person tags something the way you would, you’ll find it – using a thesaurus to force everyone’s tags into tighter synchrony would actually worsen the noise you’ll get with your signal. If there is no shelf, then even imagining that there is one right way to organize things is an error.”

Conclusions

The investigations described in the article are brief, simple and relatively unscientific, as are the number provided within. That the results from both del.icio.us and flickr tended to be rather similar imply that they can be trusted only as much as a short, seat-of-the-pants. Only those with direct access to the del.icio.us nd flickr databases can be aware of the exact state of affairs and how it has changed across the months. For the research purposes, the interesting features of the tags are not in the precise percentages of usage, but in the choice of tag, the choice of structure, and the choice of language. Somewhere around a third of tags were indeed “malformed”, in tat they were beyond the grasp of a multilingual spell-checker for on e reason or another. Many of there were not misspelt, but mis-constructed, some of the latter in a correctable manner.

Still, possibly the real problem with folksonomies in not their chaotic tags but they are trying to serve two masters at once; the personal collection and the collective collection. So it possible to have the best of both worlds? At the moment, many investigations of tag data are in progress, including how tags can be used for searching. As a consequence, development in this fields tends to confine itself to methods for improving the quality of the user-contributed tags for this purpose. In practice, this involves promoting commonly-chosen tags above single-use or infrequently used tags by various means. It is possible that the data collected through folksonomy tagging is more complete than we had imagined. Some single-use tags are explicitly designed as such, such as the latitude/longitude makers used by geotagging. Some may be perceived as valuable or helpful to the reader. Some may be infinitely helpful for search purpose, if only the information provided therein is accessed in an appropriate manner. Is it therefore preferable, rather than attempting to stamp out single use or sloppy tags, to suggest that each item be tagged with mixture of approaches, including several search-friendly keywords?



Source: http://webdoc.sub.gwdg.de/edoc/aw/d-lib/dlib/january06/guy/01guy.html

Folksonomies
Tidying up Tags?

Marieke Guy
UKOLN


Emma Tonkin
UKOLN

from D-Lib Megazine

Draft:The Data Mining of Collaborative Bookmark-sharing System

The Data Mining of Collaborative Bookmark-sharing System

Introduction

Recent year there are online bookmark system on the Internet. User, namely authors, can share their web page bookmark from the centralized portal like Del.icio.us and these webs are categorized by tagging, which means adding keywords to Internet resources without relying on a controlled vocabulary. Therefore information can be indexed and users can access the categorized web page easily base on tags. This process is called the Collaborative Tagging which “describes the process by which many users add metadata in the form of keywords to shared content” (Golder S.A. and Huberman B.A, 2005).

Related Work

Some data mining technique involved in the Collaborative Tagging technology like searching different kinds of web page in the databases, interactive mining based on the user’s preferences, incorporation of background knowledge that integrate the background knowledge in the categorized information in order to find the interestingness of data. “Data mining system can uncover thousands of [interesting] patterns.” (Han J. and Kamber M., 2001). Since the content in the Internet is huge and still growing, it is difficult to conduct the traditional text document control by the traditional data mining method. In addition, the web is “highly dynamic information source … its information is also constantly updated” (Han J. and Kamber M., 2001). Take search engine as an example, data is categorized by the authorized party like the authors or database administrator cannot categorize all the web page from time to time in order to cope with the rapid growing of the web data. Han and Kamber point out that topic of any breadth may contain hundreds of thousand of documents and this can lead to huge number of document entries returned and only small part of documents are relevant. On the other hand, many documents that are high relevant to a topic usually does not contain keywords to defining them. Collaborative Tagging can solve some of the problems since the useful data is categorized by both authors and users in the community. Only useful data are filtered and returned from the system base on the searching relevant tags. Other users can use these categorized and “cleaned” data for searching based on the tags. Sadly, only common tags can return the most relevant web data and usually tag only serve for personal purpose and common tags is hardly implemented.

Tagging can eliminate the word inflections if there is a lemmatization engine, which determines the lemma for a given word, could trigger in the input stage. “Folksonomies (Collaborative Tagging) are characterized by flaws that formal classification systems are designed to eliminate including polysemy and synonyms.“(Wikipedia, 2006). Many data collecting system use tagging to classify the data because people believe that user involvement on the categorization is more accurate and human-readable than the artificial classifier base on the statistics. “Tagging system has the potential to improve search, spam detection, reputation systems and personal organization while introducing new modalities of social communication and opportunities for data mining.”(Marlow C., Naaman M., Boyd D. and Davis M., 2006). However, if certain degree of regulation used in the traditional data mining technique involves in filtering the tagged data can produce more accurate results than simply return all “unclear” tagged information that made by the authors. One of the data mining technique is data cleansing process which can remove the “low-quality, redundant or nonsense metadata, and the potential risks of tidying too nearly and thereby losing very openness that has made folksonomies so popular” (Guy M., Tonkin E., 2006). This project is trying to combine freedom on choosing right tag to categorize the web data and retrieve the useful information by using the data cleansing method in order to create some degree of relationships between the content provider (authors) and the users. Although the tagging system cannot replace for formal system like the search engine but this project can improve the accuracy and relevancy of the returned information and “[treat] this as ring the core quality that makes folksonomy tagging so useful.” (Guy M., Tonkin E., 2006).


References:

Cameron M, M. Naaman, D. Boyd and M. Davis. (2006). HT06, tagging paper, taxonomy, Flickr, academic article, to read. Proceedings of the seventeenth conference on Hypertext and hypermedia. 31-40.

Folksonomy – Wikipedia. [Online]. Wikipedia. Available: http://en.wikipedia.org/Folksonomy [2006, Nov. 8]

Guy M. and T. Emma. (2006, January). Folksonomies: Tidying up Tags?[Online]. D-Lib Magazine 12. Available: http://www.dlib.org/dlib/january06/guy/01guy.html

Jiawei. H., and M. Kamber. (2001) Data Mining: Concepts and Techniques. San Diego, C.A.: Academic Press.

Scott A.G. and B.A. Huberman. (2006). Usage Patterns of Collaborative Tagging Systems. Journal of Information Science,32(2). 198-208