Saturday, November 11, 2006

Draft:The Data Mining of Collaborative Bookmark-sharing System

The Data Mining of Collaborative Bookmark-sharing System

Introduction

Recent year there are online bookmark system on the Internet. User, namely authors, can share their web page bookmark from the centralized portal like Del.icio.us and these webs are categorized by tagging, which means adding keywords to Internet resources without relying on a controlled vocabulary. Therefore information can be indexed and users can access the categorized web page easily base on tags. This process is called the Collaborative Tagging which “describes the process by which many users add metadata in the form of keywords to shared content” (Golder S.A. and Huberman B.A, 2005).

Related Work

Some data mining technique involved in the Collaborative Tagging technology like searching different kinds of web page in the databases, interactive mining based on the user’s preferences, incorporation of background knowledge that integrate the background knowledge in the categorized information in order to find the interestingness of data. “Data mining system can uncover thousands of [interesting] patterns.” (Han J. and Kamber M., 2001). Since the content in the Internet is huge and still growing, it is difficult to conduct the traditional text document control by the traditional data mining method. In addition, the web is “highly dynamic information source … its information is also constantly updated” (Han J. and Kamber M., 2001). Take search engine as an example, data is categorized by the authorized party like the authors or database administrator cannot categorize all the web page from time to time in order to cope with the rapid growing of the web data. Han and Kamber point out that topic of any breadth may contain hundreds of thousand of documents and this can lead to huge number of document entries returned and only small part of documents are relevant. On the other hand, many documents that are high relevant to a topic usually does not contain keywords to defining them. Collaborative Tagging can solve some of the problems since the useful data is categorized by both authors and users in the community. Only useful data are filtered and returned from the system base on the searching relevant tags. Other users can use these categorized and “cleaned” data for searching based on the tags. Sadly, only common tags can return the most relevant web data and usually tag only serve for personal purpose and common tags is hardly implemented.

Tagging can eliminate the word inflections if there is a lemmatization engine, which determines the lemma for a given word, could trigger in the input stage. “Folksonomies (Collaborative Tagging) are characterized by flaws that formal classification systems are designed to eliminate including polysemy and synonyms.“(Wikipedia, 2006). Many data collecting system use tagging to classify the data because people believe that user involvement on the categorization is more accurate and human-readable than the artificial classifier base on the statistics. “Tagging system has the potential to improve search, spam detection, reputation systems and personal organization while introducing new modalities of social communication and opportunities for data mining.”(Marlow C., Naaman M., Boyd D. and Davis M., 2006). However, if certain degree of regulation used in the traditional data mining technique involves in filtering the tagged data can produce more accurate results than simply return all “unclear” tagged information that made by the authors. One of the data mining technique is data cleansing process which can remove the “low-quality, redundant or nonsense metadata, and the potential risks of tidying too nearly and thereby losing very openness that has made folksonomies so popular” (Guy M., Tonkin E., 2006). This project is trying to combine freedom on choosing right tag to categorize the web data and retrieve the useful information by using the data cleansing method in order to create some degree of relationships between the content provider (authors) and the users. Although the tagging system cannot replace for formal system like the search engine but this project can improve the accuracy and relevancy of the returned information and “[treat] this as ring the core quality that makes folksonomy tagging so useful.” (Guy M., Tonkin E., 2006).


References:

Cameron M, M. Naaman, D. Boyd and M. Davis. (2006). HT06, tagging paper, taxonomy, Flickr, academic article, to read. Proceedings of the seventeenth conference on Hypertext and hypermedia. 31-40.

Folksonomy – Wikipedia. [Online]. Wikipedia. Available: http://en.wikipedia.org/Folksonomy [2006, Nov. 8]

Guy M. and T. Emma. (2006, January). Folksonomies: Tidying up Tags?[Online]. D-Lib Magazine 12. Available: http://www.dlib.org/dlib/january06/guy/01guy.html

Jiawei. H., and M. Kamber. (2001) Data Mining: Concepts and Techniques. San Diego, C.A.: Academic Press.

Scott A.G. and B.A. Huberman. (2006). Usage Patterns of Collaborative Tagging Systems. Journal of Information Science,32(2). 198-208

No comments: