Victor's Final Year Project: January 2007

Wednesday, January 31, 2007

Meeting with supervisor

After meeting with supervisor, Dr Yuen, Geoffrey and Wah shows their view points on my project system.

Firstly, Wah ask what if the URL share among the users are identical. For example, if User A save hk.yahoo.com as a Portal site and User B save hk.yahoo.com as a search engine, the URL will be duplicated in the database. Well, this is not a problem becuase, say in the YouTube system, when we search "Frilled Shark", more than 4 same vedio chips will be returned and, it indicated that the searching result is accurate. Dr Yuen suggested that one of the development approach in my stage is that, I can enhance the accuracy on the searching data among the various items in the system.

Afterward, Dr Yuen asked the selling point of my project, well I can hardly tell the different, I response by saying "Well, by comparing the one man work, I do better than them as a team :p" Seriously, the furture mile stoneof my project probably support the different character, espeacially the Asian fonts because, for example, the FURL.net have problem on saving the Traditional Chinese character, I think I will/need to trigger this problem.

Dr Yuen also suggested me to integrate the two algorithms into one and implementation this 'new' algorithm on doing associate analysis. These two algorithm, Apriori Algorithm and FP-Tree, have different approach.
Apriori Algorithm is base on counting the itemsets' frequency. If the frequent of itemset is low, elimnate that item(s) and the remaining enter the next stage. Finally, the resulting item(s) is/are the most likely be related among the other tags.
FP-Tree,on the other hand, is based on classifying the frequency of the tags used into the hush table. The most frequent tags will get the higher position in the hush table and lower frequent items list under them. Under this method, the completeness of the association pattern maintains, and the computational power required is lower than Apriori method.

Lastly, Dr Yuen suggested that the system should utliminately make LESS involvement, like 1. totally no input of tags by user and 2. eliminate the unneccessary user interaction on the site but the newly added data can embed to the system automatically. I think the first one can be achieved by combining the AJAX and Association Pattern suggestion like the one did by Del.icio.us. I think this function will be implement in Phase 3

Progress log in Janurary

The project has entered the major mile stone and the link suggestion and searching engine using in the site are building. In Jan., beside finish writing the User Interface of the site like user registration, buddy list update, adding the new link and adding new link throw the Bookmarklet have been finished.
In Feb., I will focus on developing the Association Analysis which is used for frequent pattern mining. After finishing this functional implementation, the project will be almost done and enter the testing phases.

The Alpha version of my project can be accessed through this link:
http://144.214.121.62/BSMS/Default.aspx

The latest changes and update of my progress can be checked via the follow link:
http://cslabvictor.blogspot.com/

Saturday, January 27, 2007

Bookmarket tool

http://www.bookmarklets.com/tools/categor.html

Work on the bookmark system like remove the JavaScript in the viewing pages, send the searching value into the search engine etc.

Friday, January 26, 2007

ASP.NETWeb: The Official Microsoft ASP.NET 2.0 Site: Videox

http://www.asp.net/learn/videos/default.aspx?tabid=63#ajax

Videos for ASP.NET 2.0 beginner and AJAX developer

http://blogs.msdn.com/mattgi/archive/2007/01/23/asp-net-ajax-validators.aspx

Known issue after installing the AJAX v1.0 extension for ASP.NET 2.0

ASP.NET AJAX Validators

ASP.NET
AJAX provides new APIs for registering script with the ScriptManager.
Using these APIs allows controls to work well with partial rendering.
Without them, controls placed inside an UpdatePanel won't work as
expected. In previous CTP releases of ASP.NET AJAX, we had a set of
validator controls that derived from the v2.0 controls and used the new
APIs. This made them work well with ASP.NET AJAX. WindowsUpdate will
soon include a version of System.Web that can take advantage of the new
APIs. So the new controls which would have been redundant have been
removed. However, the update isn't available yet and ASP.NET AJAX has
been released. So, in the short-term, the source code for a set of
custom validator controls that work with partial rendering is available
here.

Sunday, January 21, 2007

Web mining for web personalization

Web mining for web personalization
Magdlini Eirinaki and Michalis Vazirgiannis
Athens University of Economics and Business

ACM Transaction on Internet Technology, Vol. 3., No 1, Feb., 2003, Page 1 - 27

Introduction

Web personalization is defined as any action that adapts the information or services provided by a Web site to the needs of a particular user or a set of users, taking advantage of the knowledge gained from the user’s navigational behavior and individual interests, in combination with the content and the structure of the web site.

Objective:

The objective of a web personalization system is to provide users with the information they want or need, without expecting from them to ask for it explicitly.

The content management is the process classifying the content of a web site in semantic categories in order to make information retrieval and presentation easier for the users. Content management is very important for web sites whose content is increasing on a daily basis, such as news sites or portals.

Web personalization

… the analysis of the collected data, and the determination of the actions that should be performed. The ways that are employed in order to analyze the collected data include content-based filtering, collaborative filtering, rule-based filtering and Web usage mining.

Content-based filtering systems: are solely based on individual users’ preferences. The system tracks each user’s behavior and recommends items to them that are similar to items the user liked in the past.

Collaborative filtering systems invite users to rate objects or divulge their preferences and interests to them. This is based on the assumption that users with similar behavior have analogous interests.

The data mining methods that are employed are: association rule mining, sequential pattern discovery, clustering and classification. This knowledge is then used from the system in order to personalize the site according to each user’s behavior and profile.

User profiling

In order to personalize a web site the system should be able to distinguish between different users or groups of users. This process is called user profiling and its objective is the creation of an information base that contain the preferences, characteristics, and activities of the users.

Log analysis and web usage mining:

By applying statistical and data mining methods to the web log data, interesting patterns concerning the user’s navigational behavior can be identified, such as users and page clustering, as well as possible correlations between web pages and user groups.

The web usage mining process can be regards as a three-phase process, consisting of the data presentation, pattern discovery, and pattern analysis phases. In the first phase, log data are preprocessed in order to identify users’ session, page views and so on. In the second phases, statistical methods, as well as data mining methods (such as association rules, sequential pattern discovery, clustering and classification are applied in order to detect interesting patterns.

Most important of all is the user identification issue. More accurate approaches for a priori identification of unique visitors are the use of cookies or similar mechanisms of the requirement for be the reluctance of users to share personal information.

Web usage mining

More advanced data mining methods and algorithms tailored appropriately are use in the Web domain include association rules, sequential pattern discovery, clustering and classification. Association rule mining is used in order to reveal correlations between pages accessed together during a server session. It can reveal association between groups of users with specific interests.

Sequential pattern discovery is an extension of association rules mining in that it reveals pattern of concurrence incorporating the notion of time sequence. Clustering is used to group together items that have similar characteristics. In the context of web mining, we can distinguish two cases, user clusters and page clusters.

Page clustering identifies group of pages that seem to be conceptually related according to the user’s perception. User clustering results in group of users that seems to behave similarly when navigating through a Web site.

Classification is a process that maps a data item into one of several predetermined classes. In web domain classes usually represent different user profiles and classification is performed using selected features that describe each user’s category. The most common classification algorithm are decision trees, Naïve Bayesian classifier, neural networks, and so on.

After discovering patterns from usage data, a further analysis has to be conducted. The exact methodology that should be followed depends on the technique previously used. The most common ways of analyzing such patterns are either by using a query mechanism on a database where the results are stored, or by loading the results into a data cube and then performing OLAP operations. Additionally, visualization technique are used fir an easier interpretation of formation convening the web site there can be extracted useful knowledge for modifying the site according to the correlation between user and content groups.

Research initiatives

Most of the efforts focus on extracting useful patterns and rules using data mining techniques in order to understand the users’ navigational behavior, so that decision concerning site restructuring or modification can then be made by humans. In several cases, a recommendation engine helps the user navigates through a site.

A different approaches is adopted by Zaiane et al. the authors combine the OLAP and data mining techniques and a multidimensional data cube, to extract interactively implicit knowledge. Their webLogMiner system after filtering the data contained in the web log, transforms them into a relational database. In the next phase a data cube is built, each dimension representing a field with all possible values described by attributes. OLAP technology s then used in combination with data mining techniques for prediction, classification and time-series analysis of web log data.

Pattern discovery is accomplished through the use of general statistics algorithms and data mining techniques such as association rules, sequential pattern analysis, clustering and classification. the result then analyzed through a simple knowledge query mechanism, a visualization tool, or the information filter, that makes use of the preprocessed content, and structure information to automatically filter the results of the knowledge discovery algorithms.

Saturday, January 13, 2007

Function that like StringTokenizer in C#

Function that like StringTokenizer in C#

Question: 我有一个字符串,我想以空格把里面的单词分开,以前在Java中有StringTokenizer很方便,不知C#中有没有类似的.

using System;

using System.Text.RegularExpressions;

class SplitTest

{

public static void Main()

{

String s = "Hello, Houston,I,am,coming";

Console.WriteLine("First example:");

Console.WriteLine("Original stirng:" + s);

Console.WriteLine("After splitting:");

String[] tokens = Regex.Split(s,",");

for (int i=0; i tokens.Length; i++)

Console.WriteLine(tokens[i]);

s = "Hello | Houston | I | am | coming";

Console.WriteLine("Second example:");

Console.WriteLine("Original stirng:" + s);

Console.WriteLine("After splitting:");

tokens = Regex.Split(s,@"\s*\|\s*");

for (int i=0; i tokens.Length; i++)

Console.WriteLine(tokens[i]);

}

}

Thursday, January 11, 2007

get RETURN VALUE from SYBASE stored procedure [Also apply to MSSQL2000]

http://forums.asp.net/thread/1333707.aspx

I can handle the returned value base on the storedProcedure conduct in MSSQL2000 with the following code:
ASP.NET
try
{
storedProcCommand.CommandType = CommandType.StoredProcedure;
storedProcCommand.Parameters.Add("@userid", Session["UserName"].ToString());
storedProcCommand.Parameters.Add("@buddyid", tbNewBuddy.Text);
cn.Open();

if (Convert.ToString(storedProcCommand.ExecuteScalar()) == "Success")
{
lbDisplay.Text = "New buddy added. / />
}
else
{
lbDisplay.Text = "Sorry, there is no such buddy. / />
}
}// end try
catch (Exception ex)
{
lbDisplay.Text = ex.ToString() + @" / />
}
finally
{
cn.Close();
}

Stored Procedure in SQL2000
CREATE PROCEDURE userBuddyAdd
@userid nvarchar(50),
@buddyid nvarchar(50)

With Recompile
AS
Declare
@intCheckBuddyExist int,
@intCheckAdd int,
@intSameUserId int
-- Check if buddy exist in the userinfo
Select @intCheckBuddyExist = count(*) from userinfo where userid = @buddyid

-- Check if the buddy id is user id
Select @intSameUserId = count(*) from userbuddy where userid = @userid and userid = @buddyid

-- Check if buddy already added
Select @intCheckAdd = count(*) from userbuddy where userid = @userid and buddyid = @buddyid

-- Add Buddy only if not been added
if @intCheckBuddyExist = 1 and @intCheckAdd = 0 and @intSameUserId = 0
begin
INSERT INTO userbuddy VALUES(@userid,@buddyid)
if @@Error = 0 Goto SUCCESS
end

-- Return addition fail if the buddy does not exist in the list
if @intCheckBuddyExist = 0
Goto PROBLEM
Return 0

PROBLEM:
Select 'Fail'
Return 1

SUCCESS:
Select 'Success'
Return 2

GO

Monday, January 08, 2007

http://support.microsoft.com/default.aspx/kb/925336

Error message when you try to install a large Windows Installer package or a large Windows Installer patch package in Windows Server 2003 or in Windows XP: "Error 1718. File was rejected by digital signature policy"

View products that this article applies to.
Article ID : 925336
Last Review : October 3, 2006
Revision : 1.1
SYMPTOMS
When you try to install a large Microsoft Windows Installer (.msi) package or a large Microsoft Windows Installer patch (.msp) package on a computer that is running Microsoft Windows Server 2003 or Microsoft Windows XP, you receive the following error message:
Error 1718. File FileName was rejected by digital signature policy.
Back to the top Back to the top
CAUSE
This problem occurs when the computer has insufficient contiguous memory for Windows Server 2003 or Windows XP to verify that the .msi package or the .msp package is correctly signed.
Back to the top Back to the top
WORKAROUND
To work around this problem, follow these steps:
1. Click Start, click Run, type control admintools, and then click OK.
2. Double-click Local Security Policy.
3. Click Software Restriction Policies.

Note If no software restrictions are listed, right-click Software Restriction Policies, and then click Create New Policy.
4. Under Object Type, double-click Enforcement.
5. Click All users except local administrators, and then click OK.
6. Restart the computer.
Important After you follow the previous steps, local administrators can install the .msi package or the .msp package. After the package is installed, reset the enforcement level by following the previous steps. In step 5, click All users instead of All users except local administrators.

Seessment result on Interim report

You got B grade for your interim report. Here is the comment from your supervisor:

------------------------------

The
related work section is good and the report is well-organized.
there are some minor mistakes on the formatting and the
writting, the student has done a reasonbly good work

Friday, January 05, 2007

Finished building of user registeration, add new item and modify user settings

In the past 10 days, the user registeration has been finished and tested. On the other hand, user can add other buddy that has been registered. User now can add itme under their account and they can add other buddy by inputting their userid. Once buddy inserted a new item, user can view that itme under the buddy's latest update page.

User now can change their setting like setting the default topic as the last-used, search the information based on the tags, or keyword or use both of them.

The next major implementation will be building the searching function which I think is the most difficult part of the project. I wish I can finish the basic function, namely phase one of the searching function like return the related item base on the tag that user input in the search field first. After then, it will be improved by adding function like searching the alike information based on user's preferences and user's tags trendency, better result should be return in phase two.
In my progress, I expect phase one can be finished by 15th January, 2007

Victor's Final Year Project