Sunday, February 11, 2007

Code under Free Daa Mining Course Code

Ref: http://www.kdkeys.net/forums/thread/2043.aspx

This is a release implemented with C# for the .NET Framework. It is an old release (2001 version).

DISCUSS THIS RELEASE

Please post all questions and discussions about this release here: http://www.kdkeys.net/forums/thread/1803.aspx





APRIORI ALGORITHM

--------------------------------------------------------------------------------

The APriori Data Mining Algorithm is used to create association rules from sets of items.

The algorithm finds patterns of items that are frequently associated together.



--------------------------------------------------------------------------------

Introduction :

A set of items is defined as an itemset and represents items found in a dataset. An example of a set of items is {Computers, Books, Videos, DVDs, Games}.

The datasets can be obtained from real world databases representing shopping carts, retail transactions, data warehouses, sales, orders and purchases database tables or created artificially.



This form of data mining is known as association rule data mining. Association rule data mining discovers associations among items in a dataset.

An example of association rule data mining is market based analysis. Market based analysis is a form of association rule data mining that finds associations between items that different customers purchased during their visits to a sales outlet.



The kind of associations discovered during market based analysis association rule data mining show a pattern of items that customers tend to buy together. E.g. During a market based analysis of a shopping basket one may discover an association between computer books and CDs showing perhaps that customers that buy computer books tend to buy CDs. This can lead to a strategic placement of CDs and computer books so that more CDs will be sold when computer books are purchased.

If customers tend to purchase computer books and CDs together then having a sale on books can lift and increase the sale of computers.





--------------------------------------------------------------------------------

APriori Algorithm :

The APriori algorithm is used to analyze a list of transactions for items that are frequently purchased together. Considering a transaction where the sale of software is increased by the sale of e-books, Support and Confidence are two measures used to describe market based analysis association rules created with an APriori algorithm.

E.g. a Support measure of 1% and a Confidence measure of 50% means that 1% of transactions analyzed contain purchases of e-books and software and 50% of customers who bought an e-book also bought a software.

A set of items is known as an itemset. An itemset which contains k items is known as a k-itemset. E.g. a set of items {Books, CD, DVD, Video} is a 4-itemset.

The number of transactiobns that contain an itemset is known as the Frequency or Support Count of the itemset. If the number of transactions containing an itemset satisfies the minimum support count specified then the itemset is known as a Frequent Itemset.

E.g. the 2-itemset {Books, DVD} has a support count of 5 in the database of transactions below.



The database below contains 9 transactions. Find the support count and confidence for the the 2-itemset {Books, DVD}.

Using the market based analysis apriori algorithm ceate an assocation data mining rule between {Books} and {DVD}.

Firstly the number of transactions that contain the 2-itemset {Books, DVD} is 5. The number of transactions containing the itemset {Books} is 6.

Consequently the support for the 2-itemset {Books, DVD} is (5/9) * (100%) = 55.6%

The confidence for the 2-itemset {Books, DVD} is = (Support Count({Books, DVD}) / Support Count({Books}) * (100%) .

Consequently the confidence for the 2-itemset {Books, DVD} is = ((5/6) * 100%) = 83.3%

Transaction 1: {Books, CD, DVD}

Transaction 2: {CD, Games}

Transaction 3: {CD, DVD}

Transaction 4: {Books, CD, Games}

Transaction 5: {Books, DVD}

Transaction 6: {CD, DVD}

Transaction 7: {Books, DVD}

Transaction 8: {Books, CD, DVD, Video}

Transaction 9: {Books, CD, DVD}





The APriori Algorithm basically finds the support count and confidence of itemsets eliminating those itemsets that do not meet a minimum support count and confidence measure from a final list of rules created.

Considering the list of transactions above, the algorithm will perform the following steps for a minimum support count of 3 :

The APriori algorithm creates a list of unique items in a 1-itemset Candidate Itemset corresponding to {Books, CD, DVD, Games, Video}

The support count of each item in the list above is obtained and any item that does not satisfy the minimum support count is eliminated from further analysis creating a 1-itemset Frequent Itemset

The 1-itemset frequent itemset is joined with itself to create a 2-itemset candidate itemset

The steps taken for the 1-itemset candidate itemset is repeated for the 2-itemset candidate itemset

The steps above are repeated until a frequent itemset is empty and no new candidate itemsets can be generated

A confidence measure is created for each rule generated from the frequent itemsets.





powered by performancing firefox

No comments: