Assignments


 * Project # 1 **
 * (Due Date: 2 AM on November 20, 2013) **

You need to develop a predictive model for the given data. The data set is about credit card fraud prediction. The data set was originally posted in PAKDD2013 for a data mining competition. The data is in CSV format. There are two files: training and testing. The training file has 500,000 records while the testing file has 262,966 records.

[|Download Files]

Before applying all the classification techniques you have learned in this course, you need to prepare the data first. This includes excluding (or transforming) those features having extremely higher percentage of missing values, handling of missing values, discretization of certain attributes, feature reduction, etc.

The benchmark should be the F-Measure and ROC values for "1" in the "Target_Label" column.

You should mainly perform your analysis in KNIME but feel free to take advantage of other tools (be it Weka, Excel, etc.). You might be interested in knowing that KNIME supports connectivity with a database (such as MySQL, SQL Server, etc.). You may want to utilize that feature for certain operations (such as columns or rows removal or updation).

You need to present your detailed findings in the class.


 * Assignment # 1 **
 * (Due Date: October 30) **

You need to present a case study/white paper/research paper describing an application of data mining. You will have 10-15 minutes to present your findings in the class. Considering the fact that we haven't touched unsupervised learning so far, it is suggested that you pick only classification related applications involving either Classification Tree, Naive Bayes or Neural Networks.