CIS 788:
Data Mining and Knowledge Discovery: Techniques, Systems Support and
Applications
Spring
2009
Instructor: Srinivasan
Parthasarathy
Office:
DL 693
Phone:
292-2568
Email:
srini@cse.ohio-state.edu
Class Hours: TR 10:30-12:30
Office Hours: W 9-11 or by appointment
Introduction
With the unprecedented rate at which data is being collected today in almost
all fields of human endeavor, there is an emerging economic and scientific need
to extract useful information from it. Data mining is the process of automatic
discovery of patterns, changes, associations, sequences and anomalies in
massive databases. This research seminar will survey the main topics in data
mining and knowledge discovery. Topics will be from among: mining
techniques:-classification, clustering, association rules, sequence similarity
etc.; high performance implementation issues:-parallel/distributed data mining,
active resource-aware data mining; and application domains such as web mining,
scientific simulations, e-commerce and bioinformatics.
Prerequisites
There are no pre-requisites as such. However, it is desired that students will
have had experience with at least one of the following courses: database
systems (CIS 670), statistics (500/600 level course) parallel computing (CIS
720). Tha ability to program in C/C++, and/or with statistical programming
packages, and work on quarter-long (team/individual) projects is expected.
Reference Text
· Introduction to Data Mining,
Tan, Steinbach and Kumar, Addison Wesley, 2006
· Data Mining: Concepts and Techniques, J. Han & M. Kamber, Morgan Kaufmann, 2006.
Class Format and Requirements
The class will be a mix of student presentations, paper discussions and a
research-oriented project. Generally, one of you will introduce a topic, and
then we'll discuss some of the latest work on that topic. You will have to
explain and defend what the paper says, as well as present weaknesses and
shortcomings as you see fit. The rest of the class will be expected to
contribute to the discussion as well, and there will be some points assigned
for class participation. Ideally, criticisms should be constructive in nature,
including the identification of alleviating solutions . Once a paper has been
discussed in class you will be expected to compile an annotated bibliography
covering all the papers discussed during the quarter and submit this to me by
the end of the quarter. The best time to compile this is to do it as soon as
possible after the discussion in class. That is when you will have all the
points covered in class. A general list of
introductory papers is available, but the papers we may choose to discuss may
lie outside this range of papers. Presentation order for
the first few weeks will be available soon . I have specifically picked on
some old students for the first few presentations so that the students who are
new to this form of course can get an idea of what to expect. Feedback
forms can be downloaded here . Each of you will
be expected to focus on a research-oriented project. The
research component is stressed as is evidenced by the fact that many of the
projects started by students taking this course over the year have resulted in publications
in prestigious conferences and workshops. The projects you do may be in groups
of two or individual in nature (if group the tasks will be non-overlapping). A
list of project topics will be discussed individually with each of you based on
your interests during the first week of class along with relevant references. I
will try to meet with each one of you during the first two weeks to help
determine projects. The project is expected to culminate in a presentation
during the last week of class, and also a report on the experimental results
obtained.
The final grade will be determined as follows:
25% Class Participation and Presentations
50% Project
25% Annotated Bibliography
Last Updated: April 2009