CSE 5249:
Data Analytics Seminar: Spring 2017
Instructor: Srinivasan Parthasarathy
Office:
DL 693
Phone:
292-2568
Email:
srini@cse.ohio-state.edu
Class Hours: TR 11:30-12:30
Office Hours: W by appointment
Introduction
With the unprecedented rate at which data is being collected today in almost
all fields of human endeavor, there is an emerging economic and scientific need
to extract useful information from it. Data mining is the process of automatic
discovery of patterns, changes, associations, sequences and anomalies in
massive databases. This research seminar will survey the main topics in data
mining and knowledge discovery as they relate to data stored in the forms of
graphs and networks. Topics will be from among: mining structured and
semi-structured data, adaptation of classical techniques:-classification,
clustering, association rules, sequence similarity for such problems; high performance
implementation issues:-parallel/distributed data mining; visualization of
network and graph data and application domains such as web mining, scientific
simulations, e-commerce and bioinformatics.
Prerequisites
There are no pre-requisites as such. However, it is desired that students will
have had experience with at least one of the following courses: database
systems (CIS 670), statistics (500/600 level course) parallel computing (CIS
720). The ability to program in C/C++, and/or with statistical programming
packages, and work on semester-long (team/individual) projects is expected.
Reference Text
· Not Applicable
(will focus this semester on paper readings)
Class Format and Requirements
The class will be a mix of student presentations,
paper discussions and a research-oriented project. Generally, one of you will
introduce a topic, and then we'll discuss some of the latest work on that
topic. You will have to explain and defend what the paper says, as well as
present weaknesses and shortcomings as you see fit. The rest of the class will
be expected to contribute to the discussion as well, and there will be some
points assigned for class participation. Ideally, criticisms should be
constructive in nature, including the identification of alleviating solutions.
Once a paper has been discussed in class you will be expected to compile an
annotated bibliography covering all the papers discussed during the quarter and
submit this to me by the end of the quarter. The best time to compile this is
to do it as soon as possible after the discussion in class. That is when you
will have all the points covered in class. Presentation order for the first few weeks is now available . I have specifically picked on
some old students for the first few presentations so that the students who are
new to this form of course can get an idea of what to expect. Feedback
forms can be downloaded here .
A sample critique (very extensive – I will not hold you to such a high standard)
is available here. Each of you will be expected to
focus on a research-oriented project. The research component
is stressed as is evidenced by the fact that many of the projects started by
students taking this course over the year have resulted in publications in
prestigious conferences and workshops. The projects you do may be in groups of two
or individual in nature (if group the tasks will be non-overlapping). A list of
project topics will be discussed individually with each of you based on your
interests during the first week of class along with relevant references. I will
try to meet with each one of you during the first two weeks to help determine
projects. The project is expected to culminate in a presentation during the
last week of class, and also a report on the experimental results obtained.
The final grade will be determined as follows:
25% Class Participation and Presentations
50% Project
25% Annotated Bibliography
Last Updated: Jan 2017