CSE 640: Graph Mining and Management (Fall 2020)

Instructor: A. Erdem Sariyuce (erdem AT buffalo.edu)
Class hours: MW 3:30-4:50, Park 146
Office hours: TBA, Davis 323

Course Description

Graphs are a universal language to deal with the complex data in science, nature, and technology. Scale and complexity of real-world networks (graphs) pose unprecedented challenges in the area of computer science. This course introduces several important problems and algorithms being used in many applications where the input data is a graph. Students will learn the challenges and problems in large-scale algorithmic graph mining, work on managing the graph data by handling the irregular computations that do not fit to traditional CPU architectures, be familiar with the recent advances in the area, and get hands-on experience with the graph mining and management research via semester-long project.

This course satisfies Master's Project requirement.

This course falls under Artificial Intelligence (AI) focus area and can be counted towards Master's or PhD's course requirements.

Prerequisites

CSE 531: Analysis of Algorithms or the permission of the instructor. Assignments have programming components, hence a fundamental ability to code in C, C++, Java, or Python is expected. Depending on the subject, course project might also require some programming.

Course Materials

No textbooks required

Grading Policy

Homeworks: 40% : 4 * 10% (Four homeworks)
Midterm: 20% (In class)
Project: 40% : 5% (proposal) + 15% (progress)+ 20% (final)



Homeworks

Questions will be about course material and occasionally require data analysis and algorithm design. Non-trivial programming skills are required in some language. Homeworks will be done individually. All assignment are due in a week after the assignment date. All homeworks should be typed and submitted by the beginning of the class on the due date.

Project

Teams of 2 might be allowed, depending on the class size. The content and scope of the project will be decided in consultation with the instructor -- project ideas will be provided. Each team will meet and update the instructor every other week. There will be three milestones that includes reports and presentations (equal weighted); proposal, progress, and final.

- Proposal: Report will be a page long description of the project. It will be introduced with a short (5 mins) in-class presentation.
- Progress: Report can be up to 9 pages and also will be presented in-class (5 mins).
- Final: Report can be up to 9 pages and each team will have a 15-20 mins presentation time in the last week.

Piazza page

Tentative Schedule

  • Week 1: Introduction and Review of Basics.

    General introduction to real-world networks, interdisciplinary network science field and why computer science matters in that context, review on fundamental concepts in graph theory, overview of linear algebra and matrix operations.

  • Week 2: PageRank.

    Link analysis in networks, hubs and authorities, HITS algorithm, degree-driven metrics to determine important nodes and edges, use of PageRank in web and beyond. (Project Proposal Reports due)

  • Week 3: Graph Traversal and Maximum Flow.

    Fundamental and practical algorithms for graph traversal, breadth-first search, depth-first search, strongly connected components, direction-optimizing BFS, maximum flows & minimum cuts. (HW-1 out)

  • Week 4: Shortest Paths and Centrality.

    Single-source shortest paths, all-pairs shortest paths, practical algorithms for Katz, eigenvector, closeness, and betweenness centrality computations, adaptations for weighted graphs.

  • Week 5: Community Detection.

    Graph clustering problem, community definition and detection algorithms, evaluation metrics, modularity, graph conductance, types of communities in real-world networks, overlapping communities.

  • Week 6: Dense Subgraphs and Midterm Exam.

    Densest subgraph problem, dense subgraph models and measures, clique and quasi-cliques, connections to graph clustering and community detection, use of higher-order structures, core, truss, and nucleus decompositions. (HW-2 out)

  • Week 7: Graph Partitioning.

    Definition of graph partitioning, applications in scientific computing and data mining, sparse matrix vector multiplication, Kernighan-Lin algorithm, load balancing, multi-level methods, streaming graph partitioning.

  • Week 8: Network Motifs.

    Subgraph patterns, mesoscale structures, triangles and higher-order structures, motif distributions per node/edge, adaptation for directed networks, motifs on bipartite graphs and limitations, connections to subgraph isomorphism. (Project Progress Reports due)

  • Week 9: Heterogeneous and Non-traditional Networks.

    Directed networks and challenges, graphs with categorical and numerical node/edge labels, bipartite networks and challenges, $k$-partite networks and applications.

  • Week 10: Temporal Graphs I.

    Temporal walks, paths, and reachability, basic graph metrics in temporal graphs such as subgraphs, connectivity, and centrality, models of temporal networks, temporal network motifs. (HW-3 out)

  • Week 11: Temporal Graphs II.

    Streaming models for graph algorithms, graph sketches, incremental methods to maintain graph analytics such as centrality, community detection, pagerank, and $k$-core computation.

  • Week 12: Machine (and Deep) Learning on Graph.

    Representation learning on graphs, applications in downstream ML tasks, embedding nodes, embedding subgraphs, bipartite graph embeddings, graph neural networks.

  • Week 13: Parallel Graph Analytics I.

    Review of parallel computing basics, shared-memory parallel programming in OpenMP and Cilk, shared-memory graph processing frameworks, GPU algorithms and frameworks for graph processing. (HW-4 out)

  • Week 14: Parallel Graph Analytics II.

    Distributed-memory graph processing frameworks, vertex-programming model, specialized distributed graph algorithms (graph coloring, centrality, $k$-core computation), connections to the graph partitioning. (Project Final Reports due)