Data Mining

2014 - 2020

Overview

In many definitions, data mining is the science of data analysis in order to discover existing patterns and trends within data in order to help decision-making and make data useful. This is despite the fact that nowadays we are often faced with big data, the analysis of which requires managing memory and time.
In machine learning, the focus is on learning models, while in this module, data and how to make data useful are the main focus. Therefore, we classify some of the popular big data structures and their related issues and then deliver the methods of solving them.

Syllabus

Data mining definition
- Overlap with Databases and Machine learning
- Descriptive methods and Predictive methods
Documents Retrival
- distance measures, Jaccard similarity, cosine similarity, TF-IDF
- Shingling, Min-Hashing, Locality-Sensitive Hashing
- Locality-Sensitive Hashing
Clustring
- Hierarchical Clustring
- k-means
- BFR, CURE
Link Analysis
- Graph data
- Ranking nodes
- Page rank
Association rules mining
- Market-basket model
- frequent itemsets
- A-Priori algorithm
- PCY algorithm
Recommendation Systems
- Content-Based RS
- Collaborative filtering
- Hybrid Methods
Mining Social-Network Graphs
- Girvan-Newman, Modularity, Betweenness
- Spectral Clustring
- AGM, BigCLAM
Large-Scale Machine Learning
- Supervised Learning
- k-Nearest Neighbor
- Perceptron
- SVM
- DT
Generative Models
- variational Autoencoder
- GAN