Data Mining
2014 - 2020
Overview
In many definitions, data mining is the science of data analysis in order to discover existing patterns and trends within data in order to help decision-making and make data useful. This is despite the fact that nowadays we are often faced with big data, the analysis of which requires managing memory and time.
In machine learning, the focus is on learning models, while in this module, data and how to make data useful are the main focus. Therefore, we classify some of the popular big data structures and their related issues and then deliver the methods of solving them.
Syllabus
- Data mining definition
- Overlap with Databases and Machine learning
- Descriptive methods and Predictive methods
- Documents Retrival
- distance measures, Jaccard similarity, cosine similarity, TF-IDF
- Shingling, Min-Hashing, Locality-Sensitive Hashing
- Locality-Sensitive Hashing
- Clustring
- Hierarchical Clustring
- k-means
- BFR, CURE
- Link Analysis
- Graph data
- Ranking nodes
- Page rank
- Association rules mining
- Market-basket model
- frequent itemsets
- A-Priori algorithm
- PCY algorithm
- Recommendation Systems
- Content-Based RS
- Collaborative filtering
- Hybrid Methods
- Mining Social-Network Graphs
- Girvan-Newman, Modularity, Betweenness
- Spectral Clustring
- AGM, BigCLAM
- Large-Scale Machine Learning
- Supervised Learning
- k-Nearest Neighbor
- Perceptron
- SVM
- DT
- Generative Models
- variational Autoencoder
- GAN