# Machine Learning (Learning from Data)

2014 - 2020

## Overview

This is an introductory module in machine learning(ML) that covers basic concepts, algorithms, and applications. ML is the science of dealing with data and somehow mimics the learning process of encountering data. The different parts of the module are arranged in a way to explain the mathematical concepts in a step-by-step as well as a practical process.

It is a exciting journey that starts with supervised learning and regression, then goes through classification and finally enters the world of unsupervised learning.

## Syllabus

- The Learning Problem
- Introduction, Is learning feasible
- Supervised, Unsupervised and Reinforcement Learning

- Linear Regression
- data, model
- simple linear regression model
- Multiple Regression
- Batch gradient descent vs gradient descent
- probability prespective

- Assessing Performance
- Theory of Generalization, Training error, Generalization error, Test error
- Overfitting, Learning Curve
- Bias and Bariance
- Noise
- Training/validation/test split, Cross Validation

- Regularization
- Curse of Dimension
- L2 Regularization
- Feature Selection, Forward and Backward Stepwise Algorithm
- Lasso, Coordinate descent

- Kernel Regression
- Parametric and Non-parametric Approaches
- Distance metrics, Nearest neighbor algorithm
- weighted k-NN to kernel regression
- k-NN for classification

- Support Vector Machines
- Hard margin and soft margin
- convex optimization, Lagrangian, primal problem and the dual problem
- Sub-gradient descent
- RBF Kernel

- Linear Classifiers
- Decision boundaries
- Overfitting & Regularization in Logistic Regression
- Classification metrics, Precision-Recall

- Artificial Neural Network
- definition of NN
- FeedForward Neural Network and backpropagation
- Multi-Layer Perceptron
- Recurrent NN
- ConvolutionL Filter
- Autoencoder

- Decision Trees
- categorical inputs
- Recursive greedy algorithm
- Entropy
- Overfitting in Decision Trees
- Principle of Occamâ€™s razor
- Ensemble classifiers and Boosting
- Random Forest
- AdaBoost
- XGBoost

- Handling Missing Data

- Retrival
- Nearest Neighbor Search
- Distance metrics, Euclidean and scaled Euclidean
- KD-tree representation, LSH

- Clustering
- k-means, k-means++
- MapReduce
- Mixture Models, Gaussian distributions, EM
- Topic Modelling, Latent Dirichlet Allocation
- Hierarchical Clustering