'Data Scientist' is one of the hottest professions of the 21st century. The Machine Learning using Python is a comprehensive course with a strong focus on skills such as statistics, data mining, data wrangling, data visualization, regression models and clustering, decision trees, linear and logistic regression, OSMEN Methodology. This course aims to build an understanding of concepts like supervised and unsupervised learning models, and more. The program concludes with a study of industry cases. These projects are selected to reinforce the subject by building a real-life predictive model, encompassing all the key aspects learned throughout the program. The skills focused on in this program will help the students be prepared for career opportunities in the field of Machine Learning & Data Science.
- Introduction to Data Science / ML = 2 Hours
- What is Data Science? Roles of Data Scientist, Other Opportunities in this field.
- Understand the differences between Artificial Intelligence, Machine Learning, Deep Learning - their applications and trends;
- Difference between the Statistical and ML Approaches;
- Big Data and misconception about them;
- Data Science Tool Kit = 2 Hours
- Introduction to Data Life cycle; introduction to OSEMN methodology;
- Introduction to working with sklearn using sample data
- OBTAIN AND SCRUB = 3 Hours
- Types of data sources; obtaining the data; data preparation to achieve business objectives;
- Identification of data requirements to solve business problem; the importance of data quality; dealing with duplicate, missing, and incomplete data.
- EXPLORATORY DATA ANALYSIS = 3 Hours
- Introduction to EDA, Data Visualization with Python –Data mining tool.
- Data Wrangling, Data Cleaning and Preparation for the model preparation,
- Feature engineering (Feature Scaling and Standardization), Univariate Analysis, Multivariate Analysis and Correlation study using Python.
- Application of descriptive statistics on real-life dataset;
- Draw inferences from visualization
- Data visualization using Python: Seaborn / Matplotlib
- Supervised Learning MODELS = 4 Hours
- Supervised learning and different types of Supervised Learning algorithms, [Regression, and Classification Models]
- Logistic, Decision Tree, CART
- Case study based on Simple Linear Regression Algorithm
- Case study based on Logistic Classification Algorithm
- Supervised Learning MODELS = 8 Hours
- Training and Visualizing a Decision Tree,
- Random Forest, SVM, KNN Model
- Case study based on these models
- Improving the Model = 8 Hours
- Association and dependence; differences between causation and correlation;
- The importance of the Simpson’s paradox;
- The Curse of Dimensionality (COD); bias-variance trade-off; train-test split; K-fold cross-validation;
- Evaluate various models and comparison of performance;
- Computational Complexity, Gini Impurity or Entropy? Regularization Hyperparameters
- Project Work = 4 Hours
- The Hands-on practice exercise will be based on Industry case studies
- 7 to 10 days to submit the project
- Unsupervised Learning = 7 Hours
- Illustration of clustering techniques (cluster by variable or factor analysis and cluster by observation);
- K-Means Clustering, K-Means Algorithm with example, Pros & Cons of K-Means Clustering hands-on problems using K-Means algorithm.
- Use Case
- Industry level – Real Life Project = 4 Hours
- 7 to 10 days to submit the project
Essential Reading:
T1. Instructor will be sharing the material during the course
T2. Must practice all the use cases
Recommended Reading:
R1. Hands-on Machine Learning with Scikit Learn – O’Reilly
R2. Introduction to Statistical Learning – Springer