ML with PySpark
Introduction
- Introduction to distributed computing
- Overview of Big data environment
SPARK environment
- Spark Architecture
- Resilient Distributed Datasets (RDDs)
- Spark DataFrame
- Spark installation
- Spark configuration
Machine learning on SPARK
- Overview of machine learning
- PySpark SQL
- Pyspark MLlib
- Data pipeline
Predictive analytics
- Linear Regression with Mlib
Classification with Mlib
- Logistic Regression Model
- Decision Tree Classifier
- Random Forest Classifier
- Gradient-Boosted Tree Classifier
Clustering
- Clustering - use case
- KMeans clustering with Mlib