XGBoost

Topics

This week’s assignments will guide you through the following topics:

Understanding the foundational principles behind XGBoost

Reading

Please read the following:

Read Section 2c of An optimized XGBoost based diagnostic system for effective prediction of heart disease
Supplemental reading XGBoost: A Scalable Tree Boosting System
Supplemental reading XGBoost: Documentation
Supplemental reading An End-to-End Guide to Understand the Math behind XGBoost

Replication task

Prepare heart data for modeling by:
- Removing unhelpful variables (e.g., all values coded as missing or all values coded identically)
- Transforming variables to numeric or categorical based on the data dictionary
- One-hot encoding categorical variables
- Converting the resulting dataframe to an array

Tasks

Complete the following tasks:

Answer the questions below
Clean and prepare heart disease dataset to mirror paper

Weekly Questions

Answer the following questions

What is training loss and what is regularization? Why do we care about both of these terms in defining an objective function for a supervised learning model?
What is an ensemble model and specifically how, in general terms, do decision tree ensembles work?
What is boosting, how is it different than bagging, and why is boosting a useful technique for building models?
What are ‘weak learners’?