XGBoost
Topics
This week’s assignments will guide you through the following topics:
- Understanding the foundational principles behind XGBoost
Reading
Please read the following:
Replication task
- Prepare heart data for modeling by:
- Removing unhelpful variables (e.g., all values coded as missing or all values coded identically)
- Transforming variables to numeric or categorical based on the data dictionary
- One-hot encoding categorical variables
- Converting the resulting dataframe to an array
Tasks
Complete the following tasks:
- Answer the questions below
- Clean and prepare heart disease dataset to mirror paper
Weekly Questions
Answer the following questions
- What is training loss and what is regularization? Why do we care about both of these terms in defining an objective function for a supervised learning model?
- What is an ensemble model and specifically how, in general terms, do decision tree ensembles work?
- What is boosting, how is it different than bagging, and why is boosting a useful technique for building models?
- What are ‘weak learners’?