Abstract
Rainfall-induced landslides have caused a large amount of economic losses and casualties over the years. Machine learning techniques have been widely applied in recent years to assess landslide susceptibility over regions of interest. However, a number of challenges limit the reliability and performance of machine learning-based landslide models. In particular, class imbalance in the dataset, selection of landslide conditioning factors, and potential extrapolation problems for landslide prediction under future conditions need to be carefully addressed. This work introduces methodologies to address these challenges using XGBoost to train the landslide prediction model. Data resampling techniques were adopted to improve the model performance with the imbalanced dataset. Various models were trained and their performances evaluated using a combination of different metrics. The results show that synthetic minority oversampling technique combined with the proposed gridded hyperspace sampling technique performs better than the other imbalance learning techniques with XGBoost. Subsequently, the extrapolation performance of the XGBoost model was evaluated, showing that the predictions remain valid for the projected climate conditions. As a case study, landslide susceptibility maps in California were generated using the developed model and compared with the historical California landslide catalog. These results suggest that the developed model can be of great significance in global landslide susceptibility mapping under climate change scenarios.