Amir A. Nasrollahzadeh

Logo

Here, I am sharing snippets of the projects I am working on in Operations Research or Machine Learning.

View My GitHub Profile

Feature Selection with Boruta and Importance Analysis with SHAP Values

xgBoost technique can be used to produce feature importances. These importances may be used to select a subset of the more important features given an arbitrary threshold. However, it has been shown that these feature importances are inconsistent with respect to different data sets. Therefore, even by considering an importance threshold, appropriate feature subset selection cannot be achieved. Boruta selection wrapper is an algorithm that test the importance of each feature against a shuffled copy. SHAP values are developed by Scott Lundberg and are used to derive global and consistent feature importances. You can see my implementation here.

Classification with Extreme Gradient Boosting (xgBoost)

Implementation of the xgBoost technique on roadside maintenance data set to identify major factor contributing to the risk of collision for roadside maintenance workers. The data contains more than 2 million work orders from the state of California and describe work zone features, road features, traffic flow and volume and collision characteristics. You can visit the project’s page or see the entire repository. The project is sponsored by California Department of Transportation through Advanced Highway Maintenance & Construction Technology research center.