All Projects → hetong007 → higgsml

hetong007 / higgsml

Licence: other
Repository for post higgs-competition model submission

Programming Languages

C++
36643 projects - #6 most used programming language
r
7636 projects
python
139335 projects - #7 most used programming language
c
50402 projects - #5 most used programming language

Higgs Machine Learning Model

This model is created by Tianqi Chen and Tong He.

This model achieved the best private score among all of our models without ensemble learning. The scores for public board and private board are 3.72181 and 3.72370 respectively.

We integrate some physical features with the original features, then feed the new dataset to xgboost, which is mainly authored by Tianqi, for training and prediction.

Notes

  • The major physics features we add is the sum momentum, invariant mass, energy of arbitary subset of {lep, tau, jet_leading, jet_subleading}. We also include consideration met, with sum of these quantities only considering x,y plain.
  • We adjust several parameters to avoid overfitting
    • eta is set to small value 0.01, which usually needs more rounds to converge, but make results more stable
    • min_child_weight is set to 100, which mean each leaf value requires at least 900 sum of weights, making leave weight estimation more stable
    • colsampleby_tree is set to 0.5, every iteration we randomly pick half of features to construct the tree, this speedup training, and sometimes helps avoid overfitting
    • gamma is set to 0.1, this is a prunning parameter, we didn't tune it carefully, but leaving it nonzero do helps, because the trees in later phase will tends to be simpler, making the boosting less easy to overfit

On a laptop with an 8-thread i7 CPU, this model will be trained in around an hour with less than 2GB memory. The evaluation step will cost 3.5GB memory in 80 seconds.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].