Dataset Open Access

Aspen Open Jets: a real-world ML-ready dataset for jet physics

Amram, Oz; Anzalone, Luca; Birk, Joschka; Faroughy, Darius A.; Hallin, Anna; Kasieczka, Gregor; Krämer, Michael; Pang, Ian; Reyes-Gonzalez, Humberto; Shih, David

This dataset contains approximately 180 M boosted jets, derived from open data collected by the CMS experiment at the Large Hadron Collider (LHC) in 2016 — specifically the JetHT datastream — and presented in a format suitable for Machine Learning (ML) applications. A detailed description of the dataset and how it was produced can be found in the companion paper, arxiv 2412.10504.

For each jet we store its transverse momentum (p_T), pseudorapidity (eta), and azimuthal angular coordinate (phi). We also store its mass, groomed with the softdrop algorithm as computed within the CMS reconstruction. Up to 150 constituents of the jet are stored. For each constituent, its 4-momentum is stored in the format (p_x, p_y, p_z, E). We additionally store its transverse impact parameter (d_0) and longitudinal impact parameter (d_z) with their uncertainties, the charge of the candidate, its particle-ID (PID) in the PDG format (note that neutral hadrons are assigned the PID=130 of the neutral kaon K_L^0, while positively/negatively charged hadrons are assigned PID=211 of the charged pion) and its weight from the PUPPI algorithm. We also include additional jet substructure quantities computed within the CMS reconstruction, including the number of constituents in the jet, N-subjettiness variables, various jet-tagging observables from the CMS implementation of ParticleNet and a regression of the jet mass from ParticleNet.

Events are stored in h5 format with 4 keys:

  • 'event_info', shape (N_jets, 3): [Run Number, LumiBlock, Event Number]
  • 'jet_kinematics', shape (N_jets, 4): [pt, eta, phi, softdrop mass]
  • 'PFCands', shape (N_jets, 150, 11): Zero padded list of up to 150 PFcandidates inside the jet.
    Info for each candidate is [px, py, pz, E, d0, d0Err, dz, dzErr, charge, PDG ID, PUPPI weight]
  • 'jet_tagging', shape (N_jets, 13): Tagging info/scores for the AK8 jet.
    Info for each jet: [nConstituents, tau1, tau2, tau3, tau4, ParticleNet H4q vs QCD, ParticleNet Hbb vs QCD, ParticleNet Hcc vs QCD, ParticleNet QCD score, ParticleNet T vs QCD, ParticleNet W vs QCD, ParticleNet Z vs QCD, ParticleNet regressed mass]

The code that was used to create Aspen Open Jets from CMS open data can be found at https://github.com/OzAmram/AOJProcessing, and the code used for the OmniJet-alpha model and its training can be found at https://github.com/uhh-pd-ml/omnijet_alpha.

 

This work was initiated at the Aspen Center for Physics, supported by National Science Foundation grant PHY-2210452. We would like to thank Alexander Mück for discussions. The research of MK and HR-G is supported by the Deutsche Forschungsgemeinschaft DFG under grant 396021762 -- TRR 257: Particle physics phenomenology after the Higgs discovery. JB, AH and GK are supported by the DFG under the German Excellence Initiative -- EXC 2121 Quantum Universe – 390833306, and by PUNCH4NFDI – project number 460248186. DAF, IP, and DS are supported by DOE grant DOE-SC0010008. OA is supported by Fermi Research Alliance, LLC under Contract No. DE-AC02-07CH11359 with the U.S. Department of Energy, Office of Science, Office of High Energy Physics. LA is supported by the University of Bologna. Additionally, we acknowledge support from the Maxwell computational resources at Deutsches Elektronen-Synchrotron DESY, Hamburg, Germany, and computing resources provided by RWTH Aachen University under project rwth0934. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 using NERSC award HEP-ERCAP0027491.
Files (207.3 GB)
Name Size
RunG_batch0.h5
md5:bed9c27139f6651fb9511da39210e9e8
2.4 GB Download
RunG_batch1.h5
md5:7c5d85bbef57d52168cbf6e7b6f9293d
2.5 GB Download
RunG_batch10.h5
md5:d1d6526c38ac2e1f822e8db958151f5c
2.4 GB Download
RunG_batch11.h5
md5:9b2046dbdbc26cee5754917c6c15c5c1
2.5 GB Download
RunG_batch12.h5
md5:1789b978073651698cceda6a11044ae7
2.4 GB Download
RunG_batch13.h5
md5:43b7600c25047e7ca09ea062fc77c3c8
2.5 GB Download
RunG_batch14.h5
md5:8fa23f0dd377a61035cd5936a7fcfab7
2.5 GB Download
RunG_batch15.h5
md5:4d0af0a9d38e24a1ab7ebb169d2f6484
2.4 GB Download
RunG_batch16.h5
md5:8e1fb1b737fd666c4d794cc003f95813
2.6 GB Download
RunG_batch17.h5
md5:d895ea7f4478819dcfb35d672c130c49
2.6 GB Download
RunG_batch18.h5
md5:3ec969ce6935c16254283231117abfc1
2.4 GB Download
RunG_batch19.h5
md5:09abd20e0dc46eee72c13a61c765abea
2.5 GB Download
RunG_batch2.h5
md5:8845537f51e6680e36820511665715f5
2.3 GB Download
RunG_batch20.h5
md5:870ac08cd2377bb1e068539a664aeda2
2.3 GB Download
RunG_batch21.h5
md5:7a29a047ccecbfe18439bd466d1ea86e
2.4 GB Download
RunG_batch22.h5
md5:bea18b4d90dbc7ef01f0db8c2e0772f9
2.6 GB Download
RunG_batch23.h5
md5:1a9aa5f09ea8d0e5687859cf79aace59
2.5 GB Download
RunG_batch24.h5
md5:c8267f474419b27f1472c750396532d4
2.4 GB Download
RunG_batch25.h5
md5:bc6de5075284ea7e1db4f78eb2a8490c
2.5 GB Download
RunG_batch26.h5
md5:533c98643f63abbc9c95161b39494c8b
2.5 GB Download
RunG_batch27.h5
md5:94f9b1e957c6cc9f3f3489d362cc9b92
2.4 GB Download
RunG_batch28.h5
md5:bff3cd0eb05b6c11f55511956cd65936
2.5 GB Download
RunG_batch29.h5
md5:a3f6500c20681554361570a2e3005352
2.3 GB Download
RunG_batch3.h5
md5:b81d16257b371acc4021d1a27cae5ed5
2.4 GB Download
RunG_batch30.h5
md5:bdde11c6af382b8a360582999f09742f
2.4 GB Download
RunG_batch31.h5
md5:d84931fec851266e8e8d08ff11ba32cf
2.4 GB Download
RunG_batch32.h5
md5:cff8da1b4f62f83fdc7f4d23fd2478d9
2.5 GB Download
RunG_batch33.h5
md5:b8df08c907e0d3ff880a29efb0d4b250
2.3 GB Download
RunG_batch34.h5
md5:5f1f739d0be0aaa7ea667aa72257fefb
2.6 GB Download
RunG_batch35.h5
md5:7a73ecf750f1c98829fb675cde4b49f9
2.5 GB Download
RunG_batch36.h5
md5:278d9b722c0748dc9cebc066d677b447
2.5 GB Download
RunG_batch37.h5
md5:e1931a2d81f8dd6afa6dccdf1295ba22
2.5 GB Download
RunG_batch38.h5
md5:f157af70dde0dd9f9bee533adf494195
2.5 GB Download
RunG_batch39.h5
md5:c893435a6b2b19e804360a86035a3577
5.3 GB Download
RunG_batch4.h5
md5:b94165566ff2a3085ce30fb201cf4783
2.6 GB Download
RunG_batch5.h5
md5:d81d867fc7a5b09665d725140003827c
2.6 GB Download
RunG_batch6.h5
md5:d05108f7d3913bb1a204052834332e2a
2.4 GB Download
RunG_batch7.h5
md5:eb522dd62f6abffd936e51cec5ef0e1b
2.3 GB Download
RunG_batch8.h5
md5:96cd7df0741cae05a42005ac434bc6cf
2.4 GB Download
RunG_batch9.h5
md5:9ddfc6bd753a3fdcfb53195eec492333
2.5 GB Download
RunH_batch0.h5
md5:e1f80505da028e2f800a84e592c6eb61
2.4 GB Download
RunH_batch1.h5
md5:8d4431e6a9e9a4ee00ab4dee8e62be9a
2.5 GB Download
RunH_batch10.h5
md5:7e5fa5f3cb065a623702e2b9a6f55a82
2.5 GB Download
RunH_batch11.h5
md5:40cf917c2d0e6164f5c5d3c12a0405de
2.6 GB Download
RunH_batch12.h5
md5:90f2bdb87fc3adbadb161d7e9ee94191
2.6 GB Download
RunH_batch13.h5
md5:eb9cf126983144ebf57e905aa263ae69
2.6 GB Download
RunH_batch14.h5
md5:1ce5a82dba4253da5e1451468dc32654
2.6 GB Download
RunH_batch15.h5
md5:64f7ff3bb44b4fd6169fb30490cad676
2.7 GB Download
RunH_batch16.h5
md5:9c92564610df6d88a58ef3dda1a007b7
2.8 GB Download
RunH_batch17.h5
md5:da3593c4840042ef655f898632bfe364
2.7 GB Download
RunH_batch18.h5
md5:cbac393a793e8d39532582df63100598
2.8 GB Download
RunH_batch19.h5
md5:3170d1560f1db00e55eef4e486fb2c92
2.7 GB Download
RunH_batch2.h5
md5:10e9f3d65e32652fae939444eaae0154
2.4 GB Download
RunH_batch20.h5
md5:15b63c56e79aa06a85edb6a74ddcd8ee
2.6 GB Download
RunH_batch21.h5
md5:b159025f556277fdf0cfb20468db008d
2.5 GB Download
RunH_batch22.h5
md5:93d7104826c94a1ab5506ddffaba7aa0
2.6 GB Download
RunH_batch23.h5
md5:b24e8ff61f111984f512bbda88bca082
2.5 GB Download
RunH_batch24.h5
md5:ecdcea8320892754f4ff1af1da3f1c02
2.6 GB Download
RunH_batch25.h5
md5:848fb914bbcb2a9c91d82806f076e582
2.7 GB Download
RunH_batch26.h5
md5:7efdb2313d3c56be8f7e4488e10774ad
2.6 GB Download
RunH_batch27.h5
md5:4e7288e32424df99fc05984236c1e6da
2.8 GB Download
RunH_batch28.h5
md5:555b12702cd2a00a3dd6eddf795f1290
2.6 GB Download
RunH_batch29.h5
md5:65a26fac1db7513bfb445f9b95a06e7f
2.6 GB Download
RunH_batch3.h5
md5:a46b299502af449c1d12a0082c89ee5d
2.3 GB Download
RunH_batch30.h5
md5:8affe7bc849c5a221566ba0cac49abf7
2.6 GB Download
RunH_batch31.h5
md5:d054602f665e15c84921c5aed0c474bd
2.6 GB Download
RunH_batch32.h5
md5:1f81308608b0ca5d5651ccf89f445a0f
2.7 GB Download
RunH_batch33.h5
md5:fcf3e84e3e20f7852bca0eeb606d1dfe
2.6 GB Download
RunH_batch34.h5
md5:db6ddc517b92f8f95c660394ebc22596
2.6 GB Download
RunH_batch35.h5
md5:80c65779443b2301e7af6f1a9ef9c2d3
2.6 GB Download
RunH_batch36.h5
md5:41557437e64bc7be233092de90e1a529
2.5 GB Download
RunH_batch37.h5
md5:0a2f0e51c68c646d2b703135e0423bc8
2.7 GB Download
RunH_batch38.h5
md5:69494b6c28b1cb5f54a1a8bb77049b21
2.7 GB Download
RunH_batch39.h5
md5:6714459031414b332cde2420ac52ddb5
5.1 GB Download
RunH_batch4.h5
md5:5078689e3acb6c97616235223dfd893e
2.4 GB Download
RunH_batch5.h5
md5:194597a9ca6b7f5f9d57d06ecfc88955
2.5 GB Download
RunH_batch6.h5
md5:f90dda1ad19b2d66a0fd9540ecf0d9f7
2.6 GB Download
RunH_batch7.h5
md5:e3b5f858b3f6b0b3cd2b0e371fa19a9b
2.6 GB Download
RunH_batch8.h5
md5:926cd17ac27a2673b665998a82d5aa20
2.5 GB Download
RunH_batch9.h5
md5:890444d314f7b90c89919a369ab467e6
2.3 GB Download

Cite record as