Dataset Open Access

Aspen Open Jets: Monte Carlo

Amram, Oz; Anzalone, Luca; Birk, Joschka; Faroughy, Darius A.; Hallin, Anna; Kasieczka, Gregor; Krämer, Michael; Pang, Ian; Reyes-Gonzalez, Humberto; Shih, David


Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:creator>Amram, Oz</dc:creator>
  <dc:creator>Anzalone, Luca</dc:creator>
  <dc:creator>Birk, Joschka</dc:creator>
  <dc:creator>Faroughy, Darius A.</dc:creator>
  <dc:creator>Hallin, Anna</dc:creator>
  <dc:creator>Kasieczka, Gregor</dc:creator>
  <dc:creator>Krämer, Michael</dc:creator>
  <dc:creator>Pang, Ian</dc:creator>
  <dc:creator>Reyes-Gonzalez, Humberto</dc:creator>
  <dc:creator>Shih, David</dc:creator>
  <dc:date>2026-04-29</dc:date>
  <dc:description>This dataset contains a processed version of open Monte Carlo simulation by CMS (see full list of datasets below), presented in a format suitable for Machine Learning (ML) applications. There is a total of 300M QCD jets and 24M top jets. The dataset is complementary to and created in the same way as Aspen Open Jets, the dataset derived from CMS open data for the paper with the same name (Amram et al, Mach.Learn.Sci.Tech. 6 (2025) 3, 030601).

For each jet we store its transverse momentum (p_T), pseudorapidity (eta), and azimuthal angular coordinate (phi). We also store its mass, groomed with the softdrop algorithm as computed within the CMS reconstruction. Up to 150 constituents of the jet are stored. For each constituent, its 4-momentum is stored in the format (p_x, p_y, p_z, E). We additionally store its transverse impact parameter (d_0) and longitudinal impact parameter (d_z) with their uncertainties, the charge of the candidate, its particle-ID (PID) in the PDG format (note that neutral hadrons are assigned the PID=130 of the neutral kaon K_L^0, while positively/negatively charged hadrons are assigned PID=211 of the charged pion) and its weight from the PUPPI algorithm. We also include additional jet substructure quantities computed within the CMS reconstruction, including the number of constituents in the jet, N-subjettiness variables, various jet-tagging observables from the CMS implementation of ParticleNet and a regression of the jet mass from ParticleNet.

Events are stored in h5 format with 4 keys:


	'event_info', shape (N_jets, 3): [Run Number, LumiBlock, Event Number]
	'jet_kinematics', shape (N_jets, 4): [pt, eta, phi, softdrop mass]
	'PFCands', shape (N_jets, 150, 11): Zero padded list of up to 150 PFcandidates inside the jet.
	Info for each candidate is [px, py, pz, E, d0, d0Err, dz, dzErr, charge, PDG ID, PUPPI weight]
	'jet_tagging', shape (N_jets, 13): Tagging info/scores for the AK8 jet.
	Info for each jet: [nConstituents, tau1, tau2, tau3, tau4, ParticleNet H4q vs QCD, ParticleNet Hbb vs QCD, ParticleNet Hcc vs QCD, ParticleNet QCD score, ParticleNet T vs QCD, ParticleNet W vs QCD, ParticleNet Z vs QCD, ParticleNet regressed mass]


Note: the script `pt_weights.py`, included in this release, is needed in order to combine the QCD jets from the different pT ranges.

The code that was used to create this dataset from CMS open simulation can be found at https://github.com/OzAmram/AOJProcessing. The following datasets were used:

QCD datasets


	CMS Collaboration (2024). Simulated dataset QCD_Pt_300to470_TuneCP5_13TeV_pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.BN4O.SD1T
	CMS Collaboration (2024). Simulated dataset QCD_Pt_470to600_TuneCP5_13TeV_pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.3OWE.GOJK
	CMS Collaboration (2024). Simulated dataset QCD_Pt_600to800_TuneCP5_13TeV_pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.FBVA.7HTR
	CMS Collaboration (2024). Simulated dataset QCD_Pt_800to1000_TuneCP5_13TeV_pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.R6GX.8H9J


Top datasets


	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M900_W270_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.ATIE.JNIC
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M900_W90_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.QAJN.QZWZ
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M2000_W200_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.QL8F.A5KF
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M2000_W600_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.75FU.1FP5
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M2500_W250_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.4G7L.LCDP
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M2500_W750_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.06V7.L5A7
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M3000_W300_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.7JYL.DIKJ
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M3000_W900_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.P3VR.B8SQ
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M3500_W1050_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.J8J2.EFJ6
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M3500_W350_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.RY0T.65F2
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M4000_W1200_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.OQVG.XNIP
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M4000_W400_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.KONY.3CQH
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M1200_W120_TuneCP2_13TeV-madgraph-pythia8 in NANOAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.ABAE.BGE2
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M1000_W100_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.PDXA.DEIH
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M1000_W300_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.2LP8.EQPX
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M1200_W120_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.0F74.2EF5
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M1200_W360_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.Z5HY.G7EF
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M1400_W140_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.TE8F.JB2O
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M1400_W420_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.JJLX.AS3U
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M1600_W160_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.32CU.IFYO
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M1600_W480_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.7PRK.PAIT
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M1800_W180_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.YNA5.XC5D
	
	
	CMS Collaboration (2024). Simulated dataset ZPrimeToTT_M1800_W540_TuneCP2_13TeV-madgraph-pythia8 in MINIAODSIM format for 2016 collision data. CERN Open Data Portal. DOI:10.7483/OPENDATA.CMS.RFGG.0ZY0
	


 </dc:description>
  <dc:identifier>https://www.fdr.uni-hamburg.de/record/18610</dc:identifier>
  <dc:identifier>10.25592/uhhfdm.18610</dc:identifier>
  <dc:identifier>oai:fdr.uni-hamburg.de:18610</dc:identifier>
  <dc:relation>doi:10.25592/uhhfdm.16504</dc:relation>
  <dc:relation>doi:10.25592/uhhfdm.18609</dc:relation>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:rights>https://creativecommons.org/licenses/by/4.0/legalcode</dc:rights>
  <dc:subject>Machine learning</dc:subject>
  <dc:subject>Foundation models</dc:subject>
  <dc:subject>Particle physics</dc:subject>
  <dc:subject>Collider physics</dc:subject>
  <dc:subject>LHC</dc:subject>
  <dc:subject>Open data</dc:subject>
  <dc:subject>Large dataset</dc:subject>
  <dc:subject>Jet physics</dc:subject>
  <dc:subject>Point clouds</dc:subject>
  <dc:subject>Jet tagging</dc:subject>
  <dc:subject>Boosted jets</dc:subject>
  <dc:title>Aspen Open Jets: Monte Carlo</dc:title>
  <dc:type>info:eu-repo/semantics/other</dc:type>
  <dc:type>dataset</dc:type>
</oai_dc:dc>

Cite record as