Dataset Open Access

Single mutation protein structure pairs extracted from the PDB with MicroMiner

Sieg Jochen; Rarey Matthias


JSON Export

{"conceptdoi":"10.25592/uhhfdm.13410","conceptrecid":"13410","created":"2023-09-30T12:25:59.843976+00:00","doi":"10.25592/uhhfdm.13411","id":13411,"links":{"badge":"https://www.fdr.uni-hamburg.de/badge/doi/10.25592/uhhfdm.13411.svg","conceptbadge":"https://www.fdr.uni-hamburg.de/badge/doi/10.25592/uhhfdm.13410.svg","conceptdoi":"http://doi.org/10.25592/uhhfdm.13410","doi":"http://doi.org/10.25592/uhhfdm.13411"},"metadata":{"access_right":"open","access_right_category":"success","communities":[{"id":"uhh"}],"creators":[{"affiliation":"Universit\u00e4t Hamburg, ZBH - Center for Bioinformatics, Bundesstra\u00dfe 43, 20146 Hamburg, Germany","name":"Sieg Jochen","orcid":"0000-0001-5343-7255"},{"affiliation":"Universit\u00e4t Hamburg, ZBH - Center for Bioinformatics, Bundesstra\u00dfe 43, 20146 Hamburg, Germany","name":"Rarey Matthias","orcid":"0000-0002-9553-6531"}],"description":"<p>This page provides the single mutation data extracted with MicroMiner from the PDB. The data contains amino acid pairs in protein structures from the PDB, exemplifying single mutations&rsquo; local structural changes for single chains and pairs for protein&ndash;protein interfaces. Mutations to non-standard residues are also provided.<br>\nSee the MicroMiner publication for details:</p>\n\n<blockquote>\n<p><em>Sieg, J.; Rarey, M. Searching similar local 3D micro-environments in protein structure databases with MicroMiner, 2023 (accepted in Briefings in Bioinformatics)</em></p>\n</blockquote>\n\n<p><strong>Data content:</strong></p>\n\n<ul>\n\t<li><strong>pdb_all_monomer.tsv</strong>\n\n\t<ul>\n\t\t<li>all single mutations in monomer/single chains</li>\n\t\t<li>255853767 pairs/lines</li>\n\t\t<li>15GB</li>\n\t</ul>\n\t</li>\n\t<li><strong>filtered_single_mutations_pdb_monomer.tsv</strong>\n\t<ul>\n\t\t<li>redundancy and similarity filtered pdb_all_monomer.tsv</li>\n\t\t<li>4868765 pairs/lines</li>\n\t\t<li>324MB</li>\n\t</ul>\n\t</li>\n\t<li><strong>single_mutations_pdb_monomer_non_standard_aa.tsv</strong>\n\t<ul>\n\t\t<li>only single mutations containing non-standard in monomer/single chains</li>\n\t\t<li>350969 pairs/lines</li>\n\t\t<li>21MB</li>\n\t</ul>\n\t</li>\n\t<li><strong>pdb_all_ppi.tsv</strong>\n\t<ul>\n\t\t<li>all single mutations at PPIs</li>\n\t\t<li>45752145 pairs/lines</li>\n\t\t<li>2.7GB</li>\n\t</ul>\n\t</li>\n\t<li><strong>filtered_single_mutations_pdb_ppi.tsv</strong>\n\t<ul>\n\t\t<li>redundancy and similarity filtered pdb_all_ppi.tsv</li>\n\t\t<li>799130 pairs/lines</li>\n\t\t<li>54MB</li>\n\t</ul>\n\t</li>\n\t<li><strong>single_mutations_pdb_ppi_non_standard_aa.tsv</strong>\n\t<ul>\n\t\t<li>only single mutations containing non-standard residues at PPIs</li>\n\t\t<li>114671 pairs/lines</li>\n\t\t<li>6.9MB</li>\n\t</ul>\n\t</li>\n</ul>\n\n<p>A row in the TSV files describes the residue position of the single mutation in the wild-type (query) and mutant (hit). Multiple local structural and sequential similarity measures are provided, computed from the residue 3D micro-environments. The column fullSeqId contains the global sequence similarity. The first two rows of a TSV file look this:</p>\n\n<pre><code class=\"language-bash\">queryName    queryChain    queryAA    queryPos    hitName    hitChain    hitAA    hitPos    siteIdentity    siteBackBoneRMSD    siteAllAtomRMSD    nofSiteResidues    alignmentLDDT    fullSeqId\n10GS    A    CYS    47    2J9H    A    ALA    48    0.938    0.223    0.431    16.0    0.996    0.976    0.976</code></pre>\n\n<p><em>queryName</em>: query PDB-ID</p>\n\n<p><em>queryChain</em>: query chain ID</p>\n\n<p><em>queryAA</em>: query amino acid type (three letter code)</p>\n\n<p><em>queryPos</em>: query sequence position of the amino acid residue</p>\n\n<p><em>hitName</em>: hit PDB-ID</p>\n\n<p><em>hitChain</em>: hit chain ID</p>\n\n<p><em>hitAA</em>: hit amino acid type (three letter code)</p>\n\n<p><em>hitPos</em>: hit sequence position of the amino acid residue</p>\n\n<p><em>siteIdentity</em>: sequence identity of the aligned micro-environments</p>\n\n<p><em>siteBackBoneRMSD</em>: Calpha-RMSD of the aligned micro-environments</p>\n\n<p><em>siteAllAtomRMSD</em>: all-atom-RMSD of the aligned micro-environments</p>\n\n<p><em>nofSiteResidues</em>: number of residues in the micro-environments</p>\n\n<p><em>alignmentLDDT</em>: mean LDDT score of all residues in the aligned micro-environments</p>\n\n<p><em>fullSeqId</em>: global sequence identity of the query chain and hit chain (as specified by the chain IDs)</p>\n\n<p>&nbsp;</p>\n\n<pre><code>This work was supported by the German Federal Ministry of Education and Research as part of de.NBI [grant number 031L0105] and protP.S.I. [grant number 031B0405B].</code></pre>\n\n<p>&nbsp;</p>","doi":"10.25592/uhhfdm.13411","license":{"id":"CC-BY-4.0"},"publication_date":"2023-09-30","related_identifiers":[{"identifier":"10.25592/uhhfdm.13410","relation":"isVersionOf","scheme":"doi"}],"relations":{"version":[{"count":1,"index":0,"is_last":true,"last_child":{"pid_type":"recid","pid_value":"13411"},"parent":{"pid_type":"recid","pid_value":"13410"}}]},"resource_type":{"title":"Dataset","type":"dataset"},"title":"Single mutation protein structure pairs extracted from the PDB with MicroMiner"},"owners":[412],"revision":4,"updated":"2023-10-02T15:31:00.401157+00:00"}

Cite record as