Dataset Open Access

Single mutation protein structure pairs extracted from the PDB with MicroMiner

Sieg Jochen; Rarey Matthias


JSON-LD (schema.org) Export

{"@context":"https://schema.org/","@id":"http://doi.org/10.25592/uhhfdm.13411","@type":"Dataset","creator":[{"@id":"https://orcid.org/0000-0001-5343-7255","@type":"Person","affiliation":"Universit\u00e4t Hamburg, ZBH - Center for Bioinformatics, Bundesstra\u00dfe 43, 20146 Hamburg, Germany","name":"Sieg Jochen"},{"@id":"https://orcid.org/0000-0002-9553-6531","@type":"Person","affiliation":"Universit\u00e4t Hamburg, ZBH - Center for Bioinformatics, Bundesstra\u00dfe 43, 20146 Hamburg, Germany","name":"Rarey Matthias"}],"datePublished":"2023-09-30","description":"<p>This page provides the single mutation data extracted with MicroMiner from the PDB. The data contains amino acid pairs in protein structures from the PDB, exemplifying single mutations&rsquo; local structural changes for single chains and pairs for protein&ndash;protein interfaces. Mutations to non-standard residues are also provided.<br>\nSee the MicroMiner publication for details:</p>\n\n<blockquote>\n<p><em>Sieg, J.; Rarey, M. Searching similar local 3D micro-environments in protein structure databases with MicroMiner, 2023 (accepted in Briefings in Bioinformatics)</em></p>\n</blockquote>\n\n<p><strong>Data content:</strong></p>\n\n<ul>\n\t<li><strong>pdb_all_monomer.tsv</strong>\n\n\t<ul>\n\t\t<li>all single mutations in monomer/single chains</li>\n\t\t<li>255853767 pairs/lines</li>\n\t\t<li>15GB</li>\n\t</ul>\n\t</li>\n\t<li><strong>filtered_single_mutations_pdb_monomer.tsv</strong>\n\t<ul>\n\t\t<li>redundancy and similarity filtered pdb_all_monomer.tsv</li>\n\t\t<li>4868765 pairs/lines</li>\n\t\t<li>324MB</li>\n\t</ul>\n\t</li>\n\t<li><strong>single_mutations_pdb_monomer_non_standard_aa.tsv</strong>\n\t<ul>\n\t\t<li>only single mutations containing non-standard in monomer/single chains</li>\n\t\t<li>350969 pairs/lines</li>\n\t\t<li>21MB</li>\n\t</ul>\n\t</li>\n\t<li><strong>pdb_all_ppi.tsv</strong>\n\t<ul>\n\t\t<li>all single mutations at PPIs</li>\n\t\t<li>45752145 pairs/lines</li>\n\t\t<li>2.7GB</li>\n\t</ul>\n\t</li>\n\t<li><strong>filtered_single_mutations_pdb_ppi.tsv</strong>\n\t<ul>\n\t\t<li>redundancy and similarity filtered pdb_all_ppi.tsv</li>\n\t\t<li>799130 pairs/lines</li>\n\t\t<li>54MB</li>\n\t</ul>\n\t</li>\n\t<li><strong>single_mutations_pdb_ppi_non_standard_aa.tsv</strong>\n\t<ul>\n\t\t<li>only single mutations containing non-standard residues at PPIs</li>\n\t\t<li>114671 pairs/lines</li>\n\t\t<li>6.9MB</li>\n\t</ul>\n\t</li>\n</ul>\n\n<p>A row in the TSV files describes the residue position of the single mutation in the wild-type (query) and mutant (hit). Multiple local structural and sequential similarity measures are provided, computed from the residue 3D micro-environments. The column fullSeqId contains the global sequence similarity. The first two rows of a TSV file look this:</p>\n\n<pre><code class=\"language-bash\">queryName    queryChain    queryAA    queryPos    hitName    hitChain    hitAA    hitPos    siteIdentity    siteBackBoneRMSD    siteAllAtomRMSD    nofSiteResidues    alignmentLDDT    fullSeqId\n10GS    A    CYS    47    2J9H    A    ALA    48    0.938    0.223    0.431    16.0    0.996    0.976    0.976</code></pre>\n\n<p><em>queryName</em>: query PDB-ID</p>\n\n<p><em>queryChain</em>: query chain ID</p>\n\n<p><em>queryAA</em>: query amino acid type (three letter code)</p>\n\n<p><em>queryPos</em>: query sequence position of the amino acid residue</p>\n\n<p><em>hitName</em>: hit PDB-ID</p>\n\n<p><em>hitChain</em>: hit chain ID</p>\n\n<p><em>hitAA</em>: hit amino acid type (three letter code)</p>\n\n<p><em>hitPos</em>: hit sequence position of the amino acid residue</p>\n\n<p><em>siteIdentity</em>: sequence identity of the aligned micro-environments</p>\n\n<p><em>siteBackBoneRMSD</em>: Calpha-RMSD of the aligned micro-environments</p>\n\n<p><em>siteAllAtomRMSD</em>: all-atom-RMSD of the aligned micro-environments</p>\n\n<p><em>nofSiteResidues</em>: number of residues in the micro-environments</p>\n\n<p><em>alignmentLDDT</em>: mean LDDT score of all residues in the aligned micro-environments</p>\n\n<p><em>fullSeqId</em>: global sequence identity of the query chain and hit chain (as specified by the chain IDs)</p>\n\n<p>&nbsp;</p>\n\n<pre><code>This work was supported by the German Federal Ministry of Education and Research as part of de.NBI [grant number 031L0105] and protP.S.I. [grant number 031B0405B].</code></pre>\n\n<p>&nbsp;</p>","distribution":[{"@type":"DataDownload","contentUrl":"https://www.fdr.uni-hamburg.de/api/files/ad633a26-3b2f-49b4-85b1-03d407fe1843/filtered_single_mutations_pdb_monomer.tsv.tar.gz","encodingFormat":"gz"},{"@type":"DataDownload","contentUrl":"https://www.fdr.uni-hamburg.de/api/files/ad633a26-3b2f-49b4-85b1-03d407fe1843/filtered_single_mutations_pdb_ppi.tsv.tar.gz","encodingFormat":"gz"},{"@type":"DataDownload","contentUrl":"https://www.fdr.uni-hamburg.de/api/files/ad633a26-3b2f-49b4-85b1-03d407fe1843/pdb_all_monomer.tsv.tar.gz","encodingFormat":"gz"},{"@type":"DataDownload","contentUrl":"https://www.fdr.uni-hamburg.de/api/files/ad633a26-3b2f-49b4-85b1-03d407fe1843/pdb_all_ppi.tsv.tar.gz","encodingFormat":"gz"},{"@type":"DataDownload","contentUrl":"https://www.fdr.uni-hamburg.de/api/files/ad633a26-3b2f-49b4-85b1-03d407fe1843/single_mutations_pdb_monomer_non_standard_aa.tsv.tar.gz","encodingFormat":"gz"},{"@type":"DataDownload","contentUrl":"https://www.fdr.uni-hamburg.de/api/files/ad633a26-3b2f-49b4-85b1-03d407fe1843/single_mutations_pdb_ppi_non_standard_aa.tsv.tar.gz","encodingFormat":"gz"}],"identifier":"http://doi.org/10.25592/uhhfdm.13411","license":"https://creativecommons.org/licenses/by/4.0/legalcode","name":"Single mutation protein structure pairs extracted from the PDB with MicroMiner","url":"https://www.fdr.uni-hamburg.de/record/13411"}

Cite record as