Dataset Open Access
<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
<leader>00000nmm##2200000uu#4500</leader>
<controlfield tag="005">20231002153100.0</controlfield>
<datafield tag="260" ind1=" " ind2=" ">
<subfield code="c">2023-09-30</subfield>
</datafield>
<datafield tag="024" ind1=" " ind2=" ">
<subfield code="a">10.25592/uhhfdm.13411</subfield>
<subfield code="2">doi</subfield>
</datafield>
<datafield tag="542" ind1=" " ind2=" ">
<subfield code="l">open</subfield>
</datafield>
<datafield tag="540" ind1=" " ind2=" ">
<subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
<subfield code="a">Creative Commons Attribution 4.0 International</subfield>
</datafield>
<datafield tag="520" ind1=" " ind2=" ">
<subfield code="a"><p>This page provides the single mutation data extracted with MicroMiner from the PDB. The data contains amino acid pairs in protein structures from the PDB, exemplifying single mutations&rsquo; local structural changes for single chains and pairs for protein&ndash;protein interfaces. Mutations to non-standard residues are also provided.<br>
See the MicroMiner publication for details:</p>
<blockquote>
<p><em>Sieg, J.; Rarey, M. Searching similar local 3D micro-environments in protein structure databases with MicroMiner, 2023 (accepted in Briefings in Bioinformatics)</em></p>
</blockquote>
<p><strong>Data content:</strong></p>
<ul>
<li><strong>pdb_all_monomer.tsv</strong>
<ul>
<li>all single mutations in monomer/single chains</li>
<li>255853767 pairs/lines</li>
<li>15GB</li>
</ul>
</li>
<li><strong>filtered_single_mutations_pdb_monomer.tsv</strong>
<ul>
<li>redundancy and similarity filtered pdb_all_monomer.tsv</li>
<li>4868765 pairs/lines</li>
<li>324MB</li>
</ul>
</li>
<li><strong>single_mutations_pdb_monomer_non_standard_aa.tsv</strong>
<ul>
<li>only single mutations containing non-standard in monomer/single chains</li>
<li>350969 pairs/lines</li>
<li>21MB</li>
</ul>
</li>
<li><strong>pdb_all_ppi.tsv</strong>
<ul>
<li>all single mutations at PPIs</li>
<li>45752145 pairs/lines</li>
<li>2.7GB</li>
</ul>
</li>
<li><strong>filtered_single_mutations_pdb_ppi.tsv</strong>
<ul>
<li>redundancy and similarity filtered pdb_all_ppi.tsv</li>
<li>799130 pairs/lines</li>
<li>54MB</li>
</ul>
</li>
<li><strong>single_mutations_pdb_ppi_non_standard_aa.tsv</strong>
<ul>
<li>only single mutations containing non-standard residues at PPIs</li>
<li>114671 pairs/lines</li>
<li>6.9MB</li>
</ul>
</li>
</ul>
<p>A row in the TSV files describes the residue position of the single mutation in the wild-type (query) and mutant (hit). Multiple local structural and sequential similarity measures are provided, computed from the residue 3D micro-environments. The column fullSeqId contains the global sequence similarity. The first two rows of a TSV file look this:</p>
<pre><code class="language-bash">queryName queryChain queryAA queryPos hitName hitChain hitAA hitPos siteIdentity siteBackBoneRMSD siteAllAtomRMSD nofSiteResidues alignmentLDDT fullSeqId
10GS A CYS 47 2J9H A ALA 48 0.938 0.223 0.431 16.0 0.996 0.976 0.976</code></pre>
<p><em>queryName</em>: query PDB-ID</p>
<p><em>queryChain</em>: query chain ID</p>
<p><em>queryAA</em>: query amino acid type (three letter code)</p>
<p><em>queryPos</em>: query sequence position of the amino acid residue</p>
<p><em>hitName</em>: hit PDB-ID</p>
<p><em>hitChain</em>: hit chain ID</p>
<p><em>hitAA</em>: hit amino acid type (three letter code)</p>
<p><em>hitPos</em>: hit sequence position of the amino acid residue</p>
<p><em>siteIdentity</em>: sequence identity of the aligned micro-environments</p>
<p><em>siteBackBoneRMSD</em>: Calpha-RMSD of the aligned micro-environments</p>
<p><em>siteAllAtomRMSD</em>: all-atom-RMSD of the aligned micro-environments</p>
<p><em>nofSiteResidues</em>: number of residues in the micro-environments</p>
<p><em>alignmentLDDT</em>: mean LDDT score of all residues in the aligned micro-environments</p>
<p><em>fullSeqId</em>: global sequence identity of the query chain and hit chain (as specified by the chain IDs)</p>
<p>&nbsp;</p>
<pre><code>This work was supported by the German Federal Ministry of Education and Research as part of de.NBI [grant number 031L0105] and protP.S.I. [grant number 031B0405B].</code></pre>
<p>&nbsp;</p></subfield>
</datafield>
<controlfield tag="001">13411</controlfield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="s">78590102</subfield>
<subfield code="u">https://www.fdr.uni-hamburg.de/record/13411/files/filtered_single_mutations_pdb_monomer.tsv.tar.gz</subfield>
<subfield code="z">md5:1992f83a3356c0dbf846d242688ee4cb</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="s">13315895</subfield>
<subfield code="u">https://www.fdr.uni-hamburg.de/record/13411/files/filtered_single_mutations_pdb_ppi.tsv.tar.gz</subfield>
<subfield code="z">md5:7e96f16aa5cf24438dae392a6f137ea0</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="s">2883340744</subfield>
<subfield code="u">https://www.fdr.uni-hamburg.de/record/13411/files/pdb_all_monomer.tsv.tar.gz</subfield>
<subfield code="z">md5:7f71061b76e5ad8583e87996cffb0bb4</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="s">517799252</subfield>
<subfield code="u">https://www.fdr.uni-hamburg.de/record/13411/files/pdb_all_ppi.tsv.tar.gz</subfield>
<subfield code="z">md5:ebd06bcc67a2785ba0f623366b3628ed</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="s">3564402</subfield>
<subfield code="u">https://www.fdr.uni-hamburg.de/record/13411/files/single_mutations_pdb_monomer_non_standard_aa.tsv.tar.gz</subfield>
<subfield code="z">md5:0c93df9fa680d0ff7bab7f423e9aa675</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="s">983956</subfield>
<subfield code="u">https://www.fdr.uni-hamburg.de/record/13411/files/single_mutations_pdb_ppi_non_standard_aa.tsv.tar.gz</subfield>
<subfield code="z">md5:1f3f15dc28fd661f772b09718c3c2816</subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">dataset</subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">user-uhh</subfield>
</datafield>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">Single mutation protein structure pairs extracted from the PDB with MicroMiner</subfield>
</datafield>
<datafield tag="100" ind1=" " ind2=" ">
<subfield code="a">Sieg Jochen</subfield>
<subfield code="u">Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany</subfield>
<subfield code="0">(orcid)0000-0001-5343-7255</subfield>
</datafield>
<datafield tag="909" ind1="C" ind2="O">
<subfield code="o">oai:fdr.uni-hamburg.de:13411</subfield>
<subfield code="p">user-uhh</subfield>
</datafield>
<datafield tag="650" ind1="1" ind2="7">
<subfield code="a">cc-by</subfield>
<subfield code="2">opendefinition.org</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Rarey Matthias</subfield>
<subfield code="u">Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany</subfield>
<subfield code="0">(orcid)0000-0002-9553-6531</subfield>
</datafield>
<datafield tag="773" ind1=" " ind2=" ">
<subfield code="a">10.25592/uhhfdm.13410</subfield>
<subfield code="i">isVersionOf</subfield>
<subfield code="n">doi</subfield>
</datafield>
</record>