Dataset Open Access

Single mutation protein structure pairs extracted from the PDB with MicroMiner

Sieg Jochen; Rarey Matthias


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <controlfield tag="005">20231002153100.0</controlfield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2023-09-30</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.25592/uhhfdm.13411</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;This page provides the single mutation data extracted with MicroMiner from the PDB. The data contains amino acid pairs in protein structures from the PDB, exemplifying single mutations&amp;rsquo; local structural changes for single chains and pairs for protein&amp;ndash;protein interfaces. Mutations to non-standard residues are also provided.&lt;br&gt;
See the MicroMiner publication for details:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Sieg, J.; Rarey, M. Searching similar local 3D micro-environments in protein structure databases with MicroMiner, 2023 (accepted in Briefings in Bioinformatics)&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Data content:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;&lt;strong&gt;pdb_all_monomer.tsv&lt;/strong&gt;

	&lt;ul&gt;
		&lt;li&gt;all single mutations in monomer/single chains&lt;/li&gt;
		&lt;li&gt;255853767 pairs/lines&lt;/li&gt;
		&lt;li&gt;15GB&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;filtered_single_mutations_pdb_monomer.tsv&lt;/strong&gt;
	&lt;ul&gt;
		&lt;li&gt;redundancy and similarity filtered pdb_all_monomer.tsv&lt;/li&gt;
		&lt;li&gt;4868765 pairs/lines&lt;/li&gt;
		&lt;li&gt;324MB&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;single_mutations_pdb_monomer_non_standard_aa.tsv&lt;/strong&gt;
	&lt;ul&gt;
		&lt;li&gt;only single mutations containing non-standard in monomer/single chains&lt;/li&gt;
		&lt;li&gt;350969 pairs/lines&lt;/li&gt;
		&lt;li&gt;21MB&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;pdb_all_ppi.tsv&lt;/strong&gt;
	&lt;ul&gt;
		&lt;li&gt;all single mutations at PPIs&lt;/li&gt;
		&lt;li&gt;45752145 pairs/lines&lt;/li&gt;
		&lt;li&gt;2.7GB&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;filtered_single_mutations_pdb_ppi.tsv&lt;/strong&gt;
	&lt;ul&gt;
		&lt;li&gt;redundancy and similarity filtered pdb_all_ppi.tsv&lt;/li&gt;
		&lt;li&gt;799130 pairs/lines&lt;/li&gt;
		&lt;li&gt;54MB&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;single_mutations_pdb_ppi_non_standard_aa.tsv&lt;/strong&gt;
	&lt;ul&gt;
		&lt;li&gt;only single mutations containing non-standard residues at PPIs&lt;/li&gt;
		&lt;li&gt;114671 pairs/lines&lt;/li&gt;
		&lt;li&gt;6.9MB&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A row in the TSV files describes the residue position of the single mutation in the wild-type (query) and mutant (hit). Multiple local structural and sequential similarity measures are provided, computed from the residue 3D micro-environments. The column fullSeqId contains the global sequence similarity. The first two rows of a TSV file look this:&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-bash"&gt;queryName    queryChain    queryAA    queryPos    hitName    hitChain    hitAA    hitPos    siteIdentity    siteBackBoneRMSD    siteAllAtomRMSD    nofSiteResidues    alignmentLDDT    fullSeqId
10GS    A    CYS    47    2J9H    A    ALA    48    0.938    0.223    0.431    16.0    0.996    0.976    0.976&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;em&gt;queryName&lt;/em&gt;: query PDB-ID&lt;/p&gt;

&lt;p&gt;&lt;em&gt;queryChain&lt;/em&gt;: query chain ID&lt;/p&gt;

&lt;p&gt;&lt;em&gt;queryAA&lt;/em&gt;: query amino acid type (three letter code)&lt;/p&gt;

&lt;p&gt;&lt;em&gt;queryPos&lt;/em&gt;: query sequence position of the amino acid residue&lt;/p&gt;

&lt;p&gt;&lt;em&gt;hitName&lt;/em&gt;: hit PDB-ID&lt;/p&gt;

&lt;p&gt;&lt;em&gt;hitChain&lt;/em&gt;: hit chain ID&lt;/p&gt;

&lt;p&gt;&lt;em&gt;hitAA&lt;/em&gt;: hit amino acid type (three letter code)&lt;/p&gt;

&lt;p&gt;&lt;em&gt;hitPos&lt;/em&gt;: hit sequence position of the amino acid residue&lt;/p&gt;

&lt;p&gt;&lt;em&gt;siteIdentity&lt;/em&gt;: sequence identity of the aligned micro-environments&lt;/p&gt;

&lt;p&gt;&lt;em&gt;siteBackBoneRMSD&lt;/em&gt;: Calpha-RMSD of the aligned micro-environments&lt;/p&gt;

&lt;p&gt;&lt;em&gt;siteAllAtomRMSD&lt;/em&gt;: all-atom-RMSD of the aligned micro-environments&lt;/p&gt;

&lt;p&gt;&lt;em&gt;nofSiteResidues&lt;/em&gt;: number of residues in the micro-environments&lt;/p&gt;

&lt;p&gt;&lt;em&gt;alignmentLDDT&lt;/em&gt;: mean LDDT score of all residues in the aligned micro-environments&lt;/p&gt;

&lt;p&gt;&lt;em&gt;fullSeqId&lt;/em&gt;: global sequence identity of the query chain and hit chain (as specified by the chain IDs)&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;This work was supported by the German Federal Ministry of Education and Research as part of de.NBI [grant number 031L0105] and protP.S.I. [grant number 031B0405B].&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;</subfield>
  </datafield>
  <controlfield tag="001">13411</controlfield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">78590102</subfield>
    <subfield code="u">https://www.fdr.uni-hamburg.de/record/13411/files/filtered_single_mutations_pdb_monomer.tsv.tar.gz</subfield>
    <subfield code="z">md5:1992f83a3356c0dbf846d242688ee4cb</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">13315895</subfield>
    <subfield code="u">https://www.fdr.uni-hamburg.de/record/13411/files/filtered_single_mutations_pdb_ppi.tsv.tar.gz</subfield>
    <subfield code="z">md5:7e96f16aa5cf24438dae392a6f137ea0</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">2883340744</subfield>
    <subfield code="u">https://www.fdr.uni-hamburg.de/record/13411/files/pdb_all_monomer.tsv.tar.gz</subfield>
    <subfield code="z">md5:7f71061b76e5ad8583e87996cffb0bb4</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">517799252</subfield>
    <subfield code="u">https://www.fdr.uni-hamburg.de/record/13411/files/pdb_all_ppi.tsv.tar.gz</subfield>
    <subfield code="z">md5:ebd06bcc67a2785ba0f623366b3628ed</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">3564402</subfield>
    <subfield code="u">https://www.fdr.uni-hamburg.de/record/13411/files/single_mutations_pdb_monomer_non_standard_aa.tsv.tar.gz</subfield>
    <subfield code="z">md5:0c93df9fa680d0ff7bab7f423e9aa675</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">983956</subfield>
    <subfield code="u">https://www.fdr.uni-hamburg.de/record/13411/files/single_mutations_pdb_ppi_non_standard_aa.tsv.tar.gz</subfield>
    <subfield code="z">md5:1f3f15dc28fd661f772b09718c3c2816</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">user-uhh</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Single mutation protein structure pairs extracted from the PDB with MicroMiner</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="a">Sieg Jochen</subfield>
    <subfield code="u">Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany</subfield>
    <subfield code="0">(orcid)0000-0001-5343-7255</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="o">oai:fdr.uni-hamburg.de:13411</subfield>
    <subfield code="p">user-uhh</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Rarey Matthias</subfield>
    <subfield code="u">Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany</subfield>
    <subfield code="0">(orcid)0000-0002-9553-6531</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="a">10.25592/uhhfdm.13410</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="n">doi</subfield>
  </datafield>
</record>

Cite record as