Dataset Open Access

Single mutation protein structure pairs extracted from the PDB with MicroMiner

Sieg Jochen; Rarey Matthias


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-3" xsi:schemaLocation="http://datacite.org/schema/kernel-3 http://schema.datacite.org/meta/kernel-3/metadata.xsd">
  <identifier identifierType="DOI">10.25592/uhhfdm.13411</identifier>
  <creators>
    <creator>
      <creatorName>Sieg Jochen</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0001-5343-7255</nameIdentifier>
      <affiliation>Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany</affiliation>
    </creator>
    <creator>
      <creatorName>Rarey Matthias</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-9553-6531</nameIdentifier>
      <affiliation>Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstraße 43, 20146 Hamburg, Germany</affiliation>
    </creator>
  </creators>
  <titles>
    <title>Single mutation protein structure pairs extracted from the PDB with MicroMiner</title>
  </titles>
  <publisher>Universität Hamburg</publisher>
  <publicationYear>2023</publicationYear>
  <dates>
    <date dateType="Issued">2023-09-30</date>
  </dates>
  <resourceType resourceTypeGeneral="Dataset"/>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://www.fdr.uni-hamburg.de/record/13411</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsPartOf">10.25592/uhhfdm.13410</relatedIdentifier>
  </relatedIdentifiers>
  <rightsList>
    <rights rightsURI="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;This page provides the single mutation data extracted with MicroMiner from the PDB. The data contains amino acid pairs in protein structures from the PDB, exemplifying single mutations&amp;rsquo; local structural changes for single chains and pairs for protein&amp;ndash;protein interfaces. Mutations to non-standard residues are also provided.&lt;br&gt;
See the MicroMiner publication for details:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Sieg, J.; Rarey, M. Searching similar local 3D micro-environments in protein structure databases with MicroMiner, 2023 (accepted in Briefings in Bioinformatics)&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Data content:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;&lt;strong&gt;pdb_all_monomer.tsv&lt;/strong&gt;

	&lt;ul&gt;
		&lt;li&gt;all single mutations in monomer/single chains&lt;/li&gt;
		&lt;li&gt;255853767 pairs/lines&lt;/li&gt;
		&lt;li&gt;15GB&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;filtered_single_mutations_pdb_monomer.tsv&lt;/strong&gt;
	&lt;ul&gt;
		&lt;li&gt;redundancy and similarity filtered pdb_all_monomer.tsv&lt;/li&gt;
		&lt;li&gt;4868765 pairs/lines&lt;/li&gt;
		&lt;li&gt;324MB&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;single_mutations_pdb_monomer_non_standard_aa.tsv&lt;/strong&gt;
	&lt;ul&gt;
		&lt;li&gt;only single mutations containing non-standard in monomer/single chains&lt;/li&gt;
		&lt;li&gt;350969 pairs/lines&lt;/li&gt;
		&lt;li&gt;21MB&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;pdb_all_ppi.tsv&lt;/strong&gt;
	&lt;ul&gt;
		&lt;li&gt;all single mutations at PPIs&lt;/li&gt;
		&lt;li&gt;45752145 pairs/lines&lt;/li&gt;
		&lt;li&gt;2.7GB&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;filtered_single_mutations_pdb_ppi.tsv&lt;/strong&gt;
	&lt;ul&gt;
		&lt;li&gt;redundancy and similarity filtered pdb_all_ppi.tsv&lt;/li&gt;
		&lt;li&gt;799130 pairs/lines&lt;/li&gt;
		&lt;li&gt;54MB&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;&lt;strong&gt;single_mutations_pdb_ppi_non_standard_aa.tsv&lt;/strong&gt;
	&lt;ul&gt;
		&lt;li&gt;only single mutations containing non-standard residues at PPIs&lt;/li&gt;
		&lt;li&gt;114671 pairs/lines&lt;/li&gt;
		&lt;li&gt;6.9MB&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A row in the TSV files describes the residue position of the single mutation in the wild-type (query) and mutant (hit). Multiple local structural and sequential similarity measures are provided, computed from the residue 3D micro-environments. The column fullSeqId contains the global sequence similarity. The first two rows of a TSV file look this:&lt;/p&gt;

&lt;pre&gt;&lt;code class="language-bash"&gt;queryName    queryChain    queryAA    queryPos    hitName    hitChain    hitAA    hitPos    siteIdentity    siteBackBoneRMSD    siteAllAtomRMSD    nofSiteResidues    alignmentLDDT    fullSeqId
10GS    A    CYS    47    2J9H    A    ALA    48    0.938    0.223    0.431    16.0    0.996    0.976    0.976&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&lt;em&gt;queryName&lt;/em&gt;: query PDB-ID&lt;/p&gt;

&lt;p&gt;&lt;em&gt;queryChain&lt;/em&gt;: query chain ID&lt;/p&gt;

&lt;p&gt;&lt;em&gt;queryAA&lt;/em&gt;: query amino acid type (three letter code)&lt;/p&gt;

&lt;p&gt;&lt;em&gt;queryPos&lt;/em&gt;: query sequence position of the amino acid residue&lt;/p&gt;

&lt;p&gt;&lt;em&gt;hitName&lt;/em&gt;: hit PDB-ID&lt;/p&gt;

&lt;p&gt;&lt;em&gt;hitChain&lt;/em&gt;: hit chain ID&lt;/p&gt;

&lt;p&gt;&lt;em&gt;hitAA&lt;/em&gt;: hit amino acid type (three letter code)&lt;/p&gt;

&lt;p&gt;&lt;em&gt;hitPos&lt;/em&gt;: hit sequence position of the amino acid residue&lt;/p&gt;

&lt;p&gt;&lt;em&gt;siteIdentity&lt;/em&gt;: sequence identity of the aligned micro-environments&lt;/p&gt;

&lt;p&gt;&lt;em&gt;siteBackBoneRMSD&lt;/em&gt;: Calpha-RMSD of the aligned micro-environments&lt;/p&gt;

&lt;p&gt;&lt;em&gt;siteAllAtomRMSD&lt;/em&gt;: all-atom-RMSD of the aligned micro-environments&lt;/p&gt;

&lt;p&gt;&lt;em&gt;nofSiteResidues&lt;/em&gt;: number of residues in the micro-environments&lt;/p&gt;

&lt;p&gt;&lt;em&gt;alignmentLDDT&lt;/em&gt;: mean LDDT score of all residues in the aligned micro-environments&lt;/p&gt;

&lt;p&gt;&lt;em&gt;fullSeqId&lt;/em&gt;: global sequence identity of the query chain and hit chain (as specified by the chain IDs)&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;This work was supported by the German Federal Ministry of Education and Research as part of de.NBI [grant number 031L0105] and protP.S.I. [grant number 031B0405B].&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
  </descriptions>
</resource>

Cite record as