Dataset Open Access
Graef, Joel;
Ehrt, Christiane;
Reim, Thorben;
Rarey, Matthias
<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
<leader>00000nmm##2200000uu#4500</leader>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Ehrt, Christiane</subfield>
<subfield code="u">ZBH Center for Bioinformatics, Universität Hamburg, Bundesstraße 43, 20146 Hamburg, Germany</subfield>
<subfield code="0">(orcid)0000-0003-1428-0042</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Reim, Thorben</subfield>
<subfield code="u">ZBH Center for Bioinformatics, Universität Hamburg, Bundesstraße 43, 20146 Hamburg, Germany</subfield>
<subfield code="0">(orcid)0009-0002-7712-8515</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Rarey, Matthias</subfield>
<subfield code="u">ZBH Center for Bioinformatics, Universität Hamburg, Bundesstraße 43, 20146 Hamburg, Germany</subfield>
<subfield code="0">(orcid)0000-0002-9553-6531</subfield>
</datafield>
<datafield tag="540" ind1=" " ind2=" ">
<subfield code="u">https://creativecommons.org/licenses/by/4.0/legalcode</subfield>
<subfield code="a">Creative Commons Attribution 4.0 International</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="s">18752879777</subfield>
<subfield code="u">https://www.fdr.uni-hamburg.de/record/13228/files/OptimizationAndEvaluationDatasetsForPiMine.zip</subfield>
<subfield code="z">md5:4ee71e9384ee2ef4e751c24e4ce67fa9</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">Protein-Protein Interaction</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">Structure-Based Drug Design</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">Similarity Search</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">Database</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">Computational Molecular Design</subfield>
</datafield>
<datafield tag="909" ind1="C" ind2="O">
<subfield code="o">oai:fdr.uni-hamburg.de:13228</subfield>
<subfield code="p">user-uhh</subfield>
</datafield>
<datafield tag="773" ind1=" " ind2=" ">
<subfield code="a">10.25592/uhhfdm.13227</subfield>
<subfield code="i">isVersionOf</subfield>
<subfield code="n">doi</subfield>
</datafield>
<controlfield tag="001">13228</controlfield>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">Optimization and Evaluation Datasets for PiMine</subfield>
</datafield>
<datafield tag="500" ind1=" " ind2=" ">
<subfield code="a">This work was supported by the German Federal Ministry of Education and Research as part of CompLS and de.NBI [031L0172, 031L0105]. C.E. is funded by Data Science in Hamburg – Helmholtz Graduate School for the Structure of Matter (Grant-ID: HIDSS-0002).</subfield>
</datafield>
<datafield tag="100" ind1=" " ind2=" ">
<subfield code="a">Graef, Joel</subfield>
<subfield code="u">ZBH Center for Bioinformatics, Universität Hamburg, Bundesstraße 43, 20146 Hamburg, Germany</subfield>
<subfield code="0">(orcid)0000-0001-8327-4936</subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">user-uhh</subfield>
</datafield>
<datafield tag="260" ind1=" " ind2=" ">
<subfield code="c">2023-09-11</subfield>
</datafield>
<datafield tag="520" ind1=" " ind2=" ">
<subfield code="a"><p>The protein-protein interface comparison software PiMine was developed to provide fast comparisons against databases of known protein-protein complex structures. Its application domains range from the prediction of interfaces and potential interaction partners to the identification of potential small molecule modulators of protein-protein interactions.[1]</p>
<p>The protein-protein evaluation datasets are a collection of five datasets that were used for the parameter optimization (<em>ParamOptSet</em>), enrichment assessment (<em>Dimer597</em> set, <em>Keskin</em> set, <em>PiMineSet</em>), and runtime analyses (<em>RunTimeSet</em>) of protein-protein interface comparison tools. The evaluation datasets contain pairs of interfaces of protein chains that either share sequential and structural similarities or are even sequentially and structurally unrelated. They enable comparative benchmark studies for tools designed to identify interface similarities.</p>
<p>&nbsp;</p>
<p>Data Set description:</p>
<p>The <em>ParamOptSet</em> was designed based on a study on improving the benchmark datasets for the evaluation of protein-protein docking tools [2]. It was used to optimize and fine-tune the geometric search parameters of PiMine.</p>
<p>The <em>Dimer597</em> [3] and <em>Keskin</em> [4] sets were developed earlier. We used them to evaluate PiMine&rsquo;s performance in identifying structurally and sequentially related interface pairs as well as interface pairs with prominent similarity whose constituting chains are sequentially unrelated.</p>
<p>The <em>PiMine</em> set [1] was constructed to assess different quality criteria for reliable interface comparison. It consists of similar pairs of protein-protein complexes of which two chains are sequentially and structurally highly related while the other two chains are unrelated and show different folds. It enables the assessment of the performance when the interfaces of apparently unrelated chains are available only. Furthermore, we could obtain reliable interface-interface alignments based on the similar chains which can be used for alignment performance assessments.</p>
<p>Finally, the <em>RunTimeSet</em> [1] comprises protein-protein complexes from the PDB that were predicted to be biologically relevant. It enables the comparison of typical run times of comparison methods and represents also an interesting dataset to screen for interface similarities.</p>
<p>&nbsp;</p>
<p>References:</p>
<p>[1] Graef, J.; Ehrt, C.; Reim, T.; Rarey, M. Database-driven identification of structurally similar protein-protein interfaces (submitted)<br>
[2] Barradas-Bautista, D.; Almajed, A.; Oliva, R.; Kalnis, P.; Cavallo, L. Improving classification of correct and incorrect protein-protein docking models by augmenting the training set. Bioinform. Adv. 2023, 3, vbad012.<br>
[3] Gao, M.; Skolnick, J. iAlign: a method for the structural comparison of protein&ndash;protein interfaces. Bioinformatics 2010, 26, 2259-2265.<br>
[4] Keskin, O.; Tsai, C.-J.; Wolfson, H.; Nussinov, R. A new, structurally nonredundant, diverse data set of protein&ndash;protein interfaces and its implications. Protein Sci. 2004, 13, 1043-1055.</p></subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">dataset</subfield>
</datafield>
<datafield tag="542" ind1=" " ind2=" ">
<subfield code="l">open</subfield>
</datafield>
<datafield tag="650" ind1="1" ind2="7">
<subfield code="a">cc-by</subfield>
<subfield code="2">opendefinition.org</subfield>
</datafield>
<controlfield tag="005">20240125125124.0</controlfield>
<datafield tag="024" ind1=" " ind2=" ">
<subfield code="a">10.25592/uhhfdm.13228</subfield>
<subfield code="2">doi</subfield>
</datafield>
</record>