Dataset Open Access
<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-3" xsi:schemaLocation="http://datacite.org/schema/kernel-3 http://schema.datacite.org/meta/kernel-3/metadata.xsd">
<identifier identifierType="DOI">10.25592/uhhfdm.1875</identifier>
<creators>
<creator>
<creatorName>Baumann, Timo</creatorName>
<nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0003-2203-1783</nameIdentifier>
</creator>
</creators>
<titles>
<title>The Spoken Wikipedia Corpora</title>
</titles>
<publisher>Universität Hamburg</publisher>
<publicationYear>2017</publicationYear>
<subjects>
<subject>linguistics</subject>
<subject>English</subject>
<subject>German</subject>
<subject>Dutch</subject>
</subjects>
<contributors>
<contributor contributorType="DataCurator">
<contributorName>Stegen, Florian</contributorName>
</contributor>
<contributor contributorType="DataCurator">
<contributorName>Baumann, Timo</contributorName>
<nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0003-2203-1783</nameIdentifier>
</contributor>
<contributor contributorType="DataCurator">
<contributorName>Köhn, Arne</contributorName>
<nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-4880-2016</nameIdentifier>
</contributor>
</contributors>
<dates>
<date dateType="Issued">2017-10-27</date>
</dates>
<resourceType resourceTypeGeneral="Dataset"/>
<alternateIdentifiers>
<alternateIdentifier alternateIdentifierType="url">https://www.fdr.uni-hamburg.de/record/1875</alternateIdentifier>
</alternateIdentifiers>
<relatedIdentifiers>
<relatedIdentifier relatedIdentifierType="DOI" relationType="IsPartOf">10.25592/uhhfdm.1874</relatedIdentifier>
</relatedIdentifiers>
<version>2.0</version>
<rightsList>
<rights rightsURI="https://creativecommons.org/licenses/by-sa/4.0/legalcode">Creative Commons Attribution Share Alike 4.0 International</rights>
<rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
</rightsList>
<descriptions>
<description descriptionType="Abstract"><p>The Spoken Wikipedia project unites volunteer readers of Wikipedia articles. Hundreds of spoken articles in multiple languages are available to users who are &ndash; for one reason or another &ndash; unable or unwilling to consume the written version of the article. Our resource, the Spoken Wikipedia Corpus, consolidates the Spoken Wikipediae, adding text segmentation, normalization, time-alignment and further annotations, making it accessible for research and fostering new ways of interacting with the material.</p>
<p>Timo Baumann and Arne K&ouml;hn and Felix Hennig. 2018. The Spoken Wikipedia Corpus Collection: Harvesting, Alignment and an Application to Hyperlistening, in Language Resources and Evaluation, Special Issue representing significant contributions of LREC 2016.</p>
<p>Arne K&ouml;hn, Florian Stegen, Timo Baumann. 2016. Mining the Spoken Wikipedia for Speech Data and Beyond, in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016).</p>
<p>&nbsp;</p>
<p><strong>CLARIN Metadata summary for The Spoken Wikipedia Corpora (CMDI-based)</strong></p>
<p><strong>Title: </strong>The Spoken Wikipedia Corpora<br>
<strong>Description: </strong> The Spoken Wikipedia project unites volunteer readers of Wikipedia articles. Hundreds of spoken articles in multiple languages are available to users who are &ndash; for one reason or another &ndash; unable or unwilling to consume the written version of the article. Our resource, the Spoken Wikipedia Corpus, consolidates the Spoken Wikipediae, adding text segmentation, normalization, time-alignment and further annotations, making it accessible for research and fostering new ways of interacting with the material.<br>
<strong>Publication date: </strong>2017<br>
<strong>Data owner: </strong> Timo Baumann - Universit&auml;t Hamburg<br>
<strong>Contributors: </strong> Timo Baumann (author), Arne K&ouml;hn (author), Florian Stegen (author)<br>
<strong>Languages: </strong> <a href="https://www.ethnologue.com/language/eng">English (eng)</a>, <a href="https://www.ethnologue.com/language/deu">German (deu)</a>, <a href="https://www.ethnologue.com/language/nld">Dutch (nld)</a><br>
<strong>Size: </strong> 5397 article, 1005 hour<br>
<strong>Segmentation units: </strong> other<br>
<strong>Genre: </strong> encyclopedia<br>
<strong>Modality: </strong> spoken<br>
<strong>References: </strong> Timo Baumann; Arne K&ouml;hn; Felix Hennig (2018) The Spoken Wikipedia Corpus Collection: Harvesting, Alignment and an Application to Hyperlistening <strong>References: </strong> Arne K&ouml;hn; Florian Stegen; Timo Baumann (2016) Mining the Spoken Wikipedia for Speech Data and Beyond</p>
<p>&nbsp;</p></description>
</descriptions>
</resource>