Dataset Open Access
<?xml version='1.0' encoding='utf-8'?> <resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-3" xsi:schemaLocation="http://datacite.org/schema/kernel-3 http://schema.datacite.org/meta/kernel-3/metadata.xsd"> <identifier identifierType="DOI">10.25592/uhhfdm.1875</identifier> <creators> <creator> <creatorName>Baumann, Timo</creatorName> <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0003-2203-1783</nameIdentifier> </creator> </creators> <titles> <title>The Spoken Wikipedia Corpora</title> </titles> <publisher>Universität Hamburg</publisher> <publicationYear>2017</publicationYear> <subjects> <subject>linguistics</subject> <subject>English</subject> <subject>German</subject> <subject>Dutch</subject> </subjects> <contributors> <contributor contributorType="DataCurator"> <contributorName>Stegen, Florian</contributorName> </contributor> <contributor contributorType="DataCurator"> <contributorName>Baumann, Timo</contributorName> <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0003-2203-1783</nameIdentifier> </contributor> <contributor contributorType="DataCurator"> <contributorName>Köhn, Arne</contributorName> <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-4880-2016</nameIdentifier> </contributor> </contributors> <dates> <date dateType="Issued">2017-10-27</date> </dates> <resourceType resourceTypeGeneral="Dataset"/> <alternateIdentifiers> <alternateIdentifier alternateIdentifierType="url">https://www.fdr.uni-hamburg.de/record/1875</alternateIdentifier> </alternateIdentifiers> <relatedIdentifiers> <relatedIdentifier relatedIdentifierType="DOI" relationType="IsPartOf">10.25592/uhhfdm.1874</relatedIdentifier> </relatedIdentifiers> <version>2.0</version> <rightsList> <rights rightsURI="https://creativecommons.org/licenses/by-sa/4.0/legalcode">Creative Commons Attribution Share Alike 4.0 International</rights> <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights> </rightsList> <descriptions> <description descriptionType="Abstract"><p>The Spoken Wikipedia project unites volunteer readers of Wikipedia articles. Hundreds of spoken articles in multiple languages are available to users who are &ndash; for one reason or another &ndash; unable or unwilling to consume the written version of the article. Our resource, the Spoken Wikipedia Corpus, consolidates the Spoken Wikipediae, adding text segmentation, normalization, time-alignment and further annotations, making it accessible for research and fostering new ways of interacting with the material.</p> <p>Timo Baumann and Arne K&ouml;hn and Felix Hennig. 2018. The Spoken Wikipedia Corpus Collection: Harvesting, Alignment and an Application to Hyperlistening, in Language Resources and Evaluation, Special Issue representing significant contributions of LREC 2016.</p> <p>Arne K&ouml;hn, Florian Stegen, Timo Baumann. 2016. Mining the Spoken Wikipedia for Speech Data and Beyond, in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016).</p> <p>&nbsp;</p> <p><strong>CLARIN Metadata summary for The Spoken Wikipedia Corpora (CMDI-based)</strong></p> <p><strong>Title: </strong>The Spoken Wikipedia Corpora<br> <strong>Description: </strong> The Spoken Wikipedia project unites volunteer readers of Wikipedia articles. Hundreds of spoken articles in multiple languages are available to users who are &ndash; for one reason or another &ndash; unable or unwilling to consume the written version of the article. Our resource, the Spoken Wikipedia Corpus, consolidates the Spoken Wikipediae, adding text segmentation, normalization, time-alignment and further annotations, making it accessible for research and fostering new ways of interacting with the material.<br> <strong>Publication date: </strong>2017<br> <strong>Data owner: </strong> Timo Baumann - Universit&auml;t Hamburg<br> <strong>Contributors: </strong> Timo Baumann (author), Arne K&ouml;hn (author), Florian Stegen (author)<br> <strong>Languages: </strong> <a href="https://www.ethnologue.com/language/eng">English (eng)</a>, <a href="https://www.ethnologue.com/language/deu">German (deu)</a>, <a href="https://www.ethnologue.com/language/nld">Dutch (nld)</a><br> <strong>Size: </strong> 5397 article, 1005 hour<br> <strong>Segmentation units: </strong> other<br> <strong>Genre: </strong> encyclopedia<br> <strong>Modality: </strong> spoken<br> <strong>References: </strong> Timo Baumann; Arne K&ouml;hn; Felix Hennig (2018) The Spoken Wikipedia Corpus Collection: Harvesting, Alignment and an Application to Hyperlistening <strong>References: </strong> Arne K&ouml;hn; Florian Stegen; Timo Baumann (2016) Mining the Spoken Wikipedia for Speech Data and Beyond</p> <p>&nbsp;</p></description> </descriptions> </resource>