Dataset Open Access

The Spoken Wikipedia Corpora

Baumann, Timo


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-3" xsi:schemaLocation="http://datacite.org/schema/kernel-3 http://schema.datacite.org/meta/kernel-3/metadata.xsd">
  <identifier identifierType="DOI">10.25592/uhhfdm.1875</identifier>
  <creators>
    <creator>
      <creatorName>Baumann, Timo</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0003-2203-1783</nameIdentifier>
    </creator>
  </creators>
  <titles>
    <title>The Spoken Wikipedia Corpora</title>
  </titles>
  <publisher>Universität Hamburg</publisher>
  <publicationYear>2017</publicationYear>
  <subjects>
    <subject>linguistics</subject>
    <subject>English</subject>
    <subject>German</subject>
    <subject>Dutch</subject>
  </subjects>
  <contributors>
    <contributor contributorType="DataCurator">
      <contributorName>Stegen, Florian</contributorName>
    </contributor>
    <contributor contributorType="DataCurator">
      <contributorName>Baumann, Timo</contributorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0003-2203-1783</nameIdentifier>
    </contributor>
    <contributor contributorType="DataCurator">
      <contributorName>Köhn, Arne</contributorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-4880-2016</nameIdentifier>
    </contributor>
  </contributors>
  <dates>
    <date dateType="Issued">2017-10-27</date>
  </dates>
  <resourceType resourceTypeGeneral="Dataset"/>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://www.fdr.uni-hamburg.de/record/1875</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsPartOf">10.25592/uhhfdm.1874</relatedIdentifier>
  </relatedIdentifiers>
  <version>2.0</version>
  <rightsList>
    <rights rightsURI="https://creativecommons.org/licenses/by-sa/4.0/legalcode">Creative Commons Attribution Share Alike 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;The Spoken Wikipedia project unites volunteer readers of Wikipedia articles. Hundreds of spoken articles in multiple languages are available to users who are &amp;ndash; for one reason or another &amp;ndash; unable or unwilling to consume the written version of the article. Our resource, the Spoken Wikipedia Corpus, consolidates the Spoken Wikipediae, adding text segmentation, normalization, time-alignment and further annotations, making it accessible for research and fostering new ways of interacting with the material.&lt;/p&gt;

&lt;p&gt;Timo Baumann and Arne K&amp;ouml;hn and Felix Hennig. 2018. The Spoken Wikipedia Corpus Collection: Harvesting, Alignment and an Application to Hyperlistening, in Language Resources and Evaluation, Special Issue representing significant contributions of LREC 2016.&lt;/p&gt;

&lt;p&gt;Arne K&amp;ouml;hn, Florian Stegen, Timo Baumann. 2016. Mining the Spoken Wikipedia for Speech Data and Beyond, in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016).&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CLARIN Metadata summary for The Spoken Wikipedia Corpora (CMDI-based)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Title: &lt;/strong&gt;The Spoken Wikipedia Corpora&lt;br&gt;
&lt;strong&gt;Description: &lt;/strong&gt; The Spoken Wikipedia project unites volunteer readers of Wikipedia articles. Hundreds of spoken articles in multiple languages are available to users who are &amp;ndash; for one reason or another &amp;ndash; unable or unwilling to consume the written version of the article. Our resource, the Spoken Wikipedia Corpus, consolidates the Spoken Wikipediae, adding text segmentation, normalization, time-alignment and further annotations, making it accessible for research and fostering new ways of interacting with the material.&lt;br&gt;
&lt;strong&gt;Publication date: &lt;/strong&gt;2017&lt;br&gt;
&lt;strong&gt;Data owner: &lt;/strong&gt; Timo Baumann - Universit&amp;auml;t Hamburg&lt;br&gt;
&lt;strong&gt;Contributors: &lt;/strong&gt; Timo Baumann (author), Arne K&amp;ouml;hn (author), Florian Stegen (author)&lt;br&gt;
&lt;strong&gt;Languages: &lt;/strong&gt; &lt;a href="https://www.ethnologue.com/language/eng"&gt;English (eng)&lt;/a&gt;, &lt;a href="https://www.ethnologue.com/language/deu"&gt;German (deu)&lt;/a&gt;, &lt;a href="https://www.ethnologue.com/language/nld"&gt;Dutch (nld)&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Size: &lt;/strong&gt; 5397 article, 1005 hour&lt;br&gt;
&lt;strong&gt;Segmentation units: &lt;/strong&gt; other&lt;br&gt;
&lt;strong&gt;Genre: &lt;/strong&gt; encyclopedia&lt;br&gt;
&lt;strong&gt;Modality: &lt;/strong&gt; spoken&lt;br&gt;
&lt;strong&gt;References: &lt;/strong&gt; Timo Baumann; Arne K&amp;ouml;hn; Felix Hennig (2018) The Spoken Wikipedia Corpus Collection: Harvesting, Alignment and an Application to Hyperlistening &lt;strong&gt;References: &lt;/strong&gt; Arne K&amp;ouml;hn; Florian Stegen; Timo Baumann (2016) Mining the Spoken Wikipedia for Speech Data and Beyond&lt;/p&gt;

&lt;p&gt;&amp;nbsp;&lt;/p&gt;</description>
  </descriptions>
</resource>

Cite record as