Dataset Open Access

INEL Enets Corpus

Shluinsky, Andrey; Khanina, Olesya; Wagner-Nagy, Beáta


DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-3" xsi:schemaLocation="http://datacite.org/schema/kernel-3 http://schema.datacite.org/meta/kernel-3/metadata.xsd">
  <identifier identifierType="DOI">10.25592/uhhfdm.18195</identifier>
  <creators>
    <creator>
      <creatorName>Shluinsky, Andrey</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-2553-7213</nameIdentifier>
      <affiliation>Universität Hamburg</affiliation>
    </creator>
    <creator>
      <creatorName>Khanina, Olesya</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0001-5930-4656</nameIdentifier>
      <affiliation>University of Helsinki</affiliation>
    </creator>
    <creator>
      <creatorName>Wagner-Nagy, Beáta</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-6801-1895</nameIdentifier>
      <affiliation>Universität Hamburg</affiliation>
    </creator>
  </creators>
  <titles>
    <title>INEL Enets Corpus</title>
  </titles>
  <publisher>Universität Hamburg</publisher>
  <publicationYear>2025</publicationYear>
  <subjects>
    <subject>Uralic</subject>
    <subject>Samoyedic</subject>
    <subject>Enets</subject>
    <subject>Forest Enets</subject>
    <subject>Tundra Enets</subject>
    <subject>endangered language</subject>
    <subject>language contact</subject>
    <subject>language documentation</subject>
    <subject>legacy data</subject>
    <subject>INEL</subject>
    <subject>AdWHH</subject>
    <subject>text corpus</subject>
    <subject>speech corpus</subject>
    <subject>parallel texts</subject>
    <subject>folklore</subject>
    <subject>tales</subject>
    <subject>narrative</subject>
    <subject>dialogue</subject>
    <subject>song</subject>
    <subject>transcription</subject>
    <subject>time-aligned</subject>
    <subject>audio</subject>
    <subject>video</subject>
    <subject>morphological glossing</subject>
    <subject>part-of-speech</subject>
    <subject>borrowings</subject>
    <subject>code-switching</subject>
    <subject>English translation</subject>
    <subject>Russian translation</subject>
    <subject>EXMARaLDA</subject>
    <subject>ELAN</subject>
    <subject>XML</subject>
    <subject>ISO/TEI</subject>
  </subjects>
  <contributors>
    <contributor contributorType="Editor">
      <contributorName>Arkhipov, Alexandre</contributorName>
      <affiliation>Universität Hamburg</affiliation>
    </contributor>
    <contributor contributorType="Editor">
      <contributorName>Wagner-Nagy, Beáta</contributorName>
      <affiliation>Universität Hamburg</affiliation>
    </contributor>
    <contributor contributorType="DataManager">
      <contributorName>Lazarenko, Elena</contributorName>
      <affiliation>Universität Hamburg</affiliation>
    </contributor>
    <contributor contributorType="DataManager">
      <contributorName>Riaposov, Aleksandr</contributorName>
      <affiliation>Universität Hamburg</affiliation>
    </contributor>
    <contributor contributorType="DataManager">
      <contributorName>Lehmberg, Timm</contributorName>
      <affiliation>Universität Hamburg</affiliation>
    </contributor>
  </contributors>
  <dates>
    <date dateType="Issued">2025-12-31</date>
  </dates>
  <resourceType resourceTypeGeneral="Dataset"/>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://www.fdr.uni-hamburg.de/record/18195</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="Handle" relationType="IsCitedBy">11022/0000-0008-005C-1</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsPartOf">10.25592/uhhfdm.16181</relatedIdentifier>
  </relatedIdentifiers>
  <version>1.1</version>
  <rightsList>
    <rights rightsURI="https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode">Creative Commons Attribution Non Commercial Share Alike 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;&lt;strong&gt;Corpus Citation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Shluinsky, Andrey; Khanina, Olesya; Wagner-Nagy, Be&amp;aacute;ta&lt;/em&gt;. 2025. INEL Enets Corpus. Version 1.1. Publication date 2025-12-31. &lt;a href="https://hdl.handle.net/11022/0000-0008-005C-1"&gt;https://hdl.handle.net/11022/0000-0008-005C-1&lt;/a&gt;. Archived at Universit&amp;auml;t Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages. &lt;a href="https://hdl.handle.net/11022/0000-0007-F45A-1"&gt;https://hdl.handle.net/11022/0000-0007-F45A-1&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Corpus Description&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The INEL Enets corpus has been created within the long-term INEL project (&amp;quot;Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages&amp;quot;), 2016&amp;ndash;2033.&lt;/p&gt;

&lt;p&gt;The corpus includes texts recorded between 1962&amp;ndash;2017 in both Enets lects &amp;ndash; Forest Enets and Tundra Enets. The sources of the corpus (see more details in the user documentation, section 2.2) are:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Audio recordings done by Olesya Khanina, Maria Ovsjannikova, Andrey Shluinsky, Natalia Stoynova and Sergey Trubetskoy,&lt;/li&gt;
	&lt;li&gt;Legacy audio recordings done by Vera Bettu, Nina N. Bolina, Dar`ya S. Bolina, Zoya N. Bolina, Oksana E. Dobzhanskaya, Valentin Gusev, Eugene Helimski&amp;dagger;, Kazimir I. Labanauskas&amp;dagger;, Larisa Leisi&amp;ouml;, Marina Lyublinskaya, Kaur M&amp;auml;gi, Viktor N. Pal`chin, Marina N. Pal`china, Irina P. Sorokina&amp;dagger;, Anna Urmanchieva, Be&amp;aacute;ta Wagner-Nagy and possibly other people,&lt;/li&gt;
	&lt;li&gt;Published audio recordings,&lt;/li&gt;
	&lt;li&gt;Texts published by Dar`ya S. Bolina, Yaroslav A. Gluxij&amp;dagger; and Vasilij A. Susekov&amp;dagger;, Eugene Helimski&amp;dagger;, Kazimir I. Labanauskas&amp;dagger;, Tibor Mikola&amp;dagger;, J&amp;aacute;nos Pusztay, Irina P. Sorokina&amp;dagger;, Anna Urmanchieva,&lt;/li&gt;
	&lt;li&gt;Legacy manuscript transcriptions and self-transcriptions done and/or edited by Dar`ya S. Bolina, Galina S. Bolina, Zoya N. Bolina, Valentin Gusev, Eugene Helimski&amp;dagger;, Kazimir I. Labanauskas&amp;dagger;, Larisa Leisi&amp;ouml;, Marina Lyublinskaya, Vasilij F. Ly`rmin&amp;dagger;, Anton N. Pal`chin, Viktor N. Pal`chin, Ivan I. Silkin&amp;dagger;, Irina P. Sorokina&amp;dagger;, Natal`ya M. Tere&amp;scaron;čenko&amp;dagger;, Anna Urmanchieva and possibly other people.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All texts in the corpus are provided with interlinear morpheme-by-morpheme glosses and translation into English and Russian. All texts for which the audio recordings were accessible are time-aligned with them. Video recordings are also included into the corpus if available.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
&lt;strong&gt;New in release 1.1&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Annotation of syntactic functions (tier category &amp;quot;SyF&amp;quot;) is now available for 55 additional texts, of which 52 are folklore and 3 &lt;em&gt;&amp;ndash; &lt;/em&gt; narrative;&lt;/li&gt;
	&lt;li&gt;For texts originating from published and archival sources, as well as manuscripts, detailed references were added to the &amp;quot;Citation&amp;quot; section of the documentation and the respective field in the corpus metadata file.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Corpus size&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Forest Enets: &lt;strong&gt;541&lt;/strong&gt; texts, &lt;strong&gt;41,396&lt;/strong&gt; sentences, &lt;strong&gt;173,380&lt;/strong&gt;&amp;nbsp;tokens&lt;/li&gt;
	&lt;li&gt;Tundra Enets: &lt;strong&gt;137&lt;/strong&gt; texts, &lt;strong&gt;12,737&lt;/strong&gt; sentences, &lt;strong&gt;45,331&lt;/strong&gt; tokens&lt;/li&gt;
	&lt;li&gt;Total: &lt;strong&gt;678&lt;/strong&gt; texts, &lt;strong&gt;54,133&lt;/strong&gt; sentences, &lt;strong&gt;218,711&lt;/strong&gt;&amp;nbsp;tokens&lt;/li&gt;
	&lt;li&gt;Total duration of audio: &lt;strong&gt;43 &lt;/strong&gt;hours &lt;strong&gt;26 &lt;/strong&gt;minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Funding&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies&amp;rsquo; Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies&amp;rsquo; Programme is coordinated by the Union of the German Academies of Sciences and Humanities.&lt;/p&gt;

&lt;p&gt;Preliminary glossing work included into this corpus was supported by Endangered Languages Documentation Programme (ELDP) and by Max Planck Institute for Evolutionary Anthropology (MPI-EVA). See more details on financial support in the documentation&amp;nbsp;file below, section 1.6.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Contributions/Acknowledgements&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Dozens of people and many institutions contributed to the corpus (see more details in the documentation&amp;nbsp;file below, section 1.6). We are especially grateful to:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Enets speakers who generously shared their knowledge, especially those who spent many days working with us: Aleksandr S. Bolin&amp;dagger;, Leonid D. Bolin&amp;dagger;, Viktor N. Bolin, Nadezhda K. Bolina, Nina N. Bolina, Ekaterina S. Glibchenko, Gennadij A. Ivanov&amp;dagger;, Irina P. Koshkaryova&amp;dagger;, Valentina P. Nader, Lyudmila P. Novosyolova, Svetlana A. Roslyakova&amp;dagger;, Ivan I. Silkin&amp;dagger;, Nikolaj I. Silkin, Alevtina S. Silkina, Zoya A. Turutina, Tat`yana Ch. Yar,&lt;/li&gt;
	&lt;li&gt;In particular, Zoya N. Bolina and Viktor N. Pal`chin who also collaborated in ELDP project and extensively transcribed Enets recordings,&lt;/li&gt;
	&lt;li&gt;Natalia Stoynova, Sergey Trubetskoy and foremostly Maria Ovsjannikova who did recordings and transcriptions of Enets texts,&lt;/li&gt;
	&lt;li&gt;Institutions and private individuals who shared legacy data: the Institute for Linguistic Studies RAS, the Taymyr House of National Arts, the Dudinka branch of GTRK &amp;ldquo;Norilsk&amp;rdquo;; Dar`ya S. Bolina, Oksana E. Dobzhanskaya, Valentin Gusev, Larisa Leisi&amp;ouml;, Viktor N. Pal`chin, Irina P. Sorokina&amp;dagger;, Anna Urmanchieva,&lt;/li&gt;
	&lt;li&gt;Marina Lyublinskaya and Anna Urmanchieva who kindly permitted to include texts processed by them into the corpus,&lt;/li&gt;
	&lt;li&gt;Dar`ya S. Bolina who consulted a lot in the process of compilation of the corpus.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Searching the corpus&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The corpus can be downloaded from the ZFDM Repository using the links provided below and browsed or searched locally using the &lt;a href="https://exmaralda.org/"&gt;EXMARaLDA&lt;/a&gt; software or, alternatively, &lt;a href="https://archive.mpi.nl/tla/elan"&gt;ELAN&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Online search with Tsakorpus platform is available at &lt;a href="https://inel.corpora.uni-hamburg.de/EnetsCorpus/search"&gt;https://inel.corpora.uni-hamburg.de/EnetsCorpus/search&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Remote search with EXMARaLDA is also possible without downloading all the files (see &lt;a href="https://inel.corpora.uni-hamburg.de/portal/help/en/index.php#search"&gt;https://inel.corpora.uni-hamburg.de/portal/help/en/index.php#search&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;See the user documentation&amp;nbsp;(section 3) for details on transcription, annotation tiers and annotation tags.&lt;br&gt;
Find further information and links on the Enets Corpus page at the INEL Resources portal: &lt;a href="https://inel.corpora.uni-hamburg.de/portal/corpora/enets/"&gt;https://inel.corpora.uni-hamburg.de/portal/corpora/enets/&lt;/a&gt;.&lt;/p&gt;</description>
  </descriptions>
</resource>

Cite record as