INEL Enets Corpus

Shluinsky, Andrey; Khanina, Olesya; Wagner-Nagy, Beáta

doi:10.25592/uhhfdm.18195

December 31, 2025 Dataset Open Access

INEL Enets Corpus

Shluinsky, Andrey; Khanina, Olesya; Wagner-Nagy, Beáta

DataCite XML Export

<?xml version='1.0' encoding='utf-8'?>
<resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-3" xsi:schemaLocation="http://datacite.org/schema/kernel-3 http://schema.datacite.org/meta/kernel-3/metadata.xsd">
  <identifier identifierType="DOI">10.25592/uhhfdm.18195</identifier>
  <creators>
    <creator>
      <creatorName>Shluinsky, Andrey</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-2553-7213</nameIdentifier>
      <affiliation>Universität Hamburg</affiliation>
    </creator>
    <creator>
      <creatorName>Khanina, Olesya</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0001-5930-4656</nameIdentifier>
      <affiliation>University of Helsinki</affiliation>
    </creator>
    <creator>
      <creatorName>Wagner-Nagy, Beáta</creatorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-6801-1895</nameIdentifier>
      <affiliation>Universität Hamburg</affiliation>
    </creator>
  </creators>
  <titles>
    <title>INEL Enets Corpus</title>
  </titles>
  <publisher>Universität Hamburg</publisher>
  <publicationYear>2025</publicationYear>
  <subjects>
    <subject>Uralic</subject>
    <subject>Samoyedic</subject>
    <subject>Enets</subject>
    <subject>Forest Enets</subject>
    <subject>Tundra Enets</subject>
    <subject>endangered language</subject>
    <subject>language contact</subject>
    <subject>language documentation</subject>
    <subject>legacy data</subject>
    <subject>INEL</subject>
    <subject>AdWHH</subject>
    <subject>text corpus</subject>
    <subject>speech corpus</subject>
    <subject>parallel texts</subject>
    <subject>folklore</subject>
    <subject>tales</subject>
    <subject>narrative</subject>
    <subject>dialogue</subject>
    <subject>song</subject>
    <subject>transcription</subject>
    <subject>time-aligned</subject>
    <subject>audio</subject>
    <subject>video</subject>
    <subject>morphological glossing</subject>
    <subject>part-of-speech</subject>
    <subject>borrowings</subject>
    <subject>code-switching</subject>
    <subject>English translation</subject>
    <subject>Russian translation</subject>
    <subject>EXMARaLDA</subject>
    <subject>ELAN</subject>
    <subject>XML</subject>
    <subject>ISO/TEI</subject>
  </subjects>
  <contributors>
    <contributor contributorType="Editor">
      <contributorName>Arkhipov, Alexandre</contributorName>
      <affiliation>Universität Hamburg</affiliation>
    </contributor>
    <contributor contributorType="Editor">
      <contributorName>Wagner-Nagy, Beáta</contributorName>
      <affiliation>Universität Hamburg</affiliation>
    </contributor>
    <contributor contributorType="DataManager">
      <contributorName>Lazarenko, Elena</contributorName>
      <affiliation>Universität Hamburg</affiliation>
    </contributor>
    <contributor contributorType="DataManager">
      <contributorName>Riaposov, Aleksandr</contributorName>
      <affiliation>Universität Hamburg</affiliation>
    </contributor>
    <contributor contributorType="DataManager">
      <contributorName>Lehmberg, Timm</contributorName>
      <affiliation>Universität Hamburg</affiliation>
    </contributor>
  </contributors>
  <dates>
    <date dateType="Issued">2025-12-31</date>
  </dates>
  <resourceType resourceTypeGeneral="Dataset"/>
  <alternateIdentifiers>
    <alternateIdentifier alternateIdentifierType="url">https://www.fdr.uni-hamburg.de/record/18195</alternateIdentifier>
  </alternateIdentifiers>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="Handle" relationType="IsCitedBy">11022/0000-0008-005C-1</relatedIdentifier>
    <relatedIdentifier relatedIdentifierType="DOI" relationType="IsPartOf">10.25592/uhhfdm.16181</relatedIdentifier>
  </relatedIdentifiers>
  <version>1.1</version>
  <rightsList>
    <rights rightsURI="https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode">Creative Commons Attribution Non Commercial Share Alike 4.0 International</rights>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
  </rightsList>
  <descriptions>
    <description descriptionType="Abstract">&lt;p&gt;&lt;strong&gt;Corpus Citation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Shluinsky, Andrey; Khanina, Olesya; Wagner-Nagy, Be&amp;aacute;ta&lt;/em&gt;. 2025. INEL Enets Corpus. Version 1.1. Publication date 2025-12-31. &lt;a href="https://hdl.handle.net/11022/0000-0008-005C-1"&gt;https://hdl.handle.net/11022/0000-0008-005C-1&lt;/a&gt;. Archived at Universit&amp;auml;t Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages. &lt;a href="https://hdl.handle.net/11022/0000-0007-F45A-1"&gt;https://hdl.handle.net/11022/0000-0007-F45A-1&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Corpus Description&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The INEL Enets corpus has been created within the long-term INEL project (&amp;quot;Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages&amp;quot;), 2016&amp;ndash;2033.&lt;/p&gt;

&lt;p&gt;The corpus includes texts recorded between 1962&amp;ndash;2017 in both Enets lects &amp;ndash; Forest Enets and Tundra Enets. The sources of the corpus (see more details in the user documentation, section 2.2) are:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Audio recordings done by Olesya Khanina, Maria Ovsjannikova, Andrey Shluinsky, Natalia Stoynova and Sergey Trubetskoy,&lt;/li&gt;
	&lt;li&gt;Legacy audio recordings done by Vera Bettu, Nina N. Bolina, Dar`ya S. Bolina, Zoya N. Bolina, Oksana E. Dobzhanskaya, Valentin Gusev, Eugene Helimski&amp;dagger;, Kazimir I. Labanauskas&amp;dagger;, Larisa Leisi&amp;ouml;, Marina Lyublinskaya, Kaur M&amp;auml;gi, Viktor N. Pal`chin, Marina N. Pal`china, Irina P. Sorokina&amp;dagger;, Anna Urmanchieva, Be&amp;aacute;ta Wagner-Nagy and possibly other people,&lt;/li&gt;
	&lt;li&gt;Published audio recordings,&lt;/li&gt;
	&lt;li&gt;Texts published by Dar`ya S. Bolina, Yaroslav A. Gluxij&amp;dagger; and Vasilij A. Susekov&amp;dagger;, Eugene Helimski&amp;dagger;, Kazimir I. Labanauskas&amp;dagger;, Tibor Mikola&amp;dagger;, J&amp;aacute;nos Pusztay, Irina P. Sorokina&amp;dagger;, Anna Urmanchieva,&lt;/li&gt;
	&lt;li&gt;Legacy manuscript transcriptions and self-transcriptions done and/or edited by Dar`ya S. Bolina, Galina S. Bolina, Zoya N. Bolina, Valentin Gusev, Eugene Helimski&amp;dagger;, Kazimir I. Labanauskas&amp;dagger;, Larisa Leisi&amp;ouml;, Marina Lyublinskaya, Vasilij F. Ly`rmin&amp;dagger;, Anton N. Pal`chin, Viktor N. Pal`chin, Ivan I. Silkin&amp;dagger;, Irina P. Sorokina&amp;dagger;, Natal`ya M. Tere&amp;scaron;čenko&amp;dagger;, Anna Urmanchieva and possibly other people.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All texts in the corpus are provided with interlinear morpheme-by-morpheme glosses and translation into English and Russian. All texts for which the audio recordings were accessible are time-aligned with them. Video recordings are also included into the corpus if available.&lt;/p&gt;

&lt;p&gt;&lt;br&gt;
&lt;strong&gt;New in release 1.1&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Annotation of syntactic functions (tier category &amp;quot;SyF&amp;quot;) is now available for 55 additional texts, of which 52 are folklore and 3 &lt;em&gt;&amp;ndash; &lt;/em&gt; narrative;&lt;/li&gt;
	&lt;li&gt;For texts originating from published and archival sources, as well as manuscripts, detailed references were added to the &amp;quot;Citation&amp;quot; section of the documentation and the respective field in the corpus metadata file.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Corpus size&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Forest Enets: &lt;strong&gt;541&lt;/strong&gt; texts, &lt;strong&gt;41,396&lt;/strong&gt; sentences, &lt;strong&gt;173,380&lt;/strong&gt;&amp;nbsp;tokens&lt;/li&gt;
	&lt;li&gt;Tundra Enets: &lt;strong&gt;137&lt;/strong&gt; texts, &lt;strong&gt;12,737&lt;/strong&gt; sentences, &lt;strong&gt;45,331&lt;/strong&gt; tokens&lt;/li&gt;
	&lt;li&gt;Total: &lt;strong&gt;678&lt;/strong&gt; texts, &lt;strong&gt;54,133&lt;/strong&gt; sentences, &lt;strong&gt;218,711&lt;/strong&gt;&amp;nbsp;tokens&lt;/li&gt;
	&lt;li&gt;Total duration of audio: &lt;strong&gt;43 &lt;/strong&gt;hours &lt;strong&gt;26 &lt;/strong&gt;minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Funding&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies&amp;rsquo; Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies&amp;rsquo; Programme is coordinated by the Union of the German Academies of Sciences and Humanities.&lt;/p&gt;

&lt;p&gt;Preliminary glossing work included into this corpus was supported by Endangered Languages Documentation Programme (ELDP) and by Max Planck Institute for Evolutionary Anthropology (MPI-EVA). See more details on financial support in the documentation&amp;nbsp;file below, section 1.6.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Contributions/Acknowledgements&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Dozens of people and many institutions contributed to the corpus (see more details in the documentation&amp;nbsp;file below, section 1.6). We are especially grateful to:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;Enets speakers who generously shared their knowledge, especially those who spent many days working with us: Aleksandr S. Bolin&amp;dagger;, Leonid D. Bolin&amp;dagger;, Viktor N. Bolin, Nadezhda K. Bolina, Nina N. Bolina, Ekaterina S. Glibchenko, Gennadij A. Ivanov&amp;dagger;, Irina P. Koshkaryova&amp;dagger;, Valentina P. Nader, Lyudmila P. Novosyolova, Svetlana A. Roslyakova&amp;dagger;, Ivan I. Silkin&amp;dagger;, Nikolaj I. Silkin, Alevtina S. Silkina, Zoya A. Turutina, Tat`yana Ch. Yar,&lt;/li&gt;
	&lt;li&gt;In particular, Zoya N. Bolina and Viktor N. Pal`chin who also collaborated in ELDP project and extensively transcribed Enets recordings,&lt;/li&gt;
	&lt;li&gt;Natalia Stoynova, Sergey Trubetskoy and foremostly Maria Ovsjannikova who did recordings and transcriptions of Enets texts,&lt;/li&gt;
	&lt;li&gt;Institutions and private individuals who shared legacy data: the Institute for Linguistic Studies RAS, the Taymyr House of National Arts, the Dudinka branch of GTRK &amp;ldquo;Norilsk&amp;rdquo;; Dar`ya S. Bolina, Oksana E. Dobzhanskaya, Valentin Gusev, Larisa Leisi&amp;ouml;, Viktor N. Pal`chin, Irina P. Sorokina&amp;dagger;, Anna Urmanchieva,&lt;/li&gt;
	&lt;li&gt;Marina Lyublinskaya and Anna Urmanchieva who kindly permitted to include texts processed by them into the corpus,&lt;/li&gt;
	&lt;li&gt;Dar`ya S. Bolina who consulted a lot in the process of compilation of the corpus.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Searching the corpus&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The corpus can be downloaded from the ZFDM Repository using the links provided below and browsed or searched locally using the &lt;a href="https://exmaralda.org/"&gt;EXMARaLDA&lt;/a&gt; software or, alternatively, &lt;a href="https://archive.mpi.nl/tla/elan"&gt;ELAN&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Online search with Tsakorpus platform is available at &lt;a href="https://inel.corpora.uni-hamburg.de/EnetsCorpus/search"&gt;https://inel.corpora.uni-hamburg.de/EnetsCorpus/search&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Remote search with EXMARaLDA is also possible without downloading all the files (see &lt;a href="https://inel.corpora.uni-hamburg.de/portal/help/en/index.php#search"&gt;https://inel.corpora.uni-hamburg.de/portal/help/en/index.php#search&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;See the user documentation&amp;nbsp;(section 3) for details on transcription, annotation tiers and annotation tags.&lt;br&gt;
Find further information and links on the Enets Corpus page at the INEL Resources portal: &lt;a href="https://inel.corpora.uni-hamburg.de/portal/corpora/enets/"&gt;https://inel.corpora.uni-hamburg.de/portal/corpora/enets/&lt;/a&gt;.&lt;/p&gt;</description>
  </descriptions>
</resource>

Publication date:

December 31, 2025

DOI:

Keyword(s):

Uralic Samoyedic Enets Forest Enets Tundra Enets endangered language language contact language documentation legacy data INEL AdWHH text corpus speech corpus parallel texts folklore tales narrative dialogue song transcription time-aligned audio video morphological glossing part-of-speech borrowings code-switching English translation Russian translation EXMARaLDA ELAN XML ISO/TEI

Related identifiers:

Cited by:
11022/0000-0008-005C-1

Communities:

License (for files):

Creative Commons Attribution Non Commercial Share Alike 4.0 International

Versions

Version 1.1 10.25592/uhhfdm.18195	Dec 31, 2025
Version 1.0 10.25592/uhhfdm.16182	Nov 30, 2024

Cite all versions? You can cite all versions by using the DOI 10.25592/uhhfdm.16181. This DOI represents all versions, and will always resolve to the latest one.

Zentrumfür Nachhaltiges Forschungsdatenmanagement

Suche

INEL Enets Corpus

DataCite XML Export

Versions

Cite record as

Export

INEL Enets Corpus

DataCite XML Export

DOI Badge

Markdown

[![DOI](https://www.fdr.uni-hamburg.de/badge/DOI/10.25592/uhhfdm.18195.svg)](https://doi.org/10.25592/uhhfdm.18195)

reStructedText

.. image:: https://www.fdr.uni-hamburg.de/badge/DOI/10.25592/uhhfdm.18195.svg :target: https://doi.org/10.25592/uhhfdm.18195

HTML

<a href="https://doi.org/10.25592/uhhfdm.18195"><img src="https://www.fdr.uni-hamburg.de/badge/DOI/10.25592/uhhfdm.18195.svg" alt="DOI"></a>

Image URL

https://www.fdr.uni-hamburg.de/badge/DOI/10.25592/uhhfdm.18195.svg

Target URL

https://doi.org/10.25592/uhhfdm.18195

Versions

Cite record as

Export