<?xml version='1.0' encoding='UTF-8'?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2026-04-07T02:47:06Z</responseDate>
  <request verb="ListRecords" set="user-inel" metadataPrefix="oai_dc">https://www.fdr.uni-hamburg.de/oai2d</request>
  <ListRecords>
    <record>
      <header>
        <identifier>oai:fdr.uni-hamburg.de:9722</identifier>
        <datestamp>2025-09-22T12:44:01Z</datestamp>
        <setSpec>user-inel</setSpec>
        <setSpec>user-adwhh</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:contributor>Wagner-Nagy, Be´ata</dc:contributor>
          <dc:contributor>Arkhipov, Alexandre</dc:contributor>
          <dc:contributor>Brykina, Maria</dc:contributor>
          <dc:contributor>Orlova, Svetlana</dc:contributor>
          <dc:contributor>Ferger, Anne</dc:contributor>
          <dc:contributor>Jettka, Daniel</dc:contributor>
          <dc:contributor>Lehmberg, Timm</dc:contributor>
          <dc:creator>Brykina, Maria</dc:creator>
          <dc:creator>Orlova, Svetlana</dc:creator>
          <dc:creator>Wagner-Nagy, Beáta</dc:creator>
          <dc:date>2018-12-31</dc:date>
          <dc:description>Corpus Citation

Brykina, Maria; Orlova, Svetlana; Wagner-Nagy, Beáta. 2018. INEL Selkup Corpus. Version 0.1. Publication date 2018-12-31. Archived in Hamburger Zentrum für Sprachkorpora. https://hdl.handle.net/11022/0000-0007-CAE5-3. In: Wagner-Nagy, Beáta; Arkhipov, Alexandre; Ferger, Anne; Jettka, Daniel; Lehmberg, Timm (eds.). 2018. The INEL corpora of indigenous Northern Eurasian languages.

Corpus Description

The INEL Selkup corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages”), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Selkup language and expands the documentation of the lesser described indigenous languages of Northern Eurasia.

The INEL Selkup corpus is composed of texts from the archive of Angelina Ivanovna Kuzmina (1924–2002), who gathered a large amount of material on Selkup in almost all regions where the Selkup people lived in 1962–1977. Most texts in the corpus originate from the handwritten part of the archive, the others come from sound recordings made by A.I. Kuzmina, transcribed and translated within the INEL project.

Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of Russian borrowings. Some texts also have annotations for syntactic functions, semantic roles and information status.

Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Contributions/Acknowledgements

Sound materials of Angelina Kuzmina were transcribed and translated by native speakers of Selkup:


	Svetlana Nikitichna Sankevich (Kunina), oral transcription and Russian translation of texts in Northern dialects
	Evgeniya Sergeevna Smorgunova (Irikova), oral and written transcription and Russian translation of audio texts in Northern dialects
	Valentina Vladimirovna Tamel`kina, oral transcription and Russian translation of audio texts in Northern dialects


The web-based search interface is using the Tsakonian Corpus platform developed by Dr. Timofey Arkhangelskiy, Humboldt Research Fellow at IFUU, Hamburg University</dc:description>
          <dc:identifier>https://www.fdr.uni-hamburg.de/record/9722</dc:identifier>
          <dc:identifier>10.25592/uhhfdm.9722</dc:identifier>
          <dc:identifier>oai:fdr.uni-hamburg.de:9722</dc:identifier>
          <dc:language>sel</dc:language>
          <dc:relation>handle:11022/0000-0007-CAE5-3</dc:relation>
          <dc:relation>doi:10.25592/uhhfdm.9721</dc:relation>
          <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
          <dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode</dc:rights>
          <dc:subject>endangered language</dc:subject>
          <dc:subject>indigenous language</dc:subject>
          <dc:subject>L1 data</dc:subject>
          <dc:subject>language contact</dc:subject>
          <dc:subject>language documentation</dc:subject>
          <dc:subject>INEL</dc:subject>
          <dc:subject>folklore</dc:subject>
          <dc:subject>narrative</dc:subject>
          <dc:subject>monologue</dc:subject>
          <dc:subject>annotated</dc:subject>
          <dc:subject>morphological glossing</dc:subject>
          <dc:subject>borrowings</dc:subject>
          <dc:subject>code-switching</dc:subject>
          <dc:subject>semantic roles</dc:subject>
          <dc:subject>syntactic functions</dc:subject>
          <dc:subject>information status</dc:subject>
          <dc:subject>English translation</dc:subject>
          <dc:subject>German translation</dc:subject>
          <dc:subject>Russian translation</dc:subject>
          <dc:title>INEL Selkup Corpus</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:fdr.uni-hamburg.de:10973</identifier>
        <datestamp>2024-04-09T11:42:55Z</datestamp>
        <setSpec>user-inel</setSpec>
        <setSpec>user-adwhh</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:contributor>Wagner-Nagy, Beáta</dc:contributor>
          <dc:contributor>Arkhipov, Alexandre</dc:contributor>
          <dc:contributor>Budzisch, Josefiina</dc:contributor>
          <dc:contributor>Orlova, Svetlana</dc:contributor>
          <dc:contributor>Lehmberg, Timm</dc:contributor>
          <dc:creator>Wagner-Nagy, Be´ata</dc:creator>
          <dc:date>2022-11-11</dc:date>
          <dc:description>This record comprizes the digitized manuscript collected by Angelina Ivanovna Kuzmina (1924–2002) between 1962 and 1977 plus additional structured information. The attached dataset contains metadata on individuals and locations, indexing and keywording with respect to contenttype ant grammatical information. 

 

 </dc:description>
          <dc:identifier>https://www.fdr.uni-hamburg.de/record/10973</dc:identifier>
          <dc:identifier>10.25592/uhhfdm.10973</dc:identifier>
          <dc:identifier>oai:fdr.uni-hamburg.de:10973</dc:identifier>
          <dc:language>sel</dc:language>
          <dc:relation>doi:10.25592/uhhfdm.10972</dc:relation>
          <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
          <dc:rights>https://creativecommons.org/licenses/by/4.0/legalcode</dc:rights>
          <dc:subject>endangered language</dc:subject>
          <dc:subject>manuscript fieldnotes</dc:subject>
          <dc:subject>L1 data</dc:subject>
          <dc:subject>language documentation</dc:subject>
          <dc:subject>INEL</dc:subject>
          <dc:subject>folklore</dc:subject>
          <dc:subject>narrative</dc:subject>
          <dc:subject>monologue</dc:subject>
          <dc:subject>Russian translation</dc:subject>
          <dc:subject>lexicon</dc:subject>
          <dc:subject>Russian</dc:subject>
          <dc:subject>Central Selkup</dc:subject>
          <dc:subject>Southern Selkup</dc:subject>
          <dc:subject>Northern Selkup</dc:subject>
          <dc:title>Kuzmina Archive - Manuscripts</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:fdr.uni-hamburg.de:11165</identifier>
        <datestamp>2025-09-12T12:10:05Z</datestamp>
        <setSpec>user-inel</setSpec>
        <setSpec>user-adwhh</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:contributor>Wagner-Nagy, Be´ata</dc:contributor>
          <dc:contributor>Arkhipov, Alexandre</dc:contributor>
          <dc:contributor>Däbritz, Chris Lasse</dc:contributor>
          <dc:contributor>Kudryakova, Nina</dc:contributor>
          <dc:contributor>Stapert, Eugénie</dc:contributor>
          <dc:contributor>Ferger, Anne</dc:contributor>
          <dc:contributor>Jettka, Daniel</dc:contributor>
          <dc:contributor>Lazarenko, Elena</dc:contributor>
          <dc:contributor>Lehmberg, Timm</dc:contributor>
          <dc:contributor>Riaposov, Aleksandr</dc:contributor>
          <dc:creator>Däbritz, Chris Lasse</dc:creator>
          <dc:creator>Kudryakova, Nina</dc:creator>
          <dc:creator>Stapert, Eugénie</dc:creator>
          <dc:date>2022-11-30</dc:date>
          <dc:description>Corpus Citation

Däbritz, Chris Lasse; Kudryakova, Nina; Stapert, Eugénie. 2022. INEL Dolgan Corpus. Version 2.0. Publication date 2022-11-30. https://hdl.handle.net/11022/0000-0007-F9A7-4. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1. 

Corpus Description

The INEL Dolgan corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages”), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Dolgan language and expands the documentation of the lesser described indigenous languages of Northern Eurasia.

The INEL Dolgan corpus is composed of texts from different sources: 1. Published folklore texts from an edited volume ("Fol'klor Dolgan", P.E. Efremov 2000), 2. Transcripts of recordings obtained from the Taymyr House of Folk Art (TDNT) in Dudinka (1970s-2000s), 3. Transcripts from the collection of Dr. Eugénie Stapert recorded on several fieldwork trips in 2007-2010, 4. Transcripts of recordings made on a fieldwork trip in 2017. The first group as well as parts of the third group were already transcribed and translated, the rest of the recordings was transcribed and translated within the INEL project.

Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of Russian borrowings. Some texts also have annotations for syntactic functions, semantic roles and information structure/information status.

New in release 2.0


	20 glossed transcripts (2864 utterances, 19989 tokens) with 03:33:14 hours of corresponding sound
	37 audio files with 10:00:36 hours of sound without glossed transcripts
	Corrections of grammatical analyses and glossing according to the findings in Däbritz’s (2022) grammar, as well as cross-corpora harmonizations
	Additional corpus-wide annotation of Mongolic borrowings
	Additional corpus-wide annotation of existential, locative and possessive predication
	Corrections in further annotations, translations and metadata


Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.</dc:description>
          <dc:identifier>https://www.fdr.uni-hamburg.de/record/11165</dc:identifier>
          <dc:identifier>10.25592/uhhfdm.11165</dc:identifier>
          <dc:identifier>oai:fdr.uni-hamburg.de:11165</dc:identifier>
          <dc:language>dlg</dc:language>
          <dc:relation>handle:11022/0000-0007-F9A7-4</dc:relation>
          <dc:relation>doi:10.25592/uhhfdm.9746</dc:relation>
          <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
          <dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode</dc:rights>
          <dc:subject>endangered language</dc:subject>
          <dc:subject>indigenous language</dc:subject>
          <dc:subject>L1 data</dc:subject>
          <dc:subject>language contact</dc:subject>
          <dc:subject>language documentation</dc:subject>
          <dc:subject>INEL</dc:subject>
          <dc:subject>folklore</dc:subject>
          <dc:subject>narrative</dc:subject>
          <dc:subject>monologue</dc:subject>
          <dc:subject>annotated</dc:subject>
          <dc:subject>morphological glossing</dc:subject>
          <dc:subject>borrowings</dc:subject>
          <dc:subject>code-switching</dc:subject>
          <dc:subject>semantic roles</dc:subject>
          <dc:subject>syntactic functions</dc:subject>
          <dc:subject>information status</dc:subject>
          <dc:subject>English translation</dc:subject>
          <dc:subject>German translation</dc:subject>
          <dc:subject>Russian translation</dc:subject>
          <dc:subject>existential predication</dc:subject>
          <dc:subject>locative predication</dc:subject>
          <dc:subject>non-verbal predication</dc:subject>
          <dc:title>INEL Dolgan Corpus</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:fdr.uni-hamburg.de:13882</identifier>
        <datestamp>2025-12-17T14:29:58Z</datestamp>
        <setSpec>user-inel</setSpec>
        <setSpec>user-adwhh</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:contributor>Wagner-Nagy, Beata</dc:contributor>
          <dc:contributor>Arkhipov, Alexandre</dc:contributor>
          <dc:contributor>Gusev, Valentin</dc:contributor>
          <dc:contributor>Klooster, Tiina</dc:contributor>
          <dc:contributor>Ferger, Anne</dc:contributor>
          <dc:contributor>Jettka, Daniel</dc:contributor>
          <dc:contributor>Lehmberg, Timm</dc:contributor>
          <dc:creator>Gusev, Valentin</dc:creator>
          <dc:creator>Klooster, Tiina</dc:creator>
          <dc:creator>Wagner-Nagy, Beáta</dc:creator>
          <dc:date>2023-12-29</dc:date>
          <dc:description>Corpus Citation

Gusev, Valentin; Klooster, Tiina; Wagner-Nagy, Beáta. 2023. “INEL Kamas Corpus.” Version 2.0. Publication date 2023-12-31. http://hdl.handle.net/11022/0000-0007-FC25-4. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages.https://hdl.handle.net/11022/0000-0007-F45A-1.

Corpus Description

The INEL Kamas corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Kamas language and expands the documentation of the lesser described indigenous languages of Northern Eurasia.

The INEL Kamas corpus consists of two parts: folklore texts collected by Kai Donner in 1912–1914, and transcribed audio recordings of the last speaker of Kamas, Klavdiya Plotnikova, made between 1964 and 1970.

Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of syntactic functions, semantic roles, Russian borrowings and code-switching. Some texts also have annotations for information status.

New in release 2.0


	In texts from Donner’s collection, phonetic transcription according to Klumpp's edition of Donner’s manuscripts has been added (as stl tier)
	Five texts which were originally split between different tapes have been merged, as well as respective parts of recordings. Sentences in each resulting text are numbered throughout
	
		PKZ_196X_Alenushka_flk + PKZ_196X_Alenushka_continuation_flk &gt; PKZ_196X_Alenushka_flk
		End of PKZ_196X_SU0226 starting from PKZ_196X_SU0226.203 (210) + PKZ_196X_Alenushka2_continuation_flk &gt; PKZ_196X_Alenushka2_flk
		PKZ_196X_BlacksmithAndMerchant_flk + PKZ_196X_BlacksmithAndMerchant_cont_flk &gt; PKZ_196X_BlacksmithAndMerchant_flk
		PKZ_196X_Finist_flk + PKZ_196X_Finist_continuation_flk &gt; PKZ_196X_Finist_flk
		PKZ_196X_StupidWolf_flk + PKZ_196X_StupidWolf_continuation_flk &gt; PKZ_196X_StupidWolf_flk
	
	
	Part of the texts are now annotated for existential, locative and possessive predication (ExLocPoss tier, by C.L. Däbritz)
	Numerous corrections in glosses, other annotations and transcriptions, including:
	
		Fuller and more consistent transcription, glossing and annotations of borrowings
		Vowel length is marked in mp tier in baːzoʔ ‘again’, büːzʼe ‘man’ and saːgər ‘black’
		Corrections in disambiguation of polysemous or homonymous morphemes: 
		-ziʔ "INS"/"COM", -də "LAT"/"3SG", mo- "can/become/want | мочь/стать/хотеть"
		Possessive suffix unmarked for case: "NOM/GEN/ACC" &gt; "POSS"
		Glosses for personal pronouns were changed to uniform labels: "I | я" &gt; "PRO1SG", "we | мы" &gt; "PRO1PL", "you | ты" &gt; "PRO2SG", "you.PL | вы" &gt; "PRO2PL"
		Fuller annotations of code-switching and calques (CS tier)
	
	
	Added ELAN *.eaf as a supplementary end-user file format for all transcripts


Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Contributions/Acknowledgements


	
	Recordings of Kamas speech made by Ago Künnap in Abalakovo and by Tiit-Rein Viitso in Tartu provided by the Archive of Estonian Dialects and Kindred Languages of the University of Tartu, Estonia (AEDKL, or TÜEMSA).
	
	
	Recordings of Klavdiya Plotnikova made by Jaakko Yli-Paavola in Tallinn in 1970 provided by the Institute for the Languages of Finland archive, Helsinki (KOTUS).
	
	
	Scanned pages from the Kai Donners Kamassisches Wörterbuch (Joki 1944) containing texts collected by Kai Donner published online courtesy of the Finno-Ugrian Society.
	
	
	The web-based search interface is using the Tsakonian Corpus platform developed by Dr. Timofey Arkhangelskiy.
	
</dc:description>
          <dc:identifier>https://www.fdr.uni-hamburg.de/record/13882</dc:identifier>
          <dc:identifier>10.25592/uhhfdm.13882</dc:identifier>
          <dc:identifier>oai:fdr.uni-hamburg.de:13882</dc:identifier>
          <dc:language>xas</dc:language>
          <dc:relation>handle:11022/0000-0007-FC25-4</dc:relation>
          <dc:relation>doi:10.25592/uhhfdm.9740</dc:relation>
          <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
          <dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode</dc:rights>
          <dc:subject>endangered language</dc:subject>
          <dc:subject>indigenous language</dc:subject>
          <dc:subject>L1 data</dc:subject>
          <dc:subject>language contact</dc:subject>
          <dc:subject>language documentation</dc:subject>
          <dc:subject>INEL</dc:subject>
          <dc:subject>folklore</dc:subject>
          <dc:subject>narrative</dc:subject>
          <dc:subject>monologue</dc:subject>
          <dc:subject>annotated</dc:subject>
          <dc:subject>morphological glossing</dc:subject>
          <dc:subject>borrowings</dc:subject>
          <dc:subject>code-switching</dc:subject>
          <dc:subject>semantic roles</dc:subject>
          <dc:subject>syntactic functions</dc:subject>
          <dc:subject>information status</dc:subject>
          <dc:subject>English translation</dc:subject>
          <dc:subject>German translation</dc:subject>
          <dc:subject>Russian translation</dc:subject>
          <dc:title>INEL Kamas Corpus</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:fdr.uni-hamburg.de:16605</identifier>
        <datestamp>2024-12-30T16:11:21Z</datestamp>
        <setSpec>user-adwhh</setSpec>
        <setSpec>user-inel</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:contributor>Wagner-Nagy, Be´ata</dc:contributor>
          <dc:contributor>Arkhipov, Alexandre</dc:contributor>
          <dc:contributor>Ferger, Anne</dc:contributor>
          <dc:contributor>Jettka, Daniel</dc:contributor>
          <dc:contributor>Lazarenko, Elena</dc:contributor>
          <dc:contributor>Lehmberg, Timm</dc:contributor>
          <dc:contributor>Riaposov, Aleksandr</dc:contributor>
          <dc:creator>Däbritz, Chris Lasse</dc:creator>
          <dc:creator>Gusev, Valentin</dc:creator>
          <dc:creator>Stoynova, Natalia</dc:creator>
          <dc:date>2024-12-31</dc:date>
          <dc:description>Corpus Citation

Däbritz, Chris Lasse; Gusev, Valentin; Stoynova, Natalia. 2024. INEL Evenki Corpus. Version 2.0. Publication date 2024-12-31. Archived at Universität Hamburg. https://hdl.handle.net/11022/0000-0007-FE38-D. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1

Corpus Description

The INEL Evenki Corpus has been created within the long-term INEL project (Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages), 2016–2033.
The corpus makes possible typologically aware corpus-based grammatical research on the Evenki (&lt; Tungusic) language and expands the documentation of the lesser described indigenous languages of Northern Eurasia.
The INEL Evenki Corpus covers Northern (Taimyr, Khantayskoe Ozero, Ilimpi, Yerbogachyon) and Southern (Sym, Barhahan, and to a smaller extent Stony Tunguska and Nepa) Evenki dialects. These are exactly the dialects which are or were in contact with other languages included in the INEL project, that is first and foremost Dolgan and Selkup. The INEL Evenki Corpus contains texts from different sources:


	Published texts from several text collections: Vasilevich (1936): the Ilimpi, Yerbogachyon, Sym, Nepa dialects; Anisimov (1936): the Stony Tunguska dialect; Brodskaya (1967): the Khantayskoe Ozero dialect.
	Transcripts of recordings obtained from the Taimyr House of National Arts (TDNT) in Dudinka (2000s) as well as transcripts of recordings made by and from Tat`yana V. Bolina, all of them representing the Khantayskoe Ozero dialect. For these texts, corresponding time-aligned audio files are available.
	Texts from the handwritten archive of the Russian ethnographer and linguist Konstantin M. Rychkov recorded in the 1900s/1910s, covering the Taimyr, Ilimpi, Sym, and Barhahan dialects.


Each text in the corpus is provided with morphological glossing, translation into English, Russian, and German, as well as annotation of Russian borrowings. Some texts also have annotations for syntactic functions, semantic roles, information status, as well as for existential, locative, and possessive predication.

Corpus size


	Northern dialects (Ilimpi, Yerbogachyon, Khantayskoye Ozero, Taimyr):
	176 texts, 7,091 sentences, 34,931 tokens
	Southern “sh” dialects (Sym, Barhahan):
	425 texts, 12,395 sentences, 55,674 tokens
	Southern “s” dialects (Stony Tunguska, Nepa):
	11 texts, 445 sentences, 2,659 tokens
	Total: 612 texts, 19,931 sentences, 93,264 tokens
	Total duration of audio: 3 hours 58 minutes (69 texts)


New in release 2.0


	The total size of the corpus has increased about twice (from 47,708 to 93,264 tokens):
	
		new texts in the Sym dialect from the Rychkov archive have been added (15,495 tokens), the entire Sym collection from the archive is now included in the corpus
		a text collection in the Barhahan dialect from the Rychkov archive has been included in the corpus (30,061 tokens)
	
	
	Some errors in glossing have been fixed
	Glossing has been unified at some points (e.g. the analysis of finite past tense forms as finite verbs vs. participles: all such forms are now glossed as finite verbs)
	Many glossing labels have been changed; in particular, most ambiguous grammatical glosses have been disambiguated by numbers and/or by semantic specifications: e.g. DIM for four affixes  ⇒  DIM1, DIM2, DIM3, DIM4; NMLZ ⇒ NMLZ.TMP, NMLZ.PT, etc.
	The structure of metadata has been slightly modified (e.g. fields for the source type and availability of audio files have been added)


Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Contributions/Acknowledgements


	The Taimyr House of National Arts (TDNT) provided valuable audio material (see above).
	Tat`yana V. Bolina (TDNT Leading Methodologist for Evenki folklore and culture) recorded further Evenki material in 2018 and 2019.
	The Institute of Oriental Manuscripts of the Russian Academy of Sciences (IOM RAS / IVR; Институт восточных рукописей РАН) in Saint Petersburg provided scanned manuscripts from the Rychkov archive (The Archives of the Orientalists of IOM RAS, Coll. 49, inv. 1, items 4, 5, 6а, 6б, 6в).


Searching the corpus

The corpus can be downloaded from the ZFDM Repository using the links provided below and browsed or searched locally using the EXMARaLDA software or, alternatively, ELAN.

Online search with Tsakorpus platform is available at https://inel.corpora.uni-hamburg.de/EvenkiCorpus/search.

Remote search with EXMARaLDA is also possible without downloading all the files (see https://inel.corpora.uni-hamburg.de/portal/help/en/index.php#search).

See the user documentation (section 3) for details on transcription, annotation tiers and annotation tags. Find further information and links on the Evenki Corpus page at the INEL Resources portal: https://inel.corpora.uni-hamburg.de/portal/corpora/evenki/.</dc:description>
          <dc:identifier>https://www.fdr.uni-hamburg.de/record/16605</dc:identifier>
          <dc:identifier>10.25592/uhhfdm.16605</dc:identifier>
          <dc:identifier>oai:fdr.uni-hamburg.de:16605</dc:identifier>
          <dc:language>evn</dc:language>
          <dc:relation>handle:11022/0000-0007-F43C-3</dc:relation>
          <dc:relation>doi:10.25592/uhhfdm.9627</dc:relation>
          <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
          <dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode</dc:rights>
          <dc:subject>Tungusic</dc:subject>
          <dc:subject>Evenki</dc:subject>
          <dc:subject>endangered language</dc:subject>
          <dc:subject>language contact</dc:subject>
          <dc:subject>language documentation</dc:subject>
          <dc:subject>legacy data</dc:subject>
          <dc:subject>INEL</dc:subject>
          <dc:subject>AdWHH</dc:subject>
          <dc:subject>text corpus</dc:subject>
          <dc:subject>speech corpus</dc:subject>
          <dc:subject>parallel texts</dc:subject>
          <dc:subject>folklore</dc:subject>
          <dc:subject>tales</dc:subject>
          <dc:subject>narrative</dc:subject>
          <dc:subject>conversation</dc:subject>
          <dc:subject>song</dc:subject>
          <dc:subject>transcription</dc:subject>
          <dc:subject>time-aligned</dc:subject>
          <dc:subject>audio</dc:subject>
          <dc:subject>morphological glossing</dc:subject>
          <dc:subject>part-of-speech</dc:subject>
          <dc:subject>borrowings</dc:subject>
          <dc:subject>code-switching</dc:subject>
          <dc:subject>semantic roles</dc:subject>
          <dc:subject>syntactic functions</dc:subject>
          <dc:subject>information status</dc:subject>
          <dc:subject>existential predication</dc:subject>
          <dc:subject>locative predication</dc:subject>
          <dc:subject>possessive predication</dc:subject>
          <dc:subject>English translation</dc:subject>
          <dc:subject>German translation</dc:subject>
          <dc:subject>Russian translation</dc:subject>
          <dc:subject>EXMARaLDA</dc:subject>
          <dc:subject>ELAN</dc:subject>
          <dc:subject>XML</dc:subject>
          <dc:subject>ISO/TEI</dc:subject>
          <dc:title>INEL Evenki Corpus</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:fdr.uni-hamburg.de:17513</identifier>
        <datestamp>2025-05-28T09:59:42Z</datestamp>
        <setSpec>user-uhh</setSpec>
        <setSpec>user-inel</setSpec>
        <setSpec>user-adwhh</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:contributor>Wagner-Nagy, Beáta</dc:contributor>
          <dc:contributor>Arkhipov, Alexandre</dc:contributor>
          <dc:contributor>Brykina, Maria</dc:contributor>
          <dc:contributor>Lazarenko, Elena</dc:contributor>
          <dc:contributor>Riaposov, Aleksandr</dc:contributor>
          <dc:creator>Wagner-Nagy, Beáta</dc:creator>
          <dc:creator>Sipőcz, Katalin</dc:creator>
          <dc:date>2025-05-15</dc:date>
          <dc:description>Corpus Citation

Sipőcz, Katalin &amp; Wagner-Nagy, Beáta. 2025. INEL Tavda Mansi Corpus. Version 1.0. Publication date 2025-05-15. https://hdl.handle.net/11022/0000-0007-FE69-6. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1

Corpus Description
The present corpus of Tavda Mansi has been created as part of the long-term research project INEL (“Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages”) in the context of the Academies’ Programme, coordinated by the Union of the German Academies of Sciences and Humanities.

The INEL Tavda Mansi corpus at hand fills a gap in the documentation of the indigenous languages of Northern Eurasia and makes possible further descriptions of the language. Mansi is a relatively good described language: there are numerous descriptions and a corpus is also available,  however, the Tavda variety is not included in the existing corpora.

The analysis of materials from the Tavda variety has already been conducted by Norbert Szilágyi., but he did not produce a corpus that could be searched and evaluated electronically. However, he has made his materials available under the URL: https://norbertszilagyi91.wixsite.com/tawdamansi. In the material published in the INEL corpus, the analyses differ significantly from Szilágyi's analysis. For the sake of comparison, the texts analysed by Szilágyi are appended to the corpus, and the Hungarian translations he provided have been retained, but some places have been corrected.

The INEL Tavda Mansi Corpus contains texts texts from different sources:


	Kannisto, Artturi and Matti Liimola 1951: Wogulische Volksdichtung gesammelt und übersetzt von Artturi Kannisto, bearbeitet und herausgegeben von Matti Liimola Volume I. Texte mythischen Inhalts. [Mémoires de la Société Finno-Ougrienne 101]. Helsinki: Suomalais-Ugrilainen Seura.
	Kannisto, Artturi and Matti Liimola 1955: Wogulische Volksdichtung gesammelt und übersetzt von Artturi Kannisto, bearbeitet und herausgegeben von Matti Liimola Volume II. Kriegs und Heldensagen. [Mémoires de la Société Finno-Ougrienne 109]. Helsinki: Suomalais-Ugrilainen Seura.
	Kannisto, Artturi and Matti Liimola 1956: Wogulische Volksdichtung gesammelt und übersetzt von Artturi Kannisto, bearbeitet und herausgegeben von Matti Liimola Volume III. Märchen. [Mémoires de la Société Finno-Ougrienne 111]. Helsinki: Suomalais-Ugrilainen Seura.
	Kannisto, Artturi and Matti Liimola 1958: Wogulische Volksdichtung gesammelt und übersetzt von Artturi Kannisto, bearbeitet und herausgegeben von Matti Liimola Volume IV. Bärenlieder. [Mémoires de la Société Finno-Ougrienne 114]. Helsinki: Suomalais-Ugrilainen Seura.
	Kannisto, Artturi and Matti Liimola 1963: Wogulische Volksdichtung gesammelt und übersetzt von Artturi Kannisto, bearbeitet und herausgegeben von Matti Liimola Volume VI. Schicksalslieder, Klagelieder, Kinderreime, Rätsel, Verschiedenes. [Mémoires de la Société Finno-Ougrienne 134]. Helsinki: Suomalais-Ugrilainen Seura.
	Munkácsi, Bernát 1896: Vogul népköltési gyűjtemény IV. Életképek. Budapest: Magyar Tudományos Akadémia.


Corpus size

The corpus currently contains 29 transcripts with 2,042 utterances and 11,879 tokens.

Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The
Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Searching the corpus

The corpus can be downloaded from the ZFDM Repository using the links provided below and browsed or searched locally using the EXMARaLDA software or, alternatively, ELAN.

Online search with Tsakorpus platform is available at https://inel.corpora.uni-hamburg.de/TavdaMansiCorpus/search.

Remote search with EXMARaLDA is also possible without downloading all the files (see https://inel.corpora.uni-hamburg.de/portal/help/en/index.php).

See the user documentation (section 3) for details on transcription, annotation tiers and annotation tags. Find further information and links on the Mansi Corpus page at the INEL Resources portal: https://inel.corpora.uni-hamburg.de/portal/corpora/mansi/.</dc:description>
          <dc:identifier>https://www.fdr.uni-hamburg.de/record/17513</dc:identifier>
          <dc:identifier>10.25592/uhhfdm.17513</dc:identifier>
          <dc:identifier>oai:fdr.uni-hamburg.de:17513</dc:identifier>
          <dc:language>mns</dc:language>
          <dc:relation>info:eu-repo/semantics/altIdentifier/handle/11022/0000-0007-FE69-6</dc:relation>
          <dc:relation>doi:10.25592/uhhfdm.17512</dc:relation>
          <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
          <dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode</dc:rights>
          <dc:subject>Uralic</dc:subject>
          <dc:subject>Mansi</dc:subject>
          <dc:subject>Tavda Mansi</dc:subject>
          <dc:subject>endangered language</dc:subject>
          <dc:subject>language contact</dc:subject>
          <dc:subject>language documentation</dc:subject>
          <dc:subject>legacy data</dc:subject>
          <dc:subject>INEL</dc:subject>
          <dc:subject>AdWHH</dc:subject>
          <dc:subject>text corpus</dc:subject>
          <dc:subject>parallel texts</dc:subject>
          <dc:subject>folklore</dc:subject>
          <dc:subject>tales</dc:subject>
          <dc:subject>narrative</dc:subject>
          <dc:subject>song</dc:subject>
          <dc:subject>transcription</dc:subject>
          <dc:subject>morphological glossing</dc:subject>
          <dc:subject>part-of-speech</dc:subject>
          <dc:subject>borrowings</dc:subject>
          <dc:subject>dialogue</dc:subject>
          <dc:subject>English translation</dc:subject>
          <dc:subject>Russian translation</dc:subject>
          <dc:subject>EXMARaLDA</dc:subject>
          <dc:subject>ELAN</dc:subject>
          <dc:subject>XML</dc:subject>
          <dc:subject>ISO/TEI</dc:subject>
          <dc:subject>German translation</dc:subject>
          <dc:subject>Hungarian translation</dc:subject>
          <dc:subject>existential predication</dc:subject>
          <dc:subject>locative predication</dc:subject>
          <dc:subject>possessive predication</dc:subject>
          <dc:subject>Ob-Ugric languages</dc:subject>
          <dc:subject>semantic role</dc:subject>
          <dc:subject>syntactic function</dc:subject>
          <dc:title>INEL Tavda Mansi Corpus</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:fdr.uni-hamburg.de:9741</identifier>
        <datestamp>2025-09-22T13:14:12Z</datestamp>
        <setSpec>user-inel</setSpec>
        <setSpec>user-adwhh</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:contributor>Wagner-Nagy, Beata</dc:contributor>
          <dc:contributor>Arkhipov, Alexandre</dc:contributor>
          <dc:contributor>Gusev, Valentin</dc:contributor>
          <dc:contributor>Klooster, Tiina</dc:contributor>
          <dc:contributor>Ferger, Anne</dc:contributor>
          <dc:contributor>Jettka, Daniel</dc:contributor>
          <dc:contributor>Lehmberg, Timm</dc:contributor>
          <dc:creator>Gusev, Valentin</dc:creator>
          <dc:creator>Klooster, Tiina</dc:creator>
          <dc:date>2018-12-31</dc:date>
          <dc:description>Corpus Citation

Gusev, Valentin; Klooster, Tiina. 2018. “INEL Kamas Corpus.” Version 0.1. Publication date 2018-12-31. https://hdl.handle.net/11022/0000-0007-CAE6-2. Archived in Hamburger Zentrum für Sprachkorpora. In: Wagner-Nagy, Beáta; Arkhipov, Alexandre; Ferger, Anne; Jettka, Daniel; Lehmberg, Timm (eds.). 2018. The INEL corpora of indigenous Northern Eurasian languages.

Corpus Description

The INEL Kamas corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Kamas language and expands the documentation of the lesser described indigenous languages of Northern Eurasia.

The INEL Kamas corpus consists of two parts: folklore texts collected by Kai Donner in 1912–1914, and transcribed audio recordings of the last speaker of Kamas, Klavdiya Plotnikova, made between 1964 and 1970.

Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of Russian borrowings. Some texts also have annotations for syntactic functions, semantic roles and information status.

Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Contributions/Acknowledgements


	Recordings of Kamas speech made by Ago Künnap in Abalakovo and by Tiit-Rein Viitso in Tartu, as well as the digitized fragment of the surviving copy of Kai Donner’s phonograph recording provided by the Archive of Estonian Dialects and Kindred Languages of the University of Tartu, Estonia (AEDKL, or TÜEMSA).
	Recordings of Klavdiya Plotnikova made by Jaakko Yli-Paavola in Tallinn in 1970 provided by KOTUS Archive, Helsinki.
	Scanned pages from [Joki 1944] containing texts collected by Kai Donner published online courtesy of the Finno-Ugrian Society.
</dc:description>
          <dc:identifier>https://www.fdr.uni-hamburg.de/record/9741</dc:identifier>
          <dc:identifier>10.25592/uhhfdm.9741</dc:identifier>
          <dc:identifier>oai:fdr.uni-hamburg.de:9741</dc:identifier>
          <dc:language>xas</dc:language>
          <dc:relation>handle:11022/0000-0007-CAE6-2</dc:relation>
          <dc:relation>doi:10.25592/uhhfdm.9740</dc:relation>
          <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
          <dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode</dc:rights>
          <dc:subject>endangered language</dc:subject>
          <dc:subject>indigenous language</dc:subject>
          <dc:subject>L1 data</dc:subject>
          <dc:subject>language contact</dc:subject>
          <dc:subject>language documentation</dc:subject>
          <dc:subject>INEL</dc:subject>
          <dc:subject>folklore</dc:subject>
          <dc:subject>narrative</dc:subject>
          <dc:subject>monologue</dc:subject>
          <dc:subject>annotated</dc:subject>
          <dc:subject>morphological glossing</dc:subject>
          <dc:subject>borrowings</dc:subject>
          <dc:subject>code-switching</dc:subject>
          <dc:subject>semantic roles</dc:subject>
          <dc:subject>syntactic functions</dc:subject>
          <dc:subject>information status</dc:subject>
          <dc:subject>English translation</dc:subject>
          <dc:subject>German translation</dc:subject>
          <dc:subject>Russian translation</dc:subject>
          <dc:title>INEL Kamas Corpus</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:fdr.uni-hamburg.de:9752</identifier>
        <datestamp>2023-12-29T16:54:14Z</datestamp>
        <setSpec>user-inel</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:contributor>Wagner-Nagy, Beata</dc:contributor>
          <dc:contributor>Arkhipov, Alexandre</dc:contributor>
          <dc:contributor>Gusev, Valentin</dc:contributor>
          <dc:contributor>Klooster, Tiina</dc:contributor>
          <dc:contributor>Ferger, Anne</dc:contributor>
          <dc:contributor>Jettka, Daniel</dc:contributor>
          <dc:contributor>Lehmberg, Timm</dc:contributor>
          <dc:creator>Gusev, Valentin</dc:creator>
          <dc:creator>Klooster, Tiina</dc:creator>
          <dc:creator>Wagner-Nagy, Beáta</dc:creator>
          <dc:date>2019-12-15</dc:date>
          <dc:description>Corpus Citation

Gusev, Valentin; Klooster, Tiina; Wagner-Nagy, Beáta. 2019. "INEL Kamas Corpus." Version 1.0. Publication date 2019-12-15. http://hdl.handle.net/11022/0000-0007-DA6E-9. Archived in Hamburger Zentrum für Sprachkorpora. In: Wagner-Nagy, Beáta; Arkhipov, Alexandre; Ferger, Anne; Jettka, Daniel; Lehmberg, Timm (eds.). The INEL corpora of indigenous Northern Eurasian languages.

Corpus Description

The INEL Kamas corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Kamas language and expands the documentation of the lesser described indigenous languages of Northern Eurasia.

The INEL Kamas corpus consists of two parts: folklore texts collected by Kai Donner in 1912–1914, and transcribed audio recordings of the last speaker of Kamas, Klavdiya Plotnikova, made between 1964 and 1970.

Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of syntactic functions, semantic roles, Russian borrowings and code-switching. Some texts also have annotations for information status.

New in release 1.0


	The totality of Klavdiya Plotnikova’s transcripts are now published, including all the tapes from the KOTUS archive, as well as the two recordings of Aleksandra Semyonova (21 more texts in total).
	All the texts are now annotated for syntactic functions and semantic roles.
	Numerous corrections in glosses and other annotations.


Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Contributions/Acknowledgements


	
	Recordings of Kamas speech made by Ago Künnap in Abalakovo and by Tiit-Rein Viitso in Tartu provided by the Archive of Estonian Dialects and Kindred Languages of the University of Tartu, Estonia (AEDKL, or TÜEMSA).
	
	
	Recordings of Klavdiya Plotnikova made by Jaakko Yli-Paavola in Tallinn in 1970 provided by the Institute for the Languages of Finland archive, Helsinki (KOTUS).
	
	
	Scanned pages from the Kai Donners Kamassisches Wörterbuch (Joki 1944) containing texts collected by Kai Donner published online courtesy of the Finno-Ugrian Society.
	
	
	The web-based search interface is using the Tsakonian Corpus platform developed by Dr. Timofey Arkhangelskiy.
	


Partner Organizations
The INEL project benefited greatly from cooperation with our partner institutions:


	Institute of the World Culture, M.V. Lomonosov Moscow State University, Moscow
	Department of Languages of the Peoples of Siberia, Tomsk State Pedagodical University, Tomsk
	Institute of Philology, Siberian Branch of Russian Academy of Sciences, Novosibirsk
	Taymyr House of Folk Art, Dudinka
	Arctic State Institute Culture and Arts, Yakutsk
</dc:description>
          <dc:identifier>https://www.fdr.uni-hamburg.de/record/9752</dc:identifier>
          <dc:identifier>10.25592/uhhfdm.9752</dc:identifier>
          <dc:identifier>oai:fdr.uni-hamburg.de:9752</dc:identifier>
          <dc:language>xas</dc:language>
          <dc:relation>handle:11022/0000-0007-DA6E-9</dc:relation>
          <dc:relation>doi:10.25592/uhhfdm.9740</dc:relation>
          <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
          <dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode</dc:rights>
          <dc:subject>endangered language</dc:subject>
          <dc:subject>indigenous language</dc:subject>
          <dc:subject>L1 data</dc:subject>
          <dc:subject>language contact</dc:subject>
          <dc:subject>language documentation</dc:subject>
          <dc:subject>INEL</dc:subject>
          <dc:subject>folklore</dc:subject>
          <dc:subject>narrative</dc:subject>
          <dc:subject>monologue</dc:subject>
          <dc:subject>annotated</dc:subject>
          <dc:subject>morphological glossing</dc:subject>
          <dc:subject>borrowings</dc:subject>
          <dc:subject>code-switching</dc:subject>
          <dc:subject>semantic roles</dc:subject>
          <dc:subject>syntactic functions</dc:subject>
          <dc:subject>information status</dc:subject>
          <dc:subject>English translation</dc:subject>
          <dc:subject>German translation</dc:subject>
          <dc:subject>Russian translation</dc:subject>
          <dc:title>INEL Kamas Corpus</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:fdr.uni-hamburg.de:16518</identifier>
        <datestamp>2024-12-19T10:48:03Z</datestamp>
        <setSpec>user-uhh</setSpec>
        <setSpec>user-adwhh</setSpec>
        <setSpec>user-inel</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:contributor>Wagner-Nagy, Beáta</dc:contributor>
          <dc:contributor>Arkhipov, Alexandre</dc:contributor>
          <dc:contributor>Lazarenko, Elena</dc:contributor>
          <dc:contributor>Riaposov, Aleksandr</dc:contributor>
          <dc:contributor>Lehmberg, Timm</dc:contributor>
          <dc:creator>Budzisch, Josefina</dc:creator>
          <dc:creator>Wagner-Nagy, Beáta</dc:creator>
          <dc:date>2024-12-31</dc:date>
          <dc:description>Corpus Citation

Budzisch, Josefina; Wagner-Nagy, Beáta. 2024. INEL Nenets Corpus. Version 1.0. Publication date 2024-12-31. https://hdl.handle.net/11022/0000-0007-FE37-E. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1

Corpus Description

The INEL Nenets corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033.

The corpus includes texts recorded between 1940–2011 in both Nenets lects – Forest Nenets and Tundra Nenets. The majority of texts in this corpus originate from published works, which are appropriately cited in the relevant sections of the metadata. In particular, the following publications were used, the full information can be found in the reference section of the documentation:


	Barmich 2018
	Burkova 2008
	Burkova 2012
	Burkova et al. 2003
	Hajdú 1968
	Koshkareva et al. 2007
	Labanauskas 2001
	Logany &amp; Logany 2016
	Lyubinskaya 2022
	Pusztay 1976
	Tereshchenko 1956
	Tereshchenko 1990
	Turutina 2003
	Yangasova 2018


Svetlana Burkova kindly shared a collection of her Forest Nenets data including an original sound recording (Agan dialect), transcripts and glosses as Toolbox files and Word documents (Agan and Pur dialects), as well as published texts in Pur (Turutina 2003) and Numto (Logany &amp; Logany 2016) dialects.

All texts in the corpus are provided with interlinear morpheme-by-morpheme glosses and translation into English, German and Russian. Audio recording is also provided for one text.

Corpus size


	Forest Nenets: 80 texts, 3,709 sentences, 23,597 tokens
	Tundra Nenets: 56 texts, 6,545 sentences, 37,681 tokens
	Total: 136 texts, 10,254 sentences, 61,278 tokens
	Total duration of audio: 44 minutes 45 seconds


Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Searching the corpus

The corpus can be downloaded from the ZFDM Repository using the links provided below and browsed or searched locally using the EXMARaLDA software or, alternatively, ELAN.

Online search with Tsakorpus platform is available at https://inel.corpora.uni-hamburg.de/NenetsCorpus/search.

Remote search with EXMARaLDA is also possible without downloading all the files (see https://inel.corpora.uni-hamburg.de/portal/help/en/index.php).

See the user documentation (section 3) for details on transcription, annotation tiers and annotation tags. Find further information and links on the Nenets Corpus page at the INEL Resources portal: https://inel.corpora.uni-hamburg.de/portal/corpora/nenets/.</dc:description>
          <dc:identifier>https://www.fdr.uni-hamburg.de/record/16518</dc:identifier>
          <dc:identifier>10.25592/uhhfdm.16518</dc:identifier>
          <dc:identifier>oai:fdr.uni-hamburg.de:16518</dc:identifier>
          <dc:language>yrk</dc:language>
          <dc:relation>handle:11022/0000-0007-FE37-E</dc:relation>
          <dc:relation>doi:10.25592/uhhfdm.16517</dc:relation>
          <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
          <dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode</dc:rights>
          <dc:subject>Uralic</dc:subject>
          <dc:subject>Samoyedic</dc:subject>
          <dc:subject>Nenets</dc:subject>
          <dc:subject>Forest Nenets</dc:subject>
          <dc:subject>Tundra Nenets</dc:subject>
          <dc:subject>endangered language</dc:subject>
          <dc:subject>language contact</dc:subject>
          <dc:subject>language documentation</dc:subject>
          <dc:subject>legacy data</dc:subject>
          <dc:subject>INEL</dc:subject>
          <dc:subject>AdWHH</dc:subject>
          <dc:subject>text corpus</dc:subject>
          <dc:subject>speech corpus</dc:subject>
          <dc:subject>parallel texts</dc:subject>
          <dc:subject>folklore</dc:subject>
          <dc:subject>tales</dc:subject>
          <dc:subject>narrative</dc:subject>
          <dc:subject>elicitation</dc:subject>
          <dc:subject>song</dc:subject>
          <dc:subject>transcription</dc:subject>
          <dc:subject>time-aligned</dc:subject>
          <dc:subject>audio</dc:subject>
          <dc:subject>morphological glossing</dc:subject>
          <dc:subject>part-of-speech</dc:subject>
          <dc:subject>borrowings</dc:subject>
          <dc:subject>code-switching</dc:subject>
          <dc:subject>existantial predication</dc:subject>
          <dc:subject>locative predication</dc:subject>
          <dc:subject>possessive predication</dc:subject>
          <dc:subject>English translation</dc:subject>
          <dc:subject>German translation</dc:subject>
          <dc:subject>Russian translation</dc:subject>
          <dc:subject>EXMARaLDA</dc:subject>
          <dc:subject>ELAN</dc:subject>
          <dc:subject>XML</dc:subject>
          <dc:subject>ISO/TEI</dc:subject>
          <dc:title>INEL Nenets Corpus</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:fdr.uni-hamburg.de:17419</identifier>
        <datestamp>2025-05-12T13:47:32Z</datestamp>
        <setSpec>user-uhh</setSpec>
        <setSpec>user-adwhh</setSpec>
        <setSpec>user-inel</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:contributor>Lazarenko, Elena</dc:contributor>
          <dc:contributor>Riaposov, Aleksandr</dc:contributor>
          <dc:contributor>Lehmberg, Timm</dc:contributor>
          <dc:contributor>Wagner-Nagy, Beáta</dc:contributor>
          <dc:contributor>Arkhipov, Alexandre</dc:contributor>
          <dc:creator>Brykina, Maria</dc:creator>
          <dc:creator>Gusev, Valentin</dc:creator>
          <dc:creator>Szeverényi, Sándor</dc:creator>
          <dc:creator>Wagner-Nagy, Beáta</dc:creator>
          <dc:date>2025-05-02</dc:date>
          <dc:description>Corpus Citation

Brykina, Maria; Gusev, Valentin; Szeverényi, Sándor; Wagner-Nagy, Beáta. INEL Nganasan Corpus. Version 1.0. Publication date 2025-05-02. https://hdl.handle.net/11022/0000-0007-FE63-C. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1

Corpus Description

The INEL Nganasan corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033. The corpus is largely based on the Nganasan Spoken Language Corpus, which has been adapted to the INEL standards and supplemented with new texts. The corpus makes possible typologically oriented corpus-based research on Nganasan and expands the documentation of the lesser described indigenous languages of Northern Eurasia.

The INEL Nganasan corpus consists of two parts. The glossed (searchable) part of the corpus includes texts provided with source media files (whenever available) and annotated transcripts. The archival part of the corpus contains non-glossed texts, represented either by audio recordings (optionally – with preliminary transcriptions) or scanned pages of the manuscripts or publications.

The corpus includes texts recorded between 1933–2019 in Nganasan. The sources of the corpus are:


	Audio recordings done by Maria Brykina, Valentin Gusev, Sándor Szeverényi and Beáta Wagner-Nagy.
	Legacy audio recordings done by A. Aksyonova, Svetlana S. Aksyonova, Josefina Budzisch, Michael Daniel, Oksana E. Dobzhanskaya, Eugene Helimski, Nadezhda T. Kosterkina, Jean-Luc Lambert, Marina D. Lyublinskaya, N. A. Popov, Florian Sobanski, Eugénie Stapert, Larisa Y. Turdagina, Zsuzsa Várnai, Peter Voliak, Tatjana Zhdanova and possibly other people.
	Legacy manuscript transcriptions done by Ekaterina P. Boldt, Eugene Helimski, Nadezhda T. Kosterkina, I. E. Machkinis, E. P. Nojfeld, A. K. Stolyarova, Natalia M. Tereshchenko and Tatjana Zhdanova.
	Texts published by Ekaterina P. Boldt, I. E. Machkinis, Tibor Mikola, Georgij N. Prokofiev and A. K. Stolyarova.


Corpus size

The glossed (searchable) part of the corpus contains 236 texts, 34,872 sentences and 221,747 tokens. The total duration of the audio recordings is 49 hours 53 minutes.

The archival part of the corpus contains 98 hours of audio material (210 texts) and 30 manuscripts.

Funding

The INEL Nganasan corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

The Nganasan Spoken Language Corpus, which was integrated into the INEL Nganasan corpus, was created as part of the project Corpus based grammatical studies on Nganasan at the Institute of Finno-Ugric/Uralic Studies of Universität Hamburg. The project was supported by the Deutsche Forschungsgemeinschaft under grant number WA3153/2-1 between 2014 and 2017.

Contributions/Acknowledgements


	Many native speakers shared their knowledge of Nganasan and thus made the existence of this corpus possible (see the documentation file below, Appendix A1). We are especially grateful to those who spent days and sometimes months working with us: Svetlana S. Aksyonova, Zinaida S. Chebodaeva, Nikolai S. Chunanchar, Nina D. Chunanchar, Yuliya M. Goricheva, Ekaterina Ch. Kokore, Ekaterina S. Kosterkina, Nadezhda T. Kosterkina, Svetlana M. Kudryakova, Serafima M. Kupchik, Tat`yana T. Kuzenko, Aleksandr Ch. Momde, Dar`ya Ch. Momde, Vera L. Momde, Vasilij F. Porbin, Evdokiya D. Porbina, Mariya M. Porbina, Zoya Ch. Porbina, Galina F. Porotova, Ekaterina N. Sovalova, Lodun N. Turdagina, Nadezhda K. Turdagina, Tat`yana D. Turkina, Mariya D. Yarotskaya, Sy`ku M. Yarotskaya.
	The Department of Siberian Indigenous Languages of Tomsk State Pedagogical University and the Institute for Linguistic Studies RAS kindly provided access to their archives.
	The Dudinka branch of GTRK “Norilsk” generously provided access to the Nganasan part of its extensive audio archive.
	The Taimyr House of National Arts and the City Centre of National Arts in Dudinka helped and supported us during our field trips.


Searching the corpus

The corpus can be downloaded from the ZFDM Repository using the links provided below and browsed or searched locally using the EXMARaLDA software or, alternatively, ELAN.

Online search with Tsakorpus platform is available at https://inel.corpora.uni-hamburg.de/NganasanCorpus/search.

Remote search with EXMARaLDA is also possible without downloading all the files (see https://inel.corpora.uni-hamburg.de/portal/help/en/index.php).

See the user documentation (section 3) for details on transcription, annotation tiers and annotation tags. Find further information and links on the Nganasan Corpus page at the INEL Resources portal: https://inel.corpora.uni-hamburg.de/portal/corpora/nganasan/.</dc:description>
          <dc:identifier>https://www.fdr.uni-hamburg.de/record/17419</dc:identifier>
          <dc:identifier>10.25592/uhhfdm.17419</dc:identifier>
          <dc:identifier>oai:fdr.uni-hamburg.de:17419</dc:identifier>
          <dc:language>nio</dc:language>
          <dc:relation>info:eu-repo/semantics/altIdentifier/handle/11022/0000-0007-FE63-C</dc:relation>
          <dc:relation>doi:10.25592/uhhfdm.17418</dc:relation>
          <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
          <dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode</dc:rights>
          <dc:subject>Uralic</dc:subject>
          <dc:subject>Samoyedic</dc:subject>
          <dc:subject>Nganasan</dc:subject>
          <dc:subject>endangered language</dc:subject>
          <dc:subject>language contact</dc:subject>
          <dc:subject>language documentation</dc:subject>
          <dc:subject>legacy data</dc:subject>
          <dc:subject>INEL</dc:subject>
          <dc:subject>AdWHH</dc:subject>
          <dc:subject>text corpus</dc:subject>
          <dc:subject>speech corpus</dc:subject>
          <dc:subject>parallel texts</dc:subject>
          <dc:subject>folklore</dc:subject>
          <dc:subject>tales</dc:subject>
          <dc:subject>narrative</dc:subject>
          <dc:subject>song</dc:subject>
          <dc:subject>transcription</dc:subject>
          <dc:subject>time-aligned</dc:subject>
          <dc:subject>audio</dc:subject>
          <dc:subject>morphological glossing</dc:subject>
          <dc:subject>part-of-speech</dc:subject>
          <dc:subject>borrowings</dc:subject>
          <dc:subject>code-switching</dc:subject>
          <dc:subject>existential predication</dc:subject>
          <dc:subject>locative predication</dc:subject>
          <dc:subject>possessive predication</dc:subject>
          <dc:subject>English translation</dc:subject>
          <dc:subject>Russian translation</dc:subject>
          <dc:subject>EXMARaLDA</dc:subject>
          <dc:subject>ELAN</dc:subject>
          <dc:subject>XML</dc:subject>
          <dc:subject>ISO/TEI</dc:subject>
          <dc:title>INEL Nganasan Corpus</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:fdr.uni-hamburg.de:731</identifier>
        <datestamp>2025-09-22T11:04:26Z</datestamp>
        <setSpec>user-inel</setSpec>
        <setSpec>user-adwhh</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:contributor>Daniel Jettka</dc:contributor>
          <dc:creator>Beáta Wagner-Nagy</dc:creator>
          <dc:creator>Alexandre Arkhipov</dc:creator>
          <dc:date>2019-12-19</dc:date>
          <dc:description>The bibliography comprises 2056 entries including references to all relevant linguistic and ethnologic publications for Selkup and Kamas language, further more numerous references for Dolgan, Ewenki, Nenets, Nganasan, Tatar and Enets. It is being supplemented and revised constantly by the members of the INEL project. A web-based and searchable version is available online. 

 </dc:description>
          <dc:identifier>https://www.fdr.uni-hamburg.de/record/731</dc:identifier>
          <dc:identifier>10.25592/uhhfdm.731</dc:identifier>
          <dc:identifier>oai:fdr.uni-hamburg.de:731</dc:identifier>
          <dc:relation>doi:10.25592/uhhfdm.730</dc:relation>
          <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
          <dc:rights>https://creativecommons.org/licenses/by/4.0/legalcode</dc:rights>
          <dc:subject>bibliography</dc:subject>
          <dc:subject>linguistics</dc:subject>
          <dc:subject>ethnology</dc:subject>
          <dc:subject>selkup</dc:subject>
          <dc:subject>kamas</dc:subject>
          <dc:subject>dolgan</dc:subject>
          <dc:subject>evenki</dc:subject>
          <dc:subject>nenets</dc:subject>
          <dc:subject>nganasan</dc:subject>
          <dc:subject>tatar</dc:subject>
          <dc:subject>enets</dc:subject>
          <dc:title>INEL Bibliographie</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:fdr.uni-hamburg.de:9753</identifier>
        <datestamp>2021-12-22T08:22:36Z</datestamp>
        <setSpec>user-inel</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:contributor>Wagner-Nagy, Be´ata</dc:contributor>
          <dc:contributor>Arkhipov, Alexandre</dc:contributor>
          <dc:contributor>Brykina, Maria</dc:contributor>
          <dc:contributor>Orlova, Svetlana</dc:contributor>
          <dc:contributor>Ferger, Anne</dc:contributor>
          <dc:contributor>Jettka, Daniel</dc:contributor>
          <dc:contributor>Lehmberg, Timm</dc:contributor>
          <dc:creator>Brykina, Maria</dc:creator>
          <dc:creator>Orlova, Svetlana</dc:creator>
          <dc:creator>Wagner-Nagy, Beáta</dc:creator>
          <dc:date>2020-06-30</dc:date>
          <dc:description>Corpus Citation

Brykina, Maria; Orlova, Svetlana; Wagner-Nagy, Beáta. 2020. INEL Selkup Corpus. Version 1.0. Publication date 2020-06-30. Archived in Hamburger Zentrum für Sprachkorpora. http://hdl.handle.net/11022/0000-0007-E1D5-A. In: Wagner-Nagy, Beáta; Arkhipov, Alexandre; Ferger, Anne; Jettka, Daniel; Lehmberg, Timm (eds.). The INEL corpora of indigenous Northern Eurasian languages.

Corpus Description

The INEL Selkup corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages”), 2016–2033. The corpus enables typologically aware corpus-based grammatical research on the Selkup language and expands the documentation of the lesser described indigenous languages of Northern Eurasia.

The INEL Selkup corpus is composed of texts from the archive of Angelina Ivanovna Kuzmina (1924–2002), who gathered a large amount of material on Selkup in almost all regions where the Selkup people lived between 1962–1977. The archive was transferred by A.I. Kuzmina to Eugen Helimski and acquired by the Universität Hamburg in 2001. Most texts in the corpus originate from the handwritten part of the archive, the others come from sound recordings made by A.I. Kuzmina, transcribed and translated within the INEL project.

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Contributions/Acknowledgements

Audio recordings made by Angelina Kuzmina were transcribed and translated by native speakers of Selkup:


	Irina Anatolyevna Korobejnikova, written transcription and Russian translation of audio in Central and Southern dialects
	Natalya Platonovna Izhenbina, written transcription and Russian translation of audio in Southern dialects
	Svetlana Nikitichna Sankevich (Kunina), oral transcription and Russian translation of audio in Northern dialects
	Evgeniya Sergeevna Smorgunova (Irikova), oral and written transcription and Russian translation of audio in Northern dialects
	Valentina Vladimirovna Tamelkina, oral transcription and Russian translation of audio in Northern dialects


For individual contributions to the collecting, transcribing and analyzing of individual texts, please refer to the user documentation and to the corpus metadata.

The web-based search interface is using the Tsakonian Corpus platform developed by Dr. Timofey Arkhangelskiy, Humboldt Research Fellow at IFUU, Hamburg University

New in release 1.0


	The corpus now contains 264 texts from 74 speakers, representing the dialects of Middle Taz, Upper Tolka, Baikha (Northern), Narym and Tym (Central), Upper and Middle Ob, Chaya, Upper and Middle Ket (Southern). These contain 7887 sentences and 42466 words in total.
	Many texts have been provided with annotations for syntactic functions and semantic roles.
	Corrections to audio transcriptions, glossing and other annotations.
</dc:description>
          <dc:identifier>https://www.fdr.uni-hamburg.de/record/9753</dc:identifier>
          <dc:identifier>10.25592/uhhfdm.9753</dc:identifier>
          <dc:identifier>oai:fdr.uni-hamburg.de:9753</dc:identifier>
          <dc:language>sel</dc:language>
          <dc:relation>handle:11022/0000-0007-E1D5-A</dc:relation>
          <dc:relation>doi:10.25592/uhhfdm.9721</dc:relation>
          <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
          <dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode</dc:rights>
          <dc:subject>endangered language</dc:subject>
          <dc:subject>indigenous language</dc:subject>
          <dc:subject>L1 data</dc:subject>
          <dc:subject>language contact</dc:subject>
          <dc:subject>language documentation</dc:subject>
          <dc:subject>INEL</dc:subject>
          <dc:subject>folklore</dc:subject>
          <dc:subject>narrative</dc:subject>
          <dc:subject>monologue</dc:subject>
          <dc:subject>annotated</dc:subject>
          <dc:subject>morphological glossing</dc:subject>
          <dc:subject>borrowings</dc:subject>
          <dc:subject>code-switching</dc:subject>
          <dc:subject>semantic roles</dc:subject>
          <dc:subject>syntactic functions</dc:subject>
          <dc:subject>information status</dc:subject>
          <dc:subject>English translation</dc:subject>
          <dc:subject>German translation</dc:subject>
          <dc:subject>Russian translation</dc:subject>
          <dc:title>INEL Selkup Corpus</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:fdr.uni-hamburg.de:9754</identifier>
        <datestamp>2024-04-09T11:43:35Z</datestamp>
        <setSpec>user-inel</setSpec>
        <setSpec>user-adwhh</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:contributor>Wagner-Nagy, Be´ata</dc:contributor>
          <dc:contributor>Arkhipov, Alexandre</dc:contributor>
          <dc:contributor>Brykina, Maria</dc:contributor>
          <dc:contributor>Orlova, Svetlana</dc:contributor>
          <dc:contributor>Ferger, Anne</dc:contributor>
          <dc:contributor>Jettka, Daniel</dc:contributor>
          <dc:contributor>Lazarenko, Elena</dc:contributor>
          <dc:contributor>Lehmberg, Timm</dc:contributor>
          <dc:contributor>Riaposov, Aleksandr</dc:contributor>
          <dc:creator>Brykina, Maria</dc:creator>
          <dc:creator>Orlova, Svetlana</dc:creator>
          <dc:creator>Wagner-Nagy, Beáta</dc:creator>
          <dc:date>2021-12-31</dc:date>
          <dc:description>Corpus Citation

Brykina, Maria; Orlova, Svetlana; Wagner-Nagy, Beáta. 2021. “INEL Selkup Corpus.” Version 2 .0. Publication date
2021-12-31. https://hdl.handle.net/11022/0000-0007-F4D9-1. Archived at Universität Hamburg. In: The INEL corpora
of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1

Corpus Description

The INEL Selkup corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages”), 2016–2033. The corpus enables typologically aware corpus-based grammatical research on the Selkup language and expands the documentation of the lesser described indigenous languages of Northern Eurasia.

The INEL Selkup corpus is composed of texts from the archive of Angelina Ivanovna Kuzmina (1924–2002), who gathered a large amount of material on Selkup in almost all regions where the Selkup people lived between 1962–1977. The archive was transferred by A.I. Kuzmina to Eugen Helimski and acquired by the Universität Hamburg in 2001. Most texts in the corpus originate from the handwritten part of the archive, the others come from sound recordings made by A.I. Kuzmina, transcribed and translated within the INEL project.

Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Contributions/Acknowledgements

Audio recordings made by Angelina Kuzmina were transcribed and translated by native speakers of Selkup:


	Irina Anatolyevna Korobejnikova, written transcription and Russian translation of audio in Central and Southern dialects
	Natalya Platonovna Izhenbina, written transcription and Russian translation of audio in Southern dialects
	Svetlana Nikitichna Sankevich (Kunina), oral transcription and Russian translation of audio in Northern dialects
	Evgeniya Sergeevna Smorgunova (Irikova), oral and written transcription and Russian translation of audio in Northern dialects
	Valentina Vladimirovna Tamelkina, oral transcription and Russian translation of audio in Northern dialects


For individual contributions to the collecting, transcribing and analyzing of individual texts, please refer to the user documentation and to the corpus metadata.

The web-based search interface is using the Tsakonian Corpus platform developed by Dr. Timofey Arkhangelskiy, Humboldt Research Fellow at IFUU, Hamburg University

New in release 2 .0


	The corpus now contains 352 transcripts from 89 speakers, representing the dialects of Taz, Upper Tolka,
	Baikha (Northern), Narym and Tym (Central), Middle Ob, Chaya and Ket (Southern). These contain 14509
	sentences and 81498 words in total.
	Many texts have been provided with annotations for syntactic functions and semantic roles. 
	Corrections to audio transcriptions, glossing and other annotations.
	Dialectal attribution of several speakers has been revised.
	The remaining n on-glossed texts from the Kuzmina archive have also been added to the corpus for completeness. These include 3 texts from the written part of the archive and 40 audio recordings, for 20 of which a preliminary transcription is provided.
</dc:description>
          <dc:identifier>https://www.fdr.uni-hamburg.de/record/9754</dc:identifier>
          <dc:identifier>10.25592/uhhfdm.9754</dc:identifier>
          <dc:identifier>oai:fdr.uni-hamburg.de:9754</dc:identifier>
          <dc:language>sel</dc:language>
          <dc:relation>handle:11022/0000-0007-F4D9-1</dc:relation>
          <dc:relation>doi:10.25592/uhhfdm.9721</dc:relation>
          <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
          <dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode</dc:rights>
          <dc:subject>endangered language</dc:subject>
          <dc:subject>indigenous language</dc:subject>
          <dc:subject>L1 data</dc:subject>
          <dc:subject>language contact</dc:subject>
          <dc:subject>language documentation</dc:subject>
          <dc:subject>INEL</dc:subject>
          <dc:subject>folklore</dc:subject>
          <dc:subject>narrative</dc:subject>
          <dc:subject>monologue</dc:subject>
          <dc:subject>annotated</dc:subject>
          <dc:subject>morphological glossing</dc:subject>
          <dc:subject>borrowings</dc:subject>
          <dc:subject>code-switching</dc:subject>
          <dc:subject>semantic roles</dc:subject>
          <dc:subject>syntactic functions</dc:subject>
          <dc:subject>information status</dc:subject>
          <dc:subject>English translation</dc:subject>
          <dc:subject>German translation</dc:subject>
          <dc:subject>Russian translation</dc:subject>
          <dc:title>INEL Selkup Corpus</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:fdr.uni-hamburg.de:9628</identifier>
        <datestamp>2024-12-30T16:11:20Z</datestamp>
        <setSpec>user-inel</setSpec>
        <setSpec>user-adwhh</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:contributor>Wagner-Nagy, Be´ata</dc:contributor>
          <dc:contributor>Arkhipov, Alexandre</dc:contributor>
          <dc:contributor>Gusev, Valentin</dc:contributor>
          <dc:contributor>Däbritz, Chris Lasse</dc:contributor>
          <dc:contributor>Ferger, Anne</dc:contributor>
          <dc:contributor>Jettka, Daniel</dc:contributor>
          <dc:contributor>Lazarenko, Elena</dc:contributor>
          <dc:contributor>Lehmberg, Timm</dc:contributor>
          <dc:contributor>Riaposov, Aleksandr</dc:contributor>
          <dc:creator>Däbritz, Chris Lasse</dc:creator>
          <dc:creator>Gusev, Valentin</dc:creator>
          <dc:date>2021-12-31</dc:date>
          <dc:description>Corpus Citation

Däbritz, Chris Lasse &amp; Gusev, Valentin. 2021. INEL Evenki Corpus. Version 1.0. Publication date 2021-12-31. Archived at Universität Hamburg. https://hdl.handle.net/11022/0000-0007-F43C-3. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1

Corpus Description

The INEL Evenki Corpus has been created within the long-term INEL project (Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages), 2016–2033.
The corpus makes possible typologically aware corpus-based grammatical research on the Evenki (&lt; Tungusic) language and expands the documentation of the lesser described indigenous languages of Northern Eurasia.
The INEL Evenki Corpus covers Northern (Taimyr, Khantayskoe Ozero, Ilimpi, Erbogachon) and Southern (Sym) Evenki dialects, which have or had contacts with other languages dealt with in the INEL project, that is, first and foremost Dolgan and Selkup. The INEL Evenki Corpus is composed of texts from different sources:


	Published texts from different text collections, inter alia "Sbornik materialov po evenkijskomu (tungusskomu) fol'kloru" (Vasilevich 1936), covering all named dialects.
	Transcripts of recordings obtained from the Taimyr House of National Arts (TDNT) in Dudinka (2000s) as well as transcripts of recordings made by and from Tat’yana V. Bolina, either of them representing the Khantayskoe Ozero dialect.
	Texts from the handwritten archive of the Russian ethnographer and linguist Konstantin M. Rychkov recorded in the 1900s/1910s, covering the Taimyr, Ilimpi and Sym dialects.


Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of Russian borrowings. Some texts also have annotations for syntactic functions, semantic roles and information status.

Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Contributions/Acknowledgements


	The Taimyr House of National Arts (TDNT) provided valuable audio material (see above).
	Tat’yana V. Bolina (TDNT Leading Methodologist for Evenki folklore &amp; culture) recorded some further Evenki material in 2018 and 2019.
	The Institute of Oriental Manuscripts of the Russian Academy of Sciences (IOM RAS; Институт восточных рукописей РАН) in Saint Petersburg provided scanned manuscripts from the Rychkov archive (The Archives of the Orientalists of IOM RAS, Coll. 49, inv. 1, items 4, 5, 6а, 6б, 6в).
	The web-based search interface is using the Tsakonian Corpus platform developed by Dr. Timofey Arkhangelskiy.
</dc:description>
          <dc:identifier>https://www.fdr.uni-hamburg.de/record/9628</dc:identifier>
          <dc:identifier>10.25592/uhhfdm.9628</dc:identifier>
          <dc:identifier>oai:fdr.uni-hamburg.de:9628</dc:identifier>
          <dc:language>evn</dc:language>
          <dc:relation>handle:11022/0000-0007-F43C-3</dc:relation>
          <dc:relation>doi:10.25592/uhhfdm.9627</dc:relation>
          <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
          <dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode</dc:rights>
          <dc:subject>endangered language</dc:subject>
          <dc:subject>indigenous language</dc:subject>
          <dc:subject>L1 data</dc:subject>
          <dc:subject>language contact</dc:subject>
          <dc:subject>language documentation</dc:subject>
          <dc:subject>INEL</dc:subject>
          <dc:subject>folklore</dc:subject>
          <dc:subject>narrative</dc:subject>
          <dc:subject>monologue</dc:subject>
          <dc:subject>annotated</dc:subject>
          <dc:subject>morphological glossing</dc:subject>
          <dc:subject>borrowings</dc:subject>
          <dc:subject>code-switching</dc:subject>
          <dc:subject>semantic roles</dc:subject>
          <dc:subject>syntactic functions</dc:subject>
          <dc:subject>information status</dc:subject>
          <dc:subject>English translation</dc:subject>
          <dc:subject>German translation</dc:subject>
          <dc:subject>Russian translation</dc:subject>
          <dc:title>INEL Evenki Corpus</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:fdr.uni-hamburg.de:17676</identifier>
        <datestamp>2025-07-22T10:59:13Z</datestamp>
        <setSpec>user-inel</setSpec>
        <setSpec>user-adwhh</setSpec>
        <setSpec>user-uhh</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:contributor>Lazarenko, Elena</dc:contributor>
          <dc:contributor>Riaposov, Aleksandr</dc:contributor>
          <dc:contributor>Arkhipov, Alexandre</dc:contributor>
          <dc:creator>Baranova, Vlada</dc:creator>
          <dc:date>2025-07-17</dc:date>
          <dc:description>Corpus citation

Baranova, Vlada. 2025. INEL Kalmyk Corpus. Archived at Universität Hamburg. Version 1.0. Publication date 2025-07-17. https://hdl.handle.net/11022/0000-0007-FFB1-2. Archived at Universität Hamburg. In: The INEL Corpora of Indigenous Northern Eurasian Languages. https://hdl.handle.net/11022/0000-0007-F45A-1.

Corpus Description

The INEL Kalmyk Corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033.

The corpus consists of transcribed audio recordings collected in the Republic of Kalmykia between 2007 and 2018 in the Ketchenerovsky District (Derbet  and Torgut dialect).

All texts in the corpus are provided with interlinear morpheme-by-morpheme glosses and translation into English and Russian. All texts for which the audio recordings were accessible are time-aligned with them. 

Corpus Size

The corpus contains 55 texts, 2,076 sentences, and 19,742 tokens. The total duration of the audio recordings is 4 hours and 23 minutes.

Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Contributions / Acknowledgements

Native speakers generously shared their knowledge of Kalmyk, making the creation of this corpus possible. Zamira Xejchieva and Galina Cabdy`rova assisted with oral transcription and the Russian translation of the audio materials.

Part of the materials were recorded during joint expeditions of St. Petersburg University and the Institute for Linguistic Studies of the Russian Academy of Sciences in 2007–2008, under the direction of Elena Perekhvalskaya and Sergey Say.

This corpus primarily follows the transcription system and partially adopts the glossing conventions developed by a research team led by Sergey Say, with input from other expedition participants.

Searching the corpus

The corpus can be downloaded from the ZFDM Repository using the links provided below and browsed or searched locally using the EXMARaLDA software or, alternatively, ELAN.

Online search with Tsakorpus platform is available at https://inel.corpora.uni-hamburg.de/KalmykCorpus/search.

Remote search with EXMARaLDA is also possible without downloading all the files (see https://inel.corpora.uni-hamburg.de/portal/help/en/index.php).

See the user documentation (section 3) for details on transcription, annotation tiers and annotation tags.
Find further information and links on the Kalmyk Corpus page at the INEL Resources portal: https://inel.corpora.uni-hamburg.de/portal/corpora/kalmyk/.</dc:description>
          <dc:identifier>https://www.fdr.uni-hamburg.de/record/17676</dc:identifier>
          <dc:identifier>10.25592/uhhfdm.17676</dc:identifier>
          <dc:identifier>oai:fdr.uni-hamburg.de:17676</dc:identifier>
          <dc:language>xal</dc:language>
          <dc:relation>info:eu-repo/semantics/altIdentifier/handle/11022/0000-0007-FFB1-2</dc:relation>
          <dc:relation>doi:10.25592/uhhfdm.17675</dc:relation>
          <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
          <dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode</dc:rights>
          <dc:subject>endangered language</dc:subject>
          <dc:subject>indigenous language</dc:subject>
          <dc:subject>language contact</dc:subject>
          <dc:subject>language documentation</dc:subject>
          <dc:subject>INEL</dc:subject>
          <dc:subject>folklore</dc:subject>
          <dc:subject>narrative</dc:subject>
          <dc:subject>monologue</dc:subject>
          <dc:subject>morphological glossing</dc:subject>
          <dc:subject>English translation</dc:subject>
          <dc:subject>Russian translation</dc:subject>
          <dc:subject>EXMARaLDA</dc:subject>
          <dc:subject>ELAN</dc:subject>
          <dc:subject>XML</dc:subject>
          <dc:subject>ISO/TEI</dc:subject>
          <dc:subject>Mongolic languages</dc:subject>
          <dc:subject>annotated corpus</dc:subject>
          <dc:title>INEL Kalmyk Corpus</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:fdr.uni-hamburg.de:9747</identifier>
        <datestamp>2025-09-12T12:09:37Z</datestamp>
        <setSpec>user-inel</setSpec>
        <setSpec>user-adwhh</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:contributor>Wagner-Nagy, Be´ata</dc:contributor>
          <dc:contributor>Arkhipov, Alexandre</dc:contributor>
          <dc:contributor>Däbritz, Chris Lasse</dc:contributor>
          <dc:contributor>Kudryakova, Nina</dc:contributor>
          <dc:contributor>Stapert, Eugénie</dc:contributor>
          <dc:creator>Däbritz, Chris Lasse</dc:creator>
          <dc:creator>Kudryakova, Nina</dc:creator>
          <dc:creator>Stapert, Eugénie</dc:creator>
          <dc:date>2019-08-31</dc:date>
          <dc:description>Corpus Citation

Däbritz, Chris Lasse; Kudryakova, Nina; Stapert, Eugénie. 2019. "INEL Dolgan Corpus." Version 1.0. Publication date 2019-08-31. https://hdl.handle.net/11022/0000-0007-CAE7-1. Archived in Hamburger Zentrum für Sprachkorpora. In: Wagner-Nagy, Beáta; Arkhipov, Alexandre; Ferger, Anne; Jettka, Daniel; Lehmberg, Timm (eds.). The INEL corpora of indigenous Northern Eurasian languages.

Corpus Description

The INEL Dolgan corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages”), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Dolgan language and expands the documentation of the lesser described indigenous languages of Northern Eurasia.

The INEL Dolgan corpus is composed of texts from different sources: 1. Published folklore texts from an edited volume ("Fol'klor Dolgan", P.E. Efremov 2000), 2. Transcripts of recordings obtained from the Taymyr House of Folk Art (TDNT) in Dudinka (1970s-2000s), 3. Transcripts from the collection of Dr. Eugénie Stapert recorded on several fieldwork trips in 2007-2010, 4. Transcripts of recordings made on a fieldwork trip in 2017. The first group as well as parts of the third group were already transcribed and translated, the rest of the recordings was transcribed and translated within the INEL project.

Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of Russian borrowings. Some texts also have annotations for syntactic functions, semantic roles and information structure/information status.

Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.</dc:description>
          <dc:identifier>https://www.fdr.uni-hamburg.de/record/9747</dc:identifier>
          <dc:identifier>10.25592/uhhfdm.9747</dc:identifier>
          <dc:identifier>oai:fdr.uni-hamburg.de:9747</dc:identifier>
          <dc:relation>handle:11022/0000-0007-CAE7-1</dc:relation>
          <dc:relation>doi:10.25592/uhhfdm.9746</dc:relation>
          <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
          <dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode</dc:rights>
          <dc:subject>endangered language</dc:subject>
          <dc:subject>indigenous language</dc:subject>
          <dc:subject>L1 data</dc:subject>
          <dc:subject>language contact</dc:subject>
          <dc:subject>language documentation</dc:subject>
          <dc:subject>INEL</dc:subject>
          <dc:subject>folklore</dc:subject>
          <dc:subject>narrative</dc:subject>
          <dc:subject>monologue</dc:subject>
          <dc:subject>annotated</dc:subject>
          <dc:subject>morphological glossing</dc:subject>
          <dc:subject>borrowings</dc:subject>
          <dc:subject>code-switching</dc:subject>
          <dc:subject>semantic roles</dc:subject>
          <dc:subject>syntactic functions</dc:subject>
          <dc:subject>information status</dc:subject>
          <dc:subject>English translation</dc:subject>
          <dc:subject>German translation</dc:subject>
          <dc:subject>Russian translation</dc:subject>
          <dc:title>INEL Dolgan Corpus</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:fdr.uni-hamburg.de:16182</identifier>
        <datestamp>2025-12-22T10:31:51Z</datestamp>
        <setSpec>user-uhh</setSpec>
        <setSpec>user-adwhh</setSpec>
        <setSpec>user-inel</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:contributor>Arkhipov, Alexandre</dc:contributor>
          <dc:contributor>Wagner-Nagy, Beáta</dc:contributor>
          <dc:contributor>Lazarenko, Elena</dc:contributor>
          <dc:contributor>Riaposov, Aleksandr</dc:contributor>
          <dc:contributor>Lehmberg, Timm</dc:contributor>
          <dc:creator>Shluinsky, Andrey</dc:creator>
          <dc:creator>Khanina, Olesya</dc:creator>
          <dc:creator>Wagner-Nagy, Beáta</dc:creator>
          <dc:date>2024-11-30</dc:date>
          <dc:description>Corpus Citation

Shluinsky, Andrey; Khanina, Olesya; Wagner-Nagy, Beáta. 2024. INEL Enets Corpus. Version 1.0. Publication date 2024-11-30. https://hdl.handle.net/11022/0000-0007-FE1D-C. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1

Corpus Description

The INEL Enets corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033.

The corpus includes texts recorded between 1962–2017 in both Enets lects – Forest Enets and Tundra Enets. The sources of the corpus (see more details in the user documentation, section 2.2) are:


	Audio recordings done by Olesya Khanina, Maria Ovsjannikova, Andrey Shluinsky, Natalia Stoynova and Sergey Trubetskoy,
	Legacy audio recordings done by Vera Bettu, Nina N. Bolina, Dar`ya S. Bolina, Zoya N. Bolina, Oksana E. Dobzhanskaya, Valentin Gusev, Eugene Helimski†, Kazimir I. Labanauskas†, Larisa Leisiö, Marina Lyublinskaya, Kaur Mägi, Viktor N. Pal`chin, Marina N. Pal`china, Irina P. Sorokina†, Anna Urmanchieva, Beáta Wagner-Nagy and possibly other people,
	Published audio recordings,
	Texts published by Dar`ya S. Bolina, Yaroslav A. Gluxij† and Vasilij A. Susekov†, Eugene Helimski†, Kazimir I. Labanauskas†, Tibor Mikola†, János Pusztay, Irina P. Sorokina†, Anna Urmanchieva,
	Legacy manuscript transcriptions and self-transcriptions done and/or edited by Dar`ya S. Bolina, Galina S. Bolina, Zoya N. Bolina, Valentin Gusev, Eugene Helimski†, Kazimir I. Labanauskas†, Larisa Leisiö, Marina Lyublinskaya, Vasilij F. Ly`rmin†, Anton N. Pal`chin, Viktor N. Pal`chin, Ivan I. Silkin†, Irina P. Sorokina†, Natal`ya M. Tereščenko†, Anna Urmanchieva and possibly other people.


All texts in the corpus are provided with interlinear morpheme-by-morpheme glosses and translation into English and Russian. All texts for which the audio recordings were accessible are time-aligned with them. Video recordings are also included into the corpus if available.

Corpus size


	Forest Enets: 541 texts, 41,396 sentences, 173,379 tokens
	Tundra Enets: 137 texts, 12,737 sentences, 45,331 tokens
	Total: 678 texts, 54,133 sentences, 218,710 tokens
	Total duration of audio: 43 hours 26 minutes


Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Preliminary glossing work included into this corpus was supported by Endangered Languages Documentation Programme (ELDP) and by Max Planck Institute for Evolutionary Anthropology (MPI-EVA). See more details on financial support in the documentation file below, section 1.6.

Contributions/Acknowledgements

Dozens of people and many institutions contributed to the corpus (see more details in the documentation file below, section 1.6). We are especially grateful to:


	Enets speakers who generously shared their knowledge, especially those who spent many days working with us: Aleksandr S. Bolin†, Leonid D. Bolin†, Viktor N. Bolin, Nadezhda K. Bolina, Nina N. Bolina, Ekaterina S. Glibchenko, Gennadij A. Ivanov†, Irina P. Koshkaryova†, Valentina P. Nader, Lyudmila P. Novosyolova, Svetlana A. Roslyakova†, Ivan I. Silkin†, Nikolaj I. Silkin, Alevtina S. Silkina, Zoya A. Turutina, Tat`yana Ch. Yar,
	In particular, Zoya N. Bolina and Viktor N. Pal`chin who also collaborated in ELDP project and extensively transcribed Enets recordings,
	Natalia Stoynova, Sergey Trubetskoy and foremostly Maria Ovsjannikova who did recordings and transcriptions of Enets texts,
	Institutions and private individuals who shared legacy data: the Institute for Linguistic Studies RAS, the Taymyr House of National Arts, the Dudinka branch of GTRK “Norilsk”; Dar`ya S. Bolina, Oksana E. Dobzhanskaya, Valentin Gusev, Larisa Leisiö, Viktor N. Pal`chin, Irina P. Sorokina†, Anna Urmanchieva,
	Marina Lyublinskaya and Anna Urmanchieva who kindly permitted to include texts processed by them into the corpus,
	Dar`ya S. Bolina who consulted a lot in the process of compilation of the corpus.


Searching the corpus

The corpus can be downloaded from the ZFDM Repository using the links provided below and browsed or searched locally using the EXMARaLDA software or, alternatively, ELAN.

Online search with Tsakorpus platform is available at https://inel.corpora.uni-hamburg.de/EnetsCorpus/search.

Remote search with EXMARaLDA is also possible without downloading all the files (see https://inel.corpora.uni-hamburg.de/portal/help/en/index.php#search).

See the user documentation (section 3) for details on transcription, annotation tiers and annotation tags.
Find further information and links on the Enets Corpus page at the INEL Resources portal: https://inel.corpora.uni-hamburg.de/portal/corpora/enets/.</dc:description>
          <dc:identifier>https://www.fdr.uni-hamburg.de/record/16182</dc:identifier>
          <dc:identifier>10.25592/uhhfdm.16182</dc:identifier>
          <dc:identifier>oai:fdr.uni-hamburg.de:16182</dc:identifier>
          <dc:relation>handle:11022/0000-0007-FE1D-C</dc:relation>
          <dc:relation>doi:10.25592/uhhfdm.16181</dc:relation>
          <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
          <dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode</dc:rights>
          <dc:subject>Uralic</dc:subject>
          <dc:subject>Samoyedic</dc:subject>
          <dc:subject>Enets</dc:subject>
          <dc:subject>Forest Enets</dc:subject>
          <dc:subject>Tundra Enets</dc:subject>
          <dc:subject>endangered language</dc:subject>
          <dc:subject>language contact</dc:subject>
          <dc:subject>language documentation</dc:subject>
          <dc:subject>legacy data</dc:subject>
          <dc:subject>INEL</dc:subject>
          <dc:subject>AdWHH</dc:subject>
          <dc:subject>text corpus</dc:subject>
          <dc:subject>speech corpus</dc:subject>
          <dc:subject>parallel texts</dc:subject>
          <dc:subject>folklore</dc:subject>
          <dc:subject>tales</dc:subject>
          <dc:subject>narrative</dc:subject>
          <dc:subject>dialogue</dc:subject>
          <dc:subject>song</dc:subject>
          <dc:subject>transcription</dc:subject>
          <dc:subject>time-aligned</dc:subject>
          <dc:subject>audio</dc:subject>
          <dc:subject>video</dc:subject>
          <dc:subject>morphological glossing</dc:subject>
          <dc:subject>part-of-speech</dc:subject>
          <dc:subject>borrowings</dc:subject>
          <dc:subject>code-switching</dc:subject>
          <dc:subject>English translation</dc:subject>
          <dc:subject>Russian translation</dc:subject>
          <dc:subject>EXMARaLDA</dc:subject>
          <dc:subject>ELAN</dc:subject>
          <dc:subject>XML</dc:subject>
          <dc:subject>ISO/TEI</dc:subject>
          <dc:title>INEL Enets Corpus</dc:title>
          <dc:type>info:eu-repo/semantics/other</dc:type>
          <dc:type>dataset</dc:type>
        </oai_dc:dc>
      </metadata>
    </record>
  </ListRecords>
</OAI-PMH>
