Dataset Open Access

INEL Nenets Corpus

Budzisch, Josefina; Wagner-Nagy, Beáta


Dublin Core Export

<?xml version='1.0' encoding='utf-8'?>
<oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:contributor>Wagner-Nagy, Beáta</dc:contributor>
  <dc:contributor>Arkhipov, Alexandre</dc:contributor>
  <dc:contributor>Lazarenko, Elena</dc:contributor>
  <dc:contributor>Riaposov, Aleksandr</dc:contributor>
  <dc:contributor>Lehmberg, Timm</dc:contributor>
  <dc:creator>Budzisch, Josefina</dc:creator>
  <dc:creator>Wagner-Nagy, Beáta</dc:creator>
  <dc:date>2024-12-31</dc:date>
  <dc:description>Corpus Citation

Budzisch, Josefina; Wagner-Nagy, Beáta. 2024. INEL Nenets Corpus. Version 1.0. Publication date 2024-12-31. https://hdl.handle.net/11022/0000-0007-FE37-E. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1

Corpus Description

The INEL Nenets corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033.

The corpus includes texts recorded between 1940–2011 in both Nenets lects – Forest Nenets and Tundra Nenets. The majority of texts in this corpus originate from published works, which are appropriately cited in the relevant sections of the metadata. In particular, the following publications were used, the full information can be found in the reference section of the documentation:


	Barmich 2018
	Burkova 2008
	Burkova 2012
	Burkova et al. 2003
	Hajdú 1968
	Koshkareva et al. 2007
	Labanauskas 2001
	Logany &amp; Logany 2016
	Lyubinskaya 2022
	Pusztay 1976
	Tereshchenko 1956
	Tereshchenko 1990
	Turutina 2003
	Yangasova 2018


Svetlana Burkova kindly shared a collection of her Forest Nenets data including an original sound recording (Agan dialect), transcripts and glosses as Toolbox files and Word documents (Agan and Pur dialects), as well as published texts in Pur (Turutina 2003) and Numto (Logany &amp; Logany 2016) dialects.

All texts in the corpus are provided with interlinear morpheme-by-morpheme glosses and translation into English, German and Russian. Audio recording is also provided for one text.

Corpus size


	Forest Nenets: 80 texts, 3,709 sentences, 23,597 tokens
	Tundra Nenets: 56 texts, 6,545 sentences, 37,681 tokens
	Total: 136 texts, 10,254 sentences, 61,278 tokens
	Total duration of audio: 44 minutes 45 seconds


Funding

The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.

Searching the corpus

The corpus can be downloaded from the ZFDM Repository using the links provided below and browsed or searched locally using the EXMARaLDA software or, alternatively, ELAN.

Online search with Tsakorpus platform is available at https://inel.corpora.uni-hamburg.de/NenetsCorpus/search.

Remote search with EXMARaLDA is also possible without downloading all the files (see https://inel.corpora.uni-hamburg.de/portal/help/en/index.php).

See the user documentation (section 3) for details on transcription, annotation tiers and annotation tags. Find further information and links on the Nenets Corpus page at the INEL Resources portal: https://inel.corpora.uni-hamburg.de/portal/corpora/nenets/.</dc:description>
  <dc:identifier>https://www.fdr.uni-hamburg.de/record/16518</dc:identifier>
  <dc:identifier>10.25592/uhhfdm.16518</dc:identifier>
  <dc:identifier>oai:fdr.uni-hamburg.de:16518</dc:identifier>
  <dc:language>yrk</dc:language>
  <dc:relation>handle:11022/0000-0007-FE37-E</dc:relation>
  <dc:relation>doi:10.25592/uhhfdm.16517</dc:relation>
  <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
  <dc:rights>https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode</dc:rights>
  <dc:subject>Uralic</dc:subject>
  <dc:subject>Samoyedic</dc:subject>
  <dc:subject>Nenets</dc:subject>
  <dc:subject>Forest Nenets</dc:subject>
  <dc:subject>Tundra Nenets</dc:subject>
  <dc:subject>endangered language</dc:subject>
  <dc:subject>language contact</dc:subject>
  <dc:subject>language documentation</dc:subject>
  <dc:subject>legacy data</dc:subject>
  <dc:subject>INEL</dc:subject>
  <dc:subject>AdWHH</dc:subject>
  <dc:subject>text corpus</dc:subject>
  <dc:subject>speech corpus</dc:subject>
  <dc:subject>parallel texts</dc:subject>
  <dc:subject>folklore</dc:subject>
  <dc:subject>tales</dc:subject>
  <dc:subject>narrative</dc:subject>
  <dc:subject>elicitation</dc:subject>
  <dc:subject>song</dc:subject>
  <dc:subject>transcription</dc:subject>
  <dc:subject>time-aligned</dc:subject>
  <dc:subject>audio</dc:subject>
  <dc:subject>morphological glossing</dc:subject>
  <dc:subject>part-of-speech</dc:subject>
  <dc:subject>borrowings</dc:subject>
  <dc:subject>code-switching</dc:subject>
  <dc:subject>existantial predication</dc:subject>
  <dc:subject>locative predication</dc:subject>
  <dc:subject>possessive predication</dc:subject>
  <dc:subject>English translation</dc:subject>
  <dc:subject>German translation</dc:subject>
  <dc:subject>Russian translation</dc:subject>
  <dc:subject>EXMARaLDA</dc:subject>
  <dc:subject>ELAN</dc:subject>
  <dc:subject>XML</dc:subject>
  <dc:subject>ISO/TEI</dc:subject>
  <dc:title>INEL Nenets Corpus</dc:title>
  <dc:type>info:eu-repo/semantics/other</dc:type>
  <dc:type>dataset</dc:type>
</oai_dc:dc>

Cite record as