2026-05-27T22:57:39Z https://www.fdr.uni-hamburg.de/oai2d

oai:fdr.uni-hamburg.de:9722 2025-09-22T12:44:01Z user-inel user-adwhh

Wagner-Nagy, Be´ata Arkhipov, Alexandre Brykina, Maria Orlova, Svetlana Ferger, Anne Jettka, Daniel Lehmberg, Timm Brykina, Maria Orlova, Svetlana Wagner-Nagy, Beáta 2018-12-31 Corpus Citation Brykina, Maria; Orlova, Svetlana; Wagner-Nagy, Beáta. 2018. INEL Selkup Corpus. Version 0.1. Publication date 2018-12-31. Archived in Hamburger Zentrum für Sprachkorpora. https://hdl.handle.net/11022/0000-0007-CAE5-3. In: Wagner-Nagy, Beáta; Arkhipov, Alexandre; Ferger, Anne; Jettka, Daniel; Lehmberg, Timm (eds.). 2018. The INEL corpora of indigenous Northern Eurasian languages. Corpus Description The INEL Selkup corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages”), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Selkup language and expands the documentation of the lesser described indigenous languages of Northern Eurasia. The INEL Selkup corpus is composed of texts from the archive of Angelina Ivanovna Kuzmina (1924–2002), who gathered a large amount of material on Selkup in almost all regions where the Selkup people lived in 1962–1977. Most texts in the corpus originate from the handwritten part of the archive, the others come from sound recordings made by A.I. Kuzmina, transcribed and translated within the INEL project. Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of Russian borrowings. Some texts also have annotations for syntactic functions, semantic roles and information status. Funding The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. Contributions/Acknowledgements Sound materials of Angelina Kuzmina were transcribed and translated by native speakers of Selkup: Svetlana Nikitichna Sankevich (Kunina), oral transcription and Russian translation of texts in Northern dialects Evgeniya Sergeevna Smorgunova (Irikova), oral and written transcription and Russian translation of audio texts in Northern dialects Valentina Vladimirovna Tamel`kina, oral transcription and Russian translation of audio texts in Northern dialects The web-based search interface is using the Tsakonian Corpus platform developed by Dr. Timofey Arkhangelskiy, Humboldt Research Fellow at IFUU, Hamburg University https://www.fdr.uni-hamburg.de/record/9722 10.25592/uhhfdm.9722 oai:fdr.uni-hamburg.de:9722 sel handle:11022/0000-0007-CAE5-3 doi:10.25592/uhhfdm.9721 info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode endangered language indigenous language L1 data language contact language documentation INEL folklore narrative monologue annotated morphological glossing borrowings code-switching semantic roles syntactic functions information status English translation German translation Russian translation INEL Selkup Corpus info:eu-repo/semantics/other dataset

oai:fdr.uni-hamburg.de:10973 2024-04-09T11:42:55Z user-inel user-adwhh

Wagner-Nagy, Beáta Arkhipov, Alexandre Budzisch, Josefiina Orlova, Svetlana Lehmberg, Timm Wagner-Nagy, Be´ata 2022-11-11 This record comprizes the digitized manuscript collected by Angelina Ivanovna Kuzmina (1924–2002) between 1962 and 1977 plus additional structured information. The attached dataset contains metadata on individuals and locations, indexing and keywording with respect to contenttype ant grammatical information. https://www.fdr.uni-hamburg.de/record/10973 10.25592/uhhfdm.10973 oai:fdr.uni-hamburg.de:10973 sel doi:10.25592/uhhfdm.10972 info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/4.0/legalcode endangered language manuscript fieldnotes L1 data language documentation INEL folklore narrative monologue Russian translation lexicon Russian Central Selkup Southern Selkup Northern Selkup Kuzmina Archive - Manuscripts info:eu-repo/semantics/other dataset

oai:fdr.uni-hamburg.de:11165 2025-09-12T12:10:05Z user-inel user-adwhh

Wagner-Nagy, Be´ata Arkhipov, Alexandre Däbritz, Chris Lasse Kudryakova, Nina Stapert, Eugénie Ferger, Anne Jettka, Daniel Lazarenko, Elena Lehmberg, Timm Riaposov, Aleksandr Däbritz, Chris Lasse Kudryakova, Nina Stapert, Eugénie 2022-11-30 Corpus Citation Däbritz, Chris Lasse; Kudryakova, Nina; Stapert, Eugénie. 2022. INEL Dolgan Corpus. Version 2.0. Publication date 2022-11-30. https://hdl.handle.net/11022/0000-0007-F9A7-4. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1. Corpus Description The INEL Dolgan corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages”), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Dolgan language and expands the documentation of the lesser described indigenous languages of Northern Eurasia. The INEL Dolgan corpus is composed of texts from different sources: 1. Published folklore texts from an edited volume ("Fol'klor Dolgan", P.E. Efremov 2000), 2. Transcripts of recordings obtained from the Taymyr House of Folk Art (TDNT) in Dudinka (1970s-2000s), 3. Transcripts from the collection of Dr. Eugénie Stapert recorded on several fieldwork trips in 2007-2010, 4. Transcripts of recordings made on a fieldwork trip in 2017. The first group as well as parts of the third group were already transcribed and translated, the rest of the recordings was transcribed and translated within the INEL project. Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of Russian borrowings. Some texts also have annotations for syntactic functions, semantic roles and information structure/information status. New in release 2.0 20 glossed transcripts (2864 utterances, 19989 tokens) with 03:33:14 hours of corresponding sound 37 audio files with 10:00:36 hours of sound without glossed transcripts Corrections of grammatical analyses and glossing according to the findings in Däbritz’s (2022) grammar, as well as cross-corpora harmonizations Additional corpus-wide annotation of Mongolic borrowings Additional corpus-wide annotation of existential, locative and possessive predication Corrections in further annotations, translations and metadata Funding The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. https://www.fdr.uni-hamburg.de/record/11165 10.25592/uhhfdm.11165 oai:fdr.uni-hamburg.de:11165 dlg handle:11022/0000-0007-F9A7-4 doi:10.25592/uhhfdm.9746 info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode endangered language indigenous language L1 data language contact language documentation INEL folklore narrative monologue annotated morphological glossing borrowings code-switching semantic roles syntactic functions information status English translation German translation Russian translation existential predication locative predication non-verbal predication INEL Dolgan Corpus info:eu-repo/semantics/other dataset

oai:fdr.uni-hamburg.de:13882 2025-12-17T14:29:58Z user-inel user-adwhh

Wagner-Nagy, Beata Arkhipov, Alexandre Gusev, Valentin Klooster, Tiina Ferger, Anne Jettka, Daniel Lehmberg, Timm Gusev, Valentin Klooster, Tiina Wagner-Nagy, Beáta 2023-12-29 Corpus Citation Gusev, Valentin; Klooster, Tiina; Wagner-Nagy, Beáta. 2023. “INEL Kamas Corpus.” Version 2.0. Publication date 2023-12-31. http://hdl.handle.net/11022/0000-0007-FC25-4. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages.https://hdl.handle.net/11022/0000-0007-F45A-1. Corpus Description The INEL Kamas corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Kamas language and expands the documentation of the lesser described indigenous languages of Northern Eurasia. The INEL Kamas corpus consists of two parts: folklore texts collected by Kai Donner in 1912–1914, and transcribed audio recordings of the last speaker of Kamas, Klavdiya Plotnikova, made between 1964 and 1970. Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of syntactic functions, semantic roles, Russian borrowings and code-switching. Some texts also have annotations for information status. New in release 2.0 In texts from Donner’s collection, phonetic transcription according to Klumpp's edition of Donner’s manuscripts has been added (as stl tier) Five texts which were originally split between different tapes have been merged, as well as respective parts of recordings. Sentences in each resulting text are numbered throughout PKZ_196X_Alenushka_flk + PKZ_196X_Alenushka_continuation_flk > PKZ_196X_Alenushka_flk End of PKZ_196X_SU0226 starting from PKZ_196X_SU0226.203 (210) + PKZ_196X_Alenushka2_continuation_flk > PKZ_196X_Alenushka2_flk PKZ_196X_BlacksmithAndMerchant_flk + PKZ_196X_BlacksmithAndMerchant_cont_flk > PKZ_196X_BlacksmithAndMerchant_flk PKZ_196X_Finist_flk + PKZ_196X_Finist_continuation_flk > PKZ_196X_Finist_flk PKZ_196X_StupidWolf_flk + PKZ_196X_StupidWolf_continuation_flk > PKZ_196X_StupidWolf_flk Part of the texts are now annotated for existential, locative and possessive predication (ExLocPoss tier, by C.L. Däbritz) Numerous corrections in glosses, other annotations and transcriptions, including: Fuller and more consistent transcription, glossing and annotations of borrowings Vowel length is marked in mp tier in baːzoʔ ‘again’, büːzʼe ‘man’ and saːgər ‘black’ Corrections in disambiguation of polysemous or homonymous morphemes: -ziʔ "INS"/"COM", -də "LAT"/"3SG", mo- "can/become/want | мочь/стать/хотеть" Possessive suffix unmarked for case: "NOM/GEN/ACC" > "POSS" Glosses for personal pronouns were changed to uniform labels: "I | я" > "PRO1SG", "we | мы" > "PRO1PL", "you | ты" > "PRO2SG", "you.PL | вы" > "PRO2PL" Fuller annotations of code-switching and calques (CS tier) Added ELAN *.eaf as a supplementary end-user file format for all transcripts Funding The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. Contributions/Acknowledgements Recordings of Kamas speech made by Ago Künnap in Abalakovo and by Tiit-Rein Viitso in Tartu provided by the Archive of Estonian Dialects and Kindred Languages of the University of Tartu, Estonia (AEDKL, or TÜEMSA). Recordings of Klavdiya Plotnikova made by Jaakko Yli-Paavola in Tallinn in 1970 provided by the Institute for the Languages of Finland archive, Helsinki (KOTUS). Scanned pages from the Kai Donners Kamassisches Wörterbuch (Joki 1944) containing texts collected by Kai Donner published online courtesy of the Finno-Ugrian Society. The web-based search interface is using the Tsakonian Corpus platform developed by Dr. Timofey Arkhangelskiy. https://www.fdr.uni-hamburg.de/record/13882 10.25592/uhhfdm.13882 oai:fdr.uni-hamburg.de:13882 xas handle:11022/0000-0007-FC25-4 doi:10.25592/uhhfdm.9740 info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode endangered language indigenous language L1 data language contact language documentation INEL folklore narrative monologue annotated morphological glossing borrowings code-switching semantic roles syntactic functions information status English translation German translation Russian translation INEL Kamas Corpus info:eu-repo/semantics/other dataset

oai:fdr.uni-hamburg.de:16605 2024-12-30T16:11:21Z user-adwhh user-inel

Wagner-Nagy, Be´ata Arkhipov, Alexandre Ferger, Anne Jettka, Daniel Lazarenko, Elena Lehmberg, Timm Riaposov, Aleksandr Däbritz, Chris Lasse Gusev, Valentin Stoynova, Natalia 2024-12-31 Corpus Citation Däbritz, Chris Lasse; Gusev, Valentin; Stoynova, Natalia. 2024. INEL Evenki Corpus. Version 2.0. Publication date 2024-12-31. Archived at Universität Hamburg. https://hdl.handle.net/11022/0000-0007-FE38-D. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1 Corpus Description The INEL Evenki Corpus has been created within the long-term INEL project (Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Evenki (< Tungusic) language and expands the documentation of the lesser described indigenous languages of Northern Eurasia. The INEL Evenki Corpus covers Northern (Taimyr, Khantayskoe Ozero, Ilimpi, Yerbogachyon) and Southern (Sym, Barhahan, and to a smaller extent Stony Tunguska and Nepa) Evenki dialects. These are exactly the dialects which are or were in contact with other languages included in the INEL project, that is first and foremost Dolgan and Selkup. The INEL Evenki Corpus contains texts from different sources: Published texts from several text collections: Vasilevich (1936): the Ilimpi, Yerbogachyon, Sym, Nepa dialects; Anisimov (1936): the Stony Tunguska dialect; Brodskaya (1967): the Khantayskoe Ozero dialect. Transcripts of recordings obtained from the Taimyr House of National Arts (TDNT) in Dudinka (2000s) as well as transcripts of recordings made by and from Tat`yana V. Bolina, all of them representing the Khantayskoe Ozero dialect. For these texts, corresponding time-aligned audio files are available. Texts from the handwritten archive of the Russian ethnographer and linguist Konstantin M. Rychkov recorded in the 1900s/1910s, covering the Taimyr, Ilimpi, Sym, and Barhahan dialects. Each text in the corpus is provided with morphological glossing, translation into English, Russian, and German, as well as annotation of Russian borrowings. Some texts also have annotations for syntactic functions, semantic roles, information status, as well as for existential, locative, and possessive predication. Corpus size Northern dialects (Ilimpi, Yerbogachyon, Khantayskoye Ozero, Taimyr): 176 texts, 7,091 sentences, 34,931 tokens Southern “sh” dialects (Sym, Barhahan): 425 texts, 12,395 sentences, 55,674 tokens Southern “s” dialects (Stony Tunguska, Nepa): 11 texts, 445 sentences, 2,659 tokens Total: 612 texts, 19,931 sentences, 93,264 tokens Total duration of audio: 3 hours 58 minutes (69 texts) New in release 2.0 The total size of the corpus has increased about twice (from 47,708 to 93,264 tokens): new texts in the Sym dialect from the Rychkov archive have been added (15,495 tokens), the entire Sym collection from the archive is now included in the corpus a text collection in the Barhahan dialect from the Rychkov archive has been included in the corpus (30,061 tokens) Some errors in glossing have been fixed Glossing has been unified at some points (e.g. the analysis of finite past tense forms as finite verbs vs. participles: all such forms are now glossed as finite verbs) Many glossing labels have been changed; in particular, most ambiguous grammatical glosses have been disambiguated by numbers and/or by semantic specifications: e.g. DIM for four affixes ⇒ DIM1, DIM2, DIM3, DIM4; NMLZ ⇒ NMLZ.TMP, NMLZ.PT, etc. The structure of metadata has been slightly modified (e.g. fields for the source type and availability of audio files have been added) Funding The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. Contributions/Acknowledgements The Taimyr House of National Arts (TDNT) provided valuable audio material (see above). Tat`yana V. Bolina (TDNT Leading Methodologist for Evenki folklore and culture) recorded further Evenki material in 2018 and 2019. The Institute of Oriental Manuscripts of the Russian Academy of Sciences (IOM RAS / IVR; Институт восточных рукописей РАН) in Saint Petersburg provided scanned manuscripts from the Rychkov archive (The Archives of the Orientalists of IOM RAS, Coll. 49, inv. 1, items 4, 5, 6а, 6б, 6в). Searching the corpus The corpus can be downloaded from the ZFDM Repository using the links provided below and browsed or searched locally using the EXMARaLDA software or, alternatively, ELAN. Online search with Tsakorpus platform is available at https://inel.corpora.uni-hamburg.de/EvenkiCorpus/search. Remote search with EXMARaLDA is also possible without downloading all the files (see https://inel.corpora.uni-hamburg.de/portal/help/en/index.php#search). See the user documentation (section 3) for details on transcription, annotation tiers and annotation tags. Find further information and links on the Evenki Corpus page at the INEL Resources portal: https://inel.corpora.uni-hamburg.de/portal/corpora/evenki/. https://www.fdr.uni-hamburg.de/record/16605 10.25592/uhhfdm.16605 oai:fdr.uni-hamburg.de:16605 evn handle:11022/0000-0007-F43C-3 doi:10.25592/uhhfdm.9627 info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode Tungusic Evenki endangered language language contact language documentation legacy data INEL AdWHH text corpus speech corpus parallel texts folklore tales narrative conversation song transcription time-aligned audio morphological glossing part-of-speech borrowings code-switching semantic roles syntactic functions information status existential predication locative predication possessive predication English translation German translation Russian translation EXMARaLDA ELAN XML ISO/TEI INEL Evenki Corpus info:eu-repo/semantics/other dataset

oai:fdr.uni-hamburg.de:17513 2025-05-28T09:59:42Z user-uhh user-inel user-adwhh

Wagner-Nagy, Beáta Arkhipov, Alexandre Brykina, Maria Lazarenko, Elena Riaposov, Aleksandr Wagner-Nagy, Beáta Sipőcz, Katalin 2025-05-15 Corpus Citation Sipőcz, Katalin & Wagner-Nagy, Beáta. 2025. INEL Tavda Mansi Corpus. Version 1.0. Publication date 2025-05-15. https://hdl.handle.net/11022/0000-0007-FE69-6. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1 Corpus Description The present corpus of Tavda Mansi has been created as part of the long-term research project INEL (“Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages”) in the context of the Academies’ Programme, coordinated by the Union of the German Academies of Sciences and Humanities. The INEL Tavda Mansi corpus at hand fills a gap in the documentation of the indigenous languages of Northern Eurasia and makes possible further descriptions of the language. Mansi is a relatively good described language: there are numerous descriptions and a corpus is also available, however, the Tavda variety is not included in the existing corpora. The analysis of materials from the Tavda variety has already been conducted by Norbert Szilágyi., but he did not produce a corpus that could be searched and evaluated electronically. However, he has made his materials available under the URL: https://norbertszilagyi91.wixsite.com/tawdamansi. In the material published in the INEL corpus, the analyses differ significantly from Szilágyi's analysis. For the sake of comparison, the texts analysed by Szilágyi are appended to the corpus, and the Hungarian translations he provided have been retained, but some places have been corrected. The INEL Tavda Mansi Corpus contains texts texts from different sources: Kannisto, Artturi and Matti Liimola 1951: Wogulische Volksdichtung gesammelt und übersetzt von Artturi Kannisto, bearbeitet und herausgegeben von Matti Liimola Volume I. Texte mythischen Inhalts. [Mémoires de la Société Finno-Ougrienne 101]. Helsinki: Suomalais-Ugrilainen Seura. Kannisto, Artturi and Matti Liimola 1955: Wogulische Volksdichtung gesammelt und übersetzt von Artturi Kannisto, bearbeitet und herausgegeben von Matti Liimola Volume II. Kriegs und Heldensagen. [Mémoires de la Société Finno-Ougrienne 109]. Helsinki: Suomalais-Ugrilainen Seura. Kannisto, Artturi and Matti Liimola 1956: Wogulische Volksdichtung gesammelt und übersetzt von Artturi Kannisto, bearbeitet und herausgegeben von Matti Liimola Volume III. Märchen. [Mémoires de la Société Finno-Ougrienne 111]. Helsinki: Suomalais-Ugrilainen Seura. Kannisto, Artturi and Matti Liimola 1958: Wogulische Volksdichtung gesammelt und übersetzt von Artturi Kannisto, bearbeitet und herausgegeben von Matti Liimola Volume IV. Bärenlieder. [Mémoires de la Société Finno-Ougrienne 114]. Helsinki: Suomalais-Ugrilainen Seura. Kannisto, Artturi and Matti Liimola 1963: Wogulische Volksdichtung gesammelt und übersetzt von Artturi Kannisto, bearbeitet und herausgegeben von Matti Liimola Volume VI. Schicksalslieder, Klagelieder, Kinderreime, Rätsel, Verschiedenes. [Mémoires de la Société Finno-Ougrienne 134]. Helsinki: Suomalais-Ugrilainen Seura. Munkácsi, Bernát 1896: Vogul népköltési gyűjtemény IV. Életképek. Budapest: Magyar Tudományos Akadémia. Corpus size The corpus currently contains 29 transcripts with 2,042 utterances and 11,879 tokens. Funding The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. Searching the corpus The corpus can be downloaded from the ZFDM Repository using the links provided below and browsed or searched locally using the EXMARaLDA software or, alternatively, ELAN. Online search with Tsakorpus platform is available at https://inel.corpora.uni-hamburg.de/TavdaMansiCorpus/search. Remote search with EXMARaLDA is also possible without downloading all the files (see https://inel.corpora.uni-hamburg.de/portal/help/en/index.php). See the user documentation (section 3) for details on transcription, annotation tiers and annotation tags. Find further information and links on the Mansi Corpus page at the INEL Resources portal: https://inel.corpora.uni-hamburg.de/portal/corpora/mansi/. https://www.fdr.uni-hamburg.de/record/17513 10.25592/uhhfdm.17513 oai:fdr.uni-hamburg.de:17513 mns info:eu-repo/semantics/altIdentifier/handle/11022/0000-0007-FE69-6 doi:10.25592/uhhfdm.17512 info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode Uralic Mansi Tavda Mansi endangered language language contact language documentation legacy data INEL AdWHH text corpus parallel texts folklore tales narrative song transcription morphological glossing part-of-speech borrowings dialogue English translation Russian translation EXMARaLDA ELAN XML ISO/TEI German translation Hungarian translation existential predication locative predication possessive predication Ob-Ugric languages semantic role syntactic function INEL Tavda Mansi Corpus info:eu-repo/semantics/other dataset

oai:fdr.uni-hamburg.de:9741 2025-09-22T13:14:12Z user-inel user-adwhh

Wagner-Nagy, Beata Arkhipov, Alexandre Gusev, Valentin Klooster, Tiina Ferger, Anne Jettka, Daniel Lehmberg, Timm Gusev, Valentin Klooster, Tiina 2018-12-31 Corpus Citation Gusev, Valentin; Klooster, Tiina. 2018. “INEL Kamas Corpus.” Version 0.1. Publication date 2018-12-31. https://hdl.handle.net/11022/0000-0007-CAE6-2. Archived in Hamburger Zentrum für Sprachkorpora. In: Wagner-Nagy, Beáta; Arkhipov, Alexandre; Ferger, Anne; Jettka, Daniel; Lehmberg, Timm (eds.). 2018. The INEL corpora of indigenous Northern Eurasian languages. Corpus Description The INEL Kamas corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Kamas language and expands the documentation of the lesser described indigenous languages of Northern Eurasia. The INEL Kamas corpus consists of two parts: folklore texts collected by Kai Donner in 1912–1914, and transcribed audio recordings of the last speaker of Kamas, Klavdiya Plotnikova, made between 1964 and 1970. Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of Russian borrowings. Some texts also have annotations for syntactic functions, semantic roles and information status. Funding The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. Contributions/Acknowledgements Recordings of Kamas speech made by Ago Künnap in Abalakovo and by Tiit-Rein Viitso in Tartu, as well as the digitized fragment of the surviving copy of Kai Donner’s phonograph recording provided by the Archive of Estonian Dialects and Kindred Languages of the University of Tartu, Estonia (AEDKL, or TÜEMSA). Recordings of Klavdiya Plotnikova made by Jaakko Yli-Paavola in Tallinn in 1970 provided by KOTUS Archive, Helsinki. Scanned pages from [Joki 1944] containing texts collected by Kai Donner published online courtesy of the Finno-Ugrian Society. https://www.fdr.uni-hamburg.de/record/9741 10.25592/uhhfdm.9741 oai:fdr.uni-hamburg.de:9741 xas handle:11022/0000-0007-CAE6-2 doi:10.25592/uhhfdm.9740 info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode endangered language indigenous language L1 data language contact language documentation INEL folklore narrative monologue annotated morphological glossing borrowings code-switching semantic roles syntactic functions information status English translation German translation Russian translation INEL Kamas Corpus info:eu-repo/semantics/other dataset

oai:fdr.uni-hamburg.de:9752 2023-12-29T16:54:14Z user-inel

Wagner-Nagy, Beata Arkhipov, Alexandre Gusev, Valentin Klooster, Tiina Ferger, Anne Jettka, Daniel Lehmberg, Timm Gusev, Valentin Klooster, Tiina Wagner-Nagy, Beáta 2019-12-15 Corpus Citation Gusev, Valentin; Klooster, Tiina; Wagner-Nagy, Beáta. 2019. "INEL Kamas Corpus." Version 1.0. Publication date 2019-12-15. http://hdl.handle.net/11022/0000-0007-DA6E-9. Archived in Hamburger Zentrum für Sprachkorpora. In: Wagner-Nagy, Beáta; Arkhipov, Alexandre; Ferger, Anne; Jettka, Daniel; Lehmberg, Timm (eds.). The INEL corpora of indigenous Northern Eurasian languages. Corpus Description The INEL Kamas corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Kamas language and expands the documentation of the lesser described indigenous languages of Northern Eurasia. The INEL Kamas corpus consists of two parts: folklore texts collected by Kai Donner in 1912–1914, and transcribed audio recordings of the last speaker of Kamas, Klavdiya Plotnikova, made between 1964 and 1970. Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of syntactic functions, semantic roles, Russian borrowings and code-switching. Some texts also have annotations for information status. New in release 1.0 The totality of Klavdiya Plotnikova’s transcripts are now published, including all the tapes from the KOTUS archive, as well as the two recordings of Aleksandra Semyonova (21 more texts in total). All the texts are now annotated for syntactic functions and semantic roles. Numerous corrections in glosses and other annotations. Funding The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. Contributions/Acknowledgements Recordings of Kamas speech made by Ago Künnap in Abalakovo and by Tiit-Rein Viitso in Tartu provided by the Archive of Estonian Dialects and Kindred Languages of the University of Tartu, Estonia (AEDKL, or TÜEMSA). Recordings of Klavdiya Plotnikova made by Jaakko Yli-Paavola in Tallinn in 1970 provided by the Institute for the Languages of Finland archive, Helsinki (KOTUS). Scanned pages from the Kai Donners Kamassisches Wörterbuch (Joki 1944) containing texts collected by Kai Donner published online courtesy of the Finno-Ugrian Society. The web-based search interface is using the Tsakonian Corpus platform developed by Dr. Timofey Arkhangelskiy. Partner Organizations The INEL project benefited greatly from cooperation with our partner institutions: Institute of the World Culture, M.V. Lomonosov Moscow State University, Moscow Department of Languages of the Peoples of Siberia, Tomsk State Pedagodical University, Tomsk Institute of Philology, Siberian Branch of Russian Academy of Sciences, Novosibirsk Taymyr House of Folk Art, Dudinka Arctic State Institute Culture and Arts, Yakutsk https://www.fdr.uni-hamburg.de/record/9752 10.25592/uhhfdm.9752 oai:fdr.uni-hamburg.de:9752 xas handle:11022/0000-0007-DA6E-9 doi:10.25592/uhhfdm.9740 info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode endangered language indigenous language L1 data language contact language documentation INEL folklore narrative monologue annotated morphological glossing borrowings code-switching semantic roles syntactic functions information status English translation German translation Russian translation INEL Kamas Corpus info:eu-repo/semantics/other dataset

oai:fdr.uni-hamburg.de:16518 2024-12-19T10:48:03Z user-uhh user-adwhh user-inel

Wagner-Nagy, Beáta Arkhipov, Alexandre Lazarenko, Elena Riaposov, Aleksandr Lehmberg, Timm Budzisch, Josefina Wagner-Nagy, Beáta 2024-12-31 Corpus Citation Budzisch, Josefina; Wagner-Nagy, Beáta. 2024. INEL Nenets Corpus. Version 1.0. Publication date 2024-12-31. https://hdl.handle.net/11022/0000-0007-FE37-E. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1 Corpus Description The INEL Nenets corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033. The corpus includes texts recorded between 1940–2011 in both Nenets lects – Forest Nenets and Tundra Nenets. The majority of texts in this corpus originate from published works, which are appropriately cited in the relevant sections of the metadata. In particular, the following publications were used, the full information can be found in the reference section of the documentation: Barmich 2018 Burkova 2008 Burkova 2012 Burkova et al. 2003 Hajdú 1968 Koshkareva et al. 2007 Labanauskas 2001 Logany & Logany 2016 Lyubinskaya 2022 Pusztay 1976 Tereshchenko 1956 Tereshchenko 1990 Turutina 2003 Yangasova 2018 Svetlana Burkova kindly shared a collection of her Forest Nenets data including an original sound recording (Agan dialect), transcripts and glosses as Toolbox files and Word documents (Agan and Pur dialects), as well as published texts in Pur (Turutina 2003) and Numto (Logany & Logany 2016) dialects. All texts in the corpus are provided with interlinear morpheme-by-morpheme glosses and translation into English, German and Russian. Audio recording is also provided for one text. Corpus size Forest Nenets: 80 texts, 3,709 sentences, 23,597 tokens Tundra Nenets: 56 texts, 6,545 sentences, 37,681 tokens Total: 136 texts, 10,254 sentences, 61,278 tokens Total duration of audio: 44 minutes 45 seconds Funding The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. Searching the corpus The corpus can be downloaded from the ZFDM Repository using the links provided below and browsed or searched locally using the EXMARaLDA software or, alternatively, ELAN. Online search with Tsakorpus platform is available at https://inel.corpora.uni-hamburg.de/NenetsCorpus/search. Remote search with EXMARaLDA is also possible without downloading all the files (see https://inel.corpora.uni-hamburg.de/portal/help/en/index.php). See the user documentation (section 3) for details on transcription, annotation tiers and annotation tags. Find further information and links on the Nenets Corpus page at the INEL Resources portal: https://inel.corpora.uni-hamburg.de/portal/corpora/nenets/. https://www.fdr.uni-hamburg.de/record/16518 10.25592/uhhfdm.16518 oai:fdr.uni-hamburg.de:16518 yrk handle:11022/0000-0007-FE37-E doi:10.25592/uhhfdm.16517 info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode Uralic Samoyedic Nenets Forest Nenets Tundra Nenets endangered language language contact language documentation legacy data INEL AdWHH text corpus speech corpus parallel texts folklore tales narrative elicitation song transcription time-aligned audio morphological glossing part-of-speech borrowings code-switching existantial predication locative predication possessive predication English translation German translation Russian translation EXMARaLDA ELAN XML ISO/TEI INEL Nenets Corpus info:eu-repo/semantics/other dataset

oai:fdr.uni-hamburg.de:17419 2025-05-12T13:47:32Z user-uhh user-adwhh user-inel

Lazarenko, Elena Riaposov, Aleksandr Lehmberg, Timm Wagner-Nagy, Beáta Arkhipov, Alexandre Brykina, Maria Gusev, Valentin Szeverényi, Sándor Wagner-Nagy, Beáta 2025-05-02 Corpus Citation Brykina, Maria; Gusev, Valentin; Szeverényi, Sándor; Wagner-Nagy, Beáta. INEL Nganasan Corpus. Version 1.0. Publication date 2025-05-02. https://hdl.handle.net/11022/0000-0007-FE63-C. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1 Corpus Description The INEL Nganasan corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033. The corpus is largely based on the Nganasan Spoken Language Corpus, which has been adapted to the INEL standards and supplemented with new texts. The corpus makes possible typologically oriented corpus-based research on Nganasan and expands the documentation of the lesser described indigenous languages of Northern Eurasia. The INEL Nganasan corpus consists of two parts. The glossed (searchable) part of the corpus includes texts provided with source media files (whenever available) and annotated transcripts. The archival part of the corpus contains non-glossed texts, represented either by audio recordings (optionally – with preliminary transcriptions) or scanned pages of the manuscripts or publications. The corpus includes texts recorded between 1933–2019 in Nganasan. The sources of the corpus are: Audio recordings done by Maria Brykina, Valentin Gusev, Sándor Szeverényi and Beáta Wagner-Nagy. Legacy audio recordings done by A. Aksyonova, Svetlana S. Aksyonova, Josefina Budzisch, Michael Daniel, Oksana E. Dobzhanskaya, Eugene Helimski, Nadezhda T. Kosterkina, Jean-Luc Lambert, Marina D. Lyublinskaya, N. A. Popov, Florian Sobanski, Eugénie Stapert, Larisa Y. Turdagina, Zsuzsa Várnai, Peter Voliak, Tatjana Zhdanova and possibly other people. Legacy manuscript transcriptions done by Ekaterina P. Boldt, Eugene Helimski, Nadezhda T. Kosterkina, I. E. Machkinis, E. P. Nojfeld, A. K. Stolyarova, Natalia M. Tereshchenko and Tatjana Zhdanova. Texts published by Ekaterina P. Boldt, I. E. Machkinis, Tibor Mikola, Georgij N. Prokofiev and A. K. Stolyarova. Corpus size The glossed (searchable) part of the corpus contains 236 texts, 34,872 sentences and 221,747 tokens. The total duration of the audio recordings is 49 hours 53 minutes. The archival part of the corpus contains 98 hours of audio material (210 texts) and 30 manuscripts. Funding The INEL Nganasan corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. The Nganasan Spoken Language Corpus, which was integrated into the INEL Nganasan corpus, was created as part of the project Corpus based grammatical studies on Nganasan at the Institute of Finno-Ugric/Uralic Studies of Universität Hamburg. The project was supported by the Deutsche Forschungsgemeinschaft under grant number WA3153/2-1 between 2014 and 2017. Contributions/Acknowledgements Many native speakers shared their knowledge of Nganasan and thus made the existence of this corpus possible (see the documentation file below, Appendix A1). We are especially grateful to those who spent days and sometimes months working with us: Svetlana S. Aksyonova, Zinaida S. Chebodaeva, Nikolai S. Chunanchar, Nina D. Chunanchar, Yuliya M. Goricheva, Ekaterina Ch. Kokore, Ekaterina S. Kosterkina, Nadezhda T. Kosterkina, Svetlana M. Kudryakova, Serafima M. Kupchik, Tat`yana T. Kuzenko, Aleksandr Ch. Momde, Dar`ya Ch. Momde, Vera L. Momde, Vasilij F. Porbin, Evdokiya D. Porbina, Mariya M. Porbina, Zoya Ch. Porbina, Galina F. Porotova, Ekaterina N. Sovalova, Lodun N. Turdagina, Nadezhda K. Turdagina, Tat`yana D. Turkina, Mariya D. Yarotskaya, Sy`ku M. Yarotskaya. The Department of Siberian Indigenous Languages of Tomsk State Pedagogical University and the Institute for Linguistic Studies RAS kindly provided access to their archives. The Dudinka branch of GTRK “Norilsk” generously provided access to the Nganasan part of its extensive audio archive. The Taimyr House of National Arts and the City Centre of National Arts in Dudinka helped and supported us during our field trips. Searching the corpus The corpus can be downloaded from the ZFDM Repository using the links provided below and browsed or searched locally using the EXMARaLDA software or, alternatively, ELAN. Online search with Tsakorpus platform is available at https://inel.corpora.uni-hamburg.de/NganasanCorpus/search. Remote search with EXMARaLDA is also possible without downloading all the files (see https://inel.corpora.uni-hamburg.de/portal/help/en/index.php). See the user documentation (section 3) for details on transcription, annotation tiers and annotation tags. Find further information and links on the Nganasan Corpus page at the INEL Resources portal: https://inel.corpora.uni-hamburg.de/portal/corpora/nganasan/. https://www.fdr.uni-hamburg.de/record/17419 10.25592/uhhfdm.17419 oai:fdr.uni-hamburg.de:17419 nio info:eu-repo/semantics/altIdentifier/handle/11022/0000-0007-FE63-C doi:10.25592/uhhfdm.17418 info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode Uralic Samoyedic Nganasan endangered language language contact language documentation legacy data INEL AdWHH text corpus speech corpus parallel texts folklore tales narrative song transcription time-aligned audio morphological glossing part-of-speech borrowings code-switching existential predication locative predication possessive predication English translation Russian translation EXMARaLDA ELAN XML ISO/TEI INEL Nganasan Corpus info:eu-repo/semantics/other dataset

oai:fdr.uni-hamburg.de:731 2025-09-22T11:04:26Z user-inel user-adwhh

Daniel Jettka Beáta Wagner-Nagy Alexandre Arkhipov 2019-12-19 The bibliography comprises 2056 entries including references to all relevant linguistic and ethnologic publications for Selkup and Kamas language, further more numerous references for Dolgan, Ewenki, Nenets, Nganasan, Tatar and Enets. It is being supplemented and revised constantly by the members of the INEL project. A web-based and searchable version is available online. https://www.fdr.uni-hamburg.de/record/731 10.25592/uhhfdm.731 oai:fdr.uni-hamburg.de:731 doi:10.25592/uhhfdm.730 info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/4.0/legalcode bibliography linguistics ethnology selkup kamas dolgan evenki nenets nganasan tatar enets INEL Bibliographie info:eu-repo/semantics/other dataset

oai:fdr.uni-hamburg.de:9753 2021-12-22T08:22:36Z user-inel

Wagner-Nagy, Be´ata Arkhipov, Alexandre Brykina, Maria Orlova, Svetlana Ferger, Anne Jettka, Daniel Lehmberg, Timm Brykina, Maria Orlova, Svetlana Wagner-Nagy, Beáta 2020-06-30 Corpus Citation Brykina, Maria; Orlova, Svetlana; Wagner-Nagy, Beáta. 2020. INEL Selkup Corpus. Version 1.0. Publication date 2020-06-30. Archived in Hamburger Zentrum für Sprachkorpora. http://hdl.handle.net/11022/0000-0007-E1D5-A. In: Wagner-Nagy, Beáta; Arkhipov, Alexandre; Ferger, Anne; Jettka, Daniel; Lehmberg, Timm (eds.). The INEL corpora of indigenous Northern Eurasian languages. Corpus Description The INEL Selkup corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages”), 2016–2033. The corpus enables typologically aware corpus-based grammatical research on the Selkup language and expands the documentation of the lesser described indigenous languages of Northern Eurasia. The INEL Selkup corpus is composed of texts from the archive of Angelina Ivanovna Kuzmina (1924–2002), who gathered a large amount of material on Selkup in almost all regions where the Selkup people lived between 1962–1977. The archive was transferred by A.I. Kuzmina to Eugen Helimski and acquired by the Universität Hamburg in 2001. Most texts in the corpus originate from the handwritten part of the archive, the others come from sound recordings made by A.I. Kuzmina, transcribed and translated within the INEL project. The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. Funding The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. Contributions/Acknowledgements Audio recordings made by Angelina Kuzmina were transcribed and translated by native speakers of Selkup: Irina Anatolyevna Korobejnikova, written transcription and Russian translation of audio in Central and Southern dialects Natalya Platonovna Izhenbina, written transcription and Russian translation of audio in Southern dialects Svetlana Nikitichna Sankevich (Kunina), oral transcription and Russian translation of audio in Northern dialects Evgeniya Sergeevna Smorgunova (Irikova), oral and written transcription and Russian translation of audio in Northern dialects Valentina Vladimirovna Tamelkina, oral transcription and Russian translation of audio in Northern dialects For individual contributions to the collecting, transcribing and analyzing of individual texts, please refer to the user documentation and to the corpus metadata. The web-based search interface is using the Tsakonian Corpus platform developed by Dr. Timofey Arkhangelskiy, Humboldt Research Fellow at IFUU, Hamburg University New in release 1.0 The corpus now contains 264 texts from 74 speakers, representing the dialects of Middle Taz, Upper Tolka, Baikha (Northern), Narym and Tym (Central), Upper and Middle Ob, Chaya, Upper and Middle Ket (Southern). These contain 7887 sentences and 42466 words in total. Many texts have been provided with annotations for syntactic functions and semantic roles. Corrections to audio transcriptions, glossing and other annotations. https://www.fdr.uni-hamburg.de/record/9753 10.25592/uhhfdm.9753 oai:fdr.uni-hamburg.de:9753 sel handle:11022/0000-0007-E1D5-A doi:10.25592/uhhfdm.9721 info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode endangered language indigenous language L1 data language contact language documentation INEL folklore narrative monologue annotated morphological glossing borrowings code-switching semantic roles syntactic functions information status English translation German translation Russian translation INEL Selkup Corpus info:eu-repo/semantics/other dataset

oai:fdr.uni-hamburg.de:9754 2024-04-09T11:43:35Z user-inel user-adwhh

Wagner-Nagy, Be´ata Arkhipov, Alexandre Brykina, Maria Orlova, Svetlana Ferger, Anne Jettka, Daniel Lazarenko, Elena Lehmberg, Timm Riaposov, Aleksandr Brykina, Maria Orlova, Svetlana Wagner-Nagy, Beáta 2021-12-31 Corpus Citation Brykina, Maria; Orlova, Svetlana; Wagner-Nagy, Beáta. 2021. “INEL Selkup Corpus.” Version 2 .0. Publication date 2021-12-31. https://hdl.handle.net/11022/0000-0007-F4D9-1. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1 Corpus Description The INEL Selkup corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages”), 2016–2033. The corpus enables typologically aware corpus-based grammatical research on the Selkup language and expands the documentation of the lesser described indigenous languages of Northern Eurasia. The INEL Selkup corpus is composed of texts from the archive of Angelina Ivanovna Kuzmina (1924–2002), who gathered a large amount of material on Selkup in almost all regions where the Selkup people lived between 1962–1977. The archive was transferred by A.I. Kuzmina to Eugen Helimski and acquired by the Universität Hamburg in 2001. Most texts in the corpus originate from the handwritten part of the archive, the others come from sound recordings made by A.I. Kuzmina, transcribed and translated within the INEL project. Funding The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. Contributions/Acknowledgements Audio recordings made by Angelina Kuzmina were transcribed and translated by native speakers of Selkup: Irina Anatolyevna Korobejnikova, written transcription and Russian translation of audio in Central and Southern dialects Natalya Platonovna Izhenbina, written transcription and Russian translation of audio in Southern dialects Svetlana Nikitichna Sankevich (Kunina), oral transcription and Russian translation of audio in Northern dialects Evgeniya Sergeevna Smorgunova (Irikova), oral and written transcription and Russian translation of audio in Northern dialects Valentina Vladimirovna Tamelkina, oral transcription and Russian translation of audio in Northern dialects For individual contributions to the collecting, transcribing and analyzing of individual texts, please refer to the user documentation and to the corpus metadata. The web-based search interface is using the Tsakonian Corpus platform developed by Dr. Timofey Arkhangelskiy, Humboldt Research Fellow at IFUU, Hamburg University New in release 2 .0 The corpus now contains 352 transcripts from 89 speakers, representing the dialects of Taz, Upper Tolka, Baikha (Northern), Narym and Tym (Central), Middle Ob, Chaya and Ket (Southern). These contain 14509 sentences and 81498 words in total. Many texts have been provided with annotations for syntactic functions and semantic roles. Corrections to audio transcriptions, glossing and other annotations. Dialectal attribution of several speakers has been revised. The remaining n on-glossed texts from the Kuzmina archive have also been added to the corpus for completeness. These include 3 texts from the written part of the archive and 40 audio recordings, for 20 of which a preliminary transcription is provided. https://www.fdr.uni-hamburg.de/record/9754 10.25592/uhhfdm.9754 oai:fdr.uni-hamburg.de:9754 sel handle:11022/0000-0007-F4D9-1 doi:10.25592/uhhfdm.9721 info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode endangered language indigenous language L1 data language contact language documentation INEL folklore narrative monologue annotated morphological glossing borrowings code-switching semantic roles syntactic functions information status English translation German translation Russian translation INEL Selkup Corpus info:eu-repo/semantics/other dataset

oai:fdr.uni-hamburg.de:9628 2024-12-30T16:11:20Z user-inel user-adwhh

Wagner-Nagy, Be´ata Arkhipov, Alexandre Gusev, Valentin Däbritz, Chris Lasse Ferger, Anne Jettka, Daniel Lazarenko, Elena Lehmberg, Timm Riaposov, Aleksandr Däbritz, Chris Lasse Gusev, Valentin 2021-12-31 Corpus Citation Däbritz, Chris Lasse & Gusev, Valentin. 2021. INEL Evenki Corpus. Version 1.0. Publication date 2021-12-31. Archived at Universität Hamburg. https://hdl.handle.net/11022/0000-0007-F43C-3. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1 Corpus Description The INEL Evenki Corpus has been created within the long-term INEL project (Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Evenki (< Tungusic) language and expands the documentation of the lesser described indigenous languages of Northern Eurasia. The INEL Evenki Corpus covers Northern (Taimyr, Khantayskoe Ozero, Ilimpi, Erbogachon) and Southern (Sym) Evenki dialects, which have or had contacts with other languages dealt with in the INEL project, that is, first and foremost Dolgan and Selkup. The INEL Evenki Corpus is composed of texts from different sources: Published texts from different text collections, inter alia "Sbornik materialov po evenkijskomu (tungusskomu) fol'kloru" (Vasilevich 1936), covering all named dialects. Transcripts of recordings obtained from the Taimyr House of National Arts (TDNT) in Dudinka (2000s) as well as transcripts of recordings made by and from Tat’yana V. Bolina, either of them representing the Khantayskoe Ozero dialect. Texts from the handwritten archive of the Russian ethnographer and linguist Konstantin M. Rychkov recorded in the 1900s/1910s, covering the Taimyr, Ilimpi and Sym dialects. Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of Russian borrowings. Some texts also have annotations for syntactic functions, semantic roles and information status. Funding The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. Contributions/Acknowledgements The Taimyr House of National Arts (TDNT) provided valuable audio material (see above). Tat’yana V. Bolina (TDNT Leading Methodologist for Evenki folklore & culture) recorded some further Evenki material in 2018 and 2019. The Institute of Oriental Manuscripts of the Russian Academy of Sciences (IOM RAS; Институт восточных рукописей РАН) in Saint Petersburg provided scanned manuscripts from the Rychkov archive (The Archives of the Orientalists of IOM RAS, Coll. 49, inv. 1, items 4, 5, 6а, 6б, 6в). The web-based search interface is using the Tsakonian Corpus platform developed by Dr. Timofey Arkhangelskiy. https://www.fdr.uni-hamburg.de/record/9628 10.25592/uhhfdm.9628 oai:fdr.uni-hamburg.de:9628 evn handle:11022/0000-0007-F43C-3 doi:10.25592/uhhfdm.9627 info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode endangered language indigenous language L1 data language contact language documentation INEL folklore narrative monologue annotated morphological glossing borrowings code-switching semantic roles syntactic functions information status English translation German translation Russian translation INEL Evenki Corpus info:eu-repo/semantics/other dataset

oai:fdr.uni-hamburg.de:17676 2025-07-22T10:59:13Z user-inel user-adwhh user-uhh

Lazarenko, Elena Riaposov, Aleksandr Arkhipov, Alexandre Baranova, Vlada 2025-07-17 Corpus citation Baranova, Vlada. 2025. INEL Kalmyk Corpus. Archived at Universität Hamburg. Version 1.0. Publication date 2025-07-17. https://hdl.handle.net/11022/0000-0007-FFB1-2. Archived at Universität Hamburg. In: The INEL Corpora of Indigenous Northern Eurasian Languages. https://hdl.handle.net/11022/0000-0007-F45A-1. Corpus Description The INEL Kalmyk Corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033. The corpus consists of transcribed audio recordings collected in the Republic of Kalmykia between 2007 and 2018 in the Ketchenerovsky District (Derbet and Torgut dialect). All texts in the corpus are provided with interlinear morpheme-by-morpheme glosses and translation into English and Russian. All texts for which the audio recordings were accessible are time-aligned with them. Corpus Size The corpus contains 55 texts, 2,076 sentences, and 19,742 tokens. The total duration of the audio recordings is 4 hours and 23 minutes. Funding The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. Contributions / Acknowledgements Native speakers generously shared their knowledge of Kalmyk, making the creation of this corpus possible. Zamira Xejchieva and Galina Cabdy`rova assisted with oral transcription and the Russian translation of the audio materials. Part of the materials were recorded during joint expeditions of St. Petersburg University and the Institute for Linguistic Studies of the Russian Academy of Sciences in 2007–2008, under the direction of Elena Perekhvalskaya and Sergey Say. This corpus primarily follows the transcription system and partially adopts the glossing conventions developed by a research team led by Sergey Say, with input from other expedition participants. Searching the corpus The corpus can be downloaded from the ZFDM Repository using the links provided below and browsed or searched locally using the EXMARaLDA software or, alternatively, ELAN. Online search with Tsakorpus platform is available at https://inel.corpora.uni-hamburg.de/KalmykCorpus/search. Remote search with EXMARaLDA is also possible without downloading all the files (see https://inel.corpora.uni-hamburg.de/portal/help/en/index.php). See the user documentation (section 3) for details on transcription, annotation tiers and annotation tags. Find further information and links on the Kalmyk Corpus page at the INEL Resources portal: https://inel.corpora.uni-hamburg.de/portal/corpora/kalmyk/. https://www.fdr.uni-hamburg.de/record/17676 10.25592/uhhfdm.17676 oai:fdr.uni-hamburg.de:17676 xal info:eu-repo/semantics/altIdentifier/handle/11022/0000-0007-FFB1-2 doi:10.25592/uhhfdm.17675 info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode endangered language indigenous language language contact language documentation INEL folklore narrative monologue morphological glossing English translation Russian translation EXMARaLDA ELAN XML ISO/TEI Mongolic languages annotated corpus INEL Kalmyk Corpus info:eu-repo/semantics/other dataset

oai:fdr.uni-hamburg.de:9747 2025-09-12T12:09:37Z user-inel user-adwhh

Wagner-Nagy, Be´ata Arkhipov, Alexandre Däbritz, Chris Lasse Kudryakova, Nina Stapert, Eugénie Däbritz, Chris Lasse Kudryakova, Nina Stapert, Eugénie 2019-08-31 Corpus Citation Däbritz, Chris Lasse; Kudryakova, Nina; Stapert, Eugénie. 2019. "INEL Dolgan Corpus." Version 1.0. Publication date 2019-08-31. https://hdl.handle.net/11022/0000-0007-CAE7-1. Archived in Hamburger Zentrum für Sprachkorpora. In: Wagner-Nagy, Beáta; Arkhipov, Alexandre; Ferger, Anne; Jettka, Daniel; Lehmberg, Timm (eds.). The INEL corpora of indigenous Northern Eurasian languages. Corpus Description The INEL Dolgan corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages”), 2016–2033. The corpus makes possible typologically aware corpus-based grammatical research on the Dolgan language and expands the documentation of the lesser described indigenous languages of Northern Eurasia. The INEL Dolgan corpus is composed of texts from different sources: 1. Published folklore texts from an edited volume ("Fol'klor Dolgan", P.E. Efremov 2000), 2. Transcripts of recordings obtained from the Taymyr House of Folk Art (TDNT) in Dudinka (1970s-2000s), 3. Transcripts from the collection of Dr. Eugénie Stapert recorded on several fieldwork trips in 2007-2010, 4. Transcripts of recordings made on a fieldwork trip in 2017. The first group as well as parts of the third group were already transcribed and translated, the rest of the recordings was transcribed and translated within the INEL project. Each text in the corpus is provided with morphological glossing, translation into English, Russian and German, as well as annotation of Russian borrowings. Some texts also have annotations for syntactic functions, semantic roles and information structure/information status. Funding The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. https://www.fdr.uni-hamburg.de/record/9747 10.25592/uhhfdm.9747 oai:fdr.uni-hamburg.de:9747 handle:11022/0000-0007-CAE7-1 doi:10.25592/uhhfdm.9746 info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode endangered language indigenous language L1 data language contact language documentation INEL folklore narrative monologue annotated morphological glossing borrowings code-switching semantic roles syntactic functions information status English translation German translation Russian translation INEL Dolgan Corpus info:eu-repo/semantics/other dataset

oai:fdr.uni-hamburg.de:16182 2025-12-22T10:31:51Z user-uhh user-adwhh user-inel

Arkhipov, Alexandre Wagner-Nagy, Beáta Lazarenko, Elena Riaposov, Aleksandr Lehmberg, Timm Shluinsky, Andrey Khanina, Olesya Wagner-Nagy, Beáta 2024-11-30 Corpus Citation Shluinsky, Andrey; Khanina, Olesya; Wagner-Nagy, Beáta. 2024. INEL Enets Corpus. Version 1.0. Publication date 2024-11-30. https://hdl.handle.net/11022/0000-0007-FE1D-C. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1 Corpus Description The INEL Enets corpus has been created within the long-term INEL project ("Grammatical Descriptions, Corpora and Language Technology for Indigenous Northern Eurasian Languages"), 2016–2033. The corpus includes texts recorded between 1962–2017 in both Enets lects – Forest Enets and Tundra Enets. The sources of the corpus (see more details in the user documentation, section 2.2) are: Audio recordings done by Olesya Khanina, Maria Ovsjannikova, Andrey Shluinsky, Natalia Stoynova and Sergey Trubetskoy, Legacy audio recordings done by Vera Bettu, Nina N. Bolina, Dar`ya S. Bolina, Zoya N. Bolina, Oksana E. Dobzhanskaya, Valentin Gusev, Eugene Helimski†, Kazimir I. Labanauskas†, Larisa Leisiö, Marina Lyublinskaya, Kaur Mägi, Viktor N. Pal`chin, Marina N. Pal`china, Irina P. Sorokina†, Anna Urmanchieva, Beáta Wagner-Nagy and possibly other people, Published audio recordings, Texts published by Dar`ya S. Bolina, Yaroslav A. Gluxij† and Vasilij A. Susekov†, Eugene Helimski†, Kazimir I. Labanauskas†, Tibor Mikola†, János Pusztay, Irina P. Sorokina†, Anna Urmanchieva, Legacy manuscript transcriptions and self-transcriptions done and/or edited by Dar`ya S. Bolina, Galina S. Bolina, Zoya N. Bolina, Valentin Gusev, Eugene Helimski†, Kazimir I. Labanauskas†, Larisa Leisiö, Marina Lyublinskaya, Vasilij F. Ly`rmin†, Anton N. Pal`chin, Viktor N. Pal`chin, Ivan I. Silkin†, Irina P. Sorokina†, Natal`ya M. Tereščenko†, Anna Urmanchieva and possibly other people. All texts in the corpus are provided with interlinear morpheme-by-morpheme glosses and translation into English and Russian. All texts for which the audio recordings were accessible are time-aligned with them. Video recordings are also included into the corpus if available. Corpus size Forest Enets: 541 texts, 41,396 sentences, 173,379 tokens Tundra Enets: 137 texts, 12,737 sentences, 45,331 tokens Total: 678 texts, 54,133 sentences, 218,710 tokens Total duration of audio: 43 hours 26 minutes Funding The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities. Preliminary glossing work included into this corpus was supported by Endangered Languages Documentation Programme (ELDP) and by Max Planck Institute for Evolutionary Anthropology (MPI-EVA). See more details on financial support in the documentation file below, section 1.6. Contributions/Acknowledgements Dozens of people and many institutions contributed to the corpus (see more details in the documentation file below, section 1.6). We are especially grateful to: Enets speakers who generously shared their knowledge, especially those who spent many days working with us: Aleksandr S. Bolin†, Leonid D. Bolin†, Viktor N. Bolin, Nadezhda K. Bolina, Nina N. Bolina, Ekaterina S. Glibchenko, Gennadij A. Ivanov†, Irina P. Koshkaryova†, Valentina P. Nader, Lyudmila P. Novosyolova, Svetlana A. Roslyakova†, Ivan I. Silkin†, Nikolaj I. Silkin, Alevtina S. Silkina, Zoya A. Turutina, Tat`yana Ch. Yar, In particular, Zoya N. Bolina and Viktor N. Pal`chin who also collaborated in ELDP project and extensively transcribed Enets recordings, Natalia Stoynova, Sergey Trubetskoy and foremostly Maria Ovsjannikova who did recordings and transcriptions of Enets texts, Institutions and private individuals who shared legacy data: the Institute for Linguistic Studies RAS, the Taymyr House of National Arts, the Dudinka branch of GTRK “Norilsk”; Dar`ya S. Bolina, Oksana E. Dobzhanskaya, Valentin Gusev, Larisa Leisiö, Viktor N. Pal`chin, Irina P. Sorokina†, Anna Urmanchieva, Marina Lyublinskaya and Anna Urmanchieva who kindly permitted to include texts processed by them into the corpus, Dar`ya S. Bolina who consulted a lot in the process of compilation of the corpus. Searching the corpus The corpus can be downloaded from the ZFDM Repository using the links provided below and browsed or searched locally using the EXMARaLDA software or, alternatively, ELAN. Online search with Tsakorpus platform is available at https://inel.corpora.uni-hamburg.de/EnetsCorpus/search. Remote search with EXMARaLDA is also possible without downloading all the files (see https://inel.corpora.uni-hamburg.de/portal/help/en/index.php#search). See the user documentation (section 3) for details on transcription, annotation tiers and annotation tags. Find further information and links on the Enets Corpus page at the INEL Resources portal: https://inel.corpora.uni-hamburg.de/portal/corpora/enets/. https://www.fdr.uni-hamburg.de/record/16182 10.25592/uhhfdm.16182 oai:fdr.uni-hamburg.de:16182 handle:11022/0000-0007-FE1D-C doi:10.25592/uhhfdm.16181 info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode Uralic Samoyedic Enets Forest Enets Tundra Enets endangered language language contact language documentation legacy data INEL AdWHH text corpus speech corpus parallel texts folklore tales narrative dialogue song transcription time-aligned audio video morphological glossing part-of-speech borrowings code-switching English translation Russian translation EXMARaLDA ELAN XML ISO/TEI INEL Enets Corpus info:eu-repo/semantics/other dataset