Dataset Open Access
Wagner-Nagy, Beáta; Sipőcz, Katalin
Corpus Citation
Sipőcz, Katalin & Wagner-Nagy, Beáta. 2026. INEL Mansi Corpus. Version 1.0. Publication date 2026-06-05. https://hdl.handle.net/11022/0000-0008-00F5-3. Archived at Universität Hamburg. In: The INEL corpora of indigenous Northern Eurasian languages. https://hdl.handle.net/11022/0000-0007-F45A-1
Corpus Description
The INEL Mansi Corpus has been created as part of the long-term project INEL (“Grammatical Descriptions, Corpora, and Language Technology for Indigenous Northern Eurasian Languages”) in the context of the Academies’ Programme, coordinated by the Union of the German Academies of Sciences and Humanities.
Mansi is a relatively well-documented language, with numerous grammatical descriptions and an existing corpus. However, not all varieties have been represented in previously available corpora. The present corpus addresses this gap by incorporating materials from the Tavda variety, alongside a number of texts from the Western dialect group. Most of the corpus data originate from the Northern dialect group.
The INEL Mansi Corpus comprises texts drawn from the following sources:
All texts in the corpus are provided with interlinear morpheme-by-morpheme glosses. All texts for which audio recordings are available have been time-aligned with the corresponding recordings.
Corpus size
The corpus contains 196 texts from 47 speakers, 6,179 sentences and 48,145 tokens. The total duration of the audio recordings is 1 hour 36 minutes.
Funding
The corpus has been produced in the context of the joint research funding of the German Federal Government and Federal States in the Academies’ Programme, with funding from the Federal Ministry of Education and Research and the Free and Hanseatic City of Hamburg. The Academies’ Programme is coordinated by the Union of the German Academies of Sciences and Humanities.
Searching the corpus
The corpus can be downloaded from the ZFDM Repository using the links provided below and browsed or searched locally using the EXMARaLDA software or, alternatively, ELAN.
Online search with Tsakorpus platform is available at https://inel.corpora.uni-hamburg.de/MansiCorpus/search.
Remote search with EXMARaLDA is also possible without downloading all the files (see https://inel.corpora.uni-hamburg.de/portal/help/en/index.php).
See the user documentation (section 3) for details on transcription, annotation tiers and annotation tags.
Find further information and links on the Mansi Corpus page at the INEL Resources portal: https://inel.corpora.uni-hamburg.de/portal/corpora/mansi/.
| Name | Size | |
|---|---|---|
|
mansi-1.0-documentation.pdf
md5:63473a38445166005158f577875e5c8a |
1.1 MB | Download |
|
mansi-1.0-lite.zip
md5:546e9937795b291f81a50e96c69637ee |
24.3 MB | Download |
|
mansi-1.0-mp3.zip
md5:77e33f69cdc3234abf96a8f0975072b4 |
390.3 MB | Download |
|
mansi-1.0-standard.zip
md5:f56840f9086b6cdfa85a986026f26303 |
1.1 GB | Download |