Dataset Open Access
Schwiebert, Gerald; Weber, Cornelius; Qu, Leyuan; Siqueira, Henrique; Wermter, Stefan
<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
<leader>00000nmm##2200000uu#4500</leader>
<datafield tag="542" ind1=" " ind2=" ">
<subfield code="l">open</subfield>
</datafield>
<datafield tag="245" ind1=" " ind2=" ">
<subfield code="a">GLips - German Lipreading Dataset</subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">user-uhh</subfield>
</datafield>
<datafield tag="909" ind1="C" ind2="O">
<subfield code="o">oai:fdr.uni-hamburg.de:10048</subfield>
<subfield code="p">user-uhh</subfield>
</datafield>
<datafield tag="260" ind1=" " ind2=" ">
<subfield code="c">2022-03-01</subfield>
</datafield>
<datafield tag="773" ind1=" " ind2=" ">
<subfield code="a">arXiv:2202.13403</subfield>
<subfield code="i">isReferencedBy</subfield>
<subfield code="n">arxiv</subfield>
</datafield>
<datafield tag="773" ind1=" " ind2=" ">
<subfield code="a">10.25592/uhhfdm.10047</subfield>
<subfield code="i">isVersionOf</subfield>
<subfield code="n">doi</subfield>
</datafield>
<datafield tag="540" ind1=" " ind2=" ">
<subfield code="u">https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode</subfield>
<subfield code="a">Creative Commons Attribution Non Commercial No Derivatives 4.0 International</subfield>
</datafield>
<datafield tag="856" ind1="4" ind2=" ">
<subfield code="s">11712886631</subfield>
<subfield code="u">https://www.fdr.uni-hamburg.de/record/10048/files/GLips.zip</subfield>
<subfield code="z">md5:ce223c2b63df6f0c94a847578b2a2414</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Weber, Cornelius</subfield>
<subfield code="u">University of Hamburg</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Qu, Leyuan</subfield>
<subfield code="u">University of Hamburg</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Siqueira, Henrique</subfield>
<subfield code="u">University of Hamburg</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Wermter, Stefan</subfield>
<subfield code="u">University of Hamburg</subfield>
</datafield>
<datafield tag="999" ind1="C" ind2="5">
<subfield code="x">Gerald Schwiebert, Cornelius Weber, Leyuan Qu, Henrique Siqueira, Stefan Wermter (2022). A Multimodal German Dataset for Automatic Lip Reading Systems and Transfer Learning</subfield>
</datafield>
<datafield tag="999" ind1="C" ind2="5">
<subfield code="x">arXiv:2202.13403</subfield>
</datafield>
<datafield tag="500" ind1=" " ind2=" ">
<subfield code="a">Copyright of original data: Hessian Parliament (https://hessischer-landtag.de).
If you use this dataset, you agree to use it for research purpose only and to cite the following reference in any works that make any use of the dataset.
Reference:
Gerald Schwiebert, Cornelius Weber, Leyuan Qu, Henrique Siqueira, Stefan Wermter (2022). A Multimodal German Dataset for Automatic Lip Reading Systems and Transfer Learning. arXiv:2202.13403</subfield>
</datafield>
<datafield tag="520" ind1=" " ind2=" ">
<subfield code="a"><p>The German Lipreading dataset consists of 250,000 publicly available videos of the faces of speakers of the Hessian Parliament, which was processed for word-level lip reading using an automatic pipeline. The format is similar to that of the English language Lip Reading in the Wild (LRW) dataset, with each H264-compressed MPEG-4 video encoding one word of interest in a context of 1.16 seconds duration, which yields compatibility for studying transfer learning between both datasets. Choosing video material based on naturally spoken language in a natural environment ensures more robust results for real-world applications than artificially generated datasets with as little noise as possible. The 500 different spoken words ranging between 4-18 characters in length each have 500 instances and separate MPEG-4 audio- and text metadata-files, originating from 1018 parliamentary sessions. Additionally, the complete TextGrid files containing the segmentation information of those sessions are also included. The size of the uncompressed dataset is 16GB.</p></subfield>
</datafield>
<datafield tag="980" ind1=" " ind2=" ">
<subfield code="a">dataset</subfield>
</datafield>
<controlfield tag="005">20220503084804.0</controlfield>
<datafield tag="024" ind1=" " ind2=" ">
<subfield code="a">10.25592/uhhfdm.10048</subfield>
<subfield code="2">doi</subfield>
</datafield>
<datafield tag="650" ind1="1" ind2="7">
<subfield code="a">cc-by</subfield>
<subfield code="2">opendefinition.org</subfield>
</datafield>
<controlfield tag="001">10048</controlfield>
<datafield tag="041" ind1=" " ind2=" ">
<subfield code="a">deu</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">Computer Vision</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">Pattern Recognition</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">Machine Learning</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">Deep Learning</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">Language</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">Dataset</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">Automatic Speech Recognition</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">Transfer Learning</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">Lip Reading</subfield>
</datafield>
<datafield tag="653" ind1=" " ind2=" ">
<subfield code="a">Corpus</subfield>
</datafield>
<datafield tag="100" ind1=" " ind2=" ">
<subfield code="a">Schwiebert, Gerald</subfield>
<subfield code="u">University of Hamburg</subfield>
</datafield>
</record>