Dataset Open Access
Schwiebert, Gerald; Weber, Cornelius; Qu, Leyuan; Siqueira, Henrique; Wermter, Stefan
<?xml version='1.0' encoding='utf-8'?> <oai_dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"> <dc:creator>Schwiebert, Gerald</dc:creator> <dc:creator>Weber, Cornelius</dc:creator> <dc:creator>Qu, Leyuan</dc:creator> <dc:creator>Siqueira, Henrique</dc:creator> <dc:creator>Wermter, Stefan</dc:creator> <dc:date>2022-03-01</dc:date> <dc:description>The German Lipreading dataset consists of 250,000 publicly available videos of the faces of speakers of the Hessian Parliament, which was processed for word-level lip reading using an automatic pipeline. The format is similar to that of the English language Lip Reading in the Wild (LRW) dataset, with each H264-compressed MPEG-4 video encoding one word of interest in a context of 1.16 seconds duration, which yields compatibility for studying transfer learning between both datasets. Choosing video material based on naturally spoken language in a natural environment ensures more robust results for real-world applications than artificially generated datasets with as little noise as possible. The 500 different spoken words ranging between 4-18 characters in length each have 500 instances and separate MPEG-4 audio- and text metadata-files, originating from 1018 parliamentary sessions. Additionally, the complete TextGrid files containing the segmentation information of those sessions are also included. The size of the uncompressed dataset is 16GB.</dc:description> <dc:identifier>https://www.fdr.uni-hamburg.de/record/10048</dc:identifier> <dc:identifier>10.25592/uhhfdm.10048</dc:identifier> <dc:identifier>oai:fdr.uni-hamburg.de:10048</dc:identifier> <dc:language>deu</dc:language> <dc:relation>arxiv:arXiv:2202.13403</dc:relation> <dc:relation>doi:10.25592/uhhfdm.10047</dc:relation> <dc:rights>info:eu-repo/semantics/openAccess</dc:rights> <dc:rights>https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode</dc:rights> <dc:subject>Computer Vision</dc:subject> <dc:subject>Pattern Recognition</dc:subject> <dc:subject>Machine Learning</dc:subject> <dc:subject>Deep Learning</dc:subject> <dc:subject>Language</dc:subject> <dc:subject>Dataset</dc:subject> <dc:subject>Automatic Speech Recognition</dc:subject> <dc:subject>Transfer Learning</dc:subject> <dc:subject>Lip Reading</dc:subject> <dc:subject>Corpus</dc:subject> <dc:title>GLips - German Lipreading Dataset</dc:title> <dc:type>info:eu-repo/semantics/other</dc:type> <dc:type>dataset</dc:type> </oai_dc:dc>