Dataset Open Access

Annotated subset of RDR notebooks for CVC development

Hussein Mohammed; Quang-Vinh Dang


JSON Export

{"conceptdoi":"10.25592/uhhfdm.17931","conceptrecid":"17931","created":"2025-09-05T11:45:44.007934+00:00","doi":"10.25592/uhhfdm.17932","id":17932,"links":{"badge":"https://www.fdr.uni-hamburg.de/badge/doi/10.25592/uhhfdm.17932.svg","conceptbadge":"https://www.fdr.uni-hamburg.de/badge/doi/10.25592/uhhfdm.17931.svg","conceptdoi":"http://doi.org/10.25592/uhhfdm.17931","doi":"http://doi.org/10.25592/uhhfdm.17932"},"metadata":{"access_right":"open","access_right_category":"success","communities":[{"id":"csmc"},{"id":"uhh"}],"creators":[{"affiliation":"Universit\u00e4t Hamburg","name":"Hussein Mohammed","orcid":"0000-0001-5020-3592"},{"affiliation":"Universit\u00e4t Hamburg","name":"Quang-Vinh Dang","orcid":"0000-0002-6715-7112"}],"description":"<p>This&nbsp;dataset is structured into four components, each serving a distinct role in the development of a&nbsp;document analysis system.</p>\n\n<ol>\n\t<li>\n\t<p><strong>Word-level annotations</strong> are provided in the file <code>word_annotations_for_cropped_images.json</code>. These annotations describe the images contained in the <code>cropped_images</code> folder. Each entry specifies the location of a word as a polygon, together with its orientation (horizontal, vertical, or tilted) and the type of writing implement used (ink or pencil). Additional metadata, such as bounding boxes and segmentation areas, is also included.</p>\n\t</li>\n\t<li>\n\t<p><strong>Cropped images</strong> are stored in the <code>cropped_images</code> folder. This set comprises 50 images, each containing only the primary page extracted from the corresponding full notebook scans.</p>\n\t</li>\n\t<li>\n\t<p><strong>Full images</strong> are located in the <code>full_images</code> folder. This collection also contains 50 items, representing the complete notebook scans in which the primary page appears alongside other material.</p>\n\t</li>\n\t<li>\n\t<p><strong>Page-level annotations</strong> are contained in the <code>page_annotations</code> folder. These are provided in YOLO format, with a single class (<code>page</code>) defined in <code>classes.txt</code>. Each annotation file specifies the bounding box of the primary page within the corresponding image in the <code>full_images</code> folder.</p>\n\t</li>\n</ol>\n\n<p>Examples illustrate the annotation structure. In the JSON file, a typical word annotation records polygon coordinates, the attribute <code>&quot;orientation&quot;: &quot;horizontal&quot;</code>, and <code>&quot;writing_tool&quot;: &quot;pencil&quot;</code>. In the YOLO annotations, a sample entry such as <code>0 0.499023 0.500776 0.777344 0.816912</code> denotes the normalised coordinates of the primary page bounding box.</p>\n\n<p><strong>Acknowledgement:</strong></p>\n\n<p>The research for this work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany&rsquo;s Excellence Strategy - EXC 2176 &lsquo;Understanding Written Artefacts: Material, Interaction and Transmission in Manuscript Cultures&rsquo;, project no. 390893796. The research was conducted within the scope of the Centre for the Study of Manuscript Cultures (CSMC) at Universit&auml;t Hamburg.</p>\n\n<p>We thank Hui Xu for her support in annotating the images.</p>","doi":"10.25592/uhhfdm.17932","keywords":["page detection","word detection","colour recognition","recognition of writing implement","visual navigation","computational visual cataloguing"],"language":"eng","license":{"id":"CC-BY-4.0"},"publication_date":"2025-09-05","related_identifiers":[{"identifier":"10.25592/uhhfdm.17809","relation":"isSupplementedBy","scheme":"doi"},{"identifier":"10.25592/uhhfdm.17613","relation":"isSupplementedBy","scheme":"doi"},{"identifier":"10.25592/uhhfdm.17615","relation":"isSupplementedBy","scheme":"doi"},{"identifier":"10.25592/uhhfdm.17931","relation":"isVersionOf","scheme":"doi"}],"relations":{"version":[{"count":1,"index":0,"is_last":true,"last_child":{"pid_type":"recid","pid_value":"17932"},"parent":{"pid_type":"recid","pid_value":"17931"}}]},"resource_type":{"title":"Dataset","type":"dataset"},"title":"Annotated subset of RDR notebooks for CVC development","version":"1"},"owners":[96],"revision":3,"updated":"2025-09-06T08:28:38.003438+00:00"}

Cite record as