A PDF Test-Set for Well-Formedness Validation in JHOVE - The Good, the Bad and the Ugly

dc.bibliographicCitation.bookTitleProceedings of iPRES Conference, Kyoto, Japan, September 2017 (iPRES 2017)eng
dc.contributor.authorLindlar, Michelle
dc.contributor.authorTunnat, Yvonne
dc.contributor.authorWilson, Carl
dc.date.accessioned2017-12-22T08:49:00Z
dc.date.available2019-06-28T07:29:50Z
dc.date.issued2017
dc.description.abstractDigital preservation and active software stewardship are both cyclical processes. While digital preservation strategies have to be reevaluated regularly to ensure that they still meet technological and organizational requirements, software needs to be tested with every new release to ensure that it functions correctly. JHOVE is an open source format validation tool which plays a central role in many digital preservation workflows and the PDF module is one of its most important features. Unlike tools such as Adobe PreFlight or veraPDF which check against requirements at profile level, JHOVE’s PDF-module is the only tool that can validate the syntax and structure of PDF files. Despite JHOVE’s widespread and long-standing adoption, the underlying validation rules are not formally or thoroughly tested, leading to bugs going undetected for a long time. Furthermore, there is no ground-truth data set which can be used to understand and test PDF validation at the structural level. The authors present a corpus of light-weight files designed to test the validation criteria of JHOVE’s PDF module against “well-formedness”. We conclude by measuring the code coverage of the test corpus within JHOVE PDF validation and by feeding detected inconsistencies of the PDF-module back into the open source development process.eng
dc.description.versionpublishedVersioneng
dc.formatapplication/pdf
dc.identifier.urihttps://doi.org/10.34657/4797
dc.identifier.urihttps://oa.tib.eu/renate/handle/123456789/1223
dc.language.isoengeng
dc.publisherZenodoeng
dc.relation.doihttps://doi.org/10.5281/zenodo.1115541
dc.rights.licenseCC BY 4.0 Unportedeng
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/eng
dc.subject.ddc620eng
dc.subject.otherdigital preservationeng
dc.subject.otherfile format validationeng
dc.subject.othertest dataeng
dc.subject.otherquality assuranceeng
dc.subject.otherJHOVEeng
dc.titleA PDF Test-Set for Well-Formedness Validation in JHOVE - The Good, the Bad and the Uglyeng
dc.typeConferenceObjecteng
dc.typeTexteng
tib.accessRightsopenAccesseng
wgl.contributorTIBeng
wgl.contributorZBWeng
wgl.subjectIngenieurwissenschafteneng
wgl.typeKonferenzbeitrageng
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
35.pdf
Size:
152.3 KB
Format:
Adobe Portable Document Format
Description: