""

First set of candidate veraPDF corpus files delivered

Duff Johnson // May 18, 2015

News


As the veraPDF project gets under way the project is generating the first test files for PDF/A-1, complementing the Isartor test suite.

Dual Lab, the veraPDF consortium’s lead developer, has loaded the first set of 49 candidate test files to the public veraPDF github repository.

The test files can be found at the veraPDF corpus for PDF/A-1b (under development) along with the wiki page describing the set.

All test files follow the pattern of the Isartor Test Suite:

  • naming convention refers to the corresponding subsection in ISO 19005-
  • they are all atomic
  • they are self-documented via PDF bookmarks

However, unlike Isartor, these files also contain “pass” tests.

There is one remarkable file to note:

6-1-12-t07-fail-a: Maximum number of Indirect objects (8,388,607) in PDF file is exceeded (the file is about 40Mb zipped)

Screenshot of File Being Repaired dialog.The document cross reference table contains more than maximum allowed number of records, violating PDF/A-1 implementation limits.

Warning: Be careful trying to validate this file in Adobe Acrobat! It will probably open after 30 seconds of thrashing, but it will hang on preflight checks.


ABOUT THE AUTHORS

Duff Johnson
Duff Johnson

Duff serves the PDF industry as ISO Project co-Leader and US TAG chair for both ISO 32000 (the PDF specification) and ISO 14289 (PDF/UA). As Executive Director of the PDF Association, Duff coordinates several working groups, speaks at a wide variety of industry events and promotes the advancement and adoption of PDF technology worldwide. An independent consultant, Duff Johnson is a veteran …

ABOUT THE AUTHORS

Duff Johnson

Duff Johnson

Duff serves the PDF industry as ISO Project co-Leader and US TAG chair for both ISO 32000 (the PDF specification) and …

© 2020 Assosiation for Digital Document Standards e.V. | Privacy Policy | Imprint