Converting a two-pages-per-sheet PDF to one-page-per-sheet

If you got a PDF scan of a book with two pages per sheet such as this one (BTW this is a manuscript dating from 1449: “Tractato de septe peccati mortali” by Frate Antonino, image copyright Houghton Library, Harvard University, Cambridge, Mass.)

and you wish to convert it to a one-page-per-sheet PDF:

then you can use these steps on Debian Linux:

  1. Query the PDF for the number of pages and the resolution:

    pdfinfo ugly.pdf

    look at the “Pages” output of this command. Now type:

    pdftoppm -gray -l 1 ugly.pdf test

    then inspect the resulting test-001.pgm file with an image editor to find out the resolution; for the pages and the resolution I got 223 and 1650 x 1275 pts respectively, so these numbers will be used in the following – you should of course adapt them to your results.

  2. Create a bash script to process a single page:

    cat >
    page=`printf '%03d' $1`
    pagenew=`printf '%03d_' $1`
    gs -sDEVICE=pdfwrite -dNOPAUSE -dQUIET -dBATCH -dFirstPage=$1 -dLastPage=$1 -sOutputFile="$page.pdf" ugly.pdf
    pdftoppm -gray "$page.pdf" > "$page.pgm"
    convert -crop 825x1275 "$page.pgm" "$pagenew.pgm"
    rm "$page.pgm"
    chmod u+x

    Note that for the X-resolution option of the convert command, I enter the half (625) of the horizontal resolution above (1250); in this way the pgm will be split in two vertically. The pdftoppm command has a -mono option to produce monochrome images, and a -r option to set the resolution.

  3. Run the bash script on all pages:

    seq 223 | xargs -n1 ./
  4. Finally concatenate the pages to get hold of the converted PDF:

    convert *_.pgm nice.pdf

    or do that in two steps:

    for i in *.pgm; do convert -compress fax $i `basename $i .pbm`.pdf; done
    gs -q -sPAPERSIZE=a4 -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=nice.pdf *_.pdf