Version 1 of images2pdf

Updated 2016-03-11 08:04:47 by pooryorick

images2pdf is a command-line utility that uses pdf4tcl to store multiple images into a PDF file without converting or otherwise processing the image.

Synopsis

images2pdf ?option value? ?images filename ...?

Options

outfile
The name of the output file. By default, this is derived from outprefix
height
The maximum height of any page. The default is -1, which means the height of each image is used.
width
The maximum width of any page. The default is 800.
image
The name of a file containing an image to include.
outprefix
A prefix to use for any files produced. If not provided, one is chosen.
images
Must be the last option provided, as it signifies that the remaining arguments are image filenames.

Description

PYK 2016-03-10: The script on this page inserts jpeg and png images as-is into a pdf document without converting or transcoding the images. To verify this, extracted them again with a utility like pdfimages -all that just extacts images without processing them. The extracted images should be identical, bit-for-bit, to the original images.

Each image is output as an separate page, and the width and height of the page are the width and height of the image. To constrain one or the other, use the height or width options. The image width/height ratio is always preserved. The constraint only reflects how the image is dynamically resized for presentation, not now the image is stored in the PDF file -- which for png and jpg files is always as a bit-for-bit identical copy of the original image. Currently, only jpeg and png files are supported.

Some PDF readers don't provide controls to zoom to a size smaller than the dimensions of the page that contain it, but they do provide controls to magnify the image, so a default value of 800 is a good choice to ensure that the initial image fits into a reasonable display width, while still making it possible to zoom in for greater detail.

Implementation

#! /bin/env tclsh

package require fileutil::magic::filetype

package require pdf4tcl

proc main {argv0 argv} {
    set dims {}
    set ftypes {}
    set images {}
    set orient {}
    set maxheight -1
    set maxwidth 800
    while {[llength $argv]} {
        set argv [lassign $argv[set argv {}] key val]
        switch $key {
            outfile {
                set outfile $val
            }
            height {
                set maxheight $val
            }
            image {
                lappend images $val
            }
            images {
                lappend images $val {*}$argv[set argv {}]
            }
            outprefix {
                set outprefix $val
            }
            width {
                set maxwidth val
            }
            default {
                return -code error [list {unknown option} $key]
            }
        }
    }
    if {[info exists outprefix]} {
        while {[file exists $outprefix-[incr outi]]} {}
    } else {
        while {[llength [glob -nocomplain [set outprefix [
            string repeat 0 [incr outi]]]*]]} {}
    }

    pdf4tcl::new mypdf
    foreach image $images {
        set ftype [fileutil::magic::filetype $image]
        if {[string match {JPEG *} $ftype]} {
            set ftype jpeg
        } elseif {} {
        } else {
            return -code error [list {unknown file type} $ftype]
        }
        lappend ftypes $ftype
        # first run is just to get image dimensions
        set id [mypdf addImage $image -type $ftype]
        set width [mypdf getImageWidth $id]
        set height [mypdf getImageHeight $id]
        puts stderr [list image $image type $ftype  height $height width $width]
        while {($maxwidth > -1 && $width > $maxwidth) 
            || ($maxheight > -1 && $height > $maxheight)} {

            set height [expr {$height / 2}]
            set width [expr {$width / 2}]
        }
        lappend dims [list $width $height]
    }
    mypdf destroy

    set imagefiles {}
    set idx -1
    foreach image $images dim $dims ftype $ftypes {
        pdf4tcl::new mypdf -paper $dim
        set id [mypdf addImage $image -type $ftype]
        mypdf putImage $id 0 0 -width [lindex $dim 0] -height [lindex $dim 1]
        set fname $outprefix-[incr idx].pdf
        mypdf write -file $fname 
        lappend imagefiles $fname
        mypdf destroy
    }
    if {[llength $imagefiles] > 1} {
        pdf4tcl::catPdf {*}$imagefiles $outfile
        if {![info exists outfile]} {
            set outfile $outprefix-[incr idx].pdf
        }
        file delete {*}$imagefiles
    } else {
        if {[info exists outfile]} {
            file rename [lindex $imagefiles 0] $outfile
        }
    }
}

main $argv0 $argv