tclMuPdf

ABU 20-Dec-2018 - tclMuPdf 1.5 released.

tclMuPdf is a porting of the MuPDF framework (see at mupdf.com ), for fast and high-quality rendering of PDF pages.

History

  • 13-dec-2016 - Version 1.0b1 (beta) released. No support for MacOS
  • 15-dec-2016 - Version 1.0 - Support for MacOS. API unchanged, but big internal optimization for reusing opened pages (read
  • 28-jan-2017 - Version 1.1 (Withdrawn) - new commands: fields, field, anchor, mupdf::libinfo . Added package mupdf-notk for tcl-only usage. (read the documentation). Aligned with core library MuPDF v.1.10a
  • 23-feb-2017 - Version 1.2 (Withdrawn) with a lot of new features:
    • commands for extracting and searching text
    • commands for extracting images from PDF pages (experimental)
    • first steps towards PDF manipulations: you can add new signature fields, then save your changes.
    • many minor auxiliary commands
  • 26-feb-2017 - Version 1.2.1 - BugFixing - replaces Version 1.1 and 1.2
    • a bug related to the saveImage command, introduced in 1.1 was fixed here.
    • Versions 1.1 and 1.2 were withdrawn
  • 29-sep-2017 - Version 1.3 - image extraction
    • subcommands for extracting images, previously released as experimental.images are now official, more powerful and deeply tested with a lot of different kinds of images.
    • Aligned with core library MuPDF v.1.11
  • 29-oct-2017 - Version 1.4 - portfolio & password-protected files
    • commands for working with PDF-portfolio (embedded files)
    • ability to work with password-protected PDFs.
    • BUG-FIX removed limit to 130 char for extracting images (full path name length)
  • 20-Dec-2018 - Version 1.5 - field get&set, new export options, graft & embed pages.
    • Working with fields: now you can also change the fields values (and bug-fixed support for Unicode characters)
    • Saving/exporting PDF: added -decrypt and -flatten option
    • Grafting & embedding pages: you can graft a page taken from a PDF, and put it over/below another pages. (watermarks, stamps,.. )

Download

Starting from version 1.2.1, you can download tclMuPdf in a pre-built package with multi-platform support or, if you only need support for a single platform, you can download a lighter package.

Note that a specific platform support (e.g. "Linux 32") is not referred to the hosting O.S. architecture, but it's referred to the architecture of the TclTk interpreter. E.g. if you have a 32-bit TclTk interpreter running on a 64-bit Linux, you need the tclMuPdf package for linux-x32.

Version 1.5 (December 2018)

  • [1 ] FULL (Win 32/64, Linux 32/64, MacOS) (Warning: 33MB)
  • [2 ] (Win 32/64) (12MB)
  • [3 ] (Win 32 only) (6 MB)
  • [4 ] (Win 64 only) (6 MB)
  • [5 ] (Linux 32/64) (15MB)
  • [6 ] (Linux 32 only) (8 MB)
  • [7 ] (Linux 64 only) (7MB)
  • [8 ] (MacOS 64 only) (7 MB)

Version 1.4 (October 2017)

  • [9 ] FULL (Win 32/64, Linux 32/64, MacOS) (Warning: 28 MB)
  • [10 ] (Win 32/64) (10 MB)
  • [11 ] (Win 32 only) (5 MB)
  • [12 ] (Win 64 only) (6 MB)
  • [13 ] (Linux 32/64) (12 MB)
  • [14 ] (Linux 32 only) (6 MB)
  • [15 ] (Linux 64 only) (6 MB)
  • [16 ] (MacOS 64 only) (6 MB)
  • ---
  • [17 ] tclMuPdf Development-Kit. For developers/maintainers.

Examples

codeoutput
$page0 savePNG img0.png -zoom 0.5Image tclmupdf-thumb
$page2 savePNG x2.png -zoom 2.75 -from 50 450 200 700Image tclmupdf-particular

  Reference manual

tclMuPDF 1.5 Tcl meets MuPDF

Tcl meets MuPDF

SYNOPSIS

package require Tcl 8.5

package require mupdf ?1.5?

  • mupdf::open filename ?-password password?
  • pdfHandle quit
  • pdfHandle close ?-flatten bool? ?-decrypt bool?
  • pdfHandle fullname
  • pdfHandle authentication
  • pdfHandle version
  • pdfHandle npages
  • pdfHandle getpage n
  • pdfHandle anchor anchorName
  • pdfHandle openedpages
  • pdfHandle closeallpages
  • pdfHandle fields
  • pdfHandle field fieldname
  • pdfHandle field fieldname value
  • pdfHandle signatures
  • pdfHandle haschanges
  • pdfHandle canbesavedincrementally
  • pdfHandle export filename ?-flatten bool? ?-decrypt bool?
  • pdfHandle portfolio list
  • pdfHandle portfolio extract i ?-dir pathname? ?-as filename?
  • pdfHandle search needle ?-startpage pagenum? ?-max hits?
  • pdfHandle search..more ?-max hits?
  • pdfHandle graft pageHandle
  • pageHandle embed graftID ?...options...?
  • pageHandle close
  • pageHandle size
  • pageHandle docref
  • pageHandle pagenumber
  • pageHandle savePNG filename ?-zoom zoom? ?-from x0 y0 x1 y1?
  • pageHandle saveImage image ?-zoom zoom? ?-from x0 y0 x1 y1? ?-to x0 y0?
  • pageHandle addsigfield fieldname x0 y0 x1 y1
  • pageHandle search needle ?-fromtop true/false? ?-max hits?
  • pageHandle images list ?-id imageID?
  • pageHandle images extract ?-id imageID? ?-dir pathname? ?-as pattern? ?-transparency boolean?
  • mupdf::imagenamesformat ?pattern?
  • mupdf::close handle (**DEPRECATED**)
  • mupdf::quit pdfHandle (**DEPRECATED**)
  • mupdf::isobject handle
  • mupdf::type handle
  • mupdf::documents
  • mupdf::documentnames
  • mupdf::isopen filename
  • mupdf::cli_passwordhelper ?helperProc?
  • mupdf::tk_passwordhelper ?helperProc?
  • mupdf::libinfo

DESCRIPTION

Package mupdf integrates the MuPDF framework in Tcl. The focus of MuPDF is on speed, small code size, and high-quality anti-aliased rendering. The main goal of this integration is to generate images of the pdf pages, in a .png format, or directly in a Tk's photo image type. Thanks to its speed mupdf can be used for building interactive pdf-viewers with high-quality and real-time zooming. mupdf is a binary package, distributed in a multi-platform bundle, i.e. it can be used on

  • Windows 32/64 bit
  • Linux 32/64 bit
  • MacOS 64 bit

Just an example to get the flavor of how to use mupdf:

    # open a file and save 1st page as a .png file

    package require mupdf
    set pdf [mupdf::open /mydir/sample.pdf]
    set page [$pdf getpage 0]   ;# 0 is the 1st page
    $page savePNG /mydir/page0.png
    $pdf quit

MuPDF with and without Tk

Starting from version 1.1, you can also run mupdf from a tclsh interpreter, without loading Tk. The following command

package require mupdf-notk

can be used in a tclsh interpreter to load the package without requiring Tk support. You will be still able to save images as PNG files, but of course some subcommands related to Tk won't be available (e.g saveImage ) The command

package require mupdf

loads the full package (and requires Tk).

mupdf Commands

mupdf supports the following commands:

mupdf::open filename ?-password password?
This is the main command: it opens the PDF-file filename and returns a pdfHandle to be used in subsequent operations. If filename is password-protected, you may specify a password adding the option -password; if option -password is not specified, mupdf will ask for a password. Read more about it in the section Working with Password Protected Files.
pdfHandle quit
close and destroy the pdfHandle without saving the changes. All the related resources (e.g. opened pages) will be closed.
pdfHandle close ?-flatten bool? ?-decrypt bool?
close and destroy the pdfHandle and save the changes. All the related resources (e.g. opened pages) will be closed. See export for the meaning of the various options.
pdfHandle fullname
return the fully normalized pathname of the pdf-file.
pdfHandle authentication
return the current authentication mode for pdfHandle. It may be none (no auth required), user (opened with user's password), owner (opened with owner's password).
pdfHandle version
return the document's internal PDF-version.
pdfHandle npages
return the number of pages.
pdfHandle getpage n
return a pageHandle to be used in subsequent operations. Note that first page is page 0. Note that if the requested page is currently opened, getpage reuses the handle of the opened page.
pdfHandle anchor anchorName
return the location of anchorName as list of 3 numbers:
  • a page number ( -1 if anchorName is not found )
  • x displacement
  • y displacement x and y are hints for locating the anchor: they represent the displacement from the upper-left corner of the page (0,0).
pdfHandle openedpages
return the list of all pageHandles currently opened related to pdfHandle
pdfHandle closeallpages
close all currently opened pages related to pdfHandle
pdfHandle fields
return a list of field-records. Each field-record is a list of three elements:
  • the field-name
  • the field-type (pushbutton, radiobutton, checkbox, text, combobox, listbox, signature or unknown)
  • the field-value Note that for a signature field, if a signature is present the returned field-value is simply the fixed string <<signature>>.
pdfHandle field fieldname
return the field's value, or raise an error if fieldname is not a valid field. Warning: field-names with accented characters or in general characters from latin alphabets are accepted (even not encorauged). Fields-names with characters from non-latin alphabets (greek, russian, chinese, ...) are not supported.
pdfHandle field fieldname value
set the field's value, or raise an error if fieldname is not a valid field. Warning: see limitations in Notes and limitations about field names and values for details.
pdfHandle signatures
return a list of signature field-records . Each field-record is a list of two elements:
  • the field-name
  • the field-value Empty signature fields (blanc signatures) have a field-value equal to "" (empty string). Currently, filled signature fields are simply denoted with the fixed string <<signature>>.
pdfHandle haschanges
return 1 if pdfHandle has been changed, otherwise 0.
pdfHandle canbesavedincrementally
return 1 if pdfHandle can be saved in incremental-mode, otherwise 0.
pdfHandle export filename ?-flatten bool? ?-decrypt bool?
save the current document and its changes in an alternative filename. Valid options are:
  • -flatten bool - flat all form's fields and annotations (default is false)
  • -decrypt bool - if bool is true remove the password protection from an encrypted pdf (default is true) Note that the target filename should be different from the original pdf-file related with pdfHandle and in general, different from the name of any pdf-file currently opened in this process. When a pdfHandle is closed (see mupdf::close or "pdfHandle close"), all the changes will be saved to its original pdf-file. (see also mupdf::quit or "pdfHandle quit" for closing without saving changes).
pdfHandle portfolio list
return the list of the embedded files
pdfHandle portfolio extract i ?-dir pathname? ?-as filename?
extract the i-th embedded file. ( i >= 0 ) If option -dir is not specified, the embedded file is saved in the current directory. If option -as is not specified, embedded file is saved with its original name
pdfHandle search needle ?-startpage pagenum? ?-max hits?
search the string needle starting from page-number pagenum (default is page 0) and returning up to hits results (default is 10). The result of search subcommand is a list of page-positions. Each page-positions is a list of two elements:
  • the page-number
  • a list with the 4 coords of the box enclosing the searched needle Next results can be retrived with the search..more subcommand.
pdfHandle search..more ?-max hits?
return a list of the next hits elements matching the last given needle. The result of search..more subcommand is a list of page-positions similar to the list returned by search.
pdfHandle graft pageHandle
import in the currently open document referred by pdfHandle the full content of the page referred by pageHandle. The page contents are just imported, not displayed, and should be controlled via the next embed command. The result of graft is a graftID that should be used by the next embed ... command for displaying the image of the source page within one or more pages of the document. Note: this page-image is not a raster-image; it's vector based and hence remains accurate across zooming levels. Note: graftID is unique for each opened document pdfHandle and it's valid until the closure of pdfHandle.
pageHandle embed graftID ?...options...?
put a copy of a grafted page in the page referred by pageHandle. pageHandle should be a page of the same opened document who returned that graftID. Valid options are:
  • -over bool - displays the contents of the grafted page over (or below) the contents of the current page (default is true).
  • -zoom factor - enlarge/shrink the grafted imaged by a factor of factor (default is 1.0).
  • -from x0 y0 ?x1 y1? - just copy a rectangular sub-region of the grafted image. (x0,y0) and (x1,y1) specify diagonally opposite corners of the rectangle; (0,0) is the upper-left corner of the (visible portion of the) grafted page. If x1 and y1 are not specified, the default value is the bottom-right corner of the grafted page.
  • -to x0 y0 ?x1 y1? - place the grafted page in a rectangular subregion of the destination page. If x0 and y0 are not specifified, the grafted page is placed at (0,0) i.e. at the upper-left corner of the destination page. If x1 and y1 are not specified, the default value is the bottom-right corner of the destination page.
pageHandle close
close and free all the resources of the page referred by pageHandle. the handle pdfHandle is destroyed.
pageHandle size
return the physical size of the page as a list of two decimal numbers. Note that page size is expressed in points, i.e. 1/72 inch.
pageHandle docref
return a reference to the related pdf-document as a pdfHandle
pageHandle pagenumber
return the pagenumber of pageHandler
pageHandle savePNG filename ?-zoom zoom? ?-from x0 y0 x1 y1?
render the page in a .png file named filename. With a default -zoom factor equal to 1.0, a page whose size is W x H points is rendered as a raster image of W x H pixels. If -zoom is specified, the resulting image size is scaled by a factor of zoom. By default the whole page is rendered; the -from option, allows you to render only a given rectangular area of the page. x0 y0 are the coords of the top-left corner and x1 y1 are for the bottom-right corner. These coords must be expressed in terms of the physical size of the page, i.e in points Note that if these coords lies outside of the page, only the intersection of this area with the page area is rendered.
  ...
  set page [$pdf getpage 0]   ;# 0 is the 1st page
  lassign [$page size] dx dy
   # save just the upper half of the page
  $page savePNG /mydir/page0.png -zoom 2.25 -from 0 0 $dx [expr $dy/2]
  ...
pageHandle saveImage image ?-zoom zoom? ?-from x0 y0 x1 y1? ?-to x0 y0?
render the page in an existing Tk's photo image. The width and/or height of image are unchanged if the user has set on it an explicit image width or height (with the -width and/or -height configuration options, respectively). About the -zoom and -from options, the same rules for the savePNG apply. Option -to allows you to place the resulting raster image at the x0 y0 coords of the destination image. By default, is -to 0.0 0.0 NOTE: this command is not available with the package mupdf-notk.
pageHandle addsigfield fieldname x0 y0 x1 y1
add a blank signature field in a rectangular box at coords x0 y0 x1 y1. fieldname must be unique among the existing field names.
pageHandle search needle ?-fromtop true/false? ?-max hits?
search the string needle in the current page and return up to hits positions (default is 10). By default search starts from top of the page. If you need to find then next hits, use option -fromtop false. The result of search subcommand is a list of positions. Each positions is a list with the 4 coords of the box enclosing the searched needle.
pageHandle images list ?-id imageID?
return a list of all the images contained in the page referred by pageHandle. The result of images list subcommand is a list of image-records. Each image-record is a list of six elements:
  • image-ID (unique for each page)
  • image's width (in pixel)
  • image's height (in pixel)
  • image's colorspace (e.g. DeviceRGB, ICCBased, ...)
  • image's bit per component (number of color components may be inferred from colorspace)
  • mask-flag : 1 means that the image has a pixel-mask (i.e. some transparent pixels) If option -id is present, the resulting list is limited to the image-record for imageID.
pageHandle images extract ?-id imageID? ?-dir pathname? ?-as pattern? ?-transparency boolean?
if option -id is specified, extract and save the image referred by imageID (see images list). If option -id is missing, all the images contained in a page are extracted and saved. If option -dir is not specified, images are saved in the current directory. If option -as is not specified, images are saved with a name derived from the default mupdf::imagenamesformat (see below), otherwise images are saved as pattern (for pattern rules see below for mupdf::imagenamesformat). if option -transparency is true, save the (semi)transparent pixels (if any), otherwise transparent pixels (if any) are rendered as white pixels. extract returns a list of extracted-records. Each extracted-record is a list of three elements:
  • image-ID (unique for each page)
  • image-name (pdf page's internal name)
  • saved filename (empty string if the image was skipped (unknow format...))
mupdf::imagenamesformat ?pattern?
a pattern is a parametric filename specification similar to the format specification used by the C printf function. If pattern is not specified, return the currently defined default pattern. If pattern is specified, set the default pattern. When the "pageHandle images extract ..." command is called, all the images are saved with a filename based on a pattern (This pattern can be explicit, if option -as is present, or it can be implicit, based on the default mupdf::imagenamesformat). a pattern is a a simple filename specification (just the base-name, since the extension of the extracted images (png, jpg, ...) is automatically determined.) with zero or more special symbols like the following ones:
  • %p : page number (first page is 1)
  • %P : total number of pages
  • %i : image number - images in a page are numbered starting from 1
  • %I : total number of images (in the current page) Special symbols may also be written with a padding-specification like %5P; this notation means the the symbol %P should be padded as a 5-character string with leading '0's. If more than one image is extracted in a single operation, and pattern does not contain the %i symbol (the image number), then -%i is implicitely appended in order to avoid a filename collision. Assuming that the current page is the page 123 (i.e the 124th page), the following command
  $pageHandle images extract -as "IMG_%5p"

will generate IMG_00124-1.jpg, IMG_00124-2.jpg ..... (note: the file extension may be different)

  $pageHandle images extract -as "Z%5p(%2i)"

will generate Z00124(01).jpg, Z00124(02).jpg ..... (note: the file extension may be different)

mupdf::close handle (**DEPRECATED**)
This command has been deprecated since mupdf 1.5. Use "pdfHandle close ...." or "pageHandle close". if handle refers to a pageHandle, close the page. if handle refers to a pdfHandle, close the pdf (updating its original pdf-file) and all its opened pages.
mupdf::quit pdfHandle (**DEPRECATED**)
close and destroy the pdfHandle without saving the changes. This command has been deprecated since mupdf 1.5 . Use "pdfHandle quit" .
mupdf::isobject handle
return 1 if handle is a valid reference to a pdf or a page.
mupdf::type handle
return document or page if handle is a valid reference, else raise an error.
mupdf::documents
return a list of the currently opened pdfHandles
mupdf::documentnames
return a list of pdf-filenames currently opened (fully normalized filenames).
mupdf::isopen filename
check if filename is among the currently opened pdf-files.
mupdf::cli_passwordhelper ?helperProc?
get/set the helper procedure for shell-like applications. If helperProc is ', then the default helper is re-set. See section Working with Password Protected Files'.
mupdf::tk_passwordhelper ?helperProc?
get/set the helper procedure for applications with a Tk graphical interface. If helperProc is ', then the default helper is re-set. See section Working with Password Protected Files'.
mupdf::libinfo
return specific attributes of the underlying MuPdf libray as a list of keywords and their values. The provided keywords are:
version
The version of the underlying MuPDf library
..more to come ..

Working with Password Protected Files

You can open a password-protected PDF in a non-interactive or in an interactive way. With the first way, non-interactive way, you must provide in advance a password

  set pdf [mupdf::open book.pdf -password "123open"]

If password is wrong then an error is raised; look at the error message and check for the errorcode: it should be MUPDF WRONGPASSWORD. Note that there's no distinction between owner's password and user's password; if you provide the right owner's password, the PDF is opened in owner-mode, if you provide the right user's password, the PDF is opened in user-mode. You can check if PDF has been opened in user-mode or owner-mode by calling the authentication command.

  set pdf [mupdf::open book.pdf -password "123open"]
   # if we are here, it means that password was OK
  set mode [$pdf authentication]
   # "none" means that book.pdf was not password-protected
   # "user" means that the supplyed password was the user's password
   # "owner" means ...

The second way , interactive way, does not require to provide an explicit password; if the PDF is password-protected, mupdf will ask for a password. Depending on the nature of your application (with or without Tk), mupdf will select and call a predefined (yet customizable) helper procedure. These predefined, built-in procedures can be changed with the mupdf::cli_passwordhelper or mupdf::tk_passwordhelper command.

   # template for a (shell-like) password-helper procedure
   proc mypswdhelper {filename} {
       ...  ask for a password ....
       .... get the password ....
       ....
       return $password    
   }
  mupdf::cli_passwordhelper mypswdhelper

   # template for a (GUI) password-helper procedure
   proc myGUIpswdhelper {filename} {
       ...  raise some popup ...
       ...  ask for a password ....
       .... get the password ...
       .... close the popup
       return $password    
   }
  mupdf::tk_passwordhelper myGUIpswdhelper

Removing protection from a password protected files

If you open a password-protected file and export it, by default the resulting PDF is saved without password. You can explicitely control this behaviour by appending the -decrypt option to the export or close commands (default is -decrypt true).

Notes and limitations about field names and values

Due to current limitations of the underlying MuPdf library, field names and values should be limited to latin alphabets only. See the note enclosed with MuPdf's BUG 696478 (Nov 2018):

... "There is, however, a limitation with our form filling support that still needs to be addressed: it only supports PDFDocEncoding, not Unicode.
...   "We can currently fill out the form fields in any language, but the display code still only handles the Latin alphabet."

This means that you can enter text in a form field in any alphabet (tested with Russian, Chinese, Greek, ...) but the rendered text for the non-latin alphabets will appear wrong. Another major limitation is the rendering of fields other that 'text' widgets, i.e. checkbox, radiobuttons... For these kind of widgets the support of the underlying library is still incomplete, in particular for values expressed in non-ASCII characters (e.g. accented letters ... )

LIMITATIONS

Full support for portfolio management is still incomplete; commands for adding/removing/reordering embedded files will require a more robust mupdf-core implementation. Support for password management is still incomplete; currently you can open password-protected PDFs, remove passwords, but there's no way to add/change passwords.

KEYWORDS

pdf, photo

CATEGORY

pdf parsing and rendering

COPYRIGHT

 Copyright (c) 2018, by A.Buratti