[ABU] 29-sep-2017 - tclMuPdf 1.3 released. '''tclMuPdf''' is a porting of the MuPDF framework (see at http://mupdf.com%|%mupdf.com%|%), for fast and high-quality rendering of PDF pages. ***History *** * 13-dec-2016 - '''Version 1.0b1''' (beta) released. No support for MacOS * 15-dec-2016 - '''Version 1.0''' - Support for MacOS. API unchanged, but big internal optimization for reusing opened pages (read * 28-jan-2017 - '''Version 1.1 (Withdrawn)''' - new commands: '''fields''', '''field''', '''anchor''', '''mupdf::libinfo''' . Added package mupdf-notk for tcl-only usage. (read the documentation). Aligned with core library MuPDF v.1.10a * 23-feb-2017 - '''Version 1.2 (Withdrawn)''' with a lot of new features: ** commands for extracting and searching text ** commands for extracting images from PDF pages (experimental) ** first steps towards PDF manipulations: you can add new signature fields, then save your changes. ** many minor auxiliary commands * 26-feb-2017 - '''Version 1.2.1''' - BugFixing - replaces Version 1.1 and 1.2 ** a bug related to the saveImage command, introduced in 1.1 was fixed here. ** Versions 1.1 and 1.2 were withdrawn * 29-sep-2017 - '''Version 1.3''' - image extraction ** subcommands for extracting images, previously released as '''experimental.images''' are now official, more powerful and deeply tested with a lot of different kinds of images. ** Aligned with core library MuPDF v.1.11 ***Download *** Starting from version 1.2.1, you can download tclMuPdf in a pre-built package with multi-platform support or, if you only need support for a single platform, you can download a lighter package. Note that a specific platform support (e.g. "Linux 32") is not referred to the hosting O.S. architecture, but it's referred to the architecture of the TclTk interpreter. E.g. if you have a 32-bit TclTk interpreter running on a 64-bit Linux, you need the tclMuPdf package for linux-x32. '''Version 1.3''' (September 2017) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.3/tclMuPdf-full-1.3.zip/download] '''FULL (Win 32/64, Linux 32/64, MacOS)''' (Warning: 26 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.3/tclMuPdf-win-1.3.zip/download] '''(Win 32/64)''' (10 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.3/tclMuPdf-win32-1.3.zip/download] '''(Win 32 only)''' (5 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.3/tclMuPdf-win64-1.3.zip/download] '''(Win 64 only)''' (6 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.3/tclMuPdf-linux-1.3.zip/download] '''(Linux 32/64)''' (12 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.3/tclMuPdf-linux32-1.3.zip/download] '''(Linux 32 only)''' (6 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.3/tclMuPdf-linux64-1.3.zip/download] '''(Linux 64 only)''' (6 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.3/tclMuPdf-mac64-1.3.zip/download] '''(MacOS 64 only)''' (6 MB) * --- * [https://sourceforge.net/p/irrational-numbers/code/HEAD/tree/pkgs/tclMuPdf-devkit] tclMuPdf Development-Kit. For developers/maintainers. '''Version 1.2.1''' (February 2017) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.2.1/tclMuPdf-full-1.2.1.zip/download] '''FULL (Win 32/64, Linux 32/64, MacOS)''' (Warning: 28 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.2.1/tclMuPdf-win-1.2.1.zip/download] '''(Win 32/64)''' (11 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.2.1/tclMuPdf-win32-1.2.1.zip/download] '''(Win 32 only)''' (6 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.2.1/tclMuPdf-win64-1.2.1.zip/download] '''(Win 64 only)''' (6 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.2.1/tclMuPdf-linux-1.2.1.zip/download] '''(Linux 32/64)''' (11 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.2.1/tclMuPdf-linux32-1.2.1.zip/download] '''(Linux 32 only)''' (6 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.2.1/tclMuPdf-linux64-1.2.1.zip/download] '''(Linux 64 only)''' (6 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.2.1/tclMuPdf-mac64-1.2.1.zip/download] '''(MacOS 64 only)''' (6 MB) * --- * [https://sourceforge.net/p/irrational-numbers/code/HEAD/tree/pkgs/tclMuPdf-devkit] tclMuPdf Development-Kit. For developers/maintainers. ***Examples*** %|code|output|% &|$page0 savePNG img0.png -zoom 0.5|[Image tclmupdf-thumb]|& &|$page2 savePNG x2.png -zoom 2.75 -from 50 450 200 700|[Image tclmupdf-particular]|& <> Reference manual ---- '''tclMuPDF 1.3''' ''Tcl meets MuPDF'' Tcl meets MuPDF **SYNOPSIS** package require '''Tcl 8.5''' package require '''mupdf ?1.3?''' * '''mupdf::open''' ''filename'' * ''pdfHandle'' '''fullname''' * ''pdfHandle'' '''version''' * ''pdfHandle'' '''npages''' * ''pdfHandle'' '''getpage''' ''n'' * ''pdfHandle'' '''anchor''' ''anchorName'' * ''pdfHandle'' '''openedpages''' * ''pdfHandle'' '''closeallpages''' * ''pdfHandle'' '''fields''' * ''pdfHandle'' '''field''' ''fieldname'' * ''pdfHandle'' '''signatures''' * ''pdfHandle'' '''haschanges''' * ''pdfHandle'' '''export''' ''filename'' * ''pdfHandle'' '''search''' ''needle'' ?'''-startpage''' ''pagenum''? ?'''-max''' ''hits''? * ''pdfHandle'' '''search..more''' ?'''-max''' ''hits''? * ''pageHandle'' '''size''' * ''pageHandle'' '''docref''' * ''pageHandle'' '''pagenumber''' * ''pageHandle'' '''savePNG''' ''filename'' ?'''-zoom''' ''zoom''? ?'''-from''' ''x0'' ''y0'' ''x1'' ''y1''? * ''pageHandle'' '''saveImage''' ''image'' ?'''-zoom''' ''zoom''? ?'''-from''' ''x0'' ''y0'' ''x1'' ''y1''? ?'''-to''' ''x0'' ''y0''? * ''pageHandle'' '''addsigfield''' ''fieldname'' ''x0'' ''y0'' ''x1'' ''y1'' * ''pageHandle'' '''search''' ''needle'' ?'''-fromtop''' ''true/false''? ?'''-max''' ''hits''? * ''pageHandle'' '''images''' '''list''' ?'''-id''' ''imageID''? * ''pageHandle'' '''images''' '''extract''' ?'''-id''' ''imageID''? ?'''-dir''' ''pathname''? ?'''-as''' ''pattern''? ?'''-transparency''' ''boolean''? * '''mupdf::imagenamesformat''' ?''pattern''? * '''mupdf::close''' ''handle'' * '''mupdf::quit''' ''pdfHandle'' * '''mupdf::isobject''' ''handle'' * '''mupdf::type''' ''handle'' * '''mupdf::documents''' * '''mupdf::documentnames''' * '''mupdf::isopen''' ''filename'' * '''mupdf::libinfo''' **DESCRIPTION** Package '''mupdf''' integrates the http://mupdf.com%|%MuPDF%|% framework in Tcl. The focus of http://mupdf.com%|%MuPDF%|% is on speed, small code size, and high-quality anti-aliased rendering. The main goal of this integration is to generate images of the pdf pages, in a .png format, or directly in a Tk's photo image type. Thanks to its speed '''mupdf''' can be used for building interactive pdf-viewers with high-quality and real-time zooming. '''mupdf''' is a binary package, distributed in a multi-platform bundle, i.e. it can be used on * Windows 32/64 bit * Linux 32/64 bit * MacOS 64 bit Just an example to get the flavor of how to use '''mupdf''': ====== # open a file and save 1st page as a .png file package require mupdf set pdf [mupdf::open /mydir/sample.pdf] set page [$pdf getpage 0] ;# 0 is the 1st page $page savePNG /mydir/page0.png mupdf::close $pdf ====== **MuPDF with and without Tk** Starting from version 1.1, you can also run '''mupdf''' from a tclsh interpreter, without loading '''Tk'''. The following command ====== package require mupdf-notk ====== can be used in a '''tclsh''' interpreter to load the package without requiring '''Tk''' support. You will be still able to save images as PNG files, but of course some subcommands related to Tk won't be available (e.g '''saveImage''' ) The command ====== package require mupdf ====== loads the full package (and requires '''Tk'''). **mupdf Commands** '''mupdf''' supports the following commands: '''mupdf::open''' ''filename'': This is the main command: it opens the PDF-file ''filename'' and returns a ''pdfHandle'' to be used in subsequent operations. ''pdfHandle'' '''fullname''': return the fully normalized pathname of the pdf-file. ''pdfHandle'' '''version''': return the document's internal PDF-version. ''pdfHandle'' '''npages''': return the number of pages. ''pdfHandle'' '''getpage''' ''n'': return a ''pageHandle'' to be used in subsequent operations. Note that first page is page 0. Note that if the requested page is currently opened, '''getpage''' reuses the handle of the opened page. ''pdfHandle'' '''anchor''' ''anchorName'': return the location of ''anchorName'' as list of 3 numbers: * a page number ( -1 if ''anchorName'' is not found ) * x displacement * y displacement x and y are hints for displaying the page: they represent the displacement of the top-left corner of the page relative to the top-left corner of the window. ''pdfHandle'' '''openedpages''': return a list of all ''pageHandles'' currently opened related to ''pdfHandle'' ''pdfHandle'' '''closeallpages''': close all currently opened pages related to ''pdfHandle'' ''pdfHandle'' '''fields''': return a list of field-records. Each field-record is a list of three elements: * the field-name * the field-type ('''pushbutton''', '''radiobutton''', '''checkbox''', '''text''', '''combobox''', '''listbox''', '''signature''' or '''unknown''') * the field-value Note that for a '''signature''' field, if a signature is present the returned field-value is simply the fixed string '''<>'''. Warning: field-names with accented characters or in general with non-ASCII charaters may require a special care when used. See the section '''Notes about field-names with accented letters''' for details. ''pdfHandle'' '''field''' ''fieldname'': return the field's value, or raise an error if ''fieldname'' is not a valid field. Warning: field-names with accented characters or in general with non-ASCII charaters may require a special care when used. See the section '''Notes about field-names with accented letters''' for details. ''pdfHandle'' '''signatures''': return a list of signature field-records . Each field-record is a list of two elements: * the field-name * the field-value Empty signature fields (blanc signatures) have a field-value equal to "" (empty string). Currently, filled signature fields are simply denoted with the fixed string '''<>'''. ''pdfHandle'' '''haschanges''': return '''1''' if ''pdfHandle'' has been changed, otherwise '''0'''. ''pdfHandle'' '''export''' ''filename'': save the current document and its changes in an alternative ''filename''. Note that the target ''filename'' should be different from the original pdf-file related with ''pdfHandle'' and in general, different from the name of any pdf-file currently opened in this process. When a ''pdfHandle'' is closed (see '''mupdf::close'''), all the changes will be saved to its original pdf-file. (see also '''mupdf::quit''' for closing without saving changes). ''pdfHandle'' '''search''' ''needle'' ?'''-startpage''' ''pagenum''? ?'''-max''' ''hits''?: search the string ''needle'' starting from page-number ''pagenum'' (default is page 0) and returning up to ''hits'' results (default is 10). The result of '''search''' subcommand is a list of ''page-positions''. Each ''page-positions'' is a list of two elements: * the page-number * a list with the 4 coords of the box enclosing the searched ''needle'' Next results can be retrived with the '''search..more''' subcommand. ''pdfHandle'' '''search..more''' ?'''-max''' ''hits''?: return a list of the next ''hits'' elements matching the last given ''needle''. The result of '''search..more''' subcommand is a list of ''page-positions'' similar to the list returned by '''search'''. ''pageHandle'' '''size''': return the physical size of the page as a list of two decimal numbers. Note that page size is expressed in ''points'', i.e. 1/72 inch. ''pageHandle'' '''docref''': return a reference to the related pdf-document as a ''pdfHandle'' ''pageHandle'' '''pagenumber''': return the pagenumber of ''pageHandler'' ''pageHandle'' '''savePNG''' ''filename'' ?'''-zoom''' ''zoom''? ?'''-from''' ''x0'' ''y0'' ''x1'' ''y1''?: render the page in a .png file named ''filename''. With a default '''-zoom''' factor equal to '''1.0''', a page whose size is W x H ''points'' is rendered as a raster image of W x H ''pixels''. If '''-zoom''' is specified, the resulting image size is scaled by a factor of ''zoom''. By default the whole page is rendered; the '''-from''' option, allows you to render only a given rectangular area of the page. ''x0'' ''y0'' are the coords of the top-left corner and ''x1'' ''y1'' are for the bottom-right corner. These coords must be expressed in terms of the physical size of the page, i.e in ''points'' Note that if these coords lies outside of the page, only the ''intersection'' of this area with the page area is rendered. ====== ... set page [$pdf getpage 0] ;# 0 is the 1st page lassign [$page size] dx dy # save just the upper half of the page $page savePNG /mydir/page0.png -zoom 2.25 -from 0 0 $dx [expr $dy/2] mupdf::close $pdf ====== ''pageHandle'' '''saveImage''' ''image'' ?'''-zoom''' ''zoom''? ?'''-from''' ''x0'' ''y0'' ''x1'' ''y1''? ?'''-to''' ''x0'' ''y0''?: render the page in an existing Tk's photo ''image''. The width and/or height of ''image'' are unchanged if the user has set on it an explicit image width or height (with the -width and/or -height configuration options, respectively). About the '''-zoom''' and '''-from''' options, the same rules for the '''savePNG''' apply. Option '''-to''' allows you to place the resulting raster image at the ''x0'' ''y0'' coords of the destination ''image''. By default, is '''-to''' '''0.0''' '''0.0''' NOTE: this command is not available with the package '''mupdf-notk'''. ''pageHandle'' '''addsigfield''' ''fieldname'' ''x0'' ''y0'' ''x1'' ''y1'': add a blank signature field in a rectangular box at coords ''x0'' ''y0'' ''x1'' ''y1''. ''fieldname'' must be unique among the existing field names. ''pageHandle'' '''search''' ''needle'' ?'''-fromtop''' ''true/false''? ?'''-max''' ''hits''?: search the string ''needle'' in the current page and return up to ''hits'' ''positions'' (default is 10). By default search starts from top of the page. If you need to find then next hits, use option '''-fromtop''' '''false'''. The result of '''search''' subcommand is a list of ''positions''. Each ''positions'' is a list with the 4 coords of the box enclosing the searched needle. ''pageHandle'' '''images''' '''list''' ?'''-id''' ''imageID''?: return a list of all the images contained in the page referred by ''pageHandle''. The result of '''images''' '''list''' subcommand is a list of ''image-records''. Each ''image-record'' is a list of six elements: * image-ID (unique for each page) * image's width (in pixel) * image's height (in pixel) * image's colorspace (e.g. DeviceRGB, ICCBased, ...) * image's bit per component (number of color components may be inferred from colorspace) * mask-flag : '''1''' means that the image has a pixel-mask (i.e. some transparent pixels) If option '''-id''' is present, the resulting list is limited to the ''image-record'' for ''imageID''. ''pageHandle'' '''images''' '''extract''' ?'''-id''' ''imageID''? ?'''-dir''' ''pathname''? ?'''-as''' ''pattern''? ?'''-transparency''' ''boolean''?: if option '''-id''' is specified, extract and save the image referred by ''imageID'' (see '''images''' '''list'''). If option '''-id''' is missing, all the images contained in a page are extracted and saved. If option '''-dir''' is not specified, images are saved in the current directory. If option '''-as''' is not specified, images are saved with a name derived from the default '''mupdf::imagenamesformat''' (see below), otherwise images are saved as ''pattern'' (for ''pattern'' rules see below for '''mupdf::imagenamesformat'''). if option '''-transparency''' is '''true''', save the (semi)transparent pixels (if any), otherwise transparent pixels (if any) are rendered as white pixels. '''extract''' returns a list of ''extracted-records''. Each ''extracted-record'' is a list of three elements: * image-ID (unique for each page) * image-name (pdf page's internal name) * saved filename (empty string if the image was skipped (unknow format...)) '''mupdf::imagenamesformat''' ?''pattern''?: a ''pattern'' is a parametric filename specification similar to the format specification used by the C printf function. If ''pattern'' is not specified, return the currently defined default pattern. If ''pattern'' is specified, set the default pattern. When the "''pageHandle'' '''images''' '''extract''' ..." command is called, all the images are saved with a filename based on a ''pattern'' (This pattern can be explicit, if option '''-as''' is present, or it can be implicit, based on the default '''mupdf::imagenamesformat'''). a ''pattern'' is a a simple filename specification (just the base-name, since the extension of the extracted images (png, jpg, ...) is automatically determined.) with zero or more special symbols like the following ones: * %p : page number (first page is 1) * %P : total number of pages * %i : image number - images in a page are numbered starting from 1 * %I : total number of images (in the current page) Special symbols may also be written with a padding-specification like '''%5P'''; this notation means the the symbol %P should be padded as a 5-character string with leading '0's. If more than one image is extracted in a single operation, and ''pattern'' does not contain the '''%i''' symbol (the image number), then '''-%i''' is implicitely appended in order to avoid a filename collision. Assuming that the current page is the page 123 (i.e the 124th page), the following command ====== $pageHandle images extract -as "IMG_%5p" ====== will generate IMG_00124-1.jpg, IMG_00124-2.jpg ..... (note: the file extension may be different) ====== $pageHandle images extract -as "Z%5p(%2i)" ====== will generate Z00124(01).jpg, Z00124(02).jpg ..... (note: the file extension may be different) '''mupdf::close''' ''handle'': if ''handle'' refers to a ''pageHandle'', close the page. if ''handle'' refers to a ''pdfHandle'', close the pdf (updating its original pdf-file) and all its opened pages; '''mupdf::quit''' ''pdfHandle'': close the pdfHandle without saving the changes. '''mupdf::isobject''' ''handle'': return 1 if ''handle'' is a valid reference to a pdf or a page. '''mupdf::type''' ''handle'': return '''document''' or '''page''' if ''handle'' is a valid reference, else raise an error. '''mupdf::documents''' : return a list of the currently opened ''pdfHandles'' '''mupdf::documentnames''' : return a list of pdf-filenames currently opened (fully normalized filenames). '''mupdf::isopen''' ''filename'': check if ''filename'' is among the currently opened pdf-files. '''mupdf::libinfo''' : return specific attributes of the underlying MuPdf libray as a list of keywords and their values. The provided keywords are: '''version''': The version of the underlying MuPDf library ..more to come ..: **Notes about field-names with accented letters** Field-names returned by [[''pdfHandle'' '''fields''']] are not standard (Unicode) strings; they are binary string. For 'good plain' field names like "City", there is no difference: the returned binary-string and the literal (Unicode) string "City" are byte-by-byte identical" Now let's consider a field-name like this '''"Città"''' (italian term for "City"): the returned field name, even if it is ''represented'' like "Città", is not comparable with the literal Unicode string "Città", just because the former byte-array is made of 5 bytes (plus a '\0' string-terminator), whereas the unicode string is made of 6 bytes (plus a '\0' string-terminator). This may produce strange results when comparing these values, and in paticular mode, it may cause the '''field''' subcommand to produce unexpected results. Let's try with this interactive example: ====== ... set fields [$pdf fields] # let's suppose the 0-th returned record is about the field "Città" .... # try to get the field value: set value [$pdf field Città] ;# --> error !!! # Workaround: let's take back the fieldname from the $fields list ... set fieldname [lindex $fields 0 0] ;# field name is the 0-th elem of the 0-th record puts "$fieldname" ;# --> Città set value [$pdf field $fieldname] ;# --> ... ok ====== **KEYWORDS** pdf, photo **CATEGORY** pdf parsing and rendering <> <> PDF | Graphics