[ABU] 10-August-2021 - tclMuPDF 2.1 released. '''tclMuPDF''' is a porting of the MuPDF framework (see at http://mupdf.com%|%mupdf.com%|%), for fast and high-quality rendering of [PDF] pages. <>History * 13-dec-2016 - '''Version 1.0b1''' (beta) released. No support for MacOS * 15-dec-2016 - '''Version 1.0''' - Support for MacOS. API unchanged, but big internal optimization for reusing opened pages (read * 28-jan-2017 - '''Version 1.1 (Withdrawn)''' - new commands: '''fields''', '''field''', '''anchor''', '''mupdf::libinfo''' . Added package mupdf-notk for tcl-only usage. (read the documentation). Aligned with core library MuPDF v.1.10a * 23-feb-2017 - '''Version 1.2 (Withdrawn)''' with a lot of new features: ** commands for extracting and searching text ** commands for extracting images from PDF pages (experimental) ** first steps towards PDF manipulations: you can add new signature fields, then save your changes. ** many minor auxiliary commands * 26-feb-2017 - '''Version 1.2.1''' - BugFixing - replaces Version 1.1 and 1.2 ** a bug related to the saveImage command, introduced in 1.1 was fixed here. ** Versions 1.1 and 1.2 were withdrawn * 29-sep-2017 - '''Version 1.3''' - image extraction ** subcommands for extracting images, previously released as '''experimental.images''' are now official, more powerful and deeply tested with a lot of different kinds of images. ** Aligned with core library MuPDF v.1.11 * 29-oct-2017 - '''Version 1.4''' - portfolio & password-protected files ** commands for working with PDF-portfolio (embedded files) ** ability to work with password-protected PDFs. ** BUG-FIX removed limit to 130 char for extracting images (full path name length) * 20-Dec-2018 - '''Version 1.5''' - field get&set, new export options, graft & embed pages. ** Working with fields: now you can also change the fields values (and bug-fixed support for Unicode characters) ** Saving/exporting PDF: added -decrypt and -flatten option ** Grafting & embedding pages: you can graft a page taken from a PDF, and put it over/below another pages. (watermarks, stamps,.. ) * 8-Jul-2020- '''Version 1.6''' - passwords, forms&fields,annotations,blocks&lines ... ** Password protected files *** added commands for adding/changing passwords on a PDF. See methods \[''pdfHandle'' '''upwd'''] and \[''pdfHandle'' '''opwd'''] ** Forms and Fields *** added commands for changing field attributes. See method \[''pdfHandle'' '''fieldattrib'''] *** added commands for flattening single fields. See method \[''pdfHandle'' '''flattenfield'''] ** Annotations *** added methods for listing, adding, changing, removing annotations. Currently only a limited set of annotation types are supported: '''highlight''', '''underline''', '''strikeout''', '''squiggly'''. See methods \[''pageHandle'' '''annots'''] and \[''pageHandle'' '''annot '''...] ** Page and text structure *** added methods for extracting the image/text "blocks" and the bounding box of each text line. These method are the basic building blocks - along with the new Annotation methods - for building an interactive PDF editor. See methods \[''pageHandle'' '''blocks'''] and \[''pageHandle'' '''lines'''] * 11-mar-2021 - '''Version 2.0''' New Major release (broken compatibility with 1.x) * 10-aug-2011 - '''Version 2.1''' (using MuPDF-core 1.18 features as of 2021-08-08) ** text extraction *** added options for extracting text and block/lines bounding-boxes. ** BUG-FIX - fixed PDFColorToTkColor() <> *** Download *** Starting from version 1.2.1, you can download tclMuPdf in a pre-built package with multi-platform support or, if you only need support for a single platform, you can download a lighter package. Note that a specific platform support (e.g. "Linux 32") is not referred to the hosting O.S. architecture, but it's referred to the architecture of the TclTk interpreter. E.g. if you have a 32-bit TclTk interpreter running on a 64-bit Linux, you need the tclMuPdf package for linux-x32. '''Version 2.1''' (Aug 2021) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPDF-2.1/tclMuPdf-full-2.1.zip/download] '''FULL (Win x64, Linux x64, MacOS)''' (Warning: 20 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPDF-2.1/tclMuPdf-win64-2.1.zip/download] '''Windows x64''' (7 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPDF-2.1/tclMuPdf-linux64-2.1.zip/download] '''Linux x64''' (8 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPDF-2.1/tclMuPdf-mac64-2.1.zip/download] '''Mac OS''' (8 MB) '''Version 1.6''' (July 2020) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.6/tclMuPdf-full-1.6.zip/download] '''FULL (Win 32/64, Linux 32/64, MacOS)''' (Warning: 33MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.6/tclMuPdf-win-1.6.zip/download]'''(Win 32/64)''' (12MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.6/tclMuPdf-win32-1.6.zip/download] '''(Win 32 only)''' (6 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.6/tclMuPdf-win64-1.6.zip/download] '''(Win 64 only)''' (6 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.6/tclMuPdf-linux-1.6.zip/download] '''(Linux 32/64)''' (15MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.6/tclMuPdf-linux32-1.6.zip/download] '''(Linux 32 only)''' (8 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.6/tclMuPdf-linux64-1.6.zip/download] '''(Linux 64 only)''' (7MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.6/tclMuPdf-mac64-1.6.zip/download] '''(MacOS 64 only)''' (7 MB) '''Version 1.5''' (December 2018) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.5/tclMuPdf-full-1.5.zip/download] '''FULL (Win 32/64, Linux 32/64, MacOS)''' (Warning: 33MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.5/tclMuPdf-win-1.5.zip/download] '''(Win 32/64)''' (12MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.5/tclMuPdf-win32-1.5.zip/download] '''(Win 32 only)''' (6 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.5/tclMuPdf-win64-1.5.zip/download] '''(Win 64 only)''' (6 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.5/tclMuPdf-linux-1.5.zip/download] '''(Linux 32/64)''' (15MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.5/tclMuPdf-linux32-1.5.zip/download] '''(Linux 32 only)''' (8 MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.5/tclMuPdf-linux64-1.5.zip/download] '''(Linux 64 only)''' (7MB) * [https://sourceforge.net/projects/irrational-numbers/files/tclMuPdf-1.5/tclMuPdf-mac64-1.5.zip/download] '''(MacOS 64 only)''' (7 MB) * --- * [https://sourceforge.net/p/irrational-numbers/code/HEAD/tree/pkgs/tclMuPdf-devkit/trunk] tclMuPdf version 1.x Development-Kit. For developers/maintainers. * [https://sourceforge.net/p/irrational-numbers/code/HEAD/tree/pkgs/tclMuPdf-devkit/branches/2.x] tclMuPdf version 2.x Development-Kit. For developers/maintainers. *** Examples *** %|code|output|% &|$page0 savePNG img0.png -zoom 0.5|[Image tclmupdf-thumb]|& &|$page2 savePNG x2.png -zoom 2.75 -from 50 450 200 700|[Image tclmupdf-particular]|& <> Reference manual - tclMuPDF 1.6 ---- '''tclMuPDF 1.6''' ''Tcl meets MuPDF'' Tcl meets MuPDF **SYNOPSIS** package require '''Tcl 8.5''' package require '''mupdf ?1.6?''' * '''mupdf::open''' ''filename'' ?'''-password''' ''password''? * ''pdfHandle'' '''upwd''' ''user-password'' * ''pdfHandle'' '''opwd''' ''owner-password'' * ''pdfHandle'' '''quit''' * ''pdfHandle'' '''close''' ?'''-flatten''' ''bool''? ?'''-decrypt''' ''bool''? * ''pdfHandle'' '''fullname''' * ''pdfHandle'' '''authentication''' * ''pdfHandle'' '''version''' * ''pdfHandle'' '''npages''' * ''pdfHandle'' '''getpage''' ''n'' * ''pdfHandle'' '''anchor''' ''anchorName'' * ''pdfHandle'' '''openedpages''' * ''pdfHandle'' '''closeallpages''' * ''pdfHandle'' '''fields''' * ''pdfHandle'' '''field''' ''fieldname'' * ''pdfHandle'' '''field''' ''fieldname'' ''value'' * ''pdfHandle'' '''fieldattrib''' ''fieldname'' * ''pdfHandle'' '''fieldattrib''' ''fieldname'' ?''options''? * ''pdfHandle'' '''flattenfield''' ''fieldname'' ?...? * ''pdfHandle'' '''signatures''' * ''pdfHandle'' '''haschanges''' * ''pdfHandle'' '''canbesavedincrementally''' * ''pdfHandle'' '''export''' ''filename'' ?'''-flatten''' ''bool''? ?'''-decrypt''' ''bool''? * ''pdfHandle'' '''portfolio''' '''list''' * ''pdfHandle'' '''portfolio''' '''extract''' ''i'' ?'''-dir''' ''pathname''? ?'''-as''' ''filename''? * ''pdfHandle'' '''search''' ''needle'' ?'''-startpage''' ''pagenum''? ?'''-max''' ''hits''? * ''pdfHandle'' '''search..more''' ?'''-max''' ''hits''? * ''pdfHandle'' '''graft''' ''pageHandle'' * ''pageHandle'' '''embed''' ''graftID'' ?...options...? * ''pageHandle'' '''annots''' * ''pageHandle'' '''annot''' ''annotID'' * ''pageHandle'' '''annot''' ''annotID'' ''attribute'' * ''pageHandle'' '''annot''' ''annotID'' ''attribute'' ''value'' ?''attribute'' ''value'' ...? * ''pageHandle'' '''annot''' ''annotID'' '''delete''' * ''pageHandle'' '''annot''' '''create''' ''type'' ?''attribute'' ''value'' ...? * ''pageHandle'' '''blocks''' * ''pageHandle'' '''close''' * ''pageHandle'' '''size''' * ''pageHandle'' '''docref''' * ''pageHandle'' '''lines''' * ''pageHandle'' '''pagenumber''' * ''pageHandle'' '''savePNG''' ''filename'' ?'''-zoom''' ''zoom''? ?'''-from''' ''x0'' ''y0'' ''x1'' ''y1''? * ''pageHandle'' '''saveImage''' ''image'' ?'''-zoom''' ''zoom''? ?'''-from''' ''x0'' ''y0'' ''x1'' ''y1''? ?'''-to''' ''x0'' ''y0''? * ''pageHandle'' '''addsigfield''' ''fieldname'' ''x0'' ''y0'' ''x1'' ''y1'' * ''pageHandle'' '''search''' ''needle'' ?'''-fromtop''' ''true/false''? ?'''-max''' ''hits''? * ''pageHandle'' '''images''' '''list''' ?'''-id''' ''imageID''? * ''pageHandle'' '''images''' '''extract''' ?'''-id''' ''imageID''? ?'''-dir''' ''pathname''? ?'''-as''' ''pattern''? ?'''-transparency''' ''boolean''? * '''mupdf::imagenamesformat''' ?''pattern''? * '''mupdf::close''' ''handle'' (**DEPRECATED**) * '''mupdf::quit''' ''pdfHandle'' (**DEPRECATED**) * '''mupdf::isobject''' ''handle'' * '''mupdf::type''' ''handle'' * '''mupdf::documents''' * '''mupdf::documentnames''' * '''mupdf::isopen''' ''filename'' * '''mupdf::cli_passwordhelper''' ?'''helperProc'''? * '''mupdf::tk_passwordhelper''' ?'''helperProc'''? * '''mupdf::libinfo''' **DESCRIPTION** Package '''mupdf''' integrates the http://mupdf.com%|%MuPDF%|% framework in Tcl. The focus of http://mupdf.com%|%MuPDF%|% is on speed, small code size, and high-quality anti-aliased rendering. The main goal of this integration is to generate images of the pdf pages, in a .png format, or directly in a Tk's photo image type. Thanks to its speed '''mupdf''' can be used for building interactive pdf-viewers with high-quality and real-time zooming. '''mupdf''' is a binary package, distributed in a multi-platform bundle, i.e. it can be used on * Windows 32/64 bit * Linux 32/64 bit * MacOS 64 bit Just an example to get the flavor of how to use '''mupdf''': ====== # open a file and save 1st page as a .png file package require mupdf set pdf [mupdf::open /mydir/sample.pdf] set page [$pdf getpage 0] ;# 0 is the 1st page $page savePNG /mydir/page0.png $pdf quit ====== **MuPDF with and without Tk** Starting from version 1.1, you can also run '''mupdf''' from a tclsh interpreter, without loading '''Tk'''. The following command ====== package require mupdf-notk ====== can be used in a '''tclsh''' interpreter to load the package without requiring '''Tk''' support. You will be still able to save images as PNG files, but of course some subcommands related to Tk won't be available (e.g '''saveImage''' ) The command ====== package require mupdf ====== loads the full package (and requires '''Tk'''). **mupdf Commands** '''mupdf''' supports the following commands: '''mupdf::open''' ''filename'' ?'''-password''' ''password''?: This is the main command: it opens the PDF-file ''filename'' and returns a ''pdfHandle'' to be used in subsequent operations. If ''filename'' is password-protected, you may specify a ''password'' adding the option '''-password'''; if option '''-password''' is not specified, '''mupdf''' will ask for a password. Read more about it in the section '''Working with Password Protected Files'''. ''pdfHandle'' '''upwd''' ''user-password'': ''pdfHandle'' '''opwd''' ''owner-password'': add or change respectively the user-password or the owner-password. In order to reset the passwords,set them to "". Read more about it in the section '''Working with Password Protected Files'''. ''pdfHandle'' '''quit''': close and destroy the ''pdfHandle'' without saving the changes. All the related resources (e.g. opened pages) will be closed. ''pdfHandle'' '''close''' ?'''-flatten''' ''bool''? ?'''-decrypt''' ''bool''?: close and destroy the ''pdfHandle'' and save the changes. All the related resources (e.g. opened pages) will be closed. See '''export''' for the meaning of the various options. ''pdfHandle'' '''fullname''': return the fully normalized pathname of the pdf-file. ''pdfHandle'' '''authentication''': return the current authentication mode for ''pdfHandle''. It may be '''none''' (no auth required), '''user''' (opened with user's password), '''owner''' (opened with owner's password). ''pdfHandle'' '''version''': return the document's internal PDF-version. ''pdfHandle'' '''npages''': return the number of pages. ''pdfHandle'' '''getpage''' ''n'': return a ''pageHandle'' to be used in subsequent operations. Note that first page is page 0. Note that if the requested page is currently opened, '''getpage''' reuses the handle of the opened page. ''pdfHandle'' '''anchor''' ''anchorName'': return the location of ''anchorName'' as list of 3 numbers: * a page number ( -1 if ''anchorName'' is not found ) * x displacement * y displacement x and y are hints for locating the anchor: they represent the displacement from the upper-left corner of the page (0,0). ''pdfHandle'' '''openedpages''': return the list of all ''pageHandles'' currently opened related to ''pdfHandle'' ''pdfHandle'' '''closeallpages''': close all currently opened pages related to ''pdfHandle'' ''pdfHandle'' '''fields''': return a list of field-records. Each field-record is a list of three elements: * the field-name * the field-type ('''button''', '''radiobutton''', '''checkbox''', '''text''', '''combobox''', '''listbox''', '''signature''' or '''unknown''') * the field-value Note that for a '''signature''' field, if a signature is present the returned field-value is simply the fixed string '''<>'''. ''pdfHandle'' '''field''' ''fieldname'': return the field's value, or raise an error if ''fieldname'' is not a valid field. Warning: field-names with accented characters or in general characters from latin alphabets are accepted (even not encorauged). Fields-names with characters from non-latin alphabets (greek, russian, chinese, ...) are not supported. ''pdfHandle'' '''field''' ''fieldname'' ''value'': set the field's value, or raise an error if ''fieldname'' is not a valid field. Warning: see limitations in '''Notes and limitations about field names and values''' for details. ''pdfHandle'' '''fieldattrib''' ''fieldname'': get all the field's attributes, as a list of ''options''/''values'', or raise an error if ''fieldname'' is not a valid field. Currently supported ''options'' are: * '''-readonly''' ''pdfHandle'' '''fieldattrib''' ''fieldname'' ?''options''?: set the field's attributes, or raise an error if ''fieldname'' is not a valid field. Currently supported ''options'' are: * '''-readonly''' ''bool'' ''pdfHandle'' '''flattenfield''' ''fieldname'' ?...?: flatten all the listed fields appearances (and removes all the listed fields from the field-table). Note that flattening N fields with a single command is far more efficient than running N flattening commands; this is especially true for documents with many pages. ''pdfHandle'' '''signatures''': return a list of signature field-records . Each field-record is a list of two elements: * the field-name * the field-value Empty signature fields (blanc signatures) have a field-value equal to "" (empty string). Currently, filled signature fields are simply denoted with the fixed string '''<>'''. ''pdfHandle'' '''haschanges''': return '''1''' if ''pdfHandle'' has been changed, otherwise '''0'''. ''pdfHandle'' '''canbesavedincrementally''': return '''1''' if ''pdfHandle'' can be saved in incremental-mode, otherwise '''0'''. ''pdfHandle'' '''export''' ''filename'' ?'''-flatten''' ''bool''? ?'''-decrypt''' ''bool''?: save the current document and its changes in an alternative ''filename''. Valid options are: * '''-flatten''' ''bool'' - (**DEPRECATED**) flat all form's fields and annotations (default is '''false''') * '''-decrypt''' ''bool'' - ((**DEPRECATED**) if ''bool'' is '''true''' remove the password protection from an encrypted pdf. If ''bool'' is '''false''' keep the original password protection. Note that the target ''filename'' should be different from the original pdf-file related with ''pdfHandle'' and in general, different from the name of any pdf-file currently opened in this process. When a ''pdfHandle'' is closed (see '''mupdf::close''' or "''pdfHandle'' '''close'''"), all the changes will be saved to its original pdf-file. (see also '''mupdf::quit''' or "''pdfHandle'' '''quit'''" for closing without saving changes). Saving with the '''-flatten''' option, is now (**DEPRECATED**). You should use "''pdfHandle'' '''flattenfield''' ''fieldname1'' ..." and then save/export the document. Saving with the '''-decrypt''' option is now (**DEPRECATED**). The default is to remove the password protection, unless the '''opwd''' or '''upwd''' commands explicitely set new passwords. ''pdfHandle'' '''portfolio''' '''list''': return the list of the embedded files ''pdfHandle'' '''portfolio''' '''extract''' ''i'' ?'''-dir''' ''pathname''? ?'''-as''' ''filename''?: extract the ''i-th'' embedded file. ( i >= 0 ) If option '''-dir''' is not specified, the embedded file is saved in the current directory. If option '''-as''' is not specified, embedded file is saved with its original name ''pdfHandle'' '''search''' ''needle'' ?'''-startpage''' ''pagenum''? ?'''-max''' ''hits''?: search the string ''needle'' starting from page-number ''pagenum'' (default is page 0) and returning up to ''hits'' results (default is 10). The result of '''search''' subcommand is a list of ''page-positions''. Each ''page-positions'' is a list of two elements: * the page-number * a list with the 4 coords of the box enclosing the searched ''needle'' Next results can be retrived with the '''search..more''' subcommand. ''pdfHandle'' '''search..more''' ?'''-max''' ''hits''?: return a list of the next ''hits'' elements matching the last given ''needle''. The result of '''search..more''' subcommand is a list of ''page-positions'' similar to the list returned by '''search'''. ''pdfHandle'' '''graft''' ''pageHandle'': import in the currently open document referred by ''pdfHandle'' the full content of the page referred by ''pageHandle''. The page contents are just imported, not displayed, and should be controlled via the next '''embed''' command. The result of '''graft''' is a ''graftID'' that should be used by the next '''embed ...''' command for displaying the image of the source page within one or more pages of the document. Note: this page-image is not a raster-image; it's vector based and hence remains accurate across zooming levels. Note: ''graftID'' is unique for each ''opened'' document ''pdfHandle'' and it's valid until ''pdfHandle'' is closed. ''pageHandle'' '''embed''' ''graftID'' ?...options...?: put a copy of a grafted page in the page referred by ''pageHandle''. ''pageHandle'' should be a page of the same opened document who returned that ''graftID''. Valid options are: * '''-over''' ''bool'' - displays the contents of the grafted page over (or below) the contents of the current page (default is '''true'''). * '''-angle''' ''theta'' - rotate the grafted imaged by ''theta'' degrees (-360.0 ..+360.0), Default is '''0.0'''. * '''-zoom''' ''factor'' - enlarge/shrink the grafted imaged by a factor of ''factor'' (default is '''1.0'''). * '''-from''' ''x0'' ''y0'' ?''x1'' ''y1''? - just copy a rectangular sub-region of the grafted image. (x0,y0) and (x1,y1) specify diagonally opposite corners of the rectangle; (0,0) is the upper-left corner of the (visible portion of the) grafted page. If ''x1'' and ''y1'' are not specified, the default value is the bottom-right corner of the grafted page. * '''-to''' ''x0'' ''y0'' ?''x1'' ''y1''? - place the grafted page in a rectangular subregion of the destination page. If ''x0'' and ''y0'' are not specifified, the grafted page is placed at (0,0) i.e. at the upper-left corner of the destination page. If ''x1'' and ''y1'' are not specified, the default value is the bottom-right corner of the destination page. ''pageHandle'' '''annots''': return the list of annotations-ID on ''pageHandle''. ''pageHandle'' '''annot''' ''annotID'': return the list of the (so far recognized) ''annotID'''s attributes. Two special read-only attributes '''-type''' and '''-rect''' are provided for each annotation. ''pageHandle'' '''annot''' ''annotID'' ''attribute'': get the value of ''attribute'' for the annotation ''annotID''. ''pageHandle'' '''annot''' ''annotID'' ''attribute'' ''value'' ?''attribute'' ''value'' ...?: set the the value of one or more attributes of the annotation ''annotID''. ''pageHandle'' '''annot''' ''annotID'' '''delete''': delete the annotation ''annotID'' ''pageHandle'' '''annot''' '''create''' ''type'' ?''attribute'' ''value'' ...?: create a new annotation. Currently supported types are: '''highlight''', '''underline''', '''strikeout''', '''squiggly'''. Common attributes for each supported type are: * -color { R G B } - R,G,B must be decimal numers between 0.0 and 1.0 Specific options for '''highlight''', '''underline''', '''strikeout''', '''squiggly''' are: * -vertices {''x0'' ''y0'' ''x1'' ''y1'' ... } - a list of 4x integers denoting the bounding boxes of the text to be marked. Note: these coordinates are not necessarily related to the text on a page. A new ''annotationID'' is returned. ''pageHandle'' '''blocks''': return the list of image/text blocks in ''pageHandle''. A block is a list of 5 elements with the following format: '''textblock'''|'''imageblock''' ''x0'' ''y0'' ''x1'' ''y1'' ''pageHandle'' '''close''': close and free all the resources of the page referred by ''pageHandle''. the handle ''pdfHandle'' is destroyed. ''pageHandle'' '''size''': return the physical size of the page as a list of two decimal numbers. Note that page size is expressed in ''points'', i.e. 1/72 inch. ''pageHandle'' '''docref''': return a reference to the related pdf-document as a ''pdfHandle'' ''pageHandle'' '''lines''': return the list of the bbox of the text lines in ''pageHandle''. A bbox is a list of 4 elements with the following format: ''x0'' ''y0'' ''x1'' ''y1'' ''pageHandle'' '''pagenumber''': return the pagenumber of ''pageHandler'' ''pageHandle'' '''savePNG''' ''filename'' ?'''-zoom''' ''zoom''? ?'''-from''' ''x0'' ''y0'' ''x1'' ''y1''?: render the page in a .png file named ''filename''. With a default '''-zoom''' factor equal to '''1.0''', a page whose size is W x H ''points'' is rendered as a raster image of W x H ''pixels''. If '''-zoom''' is specified, the resulting image size is scaled by a factor of ''zoom''. By default the whole page is rendered; the '''-from''' option, allows you to render only a given rectangular area of the page. ''x0'' ''y0'' are the coords of the top-left corner and ''x1'' ''y1'' are for the bottom-right corner. These coords must be expressed in terms of the physical size of the page, i.e in ''points'' Note that if these coords lies outside of the page, only the ''intersection'' of this area with the page area is rendered. ====== ... set page [$pdf getpage 0] ;# 0 is the 1st page lassign [$page size] dx dy # save just the upper half of the page $page savePNG /mydir/page0.png -zoom 2.25 -from 0 0 $dx [expr $dy/2] ... ====== ''pageHandle'' '''saveImage''' ''image'' ?'''-zoom''' ''zoom''? ?'''-from''' ''x0'' ''y0'' ''x1'' ''y1''? ?'''-to''' ''x0'' ''y0''?: render the page in an existing Tk's photo ''image''. The width and/or height of ''image'' are unchanged if the user has set on it an explicit image width or height (with the -width and/or -height configuration options, respectively). About the '''-zoom''' and '''-from''' options, the same rules for the '''savePNG''' apply. Option '''-to''' allows you to place the resulting raster image at the ''x0'' ''y0'' coords of the destination ''image''. By default, is '''-to''' '''0.0''' '''0.0''' NOTE: this command is not available with the package '''mupdf-notk'''. ''pageHandle'' '''addsigfield''' ''fieldname'' ''x0'' ''y0'' ''x1'' ''y1'': add a blank signature field in a rectangular box at coords ''x0'' ''y0'' ''x1'' ''y1''. ''fieldname'' must be unique among the existing field names. ''pageHandle'' '''search''' ''needle'' ?'''-fromtop''' ''true/false''? ?'''-max''' ''hits''?: search the string ''needle'' in the current page and return up to ''hits'' ''positions'' (default is 10). By default search starts from top of the page. If you need to find the next hits, use option '''-fromtop''' '''false'''. The result of '''search''' subcommand is a list of ''positions''. Each ''positions'' is a list with the 4 coords of the box enclosing the searched needle. ''pageHandle'' '''images''' '''list''' ?'''-id''' ''imageID''?: return a list of all the images contained in the page referred by ''pageHandle''. The result of '''images''' '''list''' subcommand is a list of ''image-records''. Each ''image-record'' is a list of six elements: * image-ID (unique for each page) * image's width (in pixel) * image's height (in pixel) * image's colorspace (e.g. DeviceRGB, ICCBased, ...) * image's bit per component (number of color components may be inferred from colorspace) * mask-flag : '''1''' means that the image has a pixel-mask (i.e. some transparent pixels) If option '''-id''' is present, the resulting list is limited to the ''image-record'' for ''imageID''. ''pageHandle'' '''images''' '''extract''' ?'''-id''' ''imageID''? ?'''-dir''' ''pathname''? ?'''-as''' ''pattern''? ?'''-transparency''' ''boolean''?: if option '''-id''' is specified, extract and save the image referred by ''imageID'' (see '''images''' '''list'''). If option '''-id''' is missing, all the images contained in a page are extracted and saved. If option '''-dir''' is not specified, images are saved in the current directory. If option '''-as''' is not specified, images are saved with a name derived from the default '''mupdf::imagenamesformat''' (see below), otherwise images are saved as ''pattern'' (for ''pattern'' rules see below for '''mupdf::imagenamesformat'''). if option '''-transparency''' is '''true''', save the (semi)transparent pixels (if any), otherwise transparent pixels (if any) are rendered as white pixels. '''extract''' returns a list of ''extracted-records''. Each ''extracted-record'' is a list of three elements: * image-ID (unique for each page) * image-name (pdf page's internal name) * saved filename (empty string if the image was skipped (unknow format...)) '''mupdf::imagenamesformat''' ?''pattern''?: a ''pattern'' is a parametric filename specification similar to the format specification used by the C printf function. If ''pattern'' is not specified, return the currently defined default pattern. If ''pattern'' is specified, set the default pattern. When the "''pageHandle'' '''images''' '''extract''' ..." command is called, all the images are saved with a filename based on a ''pattern'' (This pattern can be explicit, if option '''-as''' is present, or it can be implicit, based on the default '''mupdf::imagenamesformat'''). a ''pattern'' is a a simple filename specification (just the base-name, since the extension of the extracted images (png, jpg, ...) is automatically determined.) with zero or more special symbols like the following ones: * %p : page number (first page is 1) * %P : total number of pages * %i : image number - images in a page are numbered starting from 1 * %I : total number of images (in the current page) Special symbols may also be written with a padding-specification like '''%5P'''; this notation means the the symbol %P should be padded as a 5-character string with leading '0's. If more than one image is extracted in a single operation, and ''pattern'' does not contain the '''%i''' symbol (the image number), then '''-%i''' is implicitely appended in order to avoid a filename collision. Assuming that the current page is the page 123 (i.e the 124th page), the following command ====== $pageHandle images extract -as "IMG_%5p" ====== will generate IMG_00124-1.jpg, IMG_00124-2.jpg ..... (note: the file extension may be different) ====== $pageHandle images extract -as "Z%5p(%2i)" ====== will generate Z00124(01).jpg, Z00124(02).jpg ..... (note: the file extension may be different) '''mupdf::close''' ''handle'' (**DEPRECATED**): This command has been deprecated since mupdf 1.5. Use "''pdfHandle'' '''close''' ''....''" or "''pageHandle'' '''close'''". if ''handle'' refers to a ''pageHandle'', close the page. if ''handle'' refers to a ''pdfHandle'', close the pdf (updating its original pdf-file) and all its opened pages. '''mupdf::quit''' ''pdfHandle'' (**DEPRECATED**): close and destroy the ''pdfHandle'' without saving the changes. This command has been deprecated since mupdf 1.5 . Use "''pdfHandle'' '''quit'''" . '''mupdf::isobject''' ''handle'': return 1 if ''handle'' is a valid reference to a pdf or a page. '''mupdf::type''' ''handle'': return '''document''' or '''page''' if ''handle'' is a valid reference, else raise an error. '''mupdf::documents''' : return a list of the currently opened ''pdfHandles'' '''mupdf::documentnames''' : return a list of pdf-filenames currently opened (fully normalized filenames). '''mupdf::isopen''' ''filename'': check if ''filename'' is among the currently opened pdf-files. '''mupdf::cli_passwordhelper''' ?'''helperProc'''?: get/set the helper procedure for shell-like applications. If '''helperProc''' is '''''', then the default helper is re-set. See section '''Working with Password Protected Files'''. '''mupdf::tk_passwordhelper''' ?'''helperProc'''?: get/set the helper procedure for applications with a Tk graphical interface. If '''helperProc''' is '''''', then the default helper is re-set. See section '''Working with Password Protected Files'''. '''mupdf::libinfo''' : return specific attributes of the underlying MuPdf libray as a list of keywords and their values. The provided keywords are: '''version''': The version of the underlying MuPDf library ..more to come ..: **Working with Password Protected Files** You can open a password-protected PDF in a non-interactive or in an interactive way. With the first way, non-interactive way, you must provide in advance a password ====== set pdf [mupdf::open book.pdf -password "123open"] ====== If password is wrong then an error is raised; look at the error message and check for the errorcode: it should be '''MUPDF WRONGPASSWORD'''. Note that there's no distinction between owner's password and user's password; if you provide the right owner's password, the PDF is opened in owner-mode, if you provide the right user's password, the PDF is opened in user-mode. You can check if PDF has been opened in user-mode or owner-mode by calling the '''authentication''' command. ====== set pdf [mupdf::open book.pdf -password "123open"] # if we are here, it means that password was OK set mode [$pdf authentication] # "none" means that book.pdf was not password-protected # "user" means that the supplyed password was the user's password # "owner" means ... ====== The second way , interactive way, does not require to provide an explicit password; if the PDF is password-protected, '''mupdf''' will ask for a password. Depending on the nature of your application (with or without Tk), '''mupdf''' will select and call a predefined (yet customizable) helper procedure. These predefined, built-in procedures can be changed with the '''mupdf::cli_passwordhelper''' or '''mupdf::tk_passwordhelper''' command. ====== # template for a (shell-like) password-helper procedure proc mypswdhelper {filename} { ... ask for a password .... .... get the password .... .... return $password } mupdf::cli_passwordhelper mypswdhelper # template for a (GUI) password-helper procedure proc myGUIpswdhelper {filename} { ... raise some popup ... ... ask for a password .... .... get the password ... .... close the popup return $password } mupdf::tk_passwordhelper myGUIpswdhelper ====== **Removing protection from a password protected files** If you open a password-protected file and export it, by default the resulting PDF is saved without password. You can explicitely control this behaviour by appending the '''-decrypt''' option to the '''export''' or '''close''' commands (default is '''-decrypt''' '''true'''). **Adding or changing a password** After opening a PDF, you can add or change the user/owner password with the '''opwd''' or the '''upwd'''commands,like in the following example ====== set pdf [mupdf::open mybook.pdf] # if mybook.pdf is proteccted, insert the password $pdf upwd "my user-password" $pdf opwd "my owner-password" $pdf close ;# ..or $pdf export ... ====== **Notes and limitations about field names and values** Due to current limitations of the underlying MuPdf library, field names and values should be limited to latin alphabets only. See the note enclosed with MuPdf's BUG 696478 (Nov 2018): '' ... "There is, however, a limitation with our form filling support that still needs to be addressed: it only supports PDFDocEncoding, not Unicode. '' '' ... "We can currently fill out the form fields in any language, but the display code still only handles the Latin alphabet." '' This means that you can enter text in a form field in any alphabet (tested with Russian, Chinese, Greek, ...) but the rendered text for the non-latin alphabets will appear wrong. Another major limitation is the rendering of fields other that 'text' widgets, i.e. checkbox, radiobuttons... For these kind of widgets the support of the underlying library is still incomplete, in particular for values expressed in non-ASCII characters (e.g. accented letters ... ) **Limitations** Full support for '''portfolio''' management is still incomplete; commands for adding/removing/reordering embedded files will require a more robust mupdf-core implementation. ** KEYWORDS ** pdf, photo ** CATEGORY ** pdf parsing and rendering ** COPYRIGHT ** Copyright (c) 2020, by A.Buratti <> Reference manual - tclMuPDF 2.1 ---- '''tclMuPDF 2.1''' Tcl meets MuPDF **SYNOPSIS** package require '''Tcl 8.6''' package require '''MuPDF ?2.1?''' * '''mupdf::open''' ''filename'' ?'''-password''' ''password''? * ''pdfObj'' '''authentication''' * ''pdfObj'' '''upwd''' ''user-password'' * ''pdfObj'' '''opwd''' ''owner-password'' * ''pdfObj'' '''removepassword''' * ''pdfObj'' '''quit''' * ''pdfObj'' '''close''' * ''pdfObj'' '''fullname''' * ''pdfObj'' '''version''' * ''pdfObj'' '''anchor''' ''anchorName'' * ''pdfObj'' '''npages''' * ''pdfObj'' '''getpage''' ''n'' * ''pdfObj'' '''openedpages''' * ''pdfObj'' '''ispageopened''' ''pageNum'' * ''pdfObj'' '''closepage''' ''pageNum'' * ''pdfObj'' '''closeallpages''' * ''pdfObj'' '''fields''' * ''pdfObj'' '''field''' ''fieldname'' * ''pdfObj'' '''field''' ''fieldname'' ''value'' * ''pdfObj'' '''fieldattrib''' ''fieldname'' * ''pdfObj'' '''fieldattrib''' ''fieldname'' ?''options''? * ''pdfObj'' '''flattenfield''' ''fieldname'' ?...? * ''pdfObj'' '''signatures''' * ''pdfObj'' '''addsigfield''' ''fieldname'' ''pageNumber'' ''x0'' ''y0'' ''x1'' ''y1'' * ''pdfObj'' '''haschanges''' * ''pdfObj'' '''export''' ''filename'' * ''pdfObj'' '''portfolio''' '''list''' * ''pdfObj'' '''portfolio''' '''extract''' ''i'' ?'''-dir''' ''pathname''? ?'''-as''' ''filename''? * ''pdfObj'' '''newsearch''' * ''pdfObj'' '''graft''' ''pageObj'' * ''pdfObj'' '''grafts''' * ''pdfObj'' '''embed''' ''graftID'' ''pageNumber'' ?...options...? * ''searchObj'' '''destroy''' * ''searchObj'' '''find''' ''searchStr'' ?'''-max''' ''hits''? ?'''-currpageonly''' ''boolean''? * ''searchObj'' '''currpage''' * ''searchObj'' '''currpage''' ''n'' * ''searchObj'' '''docref''' * ''pageObj'' '''annots''' * ''pageObj'' '''annot''' '''get''' ''annotID'' * ''pageObj'' '''annot''' ''annotID'' * ''pageObj'' '''annot''' '''get''' ''annotID'' ''attribute'' * ''pageObj'' '''annot''' ''annotID'' ''attribute'' * ''pageObj'' '''annot''' '''set''' ''annotID'' ''attribute'' ''value'' ?''attribute'' ''value'' ...? * ''pageObj'' '''annot''' ''annotID'' ''attribute'' ''value'' ?''attribute'' ''value'' ...? * ''pageObj'' '''annot''' '''delete''' ''annotID'' ?''annotID'' ...? * ''pageObj'' '''annot''' '''flatten''' ''annotID'' ?''annotID'' ...? * ''pageObj'' '''annot''' '''create''' ''type'' ?''attribute'' ''value'' ...? * ''pageObj'' '''blocks''' * ''pageObj'' '''close''' * ''pageObj'' '''destroy''' * ''pageObj'' '''size''' * ''pageObj'' '''docref''' * ''pageObj'' '''pagenumber''' * ''pageObj'' '''lines''' * ''pageObj'' '''text''' ?'''-bbox''' '''none'''|'''blocks'''|'''lines'''? * ''pageObj'' '''savePNG''' ''filename'' ?'''-zoom''' ''zoom''? ?'''-from''' ''x0'' ''y0'' ''x1'' ''y1''? * ''pageObj'' '''saveImage''' ''image'' ?'''-zoom''' ''zoom''? ?'''-from''' ''x0'' ''y0'' ''x1'' ''y1''? ?'''-to''' ''x0'' ''y0''? * ''pageObj'' '''images''' '''list''' ?'''-id''' ''imageID''? * ''pageObj'' '''images''' '''extract''' ?'''-id''' ''imageID''? ?'''-dir''' ''pathname''? ?'''-as''' ''pattern''? ?'''-transparency''' ''boolean''? * '''mupdf::imagenamesformat''' ?''pattern''? * '''mupdf::isobject''' ''obj'' * '''mupdf::classes''' * '''mupdf::classinfo''' ''obj'' * '''mupdf::documents''' * '''mupdf::documentnames''' * '''mupdf::isopen''' ''filename'' * '''mupdf::Doc''' '''names''' * '''mupdf::Page''' '''names''' * '''mupdf::TextSearch''' '''names''' * '''mupdf::cli_passwordhelper''' ?'''helperProc'''? * '''mupdf::tk_passwordhelper''' ?'''helperProc'''? * '''mupdf::libinfo''' **DESCRIPTION** Package '''MuPDF''' integrates the http://mupdf.com%|%MuPDF%|% framework in Tcl. The focus of http://mupdf.com%|%MuPDF%|% is on speed, small code size, and high-quality anti-aliased rendering. The main goal of this integration is to generate images of the pdf pages, in a .png format, or directly in a Tk's photo image type. MuPDF also provides a partial support for editing PDFs, allowing to fill form-fields and working with annotations. Thanks to its speed '''MuPDF''' can be used for building interactive pdf-viewers with high-quality and real-time zooming. '''MuPDF''' is a binary package, distributed in a multi-platform bundle, i.e. it can be used on * Windows 64 bit * Linux 64 bit * MacOS 64 bit Just an example to get the flavor of how to use '''MuPDF''': ====== # open a file and save 1st page as a .png file package require tclMuPDF set pdf [mupdf::open /mydir/sample.pdf] set page [$pdf getpage 0] ;# 0 is the 1st page $page savePNG /mydir/page0.png $pdf quit ====== ***MuPDF 2.x and mupdf 1.x*** Please note that MuPDF-2.x is a new major release breaking the backward compatibility with 1.x. In order to support these new features, and for the next planned features, MuPDF-2.x has been totally redesigned. In Appendix you can find a quick summary of the changes and how to adapt your old code. See '''From mupdf 1.x to MuPDF 2.x''' ***MuPDF with and without Tk*** '''MuPDF''' is provided in two variants: * '''tkMuPDF''' (or simply '''MuPDF''') is the full package, it requires '''Tk''' * '''tclMuPDF''' does not require '''Tk'''. You will be still able to save images as PNG files, but of course some subcommands related to Tk won't be available (e.g '''saveImage''' ) **MuPDF Commands** The MuPDF package is based on 2 main classes '''mupdf::Doc''', '''Mupdf::Page''' whose instances correspond to a PDF-document and its pages, plus an auxiliary class '''mupdf::TextSearch''' whose purpose is to manage progressive searches across all the pages of a document ***mupdf::Doc methods*** '''mupdf::open''' ''filename'' ?'''-password''' ''password''?: This is the main command: it opens the PDF-file ''filename'' and returns a ''pdfObj'' to be used in subsequent operations. If ''filename'' is password-protected, you may specify a ''password'' adding the option '''-password'''; if option '''-password''' is not specified, '''MuPDF''' will ask for a password. Read more about it in the section '''Working with Password Protected Files'''. ''pdfObj'' '''authentication''': return the current authentication mode for ''pdfObj''. It may be '''none''' (no auth required), '''user''' (opened with user's password), '''owner''' (opened with owner's password). ''pdfObj'' '''upwd''' ''user-password'': ''pdfObj'' '''opwd''' ''owner-password'': add or change respectively the user-password or the owner-password. In order to reset the passwords,set them to "". Read more about it in the section '''Working with Password Protected Files'''. ''pdfObj'' '''removepassword''': reset both the user-password and the owner-passwords. ''pdfObj'' '''quit''': close and destroy the ''pdfObj'' without saving the changes. All the related resources (e.g. opened pages ..) will be destroyed. ''pdfObj'' '''close''': save the changes (if any) and then close and destroy the ''pdfObj''. All the related resources (e.g. opened pages ..) will be destroyed. ''pdfObj'' '''fullname''': return the fully normalized pathname of the pdf-file. ''pdfObj'' '''version''': return the document's internal PDF-version. ''pdfObj'' '''anchor''' ''anchorName'': return the location of ''anchorName'' as list of 3 numbers: * a page number ( -1 if ''anchorName'' is not found ) * x displacement * y displacement x and y are hints for locating the anchor: they represent the displacement from the upper-left corner of the page (0,0). ''pdfObj'' '''npages''': return the number of pages. ''pdfObj'' '''getpage''' ''n'': return a ''pageObj'' to be used in subsequent operations. See '''mupdf::Page methods'''. Note that first page is page 0. Note that if the requested page is currently opened, '''getpage''' reuses the handle of the opened page. ''pdfObj'' '''openedpages''': return the list of all currently opened pages (as page-numbers) related to ''pdfObj'' ''pdfObj'' '''ispageopened''' ''pageNum'': return 1 if page ''pageNum'' is currently opened, else 0. ''pdfObj'' '''closepage''' ''pageNum'': close the page referred bu ''pageNum''. No error is returned if ''pageNum'' does not refer to an opened page. ''pdfObj'' '''closeallpages''': close all currently opened pages related to ''pdfObj'' ''pdfObj'' '''fields''': return a list of field-records. Each field-record is a list of three elements: * the field-name * the field-type ('''button''', '''radiobutton''', '''checkbox''', '''text''', '''combobox''', '''listbox''', '''signature''' or '''unknown''') * the field-value Note that for a '''signature''' field, if a signature is present the returned field-value is simply the fixed string '''<>'''. ''pdfObj'' '''field''' ''fieldname'': return the field's value, or raise an error if ''fieldname'' is not a valid field. Warning: field-names with accented characters or in general characters from latin alphabets are accepted (even not encorauged). Fields-names with characters from non-latin alphabets (greek, russian, chinese, ...) are not supported. ''pdfObj'' '''field''' ''fieldname'' ''value'': set the field's value, or raise an error if ''fieldname'' is not a valid field. Warning: see limitations in '''Notes and limitations about field names and values''' for details. ''pdfObj'' '''fieldattrib''' ''fieldname'': get all the field's attributes, as a list of ''options''/''values'', or raise an error if ''fieldname'' is not a valid field. Currently supported ''options'' are: * '''-readonly''' ''pdfObj'' '''fieldattrib''' ''fieldname'' ?''options''?: set the field's attributes, or raise an error if ''fieldname'' is not a valid field. Currently supported ''options'' are: * '''-readonly''' ''bool'' ''pdfObj'' '''flattenfield''' ''fieldname'' ?...?: flatten all the listed fields appearances (and removes all the listed fields from the field-table). Note that flattening N fields with a single command is far more efficient than running N flattening commands; this is especially true for documents with many pages. ''pdfObj'' '''signatures''': return a list of signature field-records . Each field-record is a list of two elements: * the field-name * the field-value Empty signature fields (blanc signatures) have a field-value equal to "" (empty string). Currently, filled signature fields are simply denoted with the fixed string '''<>'''. ''pdfObj'' '''addsigfield''' ''fieldname'' ''pageNumber'' ''x0'' ''y0'' ''x1'' ''y1'': add a blank signature field on the page number ''pageNumber'' in a rectangular box at coords ''x0'' ''y0'' ''x1'' ''y1''. ''fieldname'' must be unique among the existing field names. ''pdfObj'' '''haschanges''': return '''1''' if ''pdfObj'' has been changed, otherwise '''0'''. ''pdfObj'' '''export''' ''filename'': save the current document and its changes in an alternative ''filename''. Note that in general, ''filename'' should be different from the name of any pdf-file currently opened in this process. The only exception is when ''filename'' is the same name of ''pdfObj'', but in this case you MUST quit ''pdfObj'' just after export. ''pdfObj'' '''portfolio''' '''list''': return the list of the embedded files ''pdfObj'' '''portfolio''' '''extract''' ''i'' ?'''-dir''' ''pathname''? ?'''-as''' ''filename''?: extract the ''i-th'' embedded file. ( i >= 0 ) If option '''-dir''' is not specified, the embedded file is saved in the current directory. If option '''-as''' is not specified, embedded file is saved with its original name ''pdfObj'' '''newsearch''': return a new ''searchObj'' object for performing text-search on the ''pdfObj'' document. See '''mupdf::TextSearch methods''' ''pdfObj'' '''graft''' ''pageObj'': import in the currently open document referred by ''pdfObj'' the full content of the page referred by ''pageObj''. The page contents are just imported, not displayed, and should be inserted in ''pdfObj'' via the next '''embed''' command. The result of '''graft''' is a ''graftID'' that should be used by the next '''embed ...''' command for inserting the source page within one or more pages of the document. Note: this page-image is not a raster-image; it's vector based and hence remains accurate across zooming levels. Note: ''graftID'' is unique for each ''opened'' document ''pdfObj'' and it's valid until ''pdfObj'' is closed. ''pdfObj'' '''grafts''': return the list of the currently grafted pages ''pdfObj'' '''embed''' ''graftID'' ''pageNumber'' ?...options...?: embed a copy of a grafted page in the page referred by ''pageNumber''. Valid options are: * '''-over''' ''bool'' - displays the contents of the grafted page over (or below) the contents of the current page (default is '''true'''). * '''-angle''' ''theta'' - rotate the grafted imaged by ''theta'' degrees (-360.0 ..+360.0), Default is '''0.0'''. * '''-zoom''' ''factor'' - enlarge/shrink the grafted imaged by a factor of ''factor'' (default is '''1.0'''). * '''-from''' ''x0'' ''y0'' ?''x1'' ''y1''? - just copy a rectangular sub-region of the grafted image. (x0,y0) and (x1,y1) specify diagonally opposite corners of the rectangle; (0,0) is the upper-left corner of the (visible portion of the) grafted page. If ''x1'' and ''y1'' are not specified, the default value is the bottom-right corner of the grafted page. * '''-to''' ''x0'' ''y0'' ?''x1'' ''y1''? - place the grafted page in a rectangular subregion of the destination page. If ''x0'' and ''y0'' are not specifified, the grafted page is placed at (0,0) i.e. at the upper-left corner of the destination page. If ''x1'' and ''y1'' are not specified, the default value is the bottom-right corner of the destination page. ***mupdf::TextSearch methods*** Instances of TextSearch objects allows to perform text-search on a given ''pdfObj'' document. TextSearch object can be created in two ways: ====== set searchObj [mupdf::TextSearch new $pdfObj] or set searchObj [$pdfObj newsearch] ====== List of TextSearch methods: ''searchObj'' '''destroy''': destroy the ''searchObj'' object. ''searchObj'' '''find''' ''searchStr'' ?'''-max''' ''hits''? ?'''-currpageonly''' ''boolean''?: search for the string ''searchStr'' starting from the current search-position. Return a list of up to ''hits'' matches (default is 10); the current position is then accordling advanced. The result of the '''find''' subcommand is a list of ''page-positions''. Each ''page-positions'' is a list of two elements: * the page-number * a list with the 4 coords of the box enclosing the searched ''searchStr'' If option '''-currpageonly''' is true, the search is limited to the page holding the current search-position. ''searchObj'' '''currpage''': return the current search-position as a page-number ''searchObj'' '''currpage''' ''n'': set the current search-position at the beginning of page ''n''. ''searchObj'' '''docref''': return a reference to the related pdf-document as a ''pdfObj'' ***mupdf::Page methods*** A Page object can be created starting from a ''pdfObj'' document with the following command: ====== set pageObj [$pdfObj getpage _n_] ====== If the requested page is currently opened, '''getpage''' reuses the handle of the opened page. Note that when ''pdfObj'' is closed, all its opened pages are automatically closed/destroyed. ''pageObj'' '''annots''': return the list of annotations-ID on ''pageObj''. Annotations ID are unique within the whole PDF. ''pageObj'' '''annot''' '''get''' ''annotID'': ''pageObj'' '''annot''' ''annotID'': return the list of the ''annotID'''s attributes. Two special read-only attributes '''-type''' and '''-rect''' are provided for each annotation. ''pageObj'' '''annot''' '''get''' ''annotID'' ''attribute'': ''pageObj'' '''annot''' ''annotID'' ''attribute'': get the value of ''attribute'' for the annotation ''annotID''. ''pageObj'' '''annot''' '''set''' ''annotID'' ''attribute'' ''value'' ?''attribute'' ''value'' ...?: ''pageObj'' '''annot''' ''annotID'' ''attribute'' ''value'' ?''attribute'' ''value'' ...?: set the the value of one or more attributes of the annotation ''annotID''. ''pageObj'' '''annot''' '''delete''' ''annotID'' ?''annotID'' ...?: delete one or more annotations ''pageObj'' '''annot''' '''flatten''' ''annotID'' ?''annotID'' ...?: flatten one or more annotations ''pageObj'' '''annot''' '''create''' ''type'' ?''attribute'' ''value'' ...?: create a new annotation. Currently supported types are: '''highlight''', '''underline''', '''strikeout''', '''squiggly''' and '''stamp'''. Common attributes for each supported type are: * '''-color''' tkcolor - color must be a simbolic name (lightblue) or #RRGGBB Specific options for '''highlight''', '''underline''', '''strikeout''', '''squiggly''' are: * '''-vertices''' {''x0'' ''y0'' ''x1'' ''y1'' ... } - a list of 4x numbers denoting the bounding boxes of the text to be marked. Note: these coordinates are not necessarily related to the text on a page. Specific options for '''stamp''' are: * '''-xobject''' ''xObjectID'' - the ID of a grafted page (see '''graft''' method ) * '''-rotate''' ''angle'' - the rotation anngle (in degree) * '''-scale''' ''s'' - the zoom factor * '''-to''' {''x0'' ''y0''} - a list of 2 numbers. A new ''annotationID'' is returned. ''pageObj'' '''blocks''': return the list of image/text blocks in ''pageObj''. A block is a list of 5 elements with the following format: '''textblock'''|'''imageblock''' ''x0'' ''y0'' ''x1'' ''y1'' ''pageObj'' '''close''': ''pageObj'' '''destroy''': close and free all the resources of the page referred by ''pageObj''. the object ''pdfObj'' is destroyed. ''pageObj'' '''size''': return the physical size of the page as a list of two decimal numbers. Note that page size is expressed in ''points'', i.e. 1/72 inch. ''pageObj'' '''docref''': return a reference to the related pdf-document as a ''pdfObj'' ''pageObj'' '''pagenumber''': return the pagenumber of ''pageObjr'' ''pageObj'' '''lines''': return the list of the bbox of the text lines in ''pageObj''. A bbox is a list of 4 elements with the following format: ''x0'' ''y0'' ''x1'' ''y1'' ''pageObj'' '''text''' ?'''-bbox''' '''none'''|'''blocks'''|'''lines'''?: extract the plain text from the page ''pageObj'' with the (optional) bounding-box of each block or line. * If '''-bbox''' is missing or equal to '''none''', this method returns just the plain text. * If '''-bbox''' is equal to '''blocks''', this method returns a list like the following ====== {bbox of 1st block} {plain text of the 1st block} ... {bbox of the Nth block} {plain text of the Nth block} ====== * If '''-bbox''' is equal to '''lines''', this method returns a list like the following ====== {bbox of the 1st block} { {bbox of the 1st line} {plain text of the 1st line} ... {bbox of the Nth line} {plain text of the Nth line} } ... {bbox of the Nth block} { {bbox of the 1st line} {plain text of the 1st line} ... {bbox of the Nth line} {plain text of the Nth line} } ====== ''pageObj'' '''savePNG''' ''filename'' ?'''-zoom''' ''zoom''? ?'''-from''' ''x0'' ''y0'' ''x1'' ''y1''?: render the page in a .png file named ''filename''. With a default '''-zoom''' factor equal to '''1.0''', a page whose size is W x H ''points'' is rendered as a raster image of W x H ''pixels''. If '''-zoom''' is specified, the resulting image size is scaled by a factor of ''zoom''. By default the whole page is rendered; the '''-from''' option, allows you to render only a given rectangular area of the page. ''x0'' ''y0'' are the coords of the top-left corner and ''x1'' ''y1'' are for the bottom-right corner. These coords must be expressed in terms of the physical size of the page, i.e in ''points'' Note that if these coords lies outside of the page, only the ''intersection'' of this area with the page area is rendered. ====== ... set page [$pdf getpage 0] ;# 0 is the 1st page lassign [$page size] dx dy # save just the upper half of the page $page savePNG /mydir/page0.png -zoom 2.25 -from 0 0 $dx [expr $dy/2] ... ====== ''pageObj'' '''saveImage''' ''image'' ?'''-zoom''' ''zoom''? ?'''-from''' ''x0'' ''y0'' ''x1'' ''y1''? ?'''-to''' ''x0'' ''y0''?: render the page in an existing Tk's photo ''image''. The width and/or height of ''image'' are unchanged if the user has set on it an explicit image width or height (with the -width and/or -height configuration options, respectively). About the '''-zoom''' and '''-from''' options, the same rules for the '''savePNG''' apply. Option '''-to''' allows you to place the resulting raster image at the ''x0'' ''y0'' coords of the destination ''image''. By default, is '''-to''' '''0.0''' '''0.0''' NOTE: this command is not available with the package '''tclMuPDF'''. ''pageObj'' '''images''' '''list''' ?'''-id''' ''imageID''?: return a list of all the images contained in the page referred by ''pageObj''. The result of '''images''' '''list''' subcommand is a list of ''image-records''. Each ''image-record'' is a list of six elements: * image-ID (unique for each page) * image's width (in pixel) * image's height (in pixel) * image's colorspace (e.g. DeviceRGB, ICCBased, ...) * image's bit per component (number of color components may be inferred from colorspace) * mask-flag : '''1''' means that the image has a pixel-mask (i.e. some transparent pixels) If option '''-id''' is present, the resulting list is limited to the ''image-record'' for ''imageID''. ''pageObj'' '''images''' '''extract''' ?'''-id''' ''imageID''? ?'''-dir''' ''pathname''? ?'''-as''' ''pattern''? ?'''-transparency''' ''boolean''?: if option '''-id''' is specified, extract and save the image referred by ''imageID'' (see '''images''' '''list'''). If option '''-id''' is missing, all the images contained in a page are extracted and saved. If option '''-dir''' is not specified, images are saved in the current directory. If option '''-as''' is not specified, images are saved with a name derived from the default '''mupdf::imagenamesformat''' (see below), otherwise images are saved as ''pattern'' (for ''pattern'' rules see below for '''mupdf::imagenamesformat'''). if option '''-transparency''' is '''true''', save the (semi)transparent pixels (if any), otherwise transparent pixels (if any) are rendered as white pixels. '''extract''' returns a list of ''extracted-records''. Each ''extracted-record'' is a list of three elements: * image-ID (unique for each page) * image-name (pdf page's internal name) * saved filename (empty string if the image was skipped (unknow format...)) '''mupdf::imagenamesformat''' ?''pattern''?: a ''pattern'' is a parametric filename specification similar to the format specification used by the C printf function. If ''pattern'' is not specified, return the currently defined default pattern. If ''pattern'' is specified, set the default pattern. When the "''pageObj'' '''images''' '''extract''' ..." command is called, all the images are saved with a filename based on a ''pattern'' (This pattern can be explicit, if option '''-as''' is present, or it can be implicit, based on the default '''mupdf::imagenamesformat'''). a ''pattern'' is a a simple filename specification (just the base-name, since the extension of the extracted images (png, jpg, ...) is automatically determined.) with zero or more special symbols like the following ones: * %p : page number (first page is 1) * %P : total number of pages * %i : image number - images in a page are numbered starting from 1 * %I : total number of images (in the current page) Special symbols may also be written with a padding-specification like '''%5P'''; this notation means the the symbol %P should be padded as a 5-character string with leading '0's. If more than one image is extracted in a single operation, and ''pattern'' does not contain the '''%i''' symbol (the image number), then '''-%i''' is implicitely appended in order to avoid a filename collision. Assuming that the current page is the page 123 (i.e the 124th page), the following command ====== $pageObj images extract -as "IMG_%5p" ====== will generate IMG_00124-1.jpg, IMG_00124-2.jpg ..... (note: the file extension may be different) ====== $pageObj images extract -as "Z%5p(%2i)" ====== will generate Z00124(01).jpg, Z00124(02).jpg ..... (note: the file extension may be different) ***MuPDF introspection commands*** '''mupdf::isobject''' ''obj'': return 1 if ''obj'' is an oo-object (not limited to MuPDF objects). '''mupdf::classes''' : returns the list of the oo-classes of '''MuPDF''' '''mupdf::classinfo''' ''obj'': return the oo::class of ''obj'' (not limited to MuPDF objects) '''mupdf::documents''' : return a list of the currently opened ''pdfObjs'' '''mupdf::documentnames''' : return a list of pdf-filenames currently opened (fully normalized filenames). '''mupdf::isopen''' ''filename'': check if ''filename'' is among the currently opened pdf-files. '''mupdf::Doc''' '''names''': '''mupdf::Page''' '''names''': '''mupdf::TextSearch''' '''names''': return a list of all the instances of Doc, Page, TextSearch. Note that when a Doc is destroyed, all its related Pages and TextSearch are destroyed, too. '''mupdf::cli_passwordhelper''' ?'''helperProc'''?: get/set the helper procedure for shell-like applications. If '''helperProc''' is '''''', then the default helper is re-set. See section '''Working with Password Protected Files'''. '''mupdf::tk_passwordhelper''' ?'''helperProc'''?: get/set the helper procedure for applications with a Tk graphical interface. If '''helperProc''' is '''''', then the default helper is re-set. See section '''Working with Password Protected Files'''. '''mupdf::libinfo''' : return specific attributes of the underlying MuPdf libray as a list of keywords and their values. The provided keywords are: '''version''': The version of the underlying MuPDf library ..more to come ..: **Working with Password Protected Files** You can open a password-protected PDF in a non-interactive or in an interactive way. In non-interactive mode, you must provide in advance a password ====== set pdf [mupdf::open book.pdf -password "123open"] ====== If password is wrong then an error is raised; look at the error message and check for the errorcode: it should be '''MUPDF WRONGPASSWORD'''. Note that there's no distinction between owner's password and user's password; if you provide the right owner's password, the PDF is opened in owner-mode, if you provide the right user's password, the PDF is opened in user-mode. You can check if PDF has been opened in user-mode or owner-mode by calling the '''authentication''' command. ====== set pdf [mupdf::open book.pdf -password "123open"] # if we are here, it means that password was OK set mode [$pdf authentication] # "none" means that book.pdf was not password-protected # "user" means that the supplyed password was the user's password # "owner" means ... ====== In interactive mode, you should not provide an explicit password; if the PDF is password-protected, '''MuPDF''' will ask for a password. Depending on the nature of your application (with or without Tk), '''mupdf''' will select and call a predefined (yet customizable) helper procedure. These predefined, built-in procedures can be changed with the '''mupdf::cli_passwordhelper''' or '''mupdf::tk_passwordhelper''' command. ====== # template for a (shell-like) password-helper procedure proc mypswdhelper {filename} { ... ask for a password .... .... get the password .... .... return $password } mupdf::cli_passwordhelper mypswdhelper # template for a (GUI) password-helper procedure proc myGUIpswdhelper {filename} { ... raise some popup ... ... ask for a password .... .... get the password ... .... close the popup return $password } mupdf::tk_passwordhelper myGUIpswdhelper ====== **Removing protection from a password protected files** If you open a password-protected file and export it, by default the resulting PDF is saved without password. You can explicitely control this behaviour by appending the '''-decrypt''' option to the '''export''' or '''close''' commands (default is '''-decrypt''' '''true'''). **Adding or changing a password** After opening a PDF, you can add or change the user/owner password with the '''opwd''' or the '''upwd'''commands,like in the following example ====== set pdf [mupdf::open mybook.pdf] # if mybook.pdf is proteccted, insert the password $pdf upwd "my user-password" $pdf opwd "my owner-password" $pdf close ;# ..or $pdf export ... ====== **Notes and limitations about field names and values** Due to current limitations of the underlying MuPdf library, field names and values should be limited to latin alphabets only. See the note enclosed with MuPdf's BUG 696478 (Nov 2018): '' ... "There is, however, a limitation with our form filling support that still needs to be addressed: it only supports PDFDocEncoding, not Unicode. '' '' ... "We can currently fill out the form fields in any language, but the display code still only handles the Latin alphabet." '' This means tha+ t you can enter text in a form field in any alphabet (tested with Russian, Chinese, Greek, ...) but the rendered text for the non-latin alphabets will appear wrong. Another major limitation is the rendering of fields other that 'text' widgets, i.e. checkbox, radiobuttons... For these kind of widgets the support of the underlying library is still incomplete, in particular for values expressed in non-ASCII characters (e.g. accented letters ... ) **LImitations** Full support for '''portfolio''' management is still incomplete; commands for adding/removing/reordering embedded files will require a more robust mupdf-core implementation. **From mupdf 1.x to MuPDF 2.x** ***New commands and methods*** ''pdfObj'' '''removepassword''': ''pdfObj'' '''ispageopened''': ''pdfObj'' '''closepage''': ''pdfObj'' '''flatten''': ''pdfObj'' '''grafts''': '''mupdf::classes''': '''mupdf::classinfo''' ''obj'': ***Changed commands and methods*** '''package require mupdf''': '''package require mupdf-notk''': With the release 2.x these package names are no more valid. Use: * package require tclMuPDF ;# replacement for "mupdf-notk" * package require tkMuPDF ;# replacement for "mupdf" * package require MuPDF ;# replacemente for "mupdf" ''pdfObj'' '''openedpages''': now returns a list of page-numbers. Formerly it returned a list of ''pageObj'' ''pdfObj'' '''close''' options: options '''-flatten''' and '''-decrypt''' are no more supported. In case you need to flatten the fields or remove the password, you should call the new "''pdfObj'' '''flatten'''" or "''pdfObj'' '''removepassword'''" methods before ''pdfObj'' '''close''' ''pdfObj'' '''flattenfield''' ...: has been replaced by "''pdfObj'' '''flatten''' ..." ''pdfObj'' '''addsigfield''' ''fieldname'' ''pageNum'' ''x0 y0 x1 y1'': changed parameters order and signature ''pdfObj'' '''embed''' ...: changed parameters order and signature ''pdf'' '''export''' ...: options '''-flatten''' and '''-decrypt''' have been removed. In case you need to flatten the fields or remove the password, you should call the new "''pdfObj'' '''flatten'''"" or "''pdfObj'' '''removepassword'''" methods before ''pdfObj'' '''export''' ''pageObj'' '''annot''' '''delete''' ''annotID'' ...: replaces "''pageObj'' '''annot''' ''annotID'' '''delete'''" ''pageObj'' '''annot''' ...: Formerly colors were expressed as a triple {R G B}. (0<=R,G,B<=1.0) Now colors should be expressed as tk-color (e.g "lightblue" or #223312). ***Removed commands and methods*** 32 bit: Removed support for Linux-32bit and Windows-32 bit '''package require mupdf''': '''package require mupdf-notk''': ''pdfObj'' '''flattenfield''' ...: Renamed as "''pdfObj'' '''flatten''' ..." ''pdfObj'' '''canbesavedincrementally''': Still present but deprecated. ''pdfObj'' '''search''' ...: Use the "''pdfObj'' '''newsearch'''" command and the '''mupdf::TextSearch''' methods. ''pageObj'' '''embed''' ...: Replaced by "''pdfObj'' '''embed''' ..." '''mupdf::quit''' ''pdfObj'': '''mupdf::close''' ''pdfObj'': '''mupdf::close''' ''pageObj'': '''mupdf::type''' ''obj'': Use "'''mupdf::classinfo''' ''obj''" **KEYWORDS** PDF **CATEGORY** pdf parsing and rendering **COPYRIGHT** Copyright (c) 2021, by A.Buratti <> <> PDF | Graphics