([TR] 2021-04-22)

Pandoc is a universal document converter which converts between a wealth of various formats. It can also be used to convert [markdown] to any of the supported document formats. Pandoc has its own extended set of markdown syntax (see also [commonmark]).

One of its strengths is the use of external filters. A document in one format will first be converted to a native Pandoc format, forming a so-called 'abstract syntax tree' (AST). This can be output in [JSON] format. A filter (some external program) may read this JSON output and change it to its desired form. Then, the filter returns the modified JSON representation and Pandoc will take over and convert it to another output format. E.g., the following call of Pandoc will read the markdown file ''input.md'', apply the filter ''myfilter'' and finally output the document in MS Word format as ''output.docx'' (ooxml):


======
pandoc -s input.md -t docx -o output.docx --filter myfilter
======

The filter program can be written in any language. It just has to be an executable reading from stdin and writing to stdout. So, we can use Tcl to manipulate the AST. The use cases are mainfold. E.g., such a filter could evaluate the Tcl code included in code sections in a markdown document and add the results of the evaluations (much like [tmdoc::tmdoc]). Or, code sections could be an https://pikchr.org/home/doc/trunk/homepage.md%|%image coded in pikchr%|% which gets inserted instead of the actual code. Or, a filter could extract the hierarchy of section headers and produce an outline from this, listing the length of each section in words and chars and include that as an annex to the document being converted.


This is a minimal example for a ''myfilter.tcl'' which just passes the AST unchanged, serving as a skeleton (Note, that the file must be executable):


======
#! /usr/bin/env tclsh

# read the JSON AST from stdin
set jsonData {}
while {[gets stdin line] > 0} {
   append jsonData $line
}


# and give it back as is (unchanged)
puts $jsonData
======

When we want to do something with the AST, it is easiest to use the [json] package from [tcllib] and then change the resulting dict representation, and converting back to json. However, the last step is not trivial as the json::write package cannot automagically convert back to the original json types (arrays, objects):


======
#! /usr/bin/env tclsh

package require json

# read the JSON AST from stdin
set jsonData {}
while {[gets stdin line] > 0} {
   append jsonData $line
}

set astDict [::json::json2dict $jsonData]

# do some processing of the data and then ...

# give it back as json again (the following is not trivial and needs extensive coding :-()
puts [::json::write object ... ...]
======


This is, how a minimal AST looks like:


======
{
   "pandoc-api-version":[1,22],
   "meta":{},
   "blocks":[
      {
         "t":"Para",
         "c":[]
      }
   ]
}
======

The AST json is one object with the three elements (key-value-pairs) *pandoc-api-version*, *meta* and *blocks*. The main part is the *blocks* element which itself contains an array of objects where each object is one part of the document (in this case just an empty paragraph ("Para").

... to be continued ...