A little man page viewer

Richard Suchenwirth 2004-01-21 - As a little evening project, here's a viewer for man pages that understands (some of) the roff format. Highly experimental, nothing guaranteed, but a nice distraction:

WikiDbImage man2text.jpg


 package require Tk

 set font {Times 10}
 set file [lindex $argv 0]
 wm title . "man [file tail  [file root $file]]"
 pack [scrollbar .s -command ".t yview"] -side right -fill y
 pack [text .t -wrap word -yscrollcommand ".s set" \
               -padx 10 -font $font] \
         -side left -fill both -expand 1
 foreach style {bold italic} {
     .t tag config $style -font [concat $font $style]
 }
 .t tag config heading -font {Helvetica 12 bold}
 .t tag config monospace -font {Courier 10}
 .t tag config right -justify right
 .t tag config elide -elide 1

 set fp [open $file]
 set tab ""
 set state normal
 foreach line [split [read $fp] \n] {
     if {$line eq ".CS"} {
         .t insert end \n
         set state preformatted
         set stab $tab
         set tab $tab\t
         continue
     }
     if {$line eq ".CE"} {
         set state normal
         set tab $stab
         continue
     }
     switch $state {
         normal {
             set tag ""
             set tail " "
         }
         preformatted {
             set tag monospace
             set tail \n
         }
     }
     if {[string match {'\\\"*} $line]} {continue}
     if {$line eq ".EN"} {continue}
     if {$line eq ".BS"} {continue}
     if {$line eq ".BE"} {continue}
     if {$line eq ".VE"} {continue}
     if {$line eq ".so man.macros"} {continue}
     if [regexp {^\.VS} $line] {continue}
     if {$line eq ".PP"} {set line \n\n}
     if {$line eq ".br"} {set line \n}
     if {$line eq ".ti 8"} {set line \n\t}
     if {$line eq ".in 8"} {set tab \t; continue}
     if {$line eq ".in 0"} {set tab ""; continue}
     if [regexp {^\.SH ([^""].*)} $line -> line] {
         set line \n\n$line
         set tag heading
         set tail \n
     }
     if [regexp {^\.SH "(.*)"} $line -> line] {
         set line \n\n$line
         set tag heading
         set tail \n
     }
     if [regexp {^\.TH (.+)} $line -> line] {set tag right}
     if [regexp {^\.I (.+)} $line -> line]  {set tag italic}
     if [regexp {^\.B (.+)} $line -> line]  {set tag bold}
     if [regexp {^\.BI (.+)} $line -> line] {set tag {bold italic}}
     if [regexp {^\.TP} $line] {set line \n\t}

     set line [string map {\\- - \\b \\} $line]
     .t insert end "$tab$line$tail" $tag
 }
 set idx 1.0
 set state R
 while {1} {
     set idx2 [.t search -regexp -forward {\\f[BPIR]} $idx]
     if {$idx2 eq ""} {break}
     if [.t compare $idx >= $idx2] {break}
     if {$state eq "B"} {
         .t tag add bold $idx $idx2
     }
     if {$state eq "I"} {
         .t tag add italic $idx $idx2
     }
     set state [.t get $idx2+2c]
     set idx $idx2+1c
 }
 set idx end
 while 1 {
     set idx [.t search -backward -regexp {\\f[BIR]} $idx]
     if {$idx eq ""} {break}
     .t tag add elide $idx $idx+3c
 }
 close $fp

escargo 21 Jan 2003 - Could you give some examples of roff codes that might be encountered that would not be handled? (Some of these formatting commands look like they are one per line. Would some elseif structures be appropriate here?)

RS: I don't have the roff syntax at hand, but I think it contains more that I've covered. I did this just data-driven from a man page from downloaded software (from 1992), which I could otherwise not render - soon later I was told about nroff -man which does it; but at that point I already wanted to do text layout... elseifs would be ok, but the current way is not dangerous as roff dot commands come only one per line. Also, some evidently cut-and-pasted code could be factored out in a foreach loop... unknown dot commands could be highlighted in red (to aid debugging)... etc.

DKF: An interesting one to do would be the Tcl/Tk manual pages, but you'll need quite a bit extra work to do it.

DKF: OK, here's an update that's capable of taking on most Tcl language commands.

escargo 22 Jan 2004 - I tried it out on the man page for Nut, a nutrition database program. I found the following problems.

  • Comments were not ignored (lines starting with {.\"}
  • Paragraph breaks indicated by empty lines were ignored.
  • Apparently a primitive table layout (.ta) was not recognized.
  • Changing the sentinel character that starts commands was not recognized (.cc),
  • Of course, the new sentinel character was not recognized (,cc).
  • Some other commands whose purposes I don't know were not recognized (.nf, .fi, .LP)

RS: Well, the above code can be elaborated in any depth to fully match the roff specifications (plus some frequent deviations, I suppose...) .nf is "no fill", keep hard line breaks; .fi is "fill", append each input line without linebreak until .br or .pp, if I remember correctly (boy, I worked with this markup style in the early 1980s...)

LV Certainly many man pages are written in more than just the an macro language and nroff. Often you can tell this when looking at the very first line, which typically looks like something like:

 '\" t

where the letter can be:

   e  eqn/neqn
   r  refer
   t  tbl
   v  vgrind

and indicates that the appropriate pre-processor be executed.