Version 9 of HTML2text

Updated 2005-10-19 12:57:44 by LES

What: HTML2text

 Where: From the contact
 Description: Tcl script which reads an HTML document and outputs the plain
        text of the document.  Designed to make it relatively easy for the
        user to configure how the program should mark specific HTML tags.
 Updated: 10/1997
 Contact: mailto:[email protected] (Joe Moss)

CM May 14th 03 - Get the source from Joe Moss current Home-Page: http://www.psg.com/~joem/tcl/


Roi Dayan writes, at the Tcl'ers Chat, a method for stripping HTML with an option to ignore specific tags:

      proc strip-html-ignore {text {ignore {}}} {
          set c 0
          foreach i $ignore {if {[regexp $i $text]} {return $text}}
          return ""
      }

      proc strip-html {html {ignore {}}} {
          regsub -all -- {<[^>]*>} $html "\[strip-html-ignore \"&\" [list $ignore]\]" html
          set html [subst $html]
          return $html
      }

Syntax: strip-html text [list ignore1 ignore2]

Example:

      set a {<pre><a href=bla>roi<hr></a></pre><br>}
      puts [strip-html $a [list <br> <a.*>]]

will output:

      <a href=bla>roi<br> 


Joe Moss | [ Category Application } ]