[EKB] This is a follow-up to the discussion on [Is Tcl Different!]. Here's the wonderful [HTML] parser in 10 lines as posted on that page, and originally suggested by [Stephen Uhler]: ---- ############################################ # Turn HTML into TCL commands # html A string containing an html document # cmd A command to run for each html tag found # start The name of the dummy html start/stop tags proc HMparse_html {html {cmd HMtest_parse} {start hmstart}} { regsub -all \{ $html {\&ob;} html regsub -all \} $html {\&cb;} html set w " \t\r\n" ;# white space proc HMcl x {return "\[$x\]"} set exp <(/?)([HMcl ^$w>]+)[HMcl $w]*([HMcl ^>]*)> set sub "\}\n$cmd {\\2} {\\1} {\\3} \{" regsub -all $exp $html $sub html eval "$cmd {$start} {} {} \{ $html \}" eval "$cmd {$start} / {} {}" } # But it was missing the default value for ''cmd'', ''HMtest_parse'', # so I wrote one and applied it to a sample bit of HTML: proc HMtest_parse {tag state props body} { if {$state == ""} { set msg "Start $tag" if {$props != ""} { set msg "$msg with args: $props" } set msg "$msg\n$body" } else { set msg "End $tag" } puts $msg } HMparse_html {
This is my very first paragraph. How do you like it? I think it has a lot to recommend it.
This is my second paragraph, which is OK, but not as nice as my first one.
} ---- This gives the following output: Start hmstart Start html Start p with args: class="bubba" This is my very first paragraph. How do you like it? I think it has a lot to recommend it. End p Start p with args: class="louielouie" This is my second paragraph, which is OK, but not as nice as my first one. End p End html End hmstart In fact, the code is not HTML-specific, and can handle simple [XML] code (e.g., that doesn't use the self-closingThis is my very first paragraph. How do you like it? I think it has a lot to recommend it.
This is my second paragraph, which is OK, but not as nice as my first one.
} ---- This is the output: Let's get going! This is my very first paragraph. How do you like it? I think it has a lot to recommend it. This is my second paragraph, which is OK, but not as nice as my first one. That's all, folks! ---- The problem with using snit (or [incr tcl] is you have to declare handlers for all tags or you will end up with a runtime error (for example "method body not found"). I myself use the following mechanism with some success: proc HMtest_parse {tag state props body} { if {[info proc handle_$tag] != ""} { handle_$tag $state $props $body } } proc handle_a {state props body} { ... } proc handle_img {state props body} { ... } This way, you only have to declare handlers for the tags that you care about. Hai Vu ---- [WHD]: Actually, Snit allows you to define a method that receives all unknown methods: delegate method * using {%s UnknownMethod %m} method UnknownMethod {methodName args} { ... } !!!!!! %| [Category HTML] | [Category XML] | [Category Parsing] | [Category Word and Text Processing] | [Category String Processing] | [Category Internet] |% !!!!!!