''[JM] 4 Dec 2012'' - Here is a minimal example of [Web scraping] using [htmlparse]

As I am a [RS] fan, I am getting a list of all his recent projects.

   * This is an unfinished code just to show the overall mechanism.
   * notice that I am getting just one link per bullet, so, for example, I am missing the link for [A pocket Wiki], which is the second link on the 5th bullet. see how ONLY [Profiling with execution traces] is being listed.
   * also, notice the error message "node "" does not exist in tree "t"" when there is no link on the bullet, as in "simplicite"

getting as many links per bullet could be a good exercise for the reader.

[wsWithImg1]

As a side note, I used [LemonTree branch] to easily find the location of the bulleted list block that I am parsing.

[wsWithImg2]

======tcl
 console show
   
 package require struct
 package require htmlparse
 package require http
   
 proc parse {} {
   
 ::struct::tree t
   
 set url "http://wiki.tcl.tk/1683"
 set http  [::http::geturl $url]
 set html  [::http::data $http]
   
 htmlparse::2tree $html t
 htmlparse::removeVisualFluff t
 htmlparse::removeFormDefs t
   
 set base [walk {1 15 0}]
     puts "data: [t get $base data]"
     puts "type(tag): [t get $base type]\n"
   
     set bulletIx   [walkf $base {0}]
   
 while {$bulletIx != {}} {
      set link  [t get [walkf $bulletIx {0}]  data]
       #set title [t get [walkf $bulletIx {0 0}]  data]
       catch {t get [walkf $bulletIx {0 0}]  data} title
       puts "$link: $title"
       update
       set bulletIx [t next $bulletIx]
 }
   
    t destroy
    return
 }
   
 proc walkf {n p} {
     foreach idx $p {
         if {$n == ""} {break}
         set n [lindex [t children $n] $idx]
     }
     return $n
 }
   
 proc walk {p} {
     return [walkf root $p]
 }
   
 parse
======
<<categories>> Internet