Aug. 2009 by [rmax] http://mikrocontroller.net/%|%mikrocontroller.net%|% is a popular German forum for people working with mikrocontrollers like AVR or PIC. You can subscribe to discussion threads to get a notification email when something new has been posted. Unfortunately these emails only contain a link to the new posting, but not the posted text. This script can be used as a filter in a http://procmail.org/%|%procmail%|% rule to replace the notification body with the actual text of the new posting. It uses the Tcl core's [http] package to fetch the discussion page and the [tdom] package to parse the HTML. ---- package require http package require tdom fconfigure stdout -encoding utf-8 while {[gets stdin l] != 0} { puts $l } regexp {https?(://[^\#]*)\#([0-9]+)} [read stdin] U f a set t [http::geturl http$f] set d [http::data $t] http::cleanup $t set b "//div\[@class=\"post box gainlayout \" and .//a\[@name=\"$a\"\]\]" set p [[[dom parse -html $d doc] documentElement] selectNodes $b] set A [[$p selectNodes {.//div[@class="author"]}] asText] puts \n[regsub -all {\s+} [string trim $A] { }] set D [[$p selectNodes {.//div[@class="date"]}] asText] puts [regsub -all {\s+} [string trim $D] { }] foreach F [$p selectNodes {.//div[@class="attachment"]}] { puts [regsub -all {\s+} [string trim [$F asText]] { }] } puts "\n[[$p selectNodes {.//div[contains(@class,"text")]}] asText]\n\n$U" <>Web Scraping