[Marty Backe] - 30 Dec 2003 To keep current with all the various opensource projects, I subscribe to the [Freshmeat] e-mail newsletter, a daily listing (in digest format) of opensource announcements. Here's an example entry from the newsletter: [[066]] - TclCurl 0.10.8 (Development) by Andres Garcia (http://freshmeat.net/users/andresgarci/) Monday, December 29th 2003 09:55 Internet :: File Transfer Protocol (FTP) Internet :: WWW/HTTP Software Development :: Libraries About: TclCurl provides a binding for libcurl. It makes it possible to download and upload files using protocols like FTP, HTTP, HTTPS, LDAP, telnet, dict, gopher, and file. Changes: The binding was updated for libcurl 7.0.18. License: BSD License (revised) URL: http://freshmeat.net/projects/tclcurl/ Now say I'm interested in finding out a bit more about this project. I click on the URL link which takes me to a [Freshmeat] website from which I can then click a link to get to the projects homepage. That's a lot of clicking! I wrote a filter application (see listing below) that visits the [Freshmeat] page for each project in the newsletter, extracts the homepage url, and adds it to the newsletter - below the URL line. Since I run my own mail server I am able to insert this filter application between my mail delivery agent (Procmail) and my mailbox. Now as I read the newsletter, I'm just one click away from any given homepage. I used [Snit] primarily because I wanted to gain a little exposure to its use. It's certainly a very simple [Snit] application. To see it in action you can grab a newsletter from the archive (see http://freshmeat.net/newsletter) and pass it through the filter: cat newsletter.txt | FreshmeatMailFilter.tcl > converted.txt ---- #!/bin/sh #\ exec tclsh8.4 "$0" "$@" ################################################################################ ################################################################################ # # Written by Marty Backe # # Freshmeat newsletter filter. # # Rev Date Changed By Comments # ----- ----------- ---------- ----------------------------------------------- # 1.0 30 Dec 2003 M Backe Initial release. # ################################################################################ ################################################################################ # # Load required packages # Snit is only used because I wanted to get acquainted with it. # set packageList { {http} {snit} } foreach package $packageList { if {[catch {eval package require $package} errorMsg]} { puts "FreshmeatMailFilter requires '$package' or above. The following" puts "error occurred: \"$errorMsg\"" exit } } # ------------------------------------------------------------------------------ # # Type: FreshmeatMailFilter # # Summary: This program is designed as a filter for the daily Freshmeat # e-mail newsletters. The URL's for each project currently # specify a Freshmeat webpage. This requires the reader of the # Freshmeat newsletter to first go to the Freshmeat page, find # the project homepage link, and then click on that to get to the # project homepage. # This program extracts the actual project homepage and adds it # as an additional link in the newsletter, below the existing # URL link. # # This program reads stdin, looks for lines that contain # the URL, retrieves the necessary Freshmeat webpages to # extract the project homepage url, and inserts the url in # a line below the URL line. # # Usage: From a Procmail recipe, pipe the e-mail through this program. # Example: # :0: # | /home/johndoe/MailFilters/FreshmeatMailFilter.tcl | # /usr/local/bin/dmail +"Mail/Freshmeat" # Example: # cat freshmeat_message.txt | FreshmeatMailFilter.tcl > # freshmeat_message2.txt # # ------------------------------------------------------------------------------ ::snit::type FreshmeatMailFilter { variable mailMessage "" constructor {} { set mailMessage [$self readInput] foreach line $mailMessage { # # Look for the URL line that is provided for each project. # if {[string first "URL: http://freshmeat.net/" $line 0] == 0} { # # Use regexp here but not above because 'string first' is # much faster and therefore is a better choice if used on every # line of the file, which it is in this case. # set urlString "" regexp {^URL: (http://.*)$} $line matchString urlString puts $line if {$urlString != ""} { set homepageUrl [$self getHomepageUrl $urlString] if {$homepageUrl != ""} { puts "Homepage: $homepageUrl" } } } else { puts $line } } } # -------------------------------------------------------------------------- # # Method: readInput # # Summary: Reads stdin. A list is built, where each list item # is a line from the stdin. # # Input: # Output: A list # # Uses: # # -------------------------------------------------------------------------- method readInput {} { set tmpFileBuffer "" while {-1 != [gets stdin inputline]} { lappend tmpFileBuffer $inputline } close stdin return $tmpFileBuffer } # -------------------------------------------------------------------------- # # Method: getHomepageUrl # # Summary: The provided URL is that which corresponds to the Project # URL provided in the newsletter. # The Freshmeat URL is redirected (http return code 302) to # a webpage that contains another redirected URL. Following # that URL gets us to the actual project homepage website. # # Therefore, acquiring the actual project homepage requires # three downloads from Freshmeat. # # If any errors occur along the way (invalid url, timeouts, # etc.) a null string is returned. # # Input: # Output: # null string if the project homepage URL could not be found # # Uses: # # -------------------------------------------------------------------------- method getHomepageUrl {url} { # # Get the webpage specified in the Newsletter URL link. This is # expected to be a redirect (http return code 302). # if {![catch {set urlToken [http::geturl $url -timeout 10000]} \ errorMsg]} { if {[http::status $urlToken] != "ok"} { # # We get here if a timeout occurred. # http::cleanup $urlToken return "" } if {[http::ncode $urlToken] == 302} { # # A redirection occurred (which is expected for these URL's). # Grab the new URL and retrieve the webpage contents. If this # times out or some other error occurs, give up. # upvar #0 $urlToken state ;# See docs for ::http array set meta $state(meta) ;# See docs for ::http http::cleanup $urlToken if {[catch {set urlToken [http::geturl $meta(Location) \ -timeout 10000]} errorMsg]} { return "" } else { if {[http::status $urlToken] != "ok"} { http::cleanup $urlToken return "" } } } set webpage [http::data $urlToken] http::cleanup $urlToken set url "" # # Search the webpage for the homepage URL. Note that Freshmeat # again provides a URL that causes redirection. # regexp {(?:Homepage:
[[:space:]]*http://.*
)+?} $webpage match url set url "http://freshmeat.net$url" if {[catch {set urlToken [http::geturl $url -timeout 10000]} \ errorMsg]} { return "" } if {[http::ncode $urlToken] != 302} { http::cleanup $urlToken return "" } # # The redirected URL is the one we finally want. # upvar #0 $urlToken state array set meta $state(meta) http::cleanup $urlToken return $meta(Location) } else { # # There was an error (catch thrown) in retrieving the Freshmeat # webpage. # return "" } } } FreshmeatMailFilter freshmeat ---- [Category Application], [Snit]