[HJG] Someone has uploaded a lot of pictures to [Flickr], and I want to show them someplace where no internet is available. The pages at Flickr have a lot of links, icons etc., so a simple recursive download with e.g. wget would fetch lots of unwanted stuff. Of course, I could tweak the parameters for calling wget (-accept, -reject, etc.), but doing roughly the same thing in Tcl looks like more fun :-) Moreover, with a Tcl-script I can also get the titles and descriptions of the images. So the first step is to download the html-pages from that person, extract the links to the photos from them, then download the photo-pages (containing titles and complete descriptions), and the pictures in the selected size (Thumbnail=100x75, Small=240x180, Medium=500x375, Large=1024x768, Original=as taken). Then we can make a [Flickr Offline Photoalbum] out of them, or just use a program like IrfanView [http://www.irfanview.com] to present the pictures as a slideshow. Second draft for the download: package require http proc getPage { url } { set token [::http::geturl $url] set data [::http::data $token] ::http::cleanup $token return $data } catch {console show} ;## set url http://www.flickr.com/photos/siegfrieden set filename "s01.html" set url http://www.flickr.com/photos/siegfrieden/page2 set filename "s02.html" if 1 { set data [ getPage $url ] #puts "$data" ;## set fileId [open $filename "w"] puts -nonewline $fileId $data close $fileId } if 0 { set fileId [open $filename r] set data [read $fileId] close $fileId } set n 0 foreach line [split $data \n] { # Flickr: Photos from XXX if {[regexp -- "" $line]} { #puts "1: $line"; incr n set p1 [ string first ":" $line 0 ]; incr p1 14 set p2 [ string first "</title" $line $p1 ]; incr p2 -1 set sT [ string range $line $p1 $p2 ] puts "Title: $p1 $p2: '$sT'" } # <h4>XXX</h4> if {[regexp -- "<h4>" $line]} { #puts "2: $line"; incr n set p1 [ string first "<h4>" $line 0 ]; incr p1 4 set p2 [ string first "</h4>" $line $p1 ]; incr p2 -1 set sH [ string range $line $p1 $p2 ] puts "\nHeader: $p1 $p2: '$sH'" } # <p class="Photo"><a href="/photos/XXX/9999/"><img src="http://static.flickr.com/99/9999_8888_m.jpg" width="240" height="180" /></a></p> if {[regexp -- (class="Photo") $line]} { #puts "3: $line"; incr n set p1 [ string first "href=" $line 0 ]; incr p1 6 set p2 [ string first "img" $line $p1 ]; incr p2 -4 set sL [ string range $line $p1 $p2 ] puts "Link : $p1 $p2: '$sL'" set p1 [ string first "src=" $line 0 ]; incr p1 5 set p2 [ string first "jpg" $line $p1 ]; incr p2 2 set sP [ string range $line $p1 $p2 ] puts "Photo: $p1 $p2: '$sP'" } # <p class="Desc">XXX</p> if {[regexp -- (class="Desc") $line]} { #puts "4: $line"; incr n set p1 [ string first "Desc" $line 0 ]; incr p1 6 set p2 [ string first "</p>" $line $p1 ]; incr p2 -1 set sD [ string range $line $p1 $p2 ] puts "Descr: $p1 $p2: '$sD'" } # <a href="/photos/XXX/page12/" class="end">12</a> if {[regexp -- (class="end") $line]} { #puts "5: $line"; incr n; set p1 [ string first "page" $line 0 ]; incr p1 4 set p2 [ string first "/" $line $p1 ]; incr p2 -1 set s9 [ string range $line $p1 $p2 ] puts "\nEnd: $p1 $p2: '$s9'" break } } puts "# $n" This will only get one html-page, so the next step is to also get the other pages of the album, extract the informations we need, and finally fetch the pictures. Strings to look for: * "<title>" - Title for album * "<h4>" - Title for an image (but also an occurance of "Search by tag" below all images) * (class="Photo") - Links to preview-image and page with a single photo * (class="Desc") - Description for an image (if present) * (class="end") - Link to last album-page * - * "profile" - Link to profile-page of album-owner * "/page" - Links to more album-pages ... CJL wonders whether the Flickr-generated RSS feeds for an album might be a quicker way of getting at the required set of image URLs. ---- See also: * [http] - [Download file via HTTP] - [Polling web images with Tk] * [A little file searcher] * [Serializing an array] ---- [Category Internet] - [Category File]