[HJG] Someone has uploaded a lot of pictures to [Flickr], and I want to show them someplace where no internet is available. The pages at Flickr have a lot of links, icons etc., so a simple recursive download with e.g. wget would fetch lots of unwanted stuff. Of course, I could tweak the parameters for calling wget (-accept, -reject, etc.), or get the html-pages, then filter their contents with [awk] or perl, but doing roughly the same thing in Tcl looks like more fun :-) Moreover, with a Tcl-script I can also get the titles and descriptions of the images. So the first step is to download the html-pages from that person, extract the links to the photos from them, then download the photo-pages (containing titles and complete descriptions), and the pictures in the selected size (Thumbnail=100x75, Small=240x180, Medium=500x375, Large=1024x768, Original=as taken). Then we can make a [Flickr Offline Photoalbum] out of them, or just use a program like IrfanView [http://www.irfanview.com] to present the pictures as a slideshow. This is the alpha-version of the downloader: ---- #!/bin/sh # Restart with tcl: -*- mode: tcl; tab-width: 4; -*- \ exec wish $0 ${1+"$@"} # FlickrDownload.tcl - HaJo Gurt - 2006-01-20 - http://wiki.tcl.tk/15303 #: Download webpages and images for a photo-album from flickr.com # # 2005-11-22 First Version # Todo: # * Save infos to file for next stage (album-maker) # * Bug: !! End of Multiline-Descriptions not detected package require Tk package require Img package require http #########1#########2#########3#########4#########5#########6#########7##### proc Print { Str {tag ""} } { #: Output to text-window # puts $Str .txt1 insert end "\n" .txt1 insert end "$Str" $tag .txt1 see end ;# scroll to bottom update } #########1#########2#########3#########4#########5#########6#########7##### proc getPage { url } { #: Fetch a webpage from the web set token [::http::geturl $url] set data [::http::data $token] ::http::cleanup $token return $data } proc FetchImage {url fname } { #: Fetch a picture from the web / see also: [Polling web images with Tk] #puts -nonewline "Fetch: \"$url\" " Print "Fetch: \"$url\" " DL set f [open $fname w] fconfigure $f -translation binary set imgtok [http::geturl $url -binary true -channel $f] flush $f close $f http::cleanup $imgtok Print " ok." Ok } #########1#########2#########3#########4#########5#########6#########7##### proc Analyse1 { url1 page } { #: Analyse flickr album-webpage, # like http://www.flickr.com/photos/PERSON # or http://www.flickr.com/photos/PERSON/page2 global PicNr set filename [format "s%02d.html" $page ] if ($page==1) { set url $url1 } else { set url [format "$url1/page%d" $page ] } set base $url1 set p1 [ string first "//" $url 0 ]; incr p1 2 set p2 [ string first "/" $url $p1 ]; incr p2 -1 set p1 0 set base [ string range $url $p1 $p2 ] #Print "$base: $p1 $p2: '$base'" Print "# filename: $filename" ;## Print "# url : $url" ;## Print "# base: $base" ;## # Deaktivate for offline-testing: if 1 { set data [ getPage $url ] #puts "$data" ;## set fileId [open $filename "w"] Print -nonewline $fileId $data close $fileId } set fileId [open $filename r] set data [read $fileId] close $fileId foreach line [split $data \n] { # Flickr: Photos from ALBUM if {[regexp -- "" $line]} { #Print "1: $line"; set p1 [ string first ":" $line 0 ]; incr p1 14 set p2 [ string first "</title" $line $p1 ]; incr p2 -1 set sA [ string range $line $p1 $p2 ] Print "Album: $p1 $p2: '$sA'" } # <h4>stilles Seitental</h4> if {[regexp -- "<h4>" $line]} { #Print "2: $line"; incr PicNr set p1 [ string first "<h4>" $line 0 ]; incr p1 4 set p2 [ string first "</h4>" $line $p1 ]; incr p2 -1 set sH [ string range $line $p1 $p2 ] Print "\n" Print "$PicNr - Header: $p1 $p2: '$sH'" Hi } # <p class="Photo"><a href="/photos/PERSON/9999/"> # <img src="http://static.flickr.com/42/9999_8888_m.jpg" width="240" height="180" /></a></p> if {[regexp -- (class="Photo") $line]} { #Print "3: $line"; set p1 [ string first "href=" $line 0 ]; incr p1 6 set p2 [ string first "img" $line $p1 ]; incr p2 -4 set sL [ string range $line $p1 $p2 ] Print "Link : $p1 $p2: '$sL'" set p1 [ string first "src=" $line 0 ]; incr p1 5 set p2 [ string first "jpg" $line $p1 ]; incr p2 2 set sP [ string range $line $p1 $p2 ] Print "Photo: $p1 $p2: '$sP'" set url2 $sL ;# /photos/PERSON/9999/ set nr $url2 ;# /photos/PERSON/9999/ #Print "#> '$nr'" set p2 [ string last "/" $url2 ]; incr p2 -1 set p1 [ string last "/" $url2 $p2 ]; incr p1 1 set nr [ string range $url2 $p1 $p2 ] Print "#>Nr: $p1 $p2 '$nr'" #set filename [format "p%04d.html" $PicNr ] set filename [format "%s.html" $nr ] ;# Filename for local photo-page set filename0 [format "%s_0100.jpg" $nr ] ;# 100x75 - Thumbnail set sP0 [ string map {_m _t} $sP ] FetchImage $sP0 $filename0 set filename1 [format "%s_0240.jpg" $nr ] ;# 240x180 - Small #FetchImage $sP $filename1 set filename2 [format "%s_0500.jpg" $nr ] ;# 500x375 - Medium set sP2 [ string map {_m ""} $sP ] #FetchImage $sP2 $filename2 set filename3 [format "%s_1024.jpg" $nr ] ;# 1024x768 - Large set sP3 [ string map {_m _b} $sP ] FetchImage $sP3 $filename3 #break ;## } # <p class="Desc">im Khao Sok</p> # <p class="Desc">Figuren aus dem alten China, auf<a href="/photos/PERSON/9999/">...</a></p> if {[regexp -- (class="Desc") $line]} { #Print "4: $line"; set p1 [ string first "Desc" $line 0 ]; incr p1 6 set p2 [ string first "<" $line $p1 ]; incr p2 -1 set sD [ string range $line $p1 $p2 ] Print "Descr: $p1 $p2: '$sD'" } # <a href="/photos/PERSON/page12/" class="end">12</a> # <a href="/photos/PERSON/" class="end">1</a> #-if {[regexp -- (class="end") $line]} { ... } if {[regexp -- (page.*class="end") $line]} { #Print "5: $line"; set p1 [ string first "page" $line 0 ]; incr p1 4 set p2 [ string first "/" $line $p1 ]; incr p2 -1 set s9 [ string range $line $p1 $p2 ] Print "\nEnd: $p1 $p2: '$s9'" return [incr s9 0] #break } # <p class="Activity"> if {[regexp -- (class="Activity") $line]} { ;# now get photo-page Analyse2 $base $sL $filename #break } # <!-- ### MAIN NAVIGATION ### --> if {[regexp -- "<!-- ### MAIN" $line]} { break ;# Nothing interesting beyond this point } } return 0 } #########1#########2#########3#########4#########5#########6#########7##### proc Analyse2 { url1 url2 filename } { #: Analyse a flickr photo-webpage, # like http://www.flickr.com/photos/PERSON/9999/ global PicNr set url "$url1$url2" # Deaktivate for offline-testing: if 1 { set data [ getPage $url ] #Print "$data" ;## set fileId [open $filename "w"] #puts -nonewline $fileId $data Print $fileId $data close $fileId } set fileId [open $filename r] set data [read $fileId] close $fileId foreach line [split $data \n] { # strings to look for: # page_current_url if {[regexp -- "page_current_url" $line]} { #Print "1>> $line"; } # <li class="Stats"> # Taken with an Olympus C5050Z. if {[regexp -- "Taken with an" $line]} { #Print "2>> $line"; set p1 [ string first "with" $line 0 ]; incr p1 8 set p2 [ string first "<br /" $line $p1 ]; incr p2 -3 set sC [ string range $line $p1 $p2 ] Print ">> Camera: $p1 $p2: '$sC'" } # <p class="DateTime"> Uploaded on <a href="/photos/PERSON/archives/date-posted/2006/01/07/" # style="text-decoration: none;">Jan 7, 2006</a></p> if {[regexp -- "Uploaded on" $line]} { #Print "3>> $line"; set p1 [ string first "date-posted" $line 0 ]; incr p1 12 set p2 [ string first "style" $line $p1 ]; incr p2 -4 set sU [ string range $line $p1 $p2 ] set sU [ string map {/ -} $sU ] Print ">> Upload: $p1 $p2: '$sU'" } # Taken on <a href="/photos/PERSON/archives/date-taken/2006/01/10/" # style="text-decoration: none;">January 10, 2006</a> if {[regexp -- "archives/date-taken" $line]} { #Print "4>> $line"; set p1 [ string first "date-taken" $line 0 ]; incr p1 11 set p2 [ string first "style" $line $p1 ]; incr p2 -4 set sS [ string range $line $p1 $p2 ] set sS [ string map {/ -} $sS ] Print ">> Shot: $p1 $p2: '$sS'" } # <h1 id="title_div87654321">stilles Seitental</h1> if {[regexp -- "<h1" $line]} { #Print "H1: $line"; set p1 [ string first ">" $line 0 ]; incr p1 1 set p2 [ string first "</h1>" $line $p1 ]; incr p2 -1 set sH [ string range $line $p1 $p2 ] Print ">> $PicNr - Header: $p1 $p2: '$sH'" } # <div id="description_div87654321" class="photoDescription">im Khao Sok</div> # <div id="description_div73182923" class="photoDescription">Massiert wird überall und immer..., # viel Konkurrenz bedeutet kleine Preise: 1h Fußmassage = 120Bt (3€)<br /> # Es massieren die Frauen, die tragende Säule der Gesellschaft.</div> # if {[regexp -- (class="photoDescription") $line]} { #Print "D: $line"; set p1 [ string first "Desc" $line 0 ]; incr p1 13 set p2 [ string first "</div>" $line $p1 ] # !! Multiline-Descriptions: get at least the first line: if {$p2 > $p1} { incr p2 -1 } else { set p2 [string length $line] } set sD [ string range $line $p1 $p2 ] Print ">> Descr: $p1 $p2: '$sD'" } # Abort scanning of current file (nothing of interest below): if {[regexp -- "upload_form_container" $line]} { break } } } #########1#########2#########3#########4#########5#########6#########7##### #: Main : #catch {console show} ;## text .txt1 -yscrollcommand ".scr1 set" -width 100 -height 40 -bg white -wrap word scrollbar .scr1 -command ".txt1 yview" pack .scr1 -side right -fill y pack .txt1 -side right wm title . "Flickr-Download" focus -force .txt1 .txt1 tag configure "Hi" -background red -foreground white .txt1 tag configure "DL" -background lightblue -underline 1 .txt1 tag configure "Ok" -background green -underline 0 set url http://www.flickr.com/photos/siegfrieden #set url http://www.flickr.com/photos/siegfrieden/page2 set StartPage 1 Print "Flickr-Download from $url" Hi set PicNr 0 set filename [ format "s%02d.html" $StartPage ] ;# s01.html set MaxPage [ Analyse1 $url $StartPage ] incr StartPage 1 #set MaxPage 2 ;## if 1 { for {set page $StartPage} {$page <= $MaxPage} {incr page} { Analyse1 $url $page } } Print "Done !" Hi ---- Everything hardwired, and quite some debugging-aids left in, but it did the job... CJL wonders whether the Flickr-generated RSS feeds for an album might be a quicker way of getting at the required set of image URLs. HJG: I don't think so - the data in the RSS lists only the most recently uploaded images, it misses some details (i.e. date when picture was taken), and the description-field looks messy. Here is a more quick'n'dirty way to just get the pictures, using wget and awk: * Visit the first page of the album with a browser * Save this page as s01.html (html-only is enough) * At the bottom of the album-page, right-click each "pageX"-link, and save-as s02.html, etc. * awk -f flickr.awk s*.html > links.txt * wget -w1 -i links.txt With this minimal flickr.awk (i.e. does not extract descriptions etc. ) : BEGIN { Found=0; FS="\"" print "# flickr-Download:" } # /class="Photo/ { Found++; sub( "^.*http", "http", $0) sub( "_m", "_b", $1) print $1 next; } # END { print "# Found:", Found } ---- See also: * [http] - [Download file via HTTP] - [Polling web images with Tk] * [TclCurl] * [A little file searcher] - [owh - a fileless tclsh] * [Serializing an array] ---- [Category Internet] - [Category File] - [Category String Processing]