Version 22 of Downloading pictures from Flickr

Updated 2006-01-21 21:53:54

HJG Someone has uploaded a lot of pictures to Flickr, and I want to show them someplace where no internet is available.

The pages at Flickr have a lot of links, icons etc., so a simple recursive download with e.g. wget would fetch lots of unwanted stuff. Of course, I could tweak the parameters for calling wget (-accept, -reject, etc.), but doing roughly the same thing in Tcl looks like more fun :-) Moreover, with a Tcl-script I can also get the titles and descriptions of the images.

So the first step is to download the html-pages from that person, extract the links to the photos from them, then download the photo-pages (containing titles and complete descriptions), and the pictures in the selected size (Thumbnail=100x75, Small=240x180, Medium=500x375, Large=1024x768, Original=as taken).

Then we can make a Flickr Offline Photoalbum out of them, or just use a program like IrfanView [L1 ] to present the pictures as a slideshow.

Second draft for the download:

  package require http

  proc getPage { url } {
       set token [::http::geturl $url]
       set data [::http::data $token]
       ::http::cleanup $token
       return $data
  }

  catch {console show}        ;##

  set url http://www.flickr.com/photos/siegfrieden
  set filename "s01.html"

  set url http://www.flickr.com/photos/siegfrieden/page2
  set filename "s02.html"

if 1 {

  set data  [ getPage $url ]
 #puts "$data"                ;##
  set fileId [open $filename "w"]
  puts -nonewline $fileId $data
  close $fileId

}

if 0 {

  set fileId [open $filename r]
  set data [read $fileId]
  close $fileId

}

  set n 0
  foreach line [split $data \n] {
    # <title>Flickr: Photos from XXX</title>
    if {[regexp -- "<title>"        $line]} { 
      #puts "1: $line";
       incr n
       set p1 [ string first ":"       $line  0  ]; incr p1 14
       set p2 [ string first "</title" $line $p1 ]; incr p2 -1
       set sT [ string range $line $p1 $p2 ]
       puts "Title: $p1 $p2: '$sT'"
    }
    # <h4>XXX</h4>
    if {[regexp -- "<h4>"           $line]} { 
      #puts "2: $line";
       incr n
       set p1 [ string first "<h4>"    $line  0  ]; incr p1  4
       set p2 [ string first "</h4>"   $line $p1 ]; incr p2 -1
       set sH [ string range $line $p1 $p2 ]
       puts "\nHeader: $p1 $p2: '$sH'"
    }
    # <p class="Photo"><a href="/photos/XXX/9999/"><img src="http://static.flickr.com/99/9999_8888_m.jpg" width="240" height="180" /></a></p>
    if {[regexp -- (class="Photo")  $line]} { 
      #puts "3: $line";
       incr n
       set p1 [ string first "href=" $line  0  ]; incr p1  6
       set p2 [ string first "img"   $line $p1 ]; incr p2 -4
       set sL [ string range $line $p1 $p2 ]
       puts "Link : $p1 $p2: '$sL'"

       set p1 [ string first "src=" $line  0  ]; incr p1  5
       set p2 [ string first "jpg"  $line $p1 ]; incr p2  2
       set sP [ string range $line $p1 $p2 ]
       puts "Photo: $p1 $p2: '$sP'"
    }
    # <p class="Desc">XXX</p>
    if {[regexp -- (class="Desc")   $line]} { 
      #puts "4: $line"; 
       incr n
       set p1 [ string first "Desc"    $line  0  ]; incr p1  6
       set p2 [ string first "</p>"    $line $p1 ]; incr p2 -1
       set sD [ string range $line $p1 $p2 ]
       puts "Descr: $p1 $p2: '$sD'"
    }
    # <a href="/photos/XXX/page12/" class="end">12</a>
    if {[regexp -- (class="end")    $line]} { 
      #puts "5: $line";
       incr n; 
       set p1 [ string first "page" $line  0  ]; incr p1  4
       set p2 [ string first "/"    $line $p1 ]; incr p2 -1
       set s9 [ string range $line $p1 $p2 ]
       puts "\nEnd: $p1 $p2: '$s9'"
       break
    }
  }
  puts "# $n"

This will only get one html-page, so the next step is to also get the other pages of the album, extract the informations we need, and finally fetch the pictures.

Strings to look for:

  • "<title>" - Title for album
  • "<h4>" - Title for an image (but also an occurance of "Search by tag" below all images)
  • (class="Photo") - Links to preview-image and page with a single photo
  • (class="Desc") - Description for an image (if present)
  • (class="end") - Link to last album-page
  • -
  • "profile" - Link to profile-page of album-owner
  • "/page" - Links to more album-pages

...

CJL wonders whether the Flickr-generated RSS feeds for an album might be a quicker way of getting at the required set of image URLs.


See also:


Category Internet - Category File