Reading JPEG image dimensions

See jpeg for more details.


From mailto:[email protected]

original code from "Odeen", mailto:[email protected]

 proc GetJPGDimensions { filename } {
    set fd [ open $filename ]
    fconfigure $fd -translation binary
    set ch1 "00"
    while { ! [ eof $fd ] } {
       binary scan [ read $fd 1 ] "H2" ch2
       if { ( $ch1 == "ff" ) && ( $ch2 >= "c0" ) && ($ch2 <= "c3" ) } {
          binary scan [ read $fd 7 ] "x3SS" height width
          return [ list $height $width ]
       }
       set ch1 $ch2
    }
    error "Couldn't find JPG header for $filename"
 }

It turns out that the above code will not work with images from most digital cameras. They usually include a thumbnail for display on the LCD. The new code will work on any JPEG that follows the standard. This is from a script that I wrote to produce thumbnailed HTML indexes and captioned display pages automatically. You can find the entire script at http://perrigoue.com Much thanks to Odeen for his help.

We need to find the SOF(start of frame) marker. The marker is a two byte code, ffcx where x is a value from 0 to 3. As Odeen noted, we can't just search on "ffcx" because most digital cameras use an embedded thumbnail for display on the camera's LCD. The embedded thumbnail contains all the same markers as the full size image. Thus if we search on "ffcx" we will find the thumbnail's marker first and thus would read in the wrong dimensions. This would not be a good thing. We avoid this by reading the length bytes for each frame and skipping ahead by that many bytes to the next frame marker. We keep doing this until we find the right frame. This would make A LOT more sense if you read the JPEG FAQ at: http://www.faqs.org/faqs/jpeg-faq/part1/

proc get_jpg_dimensions {filename} {
  # open the file in binary mode (VERY important) -- no need for write access
  set img [open $filename rb]

  # read in first two bytes to check if this is a JPEG
  if {[read $img 2] eq "\xff\xd8"} {
    while {![eof $img]} {
      # search for the next marker, read the marker type byte, and throw out
      # any extra "ff"'s
      while {[read $img 1] ne "\xff"} {}
      while {[set byte [read $img 1]] eq "\xff"} {}

      if {$byte in {\xc0 \xc1 \xc2 \xc3 \xc5 \xc6 \xc7
                    \xc9 \xca \xcb \xcd \xce \xcf}} {
        # this is the SOF marker; read a chunk of data containing the dimensions
        binary scan [read $img 7] x3S2 size
        break
      } else {
        # this is not the the SOF marker; read in the offset of the next marker
        binary scan [read $img 2] S offset

        # the offset includes itself own two bytes so subtract them, then move
        # ahead to the next marker
        seek $img [expr {($offset & 0xffff) - 2}] current
      }
    }
  }

  close $img

  if {![info exists size]} {
    error "invalid JPEG"
  } else {
    return $size
  }
}

JOB Removed some unnecessary code (regarding "SOF" marker) and it still works.

AMG: That code is necessary for skipping thumbnails. Without it, incorrect dimensions are read for some files, such as the full-resolution version of this image: [L1 ].

I went ahead and made a few optimizations to the code, posted above. Plus I added support for some rare SOF marker types.


http://zdnet.com.com/2100-1104-945735.html is a discussion about patent issues over JPEG algorithms.


See also