CHMvfs

Daniele Alberto Galliano

This VFS is based upon TclVFS, and allows accessing the content of a Compiled Html (CHM) file, as those used as help for Windows applications.

My intent was allowing the deployment of a whole site, by mean of a single file, yet available in such format. E.g. I don't want to download the whole html help for ActiveTcl, create a war, and deploy it. This comes from the need to distribute useful documentation among my development team, and also from the habit of downloading sites to my Palm, using Plucker and the like.

CHMvfs is not thoroughly tested, but worked fine with version ActiveTcl8.4.11.0.162119-html and ActiveTcl8.4.12.0.226725-html of documentation.

Main namespace is related to chm, as a filesystem. I'd like to add access to chm-specific resources, like search for indexes.

lzx namespace is a porting from chmtools, implemented by Matthew T. Russotto, whom I thank a lot, because his work was absolutely necessary to end this program. This LZX decruncher was pulled out of the program cabextract 0.2 by Stuart Caie <[email protected]> and modified to be useful as an LZX decruncher outside the context of CAB files. I simplified it even more.

The difference between CHMvfs and chmtools, is that I needed a Random Access, which I achieved using the block index at the end of the Reset Table: reading the whole file each time, uncompressing it, required a huge time.

If You like it, You can extend with all the things I forgot, and pulling all the loose strings. You can also use this as a starting point for a CABvfs.

I am not interested in using chm files other than for reading, so it is a read-only filesystems, like zips and other archives.

Code

  package provide vfs::chm 0.5
  set ::verbose 0
  package require vfs 1.0
  package provide chmvfs 0.5
  
  set ::vfs::debug 1
  # Basic idea is having access to a site structure as saved in a chm file, then
  # serving it via a tcl-based http server.
  
  namespace eval vfs::chm {
    variable status
    array set status [list / /]
  }
  
  namespace eval ::lzx {
    variable bitbuf ""
    variable bufsiz 0
    variable bufpos 0
    variable status
    array set status [list / /]
    variable LZX_FRAME_SIZE 32768
    variable LZX_CHECK_BLOCK [expr (32768 + 6144)*8]
    variable LZX_MIN_MATCH                2
    variable LZX_MAX_MATCH                257
    variable LZX_NUM_CHARS                256
  #define LZX_BLOCKTYPE_VERBATIM       (1)
  #define LZX_BLOCKTYPE_ALIGNED        (2)
  #define LZX_BLOCKTYPE_UNCOMPRESSED   (3)
    variable LZX_PRETREE_NUM_ELEMENTS  20
    variable LZX_ALIGNED_NUM_ELEMENTS  8
    variable LZX_NUM_PRIMARY_LENGTHS   7
    variable LZX_NUM_SECONDARY_LENGTHS    249
    variable LZX_LENGTH_MAXSYMBOLS   [expr $LZX_NUM_SECONDARY_LENGTHS + 1]
    variable LZX_LENGTH_TABLEBITS    12
    variable LZX_PRETREE_MAXSYMBOLS  $LZX_PRETREE_NUM_ELEMENTS
    variable LZX_PRETREE_TABLEBITS   6
    variable LZX_MAINTREE_MAXSYMBOLS [expr $LZX_NUM_CHARS + 50 * 8]
    variable LZX_MAINTREE_TABLEBITS  12
    variable LZX_ALIGNED_MAXSYMBOLS  $LZX_ALIGNED_NUM_ELEMENTS
    variable LZX_ALIGNED_TABLEBITS   7
    variable extra_bits
    variable position_base
  
    # Initialize LZX lookup tables
    for { set i 0; set j 0} { $i < 51 } { incr i 2 } {
      # 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7...
      set extra_bits($i) $j
      set extra_bits([expr $i + 1]) $j
      # 0,0,1,2,3,4...15,16,17,17,17,17...
      if { ($i != 0) && ($j < 17) } { incr j }
    }
  
    for { set i 0; set j 0 } { $i < 51 } { incr i } {
      # 0,1,2,3,4,6,8,12,16,24,32,...
      set position_base($i) $j
      # 1,1,1,1,2,2,4,4,8,8,16,16,32,32,...
      incr j [expr 1 << $extra_bits($i)]
    }
  }
  
  proc vfs::chm::Mount {chm local} {
    if {![file exists $chm] || ![file isfile $chm] } {
      error "No such file $chm"
    }
    
    set fileHeader(size) 56
    set fileHeader(structure) {        {n type f a4} {n version f i}
        {n totLength f i} {n unknown f i}
        {n timestamp f i} {n langId f i}
        {n guid1 f h32} {n guid2 f h32}}
  
    set recordFormat "h32"
    set fl [::open $chm r]
    fconfigure $fl -translation binary
  
    set fmt ""
    set varList {}
    foreach headerElement $fileHeader(structure) {
        array set element $headerElement
        set fmt "$fmt$element(f)"
        set varList [concat $varList fileHeader($element(n))]
    }
  
    set content [read $fl $fileHeader(size)]
    eval binary scan \$content $fmt $varList
  
    if { $fileHeader(version) != 2 } {
        set hst(size) 40
        set hst(structure) { {n offs0 f w} {n len0 f w}
            {n offs1 f w} {n len1 f w} {n offs2 f w}}
    } else {
        set hst(size) 32
        set hst(structure) { {n offs0 f w} {n len0 f w}
            {n offs1 f w} {n len1 f w}}
    }
  
    set fmt ""
    set varList {}
    foreach headerElement $hst(structure) {
        array set element $headerElement
        set fmt "$fmt$element(f)"
        set varList [concat $varList hst($element(n))]
    }
  
    set content [read $fl $hst(size)]
    set offset $fileHeader(size)
    eval binary scan \$content $fmt $varList
  
    set hSect0(size) $hst(len0)
    set hSect0(structure) {        {n unkn0  f H4} {n unkn1 f i}
        {n fileSize f w } {n unkn2 f i} {n unkn3 f i}}
  
    set fmt ""
    set varList {}
    foreach headerElement $hSect0(structure) {
        array set element $headerElement
        set fmt "$fmt$element(f)"
        set varList [concat $varList hSect0($element(n))]
    }
  
    set content [read $fl $hSect0(size)]
    eval binary scan \$content $fmt $varList
  
    set hSect1(size) 84
    set hSect1(structure) { { n type f a4 } { n version f i } { n length f i } { n unkn0 f i }
        { n chunksz f i } { n density f i } { n depth f i } { n rtchunkno f i } { n PGML1 f i }
        { n PMGLls f i } { n unkn1 f i } { n dirChunks f i } { n wLangId f i } { n guid f h32 }
        { n len2 f i } { n unkn2 f i } { n unkn3 f i } { n unkn4 f i } }
  
    set fmt ""
    set varList {}
    foreach headerElement $hSect1(structure) {
        array set element $headerElement
        set fmt "$fmt$element(f)"
        set varList [concat $varList hSect1($element(n))]
    }
  
    seek $fl $hst(offs1)
    set content [read $fl $hSect1(size)]
    eval binary scan \$content $fmt $varList
    set offset [expr $hst(offs1) + $hSect1(length)]
    seek $fl $offset
  
    for {set chunk 0} { $chunk < $hSect1(dirChunks) } { incr chunk } {
      set chOffset 0
      set content [read $fl $hSect1(chunksz)]
      incr offset $hSect1(chunksz)
      set fmt a4
      eval binary scan \$content $fmt chunkType
  
      if { $chunkType == "PMGL" } {
        set begLstChunk(size) 20
        set begLstChunk(structure) { { n type f a4 } { n length f i } { n zero f i }
            { n prev f i } { n next f i } }
  
        set fmt ""
        set varList {}
        foreach headerElement $begLstChunk(structure) {
            array set element $headerElement
            set fmt "$fmt$element(f)"
            set varList [concat $varList begLstChunk($element(n))]
        }
  
        eval binary scan \$content $fmt $varList
        set quickref [expr $chOffset + $hSect1(chunksz) - $begLstChunk(length)]
        set end [expr $chOffset + $hSect1(chunksz) - 2]
        set fmt s
        eval binary scan \$content @${end}$fmt begLstChunk(entryNum)
        lappend begLstChunk(structure) {n entryNum f s}
  
        set chunks($chunk) [array get begLstChunk]
  
        incr chOffset $begLstChunk(size)
        set chunkList {}
        for {set entryno 0} { $entryno < $begLstChunk(entryNum) } {incr entryno} {
          set chOffset [readEntry $content $chOffset entry]
          array set temp $entry
          set name [string trim $temp(NAME) /]
          if { $name == "/" } { set name "/" }
          set ::vfs::chm::tree${chm}($name) $entry
          lappend chunkList $entry
        }
        set chunks(lst$chunk) $chunkList
  
      } else {
        if { $chunkType == "PMGI" } {
        } else {
          error "Unknown chunk type $chunkType"
        }
      }
    }
  
    if { $fileHeader(version) != 2 } {
      set offset $hst(offs2)
    } else {
      incr offset $hSect1(chunksz)
    }
    close $fl
  
    if { $fileHeader(version) != 2 } {
      set contentStart $hst(offs2)
    } else {
      set contentStart $offset
    }
    set ::vfs::chm::tree${chm}(contentStart) $contentStart
    set nlfid [open $chm ::DataSpace/NameList r 438]
    set content [read $nlfid]
    close $nlfid
    binary scan $content ss len num
    set offset 4
    for { set idx 0 } { $idx < $num } { incr idx } {
      binary scan $content @${offset}s fileNameLen
      incr offset 2
      set beg $offset
      incr offset [expr 2 * $fileNameLen]
      if { $idx > 0 } {
        set fname [string range $content $beg $offset]
        set fname [join [split $fname "\0"] ""]
        array set sectData [set ::vfs::chm::tree${chm}(::DataSpace/Storage/$fname/Content)]
        incr sectData(OFFSET) $contentStart
        set ctrlfd [open $chm ::DataSpace/Storage/$fname/ControlData r 438]
        set control [read $ctrlfd]
        close $ctrlfd
        binary scan $control ia4iiiii wnum sectData(SIGN) sectData(VERSION) sectData(RSTINT) sectData(WINSZ) sectData(CACSZ) zero
        if { $sectData(VERSION) == 2 } {
          set sectData(WINSZ) [expr $sectData(WINSZ) * 32768]
          set sectData(RSTINT) [expr $sectData(RSTINT) * 32768]
        }
        switch $sectData(WINSZ) {
           32768 { set sectData(WINBT) 15 }
           65536 { set sectData(WINBT) 16 }
          131072 { set sectData(WINBT) 17 }
          262144 { set sectData(WINBT) 18 }
          524288 { set sectData(WINBT) 19 }
         1048576 { set sectData(WINBT) 20 }
         2097152 { set sectData(WINBT) 21 }
          default { error "bad controldata window size" }
        }
        set ctrlFileName "::DataSpace/Storage/$fname/Transform/{7FC28940-9D31-11D0-9B27-00A0C91E9C7C}/InstanceData/ResetTable"
        set ctrlfd [open $chm $ctrlFileName r 438]
        set control [read $ctrlfd]
        close $ctrlfd
        binary scan $control iiiiwwwww sectData(unk) sectData(RSTBEN) sectData(RSTBSZ) sectData(RSTBLN) sectData(RSTBULN) sectData(RSTBCLN) sectData(RSTBBSZ) sectData(RSTBZRO) sectData(RSTBBND)
        array set rstTable [set ::vfs::chm::tree${chm}($ctrlFileName)]
        set sectData(RSTBLOFFS) $rstTable(OFFSET)
        set ::vfs::chm::tree${chm}(Section$idx) [array get sectData]
      }
      incr offset 2
    }
    set ::vfs::chm::status($local) $chm
    set ::vfs::chm::tree${chm}(mountPoint) $local
    ::vfs::log "chm $chm mounted at $local"
    vfs::filesystem mount $local [list vfs::chm::handler $chm]
    vfs::RegisterMount $local [list vfs::chm::Unmount]
    return $local
  }
  
  proc vfs::chm::Unmount {local} {
    set chm $::vfs::chm::status($local)
    unset ::vfs::chm::tree$chm
    unset ::vfs::chm::status($local)
    vfs::filesystem unmount $local
  }
  
  proc vfs::chm::handler {chm cmd root relative actualpath args} {
    if {$cmd == "matchindirectory"} {
      eval [list $cmd $chm $relative $actualpath] $args
    } else {
      eval [list $cmd $chm $relative] $args
    }
  }
  
  # If we implement the commands below, we will have a perfect
  # virtual file system for Compiled HTML archives.
  
  proc vfs::chm::stat {chm name} {
      ::vfs::log "stat $name"
      if { $name == ""} {
        return [list type directory size 0 mode 0555 \
          ino -1 depth 0 name $name atime 0 ctime 0 mtime 0 dev -1 \
          uid -1 gid -1 nlink 1]
            }
      if {[catch "set ::vfs::chm::tree${chm}($name)" x]} {
        error "No such file: $x"
      }
      array set entry $x
      if { $entry(LENGTH) == 0 } {
        return [list type directory size 0 mode 0111 \
          ino -1 depth 0 name $name atime 0 ctime 0 mtime 0 dev -1 \
          uid -1 gid -1 nlink 1]
      }
      return [list type file size $entry(LENGTH) mode 0111 \
        ino -1 depth 0 name $name atime 0 ctime 0 mtime 0 dev -1 \
        uid -1 gid -1 nlink 1]
  }
  
  proc vfs::chm::access {chm name mode} {
    if {$mode & 2} {
      vfs::filesystem posixerror $::vfs::posix(EROFS)
    }
    if { $name == ""} { return 1 }
    if {[catch "set ::vfs::chm::tree${chm}($name)" x]} {
      vfs::filesystem posixerror $::vfs::posix(ENOENT)
    }
          return 1
  }
  
  proc vfs::chm::exists {chm name} {
    if { $name == ""} { return 1 }
    if {[catch "set ::vfs::chm::tree${chm}($name)" x]} {
      vfs::filesystem posixerror $::vfs::posix(ENOENT)
    }
          return 1
  }
  
  proc vfs::chm::matchindirectory {chm path actualpath pattern type} {
  
      set pattern [file join $path $pattern]
      set biggest [array name ::vfs::chm::tree${chm} $pattern]
      set root [set ::vfs::chm::tree${chm}(mountPoint)]
      set bigger [list]
      foreach p $biggest {
        if { [string match "${pattern}/*" $p ]} continue
        lappend bigger [file join $root $p]
      }
      #::vfs::log "got $newres"
      return [::vfs::matchCorrectTypes $type $bigger]
  }
  
  proc vfs::chm::open {chm name mode permissions} {
    switch -- $mode {
      "" -
      "r" {
        if {[catch "set ::vfs::chm::tree${chm}($name)" x]} {
          vfs::filesystem posixerror $::vfs::posix(ENOENT)
        }
        array set entry $x
        if { !$entry(LENGTH) } {
          # There are no empty files: they are folders!
          vfs::filesystem posixerror $::vfs::posix(EPERM)
        }
  
        set nfd [vfs::memchan]
        fconfigure $nfd -translation binary
        set chmfd [::open $chm r]
        fconfigure $chmfd -translation binary
  
        if { $entry(SECTION) } {
          # needs decompression
          array set sectData [set ::vfs::chm::tree${chm}(Section${entry(SECTION)})]
          set entryNum [expr $entry(OFFSET) / $sectData(RSTBBSZ) / 2]
          set entry(NUM) [expr $entryNum * 2]
          if { $entryNum > $sectData(RSTBEN) } {
            error "Entry $entryNum not present in Reset Table($sectData(RSTBEN))"
          }
          set entryOffset [expr [set ::vfs::chm::tree${chm}(contentStart)] + $sectData(RSTBLOFFS) + $sectData(RSTBLN) + $entry(NUM) * $sectData(RSTBSZ)]
          set fl [::open $chm r]
          fconfigure $fl -translation binary
          seek $fl $entryOffset
          set addr [read $fl $sectData(RSTBSZ)]
          if { $sectData(RSTBSZ) == 4 } {
            binary scan "$addr" i compOffset
          } else {
            binary scan "$addr" w compOffset
          }
          set entry(OFFS) $compOffset
          set skip [expr $entry(OFFSET) - $entry(NUM) * $sectData(RSTBBSZ)]
          seek $chmfd $sectData(OFFSET)
          seek $chmfd $entry(OFFS) current
          ::lzx::decompress $chmfd $nfd $entry(LENGTH) $skip $sectData(WINBT)
        } else {
          seek $chmfd [set ::vfs::chm::tree${chm}(contentStart)] start
          seek $chmfd $entry(OFFSET) current
          set data [read $chmfd $entry(LENGTH)]
          puts -nonewline $nfd $data
        }
  
        fconfigure $nfd -translation auto
        seek $nfd 0
        return [list $nfd]
      }
      default {
        vfs::filesystem posixerror $::vfs::posix(EROFS)
      }
    }
  }
  
  proc vfs::chm::createdirectory {chm name} {
      #::vfs::log "createdirectory $name"
      vfs::filesystem posixerror $::vfs::posix(EROFS)
  }
  
  proc vfs::chm::removedirectory {chm name recursive} {
      #::vfs::log "removedirectory $name"
      vfs::filesystem posixerror $::vfs::posix(EROFS)
  }
  
  proc vfs::chm::deletefile {chm name} {
      #::vfs::log "deletefile $name"
      vfs::filesystem posixerror $::vfs::posix(EROFS)
  }
  
  proc vfs::chm::fileattributes {chm name args} {
      #::vfs::log "fileattributes $args"
      switch -- [llength $args] {
          0 {
              # list strings
              return [list NAME SECTION OFFSET LENGTH]
            }
          1 {
              # get value
        if {[catch "set ::vfs::chm::tree${chm}($name)" x]} {
          vfs::filesystem posixerror $::vfs::posix(ENOENT)
        }
        array set entry $x
              set index [lindex $args 0]
              set name [lindex [list NAME SECTION OFFSET LENGTH] $index]
              return $entry($name)
          }
          2 {
              # set value
              set index [lindex $args 0]
              set val [lindex $args 1]
              vfs::filesystem posixerror $::vfs::posix(EROFS)
          }
      }
  }
  
  proc vfs::chm::utime {fd path actime mtime} {
      vfs::filesystem posixerror $::vfs::posix(EROFS)
  }
  
  proc vfs::chm::readEntry { content offset lst } {
    set l 0
    set b -1
    while { $b < 0 } {
      eval binary scan \$content @${offset}c1 b
      incr offset
      set l [expr ( $l << 7 ) | ($b & 127) ]
    }
    eval binary scan \$content @${offset}a$l n
    incr offset $l
    set s 0
    set b -1
    while { $b < 0 } {
      eval binary scan \$content @${offset}c1 b
      incr offset
      set s [expr ( $s << 7 ) | ($b & 127) ]
    }
    set o 0
    set b -1
    while { $b < 0 } {
      eval binary scan \$content @${offset}c1 b
      incr offset
      set o [expr ( $o << 7 ) | ($b & 127) ]
    }
    set l 0
    set b -1
    while { $b < 0 } {
      eval binary scan \$content @${offset}c1 b
      incr offset
      set l [expr ( $l << 7 ) | ($b & 127) ]
    }
    upvar $lst x
    set x [list NAME $n SECTION $s OFFSET $o LENGTH $l]
    return $offset
  }
  
  proc ::lzx::checkBuffer { fds } {
    variable bufpos
    variable bufsiz
    variable bitbuf
  
    # How many bytes to be skipped?
    set bigSkip [expr ($bufpos >> 3 ) & -2]
    # How many bits to be skipped?
    set smlSkip [expr $bufpos & 15]
    # In case of exhausted buffer, read some
    # We should have at least LZX_CHECK_BLOCK = ( LZX_FRAME_SIZE + 6k ) * 8 bits,
    # the maximum size an uncompressed block can achieve, but we
    # read a MegaByte at the time
    if { $bufsiz < $::lzx::LZX_CHECK_BLOCK } {
      set bytes [read $fds 1048576]
      set bitbuf [string range $bitbuf $bigSkip end]
      append bitbuf $bytes
      # Use real bytes length, in case of less input data
      incr bufsiz [expr [string length $bytes] * 8]
      set bufpos $smlSkip
      set bigSkip 0
    }
  }
  
  proc ::lzx::bits { len } {
    variable bufpos
    variable bufsiz
    variable bitbuf
    
    # How many bytes to be skipped?
    set bigSkip [expr ($bufpos >> 3 ) & -2]
    # How many bits to be skipped?
    set smlSkip [expr $bufpos & 15]
    # I read an entire DWORD: it should be enough
    if { [binary scan $bitbuf @${bigSkip}ss num1 num2] < 2} {
      error "Exhausted input"
    }
    set num [expr ($num1 << 16) | ($num2 & 65535)]
    set res [expr $num >> ( 32 - $smlSkip - $len ) ]
    set res [expr $res & ( ( 1 << $len ) - 1 ) ]
    incr bufpos $len
    incr bufsiz -$len
    return $res
  }
  
  proc ::lzx::build_table { obj } {
    set nsyms [set ::lzx::LZX_${obj}_MAXSYMBOLS ]
    set nbits [set ::lzx::LZX_${obj}_TABLEBITS ]
    set table_mask  [expr 1 << $nbits]
    set bit_mask [expr $table_mask >> 1]
    set next_symbol $bit_mask
    set bit_num 1
    set pos 0
  
    while { $bit_num <= $nbits } {
      for { set sym 0 } { $sym < $nsyms } { incr sym } {
        if { [set ::lzx::status(${obj}_len$sym)] == $bit_num } {
          set leaf $pos
  
          incr pos $bit_mask
          if { $pos > $table_mask } {
            error "Table overrun"
          }
  
          set fill $bit_mask
          while { $fill > 0 } {
            set ::lzx::status(${obj}_table$leaf) $sym
            incr fill -1
            incr leaf
          }
        }
      }
      set bit_mask [expr $bit_mask >> 1]
      incr bit_num
    }
  
    if { $pos != $table_mask } {
      for { set sym $pos } { $sym < $table_mask } { incr sym } {
        set ::lzx::status(${obj}_table$sym) 0
      }
  
      set pos [expr $pos << 16]
      set table_mask [expr $table_mask << 16]
      set bit_mask $::lzx::LZX_FRAME_SIZE
  
      while { $bit_num <= 16 } {
        for { set sym 0 } { $sym < $nsyms } { incr sym } {
          if { [set ::lzx::status(${obj}_len$sym)] == $bit_num } {
            set leaf [expr $pos >> 16]
            for { set fill 0 } { $fill < $bit_num - $nbits } { incr fill } {
              if { $::lzx::status(${obj}_table$leaf) == 0 } {
                set s [expr $next_symbol << 1]
                set ::lzx::status(${obj}_table$s) 0
                incr s
                set ::lzx::status(${obj}_table$s) 0
                set ::lzx::status(${obj}_table$leaf) $next_symbol
                incr next_symbol
              }
              set leaf [expr $::lzx::status(${obj}_table$leaf) << 1]
              if { ($pos >> (15-$fill)) & 1 } { incr leaf }
            }
            set ::lzx::status(${obj}_table$leaf) $sym
            incr pos $bit_mask
            if { $pos > $table_mask } { error "table overflow" }
          }
        }
        set bit_mask [expr $bit_mask >> 1]
        incr bit_num
      }
    }
  
    if { $pos == $table_mask } return
  }
  
  proc ::lzx::read_huffsym { obj } {
    set mp $::lzx::bufpos
    set ms $::lzx::bufsiz
    set bits [set ::lzx::LZX_${obj}_TABLEBITS ]
    set r [::lzx::bits $bits]
  
    set i $::lzx::status(${obj}_table$r)
    if { $i >= [set ::lzx::LZX_${obj}_MAXSYMBOLS ] } {
      set j [expr 1 << (32 - $bits)]
      set go 1
      while { $go || $i >= [set ::lzx::LZX_${obj}_MAXSYMBOLS ] } {
        set j [expr $j >> 1]
        set i [expr $i << 1]
        set i [expr $i | [::lzx::bits 1]]
        if {!$j} { error "Illegal data" }
        set go 0
        set i $::lzx::status(${obj}_table$i)
      }
    }
    set j $::lzx::status(${obj}_len$i)
    set ::lzx::bufpos $mp
    set ::lzx::bufsiz $ms
    incr ::lzx::bufpos $j
    incr ::lzx::bufsiz -$j
    return $i
  }
  
  proc ::lzx::read_lengths { obj first last } {
    for { set x 0 } { $x < 20 } { incr x } {
      set ::lzx::status(PRETREE_len$x) [::lzx::bits 4]
    }
    build_table PRETREE
  
    for { set x $first } { $x < $last} {} {
      set z [::lzx::read_huffsym PRETREE]
      if { $z == 17 } {
        set y [::lzx::bits 4]; incr y 4
        while { $y} {
          incr y -1
          set ::lzx::status(${obj}_len$x) 0
          incr x
        }
      } else {
        if { $z == 18 } {
          set y [::lzx::bits 5]; incr y 20
          while { $y} {
            incr y -1
            set ::lzx::status(${obj}_len$x) 0
            incr x
          }
        } else {
          if { $z == 19 } {
            set y [::lzx::bits 1]; incr y 4
            set z [::lzx::read_huffsym PRETREE]
            set z [expr $::lzx::status(${obj}_len$x) - $z]
            if { $z < 0 } { incr z 17 }
            while { $y} {
              incr y -1
              set ::lzx::status(${obj}_len$x) $z
              incr x
            }
          } else {
            set z [expr $::lzx::status(${obj}_len$x) - $z]
            if { $z < 0 } { incr z 17 }
            set ::lzx::status(${obj}_len$x) $z
            incr x
          }
        }
      }
    }
  }
  
  proc ::lzx::decompress {infd askfd len skip wndbit } {
    variable bufsiz
    variable bufpos
    variable bitbuf
  
    set bufsiz 0
    set bufpos 0
    set bitbuf ""
  
    if { $wndbit == 20 } {
      set posn_slots 42
    } else {
      if { $wndbit == 21 } {
        set posn_slots 50
      } else {
        set posn_slots [expr $wndbit * 2]
      }
    }
  
    if { $skip } {
      set outfd [vfs::memchan]
      fconfigure $outfd -translation binary
    } else {
      set outfd $askfd
    }
  
    # decompress as much as needed, but outputs only the content, skip previous
    # the idea is to start with initial values: LZX status is not used nor needed
  
    set window_posn 0
    set last_window_posn 0
    set window_size [expr 1 << $wndbit]
    set bufsiz 0
    set main_elements [expr $::lzx::LZX_NUM_CHARS + ($posn_slots << 3)]
  
    set todo [expr $len + $skip]
  
    while { $todo > 0 } {
    ::lzx::checkBuffer $infd
      if { $window_posn == $window_size } {
        set window_posn 0
      }
      if { $window_posn == 0 } {
              set header_read 0
              for { set i 0 } { $i < $::lzx::LZX_MAINTREE_MAXSYMBOLS } { incr i } {
          set ::lzx::status(MAINTREE_len$i) 0 }
              for { set i 0 } { $i < $::lzx::LZX_LENGTH_MAXSYMBOLS } { incr i } {
          set ::lzx::status(LENGTH_len$i) 0 }
        set R0 1; set R1 1; set R2 1
      }
      if { !$header_read } {
        set i 0; set j 0
        set k [::lzx::bits 1]
        set intel_filesize 0
        if { $k } { set intel_filesize [::lzx::bits 32] }
        set header_read 1
      }
      set block_type [::lzx::bits 3]
      set block_remaining [::lzx::bits 24]
      set block_length $block_remaining
  
      switch $block_type {
        1 { # LZX_BLOCKTYPE_VERBATIM
          read_lengths MAINTREE 0 256
          read_lengths MAINTREE 256 $main_elements
          build_table MAINTREE
          if { [set ::lzx::status(MAINTREE_len232)] != 0} { set intel_started 1 }
  
          read_lengths LENGTH 0 $::lzx::LZX_NUM_SECONDARY_LENGTHS
          build_table LENGTH
        }
        2 { # LZX_BLOCKTYPE_ALIGNED
          for { set i 0 } { $i < 8 } { incr i } {
            set ::lzx::status(ALIGNED_len$i) [::lzx::bits 3]
          }
          build_table ALIGNED
          read_lengths MAINTREE 0 256
          read_lengths MAINTREE 256 $main_elements
          build_table MAINTREE
          if { [set ::lzx::status(MAINTREE_len232)] != 0} { set intel_started 1 }
  
          read_lengths LENGTH 0 $::lzx::LZX_NUM_SECONDARY_LENGTHS
          build_table LENGTH
        }
        3 { # LZX_BLOCKTYPE_UNCOMPRESSED
          set intel_started 1
          set b [expr $::lzx::bufpos & 15]
          if { $b } {
            set b [expr 16 - $b]
            incr bufpos $b
            incr bufsiz -$b
          }
          set R0 [::lzx::bits 32]
          set R1 [::lzx::bits 32]
          set R2 [::lzx::bits 32]
        }
        default {
          error "Illegal block type $block_type"
        }
      }
      set this_run $block_remaining
      while { $this_run > 0 && $todo > 0 } {
        if { $this_run > $todo } { set this_run $todo }
        incr todo -$this_run
        incr block_remaining -$this_run
  
        set last_window_posn $window_posn
        set window_posn [expr $window_posn & ( $window_size - 1 )]
  
        if { ($window_posn + $this_run) > $window_size } {
          error "Invalid format"
        }
  
        switch $block_type {
          1 { # LZX_BLOCKTYPE_VERBATIM
            while { $this_run > 0 } {
              set main_element [read_huffsym MAINTREE]
              if { $main_element < $::lzx::LZX_NUM_CHARS } {
                puts -nonewline $outfd [binary format c $main_element]
                incr window_posn
                incr this_run -1
              } else {
                incr main_element -$::lzx::LZX_NUM_CHARS
  
                set match_length [expr $main_element & $::lzx::LZX_NUM_PRIMARY_LENGTHS]
                if { $match_length == $::lzx::LZX_NUM_PRIMARY_LENGTHS } {
                  incr match_length [read_huffsym LENGTH]
                }
                incr match_length $::lzx::LZX_MIN_MATCH
  
                set match_offset [expr $main_element >> 3]
  
                if { $match_offset > 2 } {
                  if { $match_offset != 3 } {
                    set extra $::lzx::extra_bits($match_offset)
                    set verbatim_bits [::lzx::bits $extra]
                    set match_offset [expr $::lzx::position_base($match_offset) - 2 + $verbatim_bits]
                  } else {
                    set match_offset 1
                  }
  
                  set R2 $R1; set R1 $R0; set R0 $match_offset
                } elseif { $match_offset == 0 } {
                    set match_offset $R0
                } elseif { $match_offset == 1 } {
                    set match_offset $R1
                    set R1 $R0; set R0 $match_offset
                } else {
                  set match_offset $R2
                  set R2 $R0; set R0 $match_offset
                }
  
                set length $match_length
                while { $length } {
                  set rundest [tell $outfd]
                  seek $outfd -$match_offset end
                  set match [read $outfd $length]
                  seek $outfd $rundest
                  puts -nonewline $outfd $match
                  set l [string length $match]
                  incr length -$l
                }
                incr window_posn $match_length
                incr this_run -$match_length
              }
                    if { ($window_posn % 32768 ) == 0 && $window_posn != 0 } {
                set b [expr $::lzx::bufpos & 15]
                if { $b } {
                  set b [expr 16 - $b]
                  incr ::lzx::bufpos $b
                  incr bufsiz -$b
                }
                    }
            }
          }
          2 { # LZX_BLOCKTYPE_ALIGNED
            while { $this_run > 0 } {
              set main_element [read_huffsym MAINTREE]
  
              if { $main_element < $::lzx::LZX_NUM_CHARS } {
                puts -nonewline $outfd [binary format c $main_element]
                incr window_posn
                incr this_run -1
              } else {
                incr main_element -$::lzx::LZX_NUM_CHARS
  
                set match_length [expr $main_element & $::lzx::LZX_NUM_PRIMARY_LENGTHS]
                if { $match_length == $::lzx::LZX_NUM_PRIMARY_LENGTHS } {
                  incr match_length [read_huffsym LENGTH]
                }
                incr match_length $::lzx::LZX_MIN_MATCH
  
                set match_offset [expr $main_element >> 3]
  
                if { $match_offset > 2 } {
                  # It is not a repeated offset
                  set extra $::lzx::extra_bits($match_offset)
                  set match_offset [expr $::lzx::position_base($match_offset) - 2]
                  if { $extra > 3 } {
                    incr extra -3
                    set verbatim_bits [::lzx::bits $extra]
                    incr match_offset [expr $verbatim_bits << 3]
                    set aligned_bits [::lzx::read_huffsym ALIGNED]
                    incr match_offset $aligned_bits
                  } elseif { $extra == 3 } {
                    set aligned_bits [::lzx::read_huffsym ALIGNED]
                    incr match_offset $aligned_bits
                  } elseif { $extra > 0 } {
                    set verbatim_bits [::lzx::bits $extra]
                    incr match_offset $verbatim_bits
                  } else {
                    set match_offset 1
                  }
  
                  set R2 $R1; set R1 $R0; set R0 $match_offset
                } elseif { $match_offset == 0 } {
                  set match_offset $R0
                } elseif { $match_offset == 1 } {
                  set match_offset $R1
                  set R1 $R0; set R0 $match_offset
                } else {
                  set match_offset $R2
                  set R2 $R0; set R0 $match_offset
                }
  
                set length $match_length
                while { $length } {
                  set rundest [tell $outfd]
                  seek $outfd -$match_offset end
                  set match [read $outfd $length]
                  seek $outfd $rundest
                  puts -nonewline $outfd $match
                  set l [string length $match]
                  incr length -$l
                }
                incr window_posn $match_length
                incr this_run -$match_length
               }
                    if { ($window_posn % 32768 ) == 0 && $window_posn != 0 } {
                set b [expr $::lzx::bufpos & 15]
                if { $b } {
                  set b [expr 16 - $b]
                  incr bufpos $b
                  incr bufsiz -$b
                }
                    }
                  }
          }
          3 { # LZX_BLOCKTYPE_UNCOMPRESSED
            set tt [expr $::lzx::bufpos / 8]
            set part [string range $::lzx::bitbuf $tt [expr $tt + $this_run - 1]]
            puts -nonewline $outfd $part
            incr ::lzx::bufpos [expr $this_run * 8]
            incr bufsiz [expr -$this_run * 8]
            incr window_posn $this_run
            if { $::lzx::bufpos & 8 } {
              incr ::lzx::bufpos 8
              incr bufsiz -8
            }
          }
          default {
            error "Illegal block type $block_type"
          }
        }
        set this_run $block_remaining
      }
    }
    if { $skip } {
      seek $outfd $skip
      set tmp [read $outfd $len]
      puts -nonewline $askfd $tmp
    }
  }
  
  proc ::vfs::chm::test {} {
    ::vfs::chm::Mount c:/Tcl/doc/ActiveTclHelp8.4.chm c:/chm
    cd c:/chm
    set d [glob -types d -directory c:/chm -tails ActiveTcl8.*]
    cd $d
    puts [pwd]
    puts [join [lsort [glob *]] \n]
    catch "file mkdir test" err
    puts $err
    set fid [::open c:/chm/$d/aspn.css r]
    set content [read $fid]
    close $fid
    set fid [::open c:/aspn.css w]
    fconfigure $fid -translation binary
    puts $fid $content
    close $fid
    cd c:/
    ::vfs::chm::Unmount c:/chm
  }

CHM Web server

This is just a DustMote modification to browse the CHM file as a site.

  package require vfs::chm
  #source chmvfs.tcl
  
    ::vfs::chm::Mount c:/Tcl/doc/ActiveTclHelp8.4.chm c:/chm
  
   set d [glob -types d -directory c:/chm -tails ActiveTcl8.*]
  
   set root "c:/chm/$d"
   set default "at.toc.html"
   set port 80
  
   proc bgerror {trouble} {puts stdout "bgerror: $trouble"}
  
   proc answer {socketChannel host2 port2} {
     fileevent $socketChannel readable [list readIt $socketChannel]
   }
  
   proc readIt {socketChannel} {
     global root default
     fconfigure $socketChannel -blocking 0
     set gotLine [gets $socketChannel]
     if { [fblocked $socketChannel] } then {return}
     fileevent $socketChannel readable ""
     set shortName "/"
     regexp {/[^ ]*} $gotLine shortName
     set many [string length $shortName]
     set last [string index $shortName [expr $many-1] ]
     if {$last=="/"} then {set shortName $shortName$default }
     set wholeName $root$shortName
  set err ""
  puts "Serving $wholeName"
     if [catch {set fileChannel [open $wholeName RDONLY] } err ] {
       puts $socketChannel "HTTP/1.0 404 Not found"
       puts $socketChannel ""
       puts $socketChannel "<html><head><title><No such URL.></title></head>"
       puts $socketChannel "<body><center>"
       puts $socketChannel "The URL you requested does not exist on this site."
       puts $socketChannel "</center>$err</body></html>"
  puts $err
     } else {
       fconfigure $fileChannel -translation binary
       fconfigure $socketChannel -translation binary -buffering full
       puts $socketChannel "HTTP/1.0 200 OK"
       puts $socketChannel ""
       set work [read $fileChannel]
  
       puts $socketChannel $work
       close $fileChannel
     }
  
     close $socketChannel
   }
  
   socket -server answer $port
   vwait forEver

Discussion


AK: A number of questions.

  • The history of the lzx code is not fully clear to me from the description. Does it come from cabextract, or from chmtools?
  • Am I right in my assumption that with porting you mean that the original code is in C(++), and you ported it to Tcl?
  • What licenses was the original code under, and what license is your code under?
  • Do you know where it is possible to find an lzx compressor in Tcl or C(++) ? (For people interested in doing the write part of the VFS).
  • Can you provide links to web pages which describe the structure of CHM and CAB files in detail? I.e., a good specification of the file format.

DAG: A number of answers.

  • It does come from chmtools. In the source code I found, and copied, the reference.
  • You're perfectly right: it was coded in C, and I ported in Tcl. It was just for fun, and to make it much more portable, without the need of a binary library: I love Tcl-only packages.
  • It was GNU GPL by Caie. Russotto didn't claim any right on modifications. For me is just the same.
  • See below.
  • My ending point, after a long research was [1 ], the reason why I thanked Matthew.
  • Here is another place of information [2 ]. It is more related to CHM format, and can be used to implement next features.

DDG: Very interesting work. However I can't read the aspn.css file. Getting the "illegal block type 4" error. The file exists however as the glob results shows. Any suggestions?


DAG: How unfortunate! May I know which version of file do you use? I am afraid the versions I own have only VERBATIM and ALIGNED blocks, so UNCOMPRESSED block part could be not tested at all.

Anyway, last night I found that there is still a bug related to Random Access: it seems probably something about odd blocks. I am afraid I access those with some misalignment. I am looking into it: next tests will copy all files from archive to another location.

Another thing I fixed is the buffer reset at the beginning of decompress proc.


DDG Thanks for your response. I use the file D:/ActiveTcl8.4.2/doc/ActiveTclHelp.chm. I can give you a link for downloading "my" chm file. It would be nice to use chmvfs for my dgHelpBrowser like the metakit files.


DAG: That's a good idea. Have a try with this last version: it works fine on both my versions, but for XOTcl pdf files. I'm working on it, but I think that probably it is still related to UNCOMPRESSED blocks. Even chmtools seems to have some problems with those pdf files, though.

I also added a small modification of DustMote, which I use to browse the CHM file as a site. In case it is not working for You, let me know where I can download Your file from.


DDG It now works also for my chm file. Great! I will see that I update dgHelpBrowser for reading chm-files on Unixes,Win and OS-X etc. I once tried to adjust chmlib for this purpose using Swig but I failed. Thanks.


DAG: I thought I found the last problem in UNCOMPRESSED blocks handling, realigning to 16-bit boundaries after each of them, if needed. Nonetheless, it seems that pdf extracted are not always identical to stored files. Still investigating.


DAG: I did it. At last I found the bug in UNCOMPRESSED blocks handling. Nevertheless, I discovered a .chm that puzzled me again. Filenames are stored in all lower-cases, but referred also with upper cases. This means that I need to add an option to mount the VFS in both modes: case-sensitive or case-insensitive. I think that, coming from Windows, the latter should be the default.