directory size

Fred Limouzin: 2005/04/11.

The file size command currently (i.e. Tcl8.4.9) does not support directories. A pure-Tcl solution involves adding recursively the size of files in the current and in sub-directories.

Here's a Tk/GUI script that I wrote this week-end, which fits my need. I wanted a gradient-color indication for the size of directories. You can position the orange/yellow scale (size at which the color indicator is orange/yellow; below that value the indicator is gradient going toward green (size=0)) and the red scale (size from which the indicator is red!). A scale selector also lets you decide how many levels of sub-directories to display.

Note: at times it returns slight discrepencies with the info returned by the "properties" context-menu (on WinXp that is; haven't tested on unix yet). Since hidden files are covered in the script below, the discrepencies may be caused by system files. Generally speaking these differences -when they exist - are negligible.

Note: this doesn't yet support links (especially links on dir). Should be easy to add as long as we consider that a link on a dir has a size of 0 (in fact it's usually the length of the string representing the path of the object linked), but remember the script was used to find big directories, so again I take that as negligible!). Bear in mind this script fited my needs ;-).

Screenshot: http://dire.straits.free.fr/vertigo/dirsize2.gif

The code can also be found at http://dire.straits.free.fr/vertigo ([L1 ]).


 #!/bin/sh
 # Frederic Limouzin Copyrights (c)2005; All rights reserved \
 exec tclsh "$0" ${1+"$@"}

 package require Tk

 set tcl_precision 17

 catch {console hide} r ; unset r

 ###############################################################################

 set dir(root) C:/ ;# [file join C:/ Temp]
 set dir(log)  [file join [pwd] [file tail [file rootname $argv0]].log]
 set dir(sel)  $dir(root)

 if {[file exists $dir(log)]} {
     file rename -force $dir(log) $dir(log).bck
 }
 set Log [open $dir(log) w]

 set ::MAXLVL 2 ;# level of sub dir to display
 set ::KILO 1024.0
 set ::MID 50  ;# 50% (yellow) at mid in Mbytes
 set ::MAX 250 ;# 100% (red) at and above max in Mbytes
 set ::HIDE(green) false
 set ::HIDE(orange) false

 ###############################################################################

 button .xit -text {Exit} -command {Quit}
 frame .d
 button .d.dirsel -text {Select Directory} -command {SelDir} -font Courier
 entry .d.dir -textvariable dir(sel)
 button .sze -text {Calc Size} -command {DirSize} -relief raised -font Courier
 scale .s -from 1 -to 30 -length 300 -resolution 1    \
   -label {Max SubLevel to display:} -variable MAXLVL -command {} \
   -orient horizontal -tickinterval 4 -showvalue true -relief groove
 scale .smid -from 1 -to [expr {int(1.0 * $::KILO)}] -length 300 -resolution 1 \
   -label {50% (yellow) at size (in Mbytes):} -variable ::MID -command {} \
   -orient horizontal -tickinterval 100 -showvalue true -relief groove
 scale .smax -from 1 -to [expr {int(1.0 * $::KILO)}] -length 300 -resolution 1 \
   -label {100% (red) at size (in Mbytes):} -variable ::MAX -command {} \
   -orient horizontal -tickinterval 100 -showvalue true -relief groove
 frame .hide
 checkbutton .hide.green  -text {Hide Green}  -variable ::HIDE(green) \
         -onvalue true -offvalue false -command {UpdateCB green} -relief raised
 checkbutton .hide.orange -text {Hide Orange} -variable ::HIDE(orange) \
         -onvalue true -offvalue false -command {UpdateCB orange} -relief raised
 pack .hide.green -side left -fill x
 pack .hide.orange -side left -fill x
 frame .t
 set Txt .t.txt
 text $Txt -width 80 -height 15 -wrap none -font Courier \
      -yscrollcommand {.t.scrolly set} -xscrollcommand {.t.scrollx set}
 scrollbar .t.scrollx -relief flat -orient horizontal -command {$Txt xview}
 scrollbar .t.scrolly -relief flat -orient vertical   -command {$Txt yview}
 label .cprght -text {Copyrights (c)2005 Fred-Phenix, Fred Limouzin}

 pack .d.dirsel -side left
 pack .d.dir -side right -fill x -expand true
 pack .cprght -side bottom -fill x
 pack .t.scrollx -side bottom -fill x
 pack $Txt -side left -fill both -expand true
 pack .t.scrolly -side right -fill y
 pack .xit -side bottom -fill x
 pack .t -side bottom -fill both -expand true
 pack .d -side top -fill x
 pack .smid -side top -fill x
 pack .smax -side top -fill x
 pack .s -side top -fill x
 pack .hide -side top -fill x
 pack .sze -side top

 ###############################################################################
 proc UpdateCB {c} {
     if {($c eq {green}) && !$::HIDE($c)} {
         set ::HIDE(orange) false
     }
     if {($c eq {orange}) && $::HIDE($c)} {
         set ::HIDE(green) true
     }
 }

 ###############################################################################
 # for input 12345678
 # if m=0 : result = 12345678 bytes
 # if m=1 : result = 12,345,678 bytes (to be done)
 # if m=2 : result = 11.77 Mbytes
 proc Norma {v {m 0}} {
     if {$m == 0} {
         set rv "[expr {wide($v)}] bytes"
     } elseif {$m == 1} {
         ;# tbd
         set rv {to be done}
     } else {
         array set unitArr {0 bytes 1 kbytes 2 Mbytes 3 Gbytes 4 Tbytes}
         set idx 0
         set nv $v
         while {true} {
             set dv $nv
             set nv [expr {1.0 * wide($dv) / $::KILO}]
             if {$nv > 1.0} {
                 incr idx
             } else {
                 break
             }
         }
         set rv [format {%3.2f %s} $dv $unitArr($idx)]
     }
     return $rv
 }

 ###############################################################################
 proc Clamp {v {min 0} {max 255}} {
     if {$v < $min} {
         return $min
     } elseif {$v > $max} {
         return $max
     } else {
         return $v
     }
 }

 ###############################################################################
 proc CalcColor {y} {
     set y [Clamp $y 0.0 1.0]
     set blu 0
     ;#set gre [Clamp [expr {int(255.0 * (1.0 - $y))}]] ;# for mid=orange
     set gre [Clamp [expr {int(2.0 * 255.0 * (1.0 - $y))}]] ;# for mid=yellow
     set red [Clamp [expr {int(255.0 * 2.0 * $y)}]]
     return [format {#%02X%02X%02X} $red $gre $blu]
 }

 ###############################################################################
 proc GetColor_Square {x} { ;# x in bytes
     set x [Clamp [expr {1.0 * wide($x) / ($::KILO * $::KILO)}] 0.0 $::MAX] ;# x in Mb
     set a [expr {(1.0 * $::MAX - (2.0 * $::MID)) / (2.0 * $::MID * $::MAX * ((1.0 * $::MID) - $::MAX))}]
     set b [expr {(1.0 / (2.0 * $::MID)) - (1.0 * $a * $::MID)}]
     set y [expr {(1.0 * $a * wide($x * $x)) + (1.0 * $b *$x)}]
     return [CalcColor $y]
 }

 ###############################################################################
 proc GetColor_Linear {x} { ;# x in bytes
     set x [Clamp [expr {1.0 * wide($x) / ($::KILO * $::KILO)}] 0.0 $::MAX] ;# x in Mb
     if {$x < $::MID} {
         set a [expr {1.0 / (2.0 * $::MID)}]
         set b 0.0
     } else {
         set a [expr {1.0 / (2.0 * ($::MAX - $::MID))}]
         set b [expr {1.0 - ($a * $::MAX)}]
     }
     set y [expr {(1.0 * $a * wide($x)) + (1.0 * $b)}]
     return [CalcColor $y]
 }

 ###############################################################################
 proc GetColor {x {mode 0}} {
     if {$mode == 0} {
         return [GetColor_Linear $x]
     } else {
         return [GetColor_Square $x]
     }
 }

 ###############################################################################
 proc SelDir {} {
     global dir
     set tmp [tk_chooseDirectory -title "Choose Root directory" -initialdir $dir(sel)]
     if {$tmp ne {}} {
         set dir(sel) $tmp
     }
     return $dir(sel)
 }

 set Tagidx 0

 ###############################################################################
 proc log {txt {clr #FFFFFF}} {
     global Log
     global Txt
     global Tagidx
     puts $Log $txt
     ;#puts $txt
     $Txt tag configure tagn($Tagidx) -background $clr
     $Txt insert end "___" tagn($Tagidx)
     $Txt insert end $txt\n
     $Txt see end
     incr Tagidx
     update idletasks
     return 0
 }

 ###############################################################################
 proc DirSize_Recurs {dir {level 0}} {
     set nextLevel [expr {$level + 1}]
     catch {cd $dir} res
     if {$res ne {}} {
         return 0
     }
     ;#set dirlst [glob -nocomplain *]
     set dirlst [concat [glob -nocomplain *] [glob -type hidden -nocomplain *]]
     set size 0
     foreach e $dirlst {
         set ndir [file join $dir $e]
         if {![file exists $ndir]} {
             continue
         }
         if {[file isdirectory $ndir]} {
             set s [DirSize_Recurs [file join $dir $ndir] $nextLevel]
         } else {
             set s [file size $ndir]
         }
         set size [expr {wide($size + $s)}]
     }
     if {$level < $::MAXLVL} {
         set clr [GetColor $size]
         if {(!($::HIDE(green) && ($size < ($::KILO * $::KILO * $::MID)))) &&
             (!(($::HIDE(orange)) && ($size < ($::KILO * $::KILO * $::MAX))))} {
             log [format {%14s %20s : %s} [Norma $size 2] ([Norma $size]) $dir] $clr
         }
     }
     return $size
 }

 ###############################################################################
 proc DirSize {} {
     global dir
     log [string repeat - 60]
     DirSize_Recurs $dir(sel)
     return 0
 }

 ###############################################################################
 proc Quit {} {
     global Log
     close $Log
     exit
 }

LV What you have written here is a useful function. However, it isn't what _I_ would think of if someone asked me the file size of a directory. Instead, I'd expect that they wanted to know the number of bytes that the directory's name/inode (on unix anyways) contained. On Unix, the above functionality would be provided by the du command, right?

Fred: Hi Larry. I am not sure I got your point, but yes what I wanted was the disk usage (with links not being followed). In other words the sum of the files' sizes in bytes in the directory and its sub-directories (again, not following links). (I didn't want the reserved space in blocks, but the used space only.) I usually work under Solaris, but I have to admit that that script was for my notebook under WinXP, so I haven't thought through the issues in unix. It did what I wanted, but I'm not saying it'll fit everyone's needs.

I just noticed that your du was a link (I first thought you refered to the unix command), and it is more likely that it was indeed what I was after. Oh well! :-).


KPV: Here's a simpler way of computing all the bytes used in a directory and all its subdirectories. It uses the fileutil module from tcllib.

  package require fileutil
  set total 0
  foreach fileName [::fileutil::find .] {
    incr total [file size $fileName]
  }

NB. this has one draw back in that it generates a list of all files which can be expensive. A better way would be to use the filtercmd option to ::fileutil::find but there's a design bug in that interface in that the filtercmd only gets passed the file name with no directory info.


Fred 20050411: So long as we can agree on what directory size means, would a Tcl command doing that be useful? Is it worth TIP'ing it? Could this functionality be added to the file size command? Even if the C code itself has to be a recursive function adding up the sub-files, I still think it'd be worth having it available as a command rather than the above recursive procedure. Comments/Points-of-view most welcome!


escargo 11 Apr 2005 - It might be worth taking a look at Tree Size [L2 ] for comparison purposes.

Fred 20050412: - Nice! It has the color indicator/status as well! The script actually didn't do 'too bad' vs. TreeSize (i.e., 'acceptable' considering that one will seldom run the task). Plus I certainly did not try to optimize the script any more than with coding style. On WinXP, Pentium-IV-HT, 3.06GHz, to fetch information for 33Gbytes of used-space (needless to say not reading 33G!), on a 56GB drive (forgot the speed of the drive, but the ratio is more important here):

  • script: roughly 30sec; (using Tcl8.4.9)
  • TreeSize: roughly 10sec.

Fred 20050512:- Added the "hide green" and "hide orange" options.


MB 17 08 2006 I experienced two problems with the previous script under Linux. The line

    set dir(root) C:/

can be easily replaced by :

    set volumes [file volumes]
    set dir(root) [lindex $volumes 0]

which works both under Windows and Linux systems. The other problem under Linux is that the "glob" command returns "." and ".." as proper directories, which creates an infinite loop. I suggest the following modification in DirSize_Recurs :

    foreach e $dirlst {
        # On Unix, "." or ".." is not a valid directory - just skip it.
        if {$e!="." && $e!=".."} then {
            set ndir [file join $dir $e]
            if {![file exists $ndir]} {
                continue
            }
            if {[file isdirectory $ndir]} {
                set s [DirSize_Recurs [file join $dir $ndir] $nextLevel]
            } else {
                set s [file size $ndir]
            }
            set size [expr {wide($size + $s)}]
        }
    }

With these two mods, the script work great under Linux and is very useful.

MG While using file volumes "works" on Windows (in that it doesn't raise an error), for me the drives are sorted alphabetically, which means the first element is the A drive, my floppy drive. That renders the script pretty much useless, because (even when I have a disk in the drive), it's not likely to find many large directories on a 1.44 mb disk :) Perhaps a better solution would be something like

  set dir(root) [tk_chooseDirectory]
  if { $dir(root) == "" } {
       exit;
     }

which lets people pick the correct drive (or directory) on any OS, without having to edit the script every time to enter the path they want to check.


See also: