How can I calculate how much disk space is being used in a directory

This is a relatively common request. Someone wants, for one reason or another, to determine how much diskspace is currently in use.

On Linux/Unix, the "df" command is the best way to go if you need information for the entire disk. The "du" command would be the tool to use for a subdirectory, but not for the whole drive.

I frequently need to know how much space is remaining of various data acqusition systems and I'm using the following code:


   #---------------------------------------------------------------
   # $Id: 4187,v 1.39 2007-01-17 19:00:44 jcw Exp $
   # Original W. Wright 5/14/2004 
   # Contact via: http://lidar.wff.nasa.gov
   # <<<<<<<< Linux/Unix only >>>>>>>>>>>>>>
   #
   # Return current disk drive information from
   # Linux/Unix based systems.
   #
   # This proc simply runs the "df" command to capture the
   # system disk drive usage information.  df is very fast.
   #
   # Examples:
   #
   # disk info                       
   # Returns info for all drives.  Use this to get the $info data
   # used by the mount and partition commands.
   #
   # disk partition [disk info] /dev/hda1 
   # Returns a list containing information for partition /dev/hda1
   # 
   # disk mount [disk info] /
   # Returns a list containing information for the partition
   # mounted on /
   #
   # To use to gather drive information for multiple partitions
   # or mounts use it as follows:
   # set lst [ disk info ]
   # disk partition $lst /dev/hda1
   # disk partition $lst /dev/hdb1
   # disk partition $lst /dev/ram0
   #
   #---------------------------------------------------------------
   proc disk { cmd args } {
    switch -glob $cmd {
     info {
       set f [ open "|/bin/df" "r" ]
       regsub -all { +} [ read $f ]  " " lst
       close $f
       regsub -all {%} $lst "" lst
       return [ lrange [ split $lst "\n\r" ] 1 end ]
     }

   m* -
   p* {
    set lst    [ lindex $args 0 ]
    set   a    [ lindex $args 1 ]
    foreach  p $lst {
      switch -glob $cmd {
        p* { set d       [ lindex $p   0   ] }
        m* { set d       [ lindex $p end   ] }
      }
     if { [ string equal [ lindex $d 0 ] $a ] } {
       return $p
      }
     }
    }
   }
  }

While on Unix one could say "exec /bin/du" , that doesn't do well for a cross platform solution.

Here's an attempt to provide some Tcl code to do this. However, I'm uncertain whether the calculation is correct. Perhaps some of my fellow Tcl'ers can look in here and determine whether my algorithm is correct.


 #! /usr/tcl84/bin/tclsh8.4
 # Name: du.tcl
 # Purpose: 
 # Given a directory, calcuate the number of bytes of plain files within
 # the directory
 # Author: [email protected]
 # Date: Sept. 26, 2002
 # Version: 1.0
 
 package require log 
 package require fileutil
 
 log::lvChannel debug stderr
 
 proc dirsize {directory} {
        if { [file exists $directory ] == 0 } {
                return 0
        }
        if { [file readable $directory ] == 0 } {
                return 0
        }
        set size 0
        set noaccess {}
        foreach element [glob -nocomplain -directory $directory -types f *] {
                set path [file join $directory $element]
                if { [file readable $path] } {
                        incr size [file size $path]
                } else {
                        lappend noaccess $path
                }
        }
        if { [llength $noaccess] != 0 } {
                log::log debug $noaccess
        }
        return $size
 }
 
 proc isdir {path} {
        return [file isdirectory $path]
 }
 
 proc dir_totalsize {directory} {
        if { [file exists $directory ] == 0 } {
                return 0
        }
        if { [file readable $directory ] == 0 } {
                return 0
        }
 
        set size 0
        set noaccess {}
        foreach element [::fileutil::find $directory isdir] {
                set path [file join $directory $element]
                if { [file readable $path] } {
                        incr size [dirsize $element]
                } else {
                        lappend noaccess $path
                }
        }
        if { [llength $noaccess] != 0 } {
                log::log debug $noaccess
        }
        return $size
 } 
 
 # Test out implementation
 
 if { [file exists /tmp/small] == 0 } {
        exec mkdir /tmp/small
        exec cp /etc/motd /tmp/small/motd
 }
 
 puts [format "Size of /tmp/small is %d" [dirsize /tmp/small] ]
 puts [format "Size of %s is %d" $env(HOME) [dirsize $env(HOME)] ]
 puts [format "Size of /not/present is %d" [dirsize /not/present] ]
 puts [format "Total size of %s is %d" $env(HOME) [dir_totalsize $env(HOME)] ]

See also du HJG Using "exec mkdir" only works on unix-like systems.


Martin Lemburg - 27.09.2002:

The proc du in du is something complete different than "dirsize" and "dir_totalsize". But I tried out your procs and got results, that differ completely from the reality. I tried out following:

    % dirsize g:/programme
    22351
    % dir_totalsize g:/programme
    12651754

With the following proc "dirSize" ...

 proc dirSize {obj {recursive 0}} {
    set size    0;

    foreach subObj [glob -nocomplain \
                                [file join $obj *] \
                                [file join $obj {.[a-zA-Z0-9]*}]] {
        if {$recursive && [file isdirectory $subObj]} {
            incr size   [dirSize $subObj 1];
        } else {
            incr size    [file size $subObj];
        }
    }

    return $size;
 }

... I got following:

    % dirSize g:/programme
    22351
    % dirSize g:/programme 1
    279744410

My explorer tells me (inklusive of all hidden files/directories) 289.795.331 Bytes.

So something goes wrong in your proc "dir_totalsize"


Yes, unfortunately the dir_totalsize doesn't recurse. And I'm having computer problems today which prevent me from making adjustments to the script...


Okay, Version 2 attempts to recurse - but there still appears to be differences.


Martin Lemburg - 30.09.2002:

Sorry, but why so complicated? Why using tcllib, where it is not needed? Why using external libs, where it is no needed? The core tcl provides enough capabilities to solve this kind of problem "how much space is used by (or in) a directory". And the core tcl provides enough capabilities for "slim" solutions, without always reinventing the wheel! Especially this problem to recurse a structure to collect informations is a common problem, why using a lib if its more a design pattern problem? Isn't it useful to store a kind of code snippet instead of using external code, libraries, packages?


LV: Why use Tcllib ? Because why rewrite code that has already been written? When you want to go on a trip, do you first build a vehicle or sew your own clothes and build your own shoes? Many people have better things to do than to write every piece of code from scratch...


Martin Lemburg - it's not about rewriting from the scratch! I work in a Software Company, where people are suspicious about the usage from external libraries and packages, if they are not a bought product (warranties, updates, maintainance, ...). And if the products to be delivered to our customers should rely as less as possible on libraries, that are not from our company, we have to collect code snippets, patterns and reuse them. It is (in my eyes) not always useful to need the complete tcllib to be able to use other packages (e.g. fileutil), to solve simple problems. And if its a runtime dependent problem, it is perhabs much more worse to use external packages, that are not to be optimized, or that rely on a unknown count of other packages. And sometimes the usage of external packages/libraries is like (in German) "mit Kanonen auf Spatzen schießen" - "to shoot with cannons at sparrows".

That's why I like the Tcl'lers Wiki! It's the best chance to collect code (snippets) in a central place, to make it available to all Tcl'lers!

This could also be a discussion about "all batteries included". We ... and I think many other companies could only use pure tcl, without any other package! So some published solutions for common problems should be as simple and not rely on something other than pure core tcl.


RS I fully agree. Simple solutions (and many are simple in Tcl) need not be administered in a library on which one's app depends. I've earlier started to collect a library of utilities ("max", "lrevert", "every"... that kind of stuff), but recently I prefer to paste those in where I need them - or even reinvent them (or rewrite from memory). A single file is still the most robust unit of deployment, even if not a Starkit...


LV I never have understood this approach. Why cut and paste in code, resulting in many more places I have to fix when bugs are found, instead of creating a single copy in a library that then everyone uses and which only has to be fixed once, documented once, etc.

CL responds: we live in different worlds. Some days, yes, I think, 'twould be insanity to duplicate code; other days, when working with other clients or in other situations, it's craziness not to write-from-scratch-to-achieve-a-small-self-contained-result. Why cut and paste? Sometimes, for bad reasons: lack of discipline, ignorance, other character flaws. Sometimes, because it fits the deployment model at hand quite perfectly.


LES suggests this cross-platform approach (adjustments still needed):

  proc p.du  { myDir }   {
  global  myDU
 
  if    {! [ info  exists  myDU ] }     {
        set  myDU  0
 }
 
  cd  $myDir
 #puts "DIR : $myDir";  update
  foreach  myType  {f d {f hidden} {d hidden}}   {
        foreach  myFile  [ glob  -nocomplain  -type  $myType * ]        {
 
                incr  myDU  [ file  size  [ file  normalize  $myFile ] ]
 
                if      { [ file  isdirectory  $myFile ] }      {
                        p.du  [ file  normalize  $myFile ]
                        cd ..
                }
        }
  }
 
  return  "directory $myDir contains $myDU bytes"
  }

 # Demo:
  catch {console show}
  catch {wm withdraw .}
  puts "Disk usage:";  update
  puts [p.du $env(HOME)]
 #puts [p.du "C:/"]

HJG Seems to work, but when run on Windows as non-admin-user, stops with errors such as "permission denied" for certain files and directories.


TV Maybe superfluous, but possibly useful for someone, both on linux (and unix but then maybe minux the -- options) and on cygwin, the du command is available:

   exec du -h .

which in Human form states the disc use for the current dir and all its subdirs it has access permissions to.

One can also limit the depth to which it will report directories with a line of size / name info:

   exec du -h --max-depth 1 /somedir

will list only /somedir's subdirs and no deeper then that, and the total size of all of somedir and lower dirs'data.

To get the total of a single dir (not counting files in its sub directories), use

   exec ls -l

be sure to consult 'man ls' to know for sure in which unit the total is stated.


LES du is no good because it is not cross-platform. Think of all non-geeky people you know. How many of them have even heard of Cygwin?