Power set of a list

Arjen Markus asked how to generate the set of subsets containing 1, 2, 3 ... elements. For example:

   {a b c d e}

will give rise to:

   {{a} {b} {c} {d} {e}}
   {{a b} {a c} {a d} {a e} {b c} {b d} {b e} {c d} {c e} {d e}}
   ...

It does not seem quite trivial.


Kevin Kenny replies:

It's not quite clear from your question whether you need the entire power set (the set of all subsets) - organized by size, or whether you simply want, for some k, to enumerate the elements of a given list taken k at a time.

If you want the entire power set, you can enumerate it by counting up to 2**[llength list]:

 proc subsets { list } {
 
     # Check list size
 
     if { [llength $list] > 32 } {
         error "You don't really expect to list all 2**[llength $list] subsets, do you?"
     }
 
     # Calculate total number of subsets
 
     set n [expr { 1 << [llength $list] }]
 
     # Enumerate the subsets.  If the list is { a b c }, then element
     # 0 is the empty set, 1 is the set { a }, 2 is the set { b },
     # 3 is { a b }, 4 is { c }, 5 is { a c }, 6 is { b c }, and 7
     # is { a b c }.
 
     for { set i 0 } { $i < $n } { incr i } {
 
         set subset {}
 
         for { set j 0 } { $j < [llength $list] } { incr j } {
             if { $i & ( 1 << $j ) } {
                 lappend subset [lindex $list $j]
             }
         }
 
         # Markus asks for subsets to be sorted out by size.  Collate
         # them into arrays.
 
         lappend subsets([llength $subset]) $subset
 
     }
 
     set retval {}
     for { set k 0 } { $k <= [llength $list] } { incr k } {
         lappend retval $subsets($k)
     }
 
     return $retval
 }
 
 puts [join [subsets { a b c d }] \n]

Here's a shorter version:

 proc subsets {l} {
     set subsets [list [list]]
     foreach e $l {
         foreach subset $subsets {
             lappend subsets [lappend subset $e]
         }
     }
     return $subsets
 }

If you want to enumerate the combinations of the elements of $list taken $size at a time, then it is fairly trivial to enumerate them recursively:

 proc subsets2 { list size } {
     if { $size == 0 } {
         return [list [list]]
     }
     set retval {}
     for { set i 0 } { ($i + $size) <= [llength $list] } { incr i } {
         set firstElement [lindex $list $i]
         set remainingElements [lrange $list [expr { $i + 1 }] end]
         foreach subset [subsets2 $remainingElements [expr { $size - 1 }]] {
             lappend retval [linsert $subset 0 $firstElement]
         }
     }
     return $retval
 
 }
 
 for { set i 0 } { $i <= 4 } { incr i } {
     puts [subsets2 {a b c d} $i]
 }

There's also a neat iterative solution:

 proc subsets2b {l n} {
     set subsets [list [list]]
     set result [list]
     foreach e $l {
         foreach subset $subsets {
             lappend subset $e
             if {[llength $subset] == $n} {
                 lappend result $subset
             } else {
                 lappend subsets $subset
             }
         }
     }
     return $result
 }

Here's another recursive version that seems to be the fastest of the three algorithms (6 versus 24 versus 31 seconds for N=20, M=10).

 proc subsets2c {myList size {prefix {}}} {
    ;# End recursion when size is 0 or equals our list size
    if {$size == 0} {return [list $prefix]}
    if {$size == [llength $myList]} {return [list [concat $prefix $myList]]}

    set first [lindex $myList 0]
    set rest [lrange $myList 1 end]

    ;# Combine solutions w/ first element and solutions w/o first element
    set ans1 [subsets2c $rest [expr {$size-1}] [concat $prefix $first]]
    set ans2 [subsets2c $rest $size $prefix]
    return [concat $ans1 $ans2]
 }

PN 2007.10.2 I get:

 for N=20 M=10 subsets2 ~5.5 secs   subsets2b ~1.5 secs subsets2c ~2.5 secs. 
 for N=22 M=11 subsets2 ~ 23 secs   subsets2b ~ 6 secs  subsets2c ~ 9 secs

and I thought I would throw in the Gray order version which means just changing the line

 foreach subset $subsets {

in [subsets] above, to

 foreach subset [reverse $subsets] {

but since we dont have a standard [reverse] I did this instead

 proc gray { l} {
     set subsets [list [list]]
     foreach e $l {
         for {set i [expr [llength $subsets]-1]} { $i >= 0 } { incr i -1} {
             lappend subsets [concat [lindex $subsets $i] $e]
         }
     }
     return $subsets
 }

Gray order has the property that there is only a single element change from one subset to the next.

Anyway all these routines are extremely subversive and they should all be banned ! Applying even a little list of the 26 letters of the alphabet that fits on one line will crash your computer. DON'T EVEN THINK OF TRYING IT!

So - these routines should NEVER be used in public, use them only in the privacy of your own bedroom.

Having said that and having looked through Knuth's recently published Sections 7.2.1.1 and 7.2.1.2 I decided to create an iterate type of object that delivers the goods one piece at a time.


Arjen Markus My question had to do with the second possibility. I had thought of a similar piece of code, but had found no time yet to test it and I was a bit troubled over some aspects - hence my remark that it did not seem quite trivial.

The background to this (again just something I have been playing with) is the following question: given a number of half-spaces, is it then possible to determine if the intersection of these is a finite volume or not?

Possibly this is too obscure, but look at it this way:

  • In 3D, I have the set of all points (x,y,z) that fulfill all of the following conditions:
   x+y+z  >= 1
   2x-y+z >= 0
   y-z    >= 1  
   x+2z   >= 0
  • Does this set of conditions specify something like a tetraeder or not? How easy is it to determine if it does or not?

One way (I thought) the question can be solved is inspecting all subsets of 3 inequalities (because we have a 3D space). Just a puzzle to keep me occupied - I should consult a book about this type of problems first.

As I saw the Wiki page on the cartesian product of lists, I thought it might be nice to add the above question.


Lars Hellstrom: It would not be possible to solve the problem quite that easy. Consider as a simple case that the inequalities define the unit cube in R^3. Then you need six inequalities (one for each side). Any 5-subset of the inequalities will however define a set with infinite volume.

I would recommend looking in a book on linear programming. An enormous amount of ingenuity has gone in to considering questions about sets defined by linear inequalities over the years and there are surprisingly many ways in which problems can be reformulated; I suspect that with the right twist to it your problem will be quickly solved by e.g. the simplex algorithm or Karmarkar's algorithm. (The ellipsoid method is fun too, but the cases in which that is best are rather odd.)

DKF notes that: Four inequalities is the minimum number to get a finite space, and will form a (usually irregular) tetrahedron if a solution exists at all...


Arjen Markus A book I have right now focuses on "computational geometry". A few chapters deal with linear programming. I do realize that the problem is not simple, certainly not if you want to use a purely geometrical method.

Arjen Markus The method I was thinking about is discussed on the Geometry page...


KBK notes: If you have a set of constraint inequalities of the form

   ax + by + cz >= d

then you can certainly use any linear-programming method to test each of the six problems

 Minimize x subject to constraints; maximize x subject to constraints;
 minimize y subject to constraints; maximize y subject to constraints;
 minimize z subject to constraints; maximize z subject to constraints.

If any of the six programs is insoluble, all six are, and the constraints are inconsistent.

If any of the six problems is unbounded, the constraints fail to define a polytope, and the insoluble problem identifies an axis along which the constrained region is infinite in extent.

Otherwise, the constraints form a polytope and the bounding parallelepiped is defined by the solutions to the six linear problems. (And you probably have uses for the bounding parallelepiped, I suspect!)


For the special case of pairs, RS has this minicode:

 proc pairs L {
    set res {}
    foreach i $L {
        foreach j $L {
            if {$i != $j} {lappend res $i-$j}
        }
    }
    set res
 }

The power set of {a b c d e}

will give rise to:

   {{a} {b} {c} {d} {e} {0} }
   {{a b} {a c} {a d} {a e} {b c} {b d} {b e} {c d} {c e} {d e} {a b c d e

TV (12 jun 03: I saw half a sentence in this piece and more typos than average so little update) Just as a remark because some ideas would seem relevant, but I'm sure some would easily lead astray: the linear programming area may be of importance to supercomputer programmers, and in that area I'm only mildly knowledgeable, and for 'dividing up' a programming problem, usually in the context of parallel/distributed computing. I'm convinced that usually, it then offers not so much in the sense of quicker or more insightful problem solving, it just can transform you equations, and as seems to be the the terminology partition your domain. The actual computations, for instance where for each of the axises of you index space you have some meaning, usually do not get simplified or shortened at all, unless maybe for some smart iterative solutions (I could try NURBS). So there is not much algorithmic insight to be gained or computational solution to be found.

Of course when you really do want to, and have some problem which is a set of inequalities, you could use simplex like method to find solutions, and I guess that could be practical. In practice I'd say often the problems are often not exactly of that nature, and one is more likely to be successful with better analysis than expecting some awe inspiring answer from such constraint analysis.

In physicist sense, every reasonable informed physicist is not going to waste all too much attention practically doing group considerations, about which a lot of literature exists, much not all too practical for real interesting or hip problems. The whole idea of using many of such mathematical constructs or 'tools' comes from a century ago, when pen and paper were physicists only aid. Proofwise in the area of computers I wouldn't expect the holy grail from it anywhere, and certainly not when the subject in question has no understanding idea of Turing Machines and von Neumann architecture, which is of later date, and for practical computer use. That is what it was made for. After groups came Schroedinger eq solving stuff I'd sum up, and after computational geometry integral calculus, which are both IMO interesting. Unless of course it is about taking mind set to think about hyperplane divided multidimensional spaces, but then I'd say that often problems boil down to Traveling Salesman kind of considerations in practice, because then the integer coordinates would often act as a sort of network links, and you'd be interested in how certain computations, for instance obviously formulated as nested loops (like in matrix computations) can be organized to make good use of a processors registers, ALU/FPU units, and communication infrastructure (memory/processor bottleneck bandwidth, parallelism like in DSP's with multiple multiply/add units, that stuff). There are various projects on how to solve shortest path analysis related problems in networks, for instance by smart repeated cutting, which I guess could apply.

Then again a good algebraic manipulator and a good mind using it is interesting.