Rough sets

if { 0 } { Arjen Markus (13 february 2004). In an e-mail enabled brainstrom session with Steve Blinkhorn that started off on the idea of fuzzy sets, I remembered reading about rough sets.

Rough sets are an (at least superficially) simple extension of ordinary sets, alike and quite different from fuzzy sets. The idea can be formulated like this (see [L1 ]):

  • All items in a set A have one or more properties
  • The items in a subset B of this set also have these properties, but it is unclear if items in subset B can be distinguished via the values of these properties from the items not in B.
  • So, first, identify which items in B have common values for one or more of these properties, the lower approximation to B.
  • And, second, identify which items in A are somewhat alike those in B.

This is done via equivalence classes: each item in such a class has the same properties as all the others.

We will try to clarify the concept with a small example (extended from the above reference):

   Name            Education         Age         Chance for a job?
   --------        -----------       ---------   -----------------
   Joe             High School       Old         No
   Mary            High School       Young       Yes
   Peter           Elementary        Young       No
   Paul            University        Young       Yes
   Cathy           Doctorate         Old         Yes

What can we deduce about the set of people with good job perspectives? We try with: education only, age only, and both.

}

 #
 # Set up the database
 #
 set props(Joe)    {HighSchool Old No}
 set props(Mary)   {HighSchool Young Yes}
 set props(Peter)  {Elementary Young Yes}
 set props(Paul)   {University Young Yes}
 set props(Cathy)  {Doctorate Old Yes}

 #
 # Procedure to determine the equivalence classes
 #
 proc equivClasses {which_props} {
    global props
    global class

    array unset class *

    foreach person [array names props] {
       set cvalues {}
       foreach propno $which_props {
          lappend cvalues [lindex $props($person) $propno]
       }

       lappend class($cvalues) $person
    }
 }

 #
 # Print the results so far
 #

 puts "Equivalence classes:"
 #
 # Only education 
 # 
 puts "Education only - Mary, Paul, Cathy?"
 equivClasses 0
 parray class

 #
 # Only age
 #
 puts "\nAge only - Mary, Paul, Cathy?" 
 equivClasses 1
 parray class

 #
 # Education and age
 # 
 puts "\nEducation and age - Mary, Paul, Cathy?"
 equivClasses [list 0 1]
 parray class

if { 0 } { Studying the output, we see, that no criterium has an equivalence class that is identical to the complete set of successful persons.

So, now the next step: can we find a criterium that gives a "good" fit? For this, we look for equivalence classes that have only successful people. }

 #
 # Determine the set of classes that contain only persons from the
 # group we consider 
 #
 proc subsetOnlyClasses {subset} {
    global class

    set equiv_classes {}
    set found_persons {}
    foreach c [array names class] {
       set okay 1
       foreach p $class($c) {
          if { [lsearch $subset $p] < 0 } {
             set okay 0
             break
          }
       }
       if { $okay } {
          set     found_persons [concat $found_persons $class($c)]
          lappend equiv_classes $c
       }
    }

    set found_persons [lsort -unique $found_persons]

    return [list $found_persons $equiv_classes]
 }

 puts "Classes with successful persons only:"
 #
 # Education?
 #
 puts "\nEducation only?"
 equivClasses 0 
 puts "[subsetOnlyClasses {Mary Paul Cathy}]"

 #
 # Age?
 #
 puts "\nAge only?"
 equivClasses 1 
 puts "[subsetOnlyClasses {Mary Paul Cathy}]"

 #
 # Education and age?
 #
 puts "\nEducation and age?"
 equivClasses {0 1}
 puts "[subsetOnlyClasses {Mary Paul Cathy}]"

if { 0 } { From this output we see that education only can explain the job chances for Cathy and Paul, but not for Mary. Combining education with age solves this: these two "explain" the subset B of people with a good chance for a job exactly.

What however about the classes that might play a role? That is the last part ... }

 # 
 # Classes that possibly play any role
 #
 proc possibleClasses {subset} {
    global class

    set equiv_classes {}
    set found_persons {}
    foreach c [array names class] {
       set okay 0
       foreach p $class($c) {
          if { [lsearch $subset $p] >= 0 } {
             set okay 1
             break
          }
       }
       if { $okay } {
          set     found_persons [concat $found_persons $class($c)]
          lappend equiv_classes $c
       }
    }

    set found_persons [lsort -unique $found_persons]

    return [list $found_persons $equiv_classes]
 }

 puts "Classes with at least one successful person:"
 #
 # Education?
 #
 puts "\nEducation only?"
 equivClasses 0 
 puts "[possibleClasses {Mary Paul Cathy}]"

 #
 # Age?
 #
 puts "\nAge only?"
 equivClasses 1 
 puts "[possibleClasses {Mary Paul Cathy}]"

 #
 # Education and age?
 #
 puts "\nEducation and age?"
 equivClasses {0 1}
 puts "[possibleClasses {Mary Paul Cathy}]"

if { 0 } { I probably should have used different names for the procedures - subsetOnlyClasses seems to deal with possible classes and possibleClasses with necessary classes. Oh well ...

Conclusion:

  • Age alone has no predictive capability (phew!)
  • Education does, but not exactly: Cathy and Paul have a high enough education, it would seem, but Mary not. So the lower approximation is {Cathy Paul} and the upper approximation is {Cathy Paul, Mary, Joe}
  • Education and age together give an exact prediction

Well, this is a completely artificial example and a very small one too. But it shows what rough sets are all about.

Now, rthe next step could be to formulate this in terms of relational algebra and use Ratcl!

29 juni 2007 Harm Olthof - changed unset class to array unset class, so that it also works when class isn't initialized at ::

29 juni 2007 Harm Olthof - case "Education and age" was a copy of "age only", fixed that }


AM (9 july 2009) I found this article on an extension to Tcl to deal with rough sets:

 Rough set extension of Tcl for data mining
 Knowledge-Based Systems, Volume 11, Issues 3-4, 12 November 1998, Pages 249-253
 G. Griffin, Z. Chen


GAM -- 12 July 2009

See Rough sets a la Relations for one relational take on this very interesting idea.


[ Category Concept ]