[Silas] - 2005-12-8 - The objective of this page is to discuss alghorithms to help in "query synthesys" (I don't know if "synthesys" and "synthesize" are the best word to discuss a stuff like that. If not, please, rename or remove this page).

Suppose you have a list with redundat data:

  {Owner Car}

  {Silas Ferrari}
  {Silas Porche}
  {John  Honda}
  {Smith Renaut}
  {Smith Ford}

See this list has redundant data. If I want to show it in a report, it probably be confuse to the viewer. So it's a good idea to take off this junk data:

  {Owner Car}

  {Silas Ferrari}
  {{}    Porche}
  {John  Honda}
  {Smith Renaut}
  {{}    Ford}

This kind of information is common from queries with JOINs. I developed a cool proc which takes off this junk data. It is in version 0.1 yet, because I'm still trying it... not 100% stable. Needs some fixes. I'll update it as I note errors. For now, it conforms to my purposes:

'''Hey! [RS]'s code (or the modified version) on the bottom is the best way to achieve the objective. Should we take the proc below off here?'''

  proc synthesize {List reference Fields} {
    # synthesize: Receive a list of lists and synthesizes a query with rows 
    # returned by a JOIN. Returns the synthesized list
    #
    # List: List to be synthesized
    # reference: fields to be compared
    # Fields: Fields which will be used in synthesize. It's a list of indexes
    # version 0.1 - 2005-12-06
    # by [Silas] Justiniano - silasdb at gmail dot com

    set i 0

    while {$i < [llength $List]} {
      for {set k 0} {$k < [llength [lindex $List 0]]} {incr k} {
	set aux($k) {}
      }

      set j [expr $i+1]
      while {[lindex $List $i $reference] eq [lindex $List $j $reference]} {
	for {set k 0} {$k < [llength [lindex $List 0]]} {incr k} {
	  if {[lsearch $aux($k) [lindex $List $j $k]] == -1
	   && [lsearch $Fields $k] != -1} {
	    lappend aux($k) [lindex $List $j $k]
	  }
	}
	incr j
	if {[expr $j-1] == [llength $List]} {return $List}
      }

      set max 0
      foreach valor [array names aux] {
	if {[lindex $aux($valor) 0] == [lindex [lindex $List $i] $valor]} {
	  set aux($valor) [lreplace aux($valor) 0 0]
	}
	if {$max < [llength $aux($valor)]} {
	  set max [llength $aux($valor)]
	}
      }

      for {set k $i} {$k <= [expr $max+$i]} {incr k} {
	for {set l 0} {$l < [llength [lindex $List 0]]} {incr l} {
	  if {$k == $i} {
	    lappend Itens [lindex [lindex $List $i] $l]
	  } else {
	    if {[lsearch $Fields $l] == -1} {
	      lappend Itens {}
	    } else {
	      lappend Itens [lindex $aux($l) 0]
	      set aux($l) [lreplace $aux($l) 0 0]
	    }
	  }
	}
	set List [linsert $List $k $Itens]
	unset -nocomplain Itens
      }

      set List [lreplace $List [expr $i+$max+1] [expr $j+$max]]

      if {$max == 0 && $i == [llength $List]-1} break
      incr i [expr $max+1]
    }

    return $List
  }

#Testing...:

  set A {
    {Silas Ferrari}
    {Silas Porche}
    {John  Honda}
    {Smith Renaut}
    {Smith Ford}
  }
  puts [synthesize $A 0 1]

Notes the first argument is the list, the second is the "field" that you will use as reference to synthesize and the last is a list of indexes to be deleted when found more than one (is this explanation fine? :) ).

As I said, the last argument can receive a list of indexes. So in the following list:

  {Owner Car     Color}

  {Silas Porche  Blue}
  {Silas Porche  Black}
  {Silas Ferrari Red}
  {John  Honda   Blue}
  {Smith Renaut  White}
  {Smith Renaut  Gray}
  {Smith Ford    Green}

Each owner has a car, some have more than one, some of them are not different, but have different color. A synthesized list would be:

  {Owner Car     Color}

  {Silas Porche  Blue}
  {{}    {}      Black}
  {{}    Ferrari Red}
  {John  Honda   Blue}
  {Smith Renaut  White}
  {{}    {}      Gray}
  {{}    Ford    Green}

I reached it using:
  puts [synthesize $A 0 {1 2}]

Please, help to improve this proc.
----
[RS] experimented with this (which has no column-width formatting):
 proc synthesize llist {
   set last {}
   set ress {}
   foreach list $llist {
	set reslist {}
        foreach e $list f $last {
           lappend reslist [expr {$e eq $f? {}: $e}]
        }
        lappend ress $reslist
        set last $list
   }
   set ress
 }
Testing:
 set data {
  {Owner Car     Color}

  {Silas Porche  Blue}
  {Silas Porche  Black}
  {Silas Ferrari Red}
  {John  Honda   Blue}
  {Smith Renaut  White}
  {Smith Renaut  Gray}
  {Smith Ford    Green}
 }
 % join [synthesize $data] \n
 Owner Car Color
 Silas Porche Blue
 {} {} Black
 {} Ferrari Red
 John Honda Blue
 Smith Renaut White
 {} {} Gray
 {} Ford Green

----
[Silas] - wow! [RS]'s code is really better than mine! It requires a list whose repeated values comes firstly. For example:

 {Kevin Honda Green}
 {Kevin Ford Green}

"Kevin" is the repeated value here. "Green" is repeated too, but it is a property of the car, which is not repeated. Unfortunatelly [RS]'s code doesn't works fine with this kind of list:

 % synthesize {{Kevin Honda Green} {Kevin Ford Green}}
 {Kevin Honda Green} {{} Ford {}

So I modified [RS]'s code:

 proc synthesize llist {
   set last {}
   set ress {}
   foreach list $llist {
	set reslist {}
	set found 0
        foreach e $list f $last {
	   if {$e eq $f && $found == 0} {
	     lappend reslist {}
	   } else {
	     lappend reslist $e
	     set found 1
	   }
        }
        lappend ress $reslist
        set last $list
   }
   return $ress
 }

#Testing...:

 % puts [synthesize {{Kevin Honda Green} {Kevin Ford Green}}]
 {Kevin Honda Green} {{} Ford Green}

Although it requires a list whose "repeated" values comes firstly, I think this [RS]'s modified code is much better than my first synthesize proc. It's easier, better and faster to get a list in this format directly from SQL and re-order it later than try to address the problem with a "non-ordered" list in this format.

----
[Silas] - 2005-12-09 - I noticed that queries results have a interesting organization form. They looks like a [fractal]. See:

 A B D
 A B E
 A C D
 A C E

It's the basic form of a query with two join. So:

 SELECT
  Books.name,
  Subjects.name,
  Authors.name
 FROM Books
 INNER JOIN Books_Subject
 ON Books.book_id = Books_Subject.book_id
 INNER JOIN Subjects
 ON Books_Subject.subject_id = Subjects.subject_id
 INNER JOIN Books_Authors
 ON Books.book_id = Books_Authors.book_id
 INNER JOIN Authors
 ON Books_Authors.author_id = Authors.author_id;

I think many-to-many (using a intermediate table) relationships are the best example here. The query above would return (for one book inserted in the database):

 {{The comunist manifest} {Politics} {Karl Marx}}
 {{The comunist manifest} {Politics} {Friedrich Engels}}
 {{The comunist manifest} {History} {Karl Marx}}
 {{The comunist manifest} {History} {Friedrich Engels}}

[RS]'s modified code would return:

 {{The comunist manifest} Politics {Karl Marx}}
 {{} {} {Friedrich Engels}}
 {{} History {Karl Marx}}
 {{} {} {Friedrich}}

It's still repeating some values. The next proc, makes it better:

 proc synthesize List {
 
   for {set i 0} {$i < [llength $List]} {incr i} {
     set First [lindex $List $i]
     set j [expr $i+1]
 
     while {[lindex $First 0] eq [lindex $List $j 0]} {
       for {set k 0} {$k < [llength $First]} {incr k} {
         if ![info exists aux($k)] {set aux($k) {}}
         if {[lsearch $aux($k) [lindex $List $j $k]] == -1} {
 	  lappend aux($k) [lindex $List $j $k]
 	}
       }
       incr j
     }
 
     if {$j > $i+1} {
       set List [lreplace $List $i $j-1]
 
       set max 0
       foreach valor [array names aux] {
         if {[llength $aux($valor)] > $max} {set max [llength $aux($valor)]}
       }
 
       for {set j 0} {$j < $max} {incr j} {
 	for {set k 0} {$k < [llength $First]} {incr k} {
 	  lappend Items [lindex $aux($k) 0]
 	  set aux($k) [lreplace $aux($k) 0 0]
 	}
 	set List [linsert $List [expr $i+1] $Items]
 	unset Items
       }
     }
     
     incr i [expr $j-1]
   }
   return $List
 }

#Testing...:

 % synthesize $A
 {{The comunist manifest} Politics {Friedrich Engels}} {{} History {Karl Marx}

It's another big proc (like [RS], I don't like big procs). Probably it can be improved and diminished... if someone has time, please, work on it.

----

[Category Algorithm]