yahoogroups-reader

NAME

yahoogroups-reader - terminal application to read archived yahoogroups files from https://archive.org for instance these ones: https://archive.org/details/yahoo-groups-2016-04-23T12-23-47Z-6e2391

DESCRIPTION

This command line application allows you to read old yahoogroup mailing lists if there were public at this time by downloading old archive files from https://archive.org using your terminal:

To read the group mailings you have to do the following

  • search https://archive.org/ for metadata using the terms yahoogroups GROUPNAME
  • if you have a hit for an archive, click on the search result
  • on the right you see the files available for download
  • click on the "WEB ARCHIVE GZ FILES" section
  • download the GROUPNAME.warc.gz file for you group
  • gunzip the file using for instance gzip -d GROUPNAME.ID.warc.gz
  • the use the script warc-reader.tcl like this: warc-reader.tcl GROUPNAME.ID.warc | less

CODE

#!/usr/bin/env tclsh
package require json

proc processWarc {filename} {
    set html_mapping {&quot; {"} &apos; ' &amp; & &lt; < &gt; > &#92; \\ &#39; '} ;#"
    if {![file exists $filename]} {
        return -code error "Error: File '$filename' does not exists!"
    }
    if [catch {open $filename r} infh] {
        puts stderr "Cannot open $filename: $infh"
        exit
    } else {
        set x 0
        set flag null
        while {[gets $infh line] >= 0} {
            if {[regexp {^WARC-Target-URI.+group/([^/]+)/message/([0-9]+)/info} $line -> group msg]} {
                #puts "Message: $group $msg"
                set flag info
            } elseif {[regexp {^WARC-Target-URI.+group/([^/]+)/message/([0-9]+)/raw} $line -> group msgid]} {
                #puts "Message: $group $msg"
                set json ""
            } elseif {[regexp {^."topicId"} $line]} {
                append json "$line\n"
                set flag raw
            } elseif {$flag eq "raw" && [regexp {[a-z]} $line]} {
                append json "$line\n"
            } elseif {$flag eq "raw" && [regexp {^WARC/} $line]} {
                set d [json::json2dict $json]
                set msgflag false
                puts "[string repeat ## 40]\nMessage: $msgid"
                foreach msg [split [dict get $d rawEmail] "\n"] {
                    if {[regexp {^Date: } $msg]} {
                        set msgflag true
                    } 
                    if {$msgflag} {
                        set msg [string map $html_mapping $msg]
                        if {[regexp {=.$} $msg]} {
                            puts -nonewline "[string range $msg 0 end-2]"
                        } else {
                            puts $msg
                        }
                    }
                }
                set flag null
            }
        }

        close $infh

    }
}

if {[info exists argv] && [llength $argv] > 0} {
    
    processWarc [lindex $argv 0]
} else {
    puts "Usage: [info script] WARCFILE"
}

Here a screenshot from the terminal:

yahoogroups-reader-image

LINKS

DISCUSSION

Please discuss here.

DDG - 2023-03-30 - I could not find a tool which decodes in a usable way the JSON encoded messages, so I wrote my own.