Hacker News

Hacker News aka HN is startup/technology news website created by Paul Graham. It can be found at https://news.ycombinator.com/ .

Postings about Tcl

I Can't Believe I'm Praising Tcl, {2020 09 06}
Tcl/tk vs. the web {2016 01 31}
Tcl the Misunderstood, {2014 01 16}

Scraping Hacker News

To run the following script you will need to have Tcllib and tls installed as well as a copy of the treeselect module residing in the same directory. You can download treeselect with wiki-reaper: wiki-reaper 41023 0 8 > treeselect-0.3.1.tm.

Note: This is just a demonstration. You can use the JSON API as an alternative to scraping and for serious applications you should.

# version 0.0.1
::tcl::tm::path add .
package require treeselect

set tree [::treeselect::url-to-tree https://news.ycombinator.com/news]

set nodes [$tree nodes]

set titles [::treeselect::get $tree [
    ::treeselect::query $tree "td.title a PCDATA" $nodes] data]
set scores [::treeselect::get $tree [
    ::treeselect::query $tree ".score PCDATA" $nodes] data]
set links [lmap x [::treeselect::get $tree [
    ::treeselect::query $tree "td.title a" $nodes] data] {
    dict get [::treeselect::parse-attributes $x] href
}]

set stories {}
foreach  title $titles score $scores link $links {
    if {$score ne {}} {
        lappend stories $title $score $link
    }
}

foreach {title score link} $stories {
    puts "($score) $title - $link"
}

Sample output

(43 points) Marty.js – A JavaScript library for state management in React applications - http://martyjs.org/
(25 points) Show HN: Metamon, a Vagrant/Ansible toolkit for kickstarting Django apps - http://blog.tryolabs.com/2015/01/20/introducing-metamon-for-kickstarting-django-development/
(151 points) Emacs Is My New Window Manager - http://www.howardism.org/Technical/Emacs/new-window-manager.html
(51 points) You’ll Always Miss Being in the Basement - http://zachholman.com/posts/the-basement/
(25 points) First LibreOffice for Android app released - https://libreoffice-from-collabora.com/libreoffice-for-android-released/
(...)

See Also

Web Scraping with htmlparse