[[clt postings from jooky and [David Welton] and [Larry Virden].]] Web scraping is the practice of getting information from a web page and reformatting it. Some reasons one may do this is to send updates to a pager/WAP phone, etc. , email one's personal account status to an account, or create data files for reading on a PDA. See projects like http://Sitescooper.org/ or Plucker http://www.plkr.org/ for non-[Tcl] tools for scraping the web and http://www.Sitescraper.co.uk provide a reliable scraping service. "Web Scraping ..." [http://cedar.intel.com/cgi-bin/ids.dll/content/content.jsp?cntKey=Generic+Editorial%3a%3aws_scraping&cntType=IDS_EDITORIAL&cat=CJA] "Web scraping is easy" [http://www.unixreview.com/documents/s=7822/ur0302h/] ---- * [An HTTP Robot in Tcl] * [websearch] * [Getting stock quotes over the internet] * Also see '''[tcllib]/examples/oscon''' which uses the [htmlparse] module (among others) to parse the schedule pages for OSCON 2001 and convert them into [CSV] files usable by [Excel] and other applications. * See also tDOM's HTML-parser + XPath expression in the web scraping example for EBAY presented at the [First European Tcl/Tk Users Meeting] ( http://sdf.lonestar.org/~loewerj/tdom2001.ppt and http://www.tu-harburg.de/skf/tcltk/tclum2001.pdf.gz) * [TkChat] is one Tk/Tcl web scraper for this web site's chat room! * [TcLeo] allows querying the English <=> German web dictionary at http://dict.leo.org from the command line. * [Getleft] * [wiki-reaper] ---- Apt comments on the technical and business difficulty of Web scraping, along with mentions of WebL and NQL, appear here [http://lambda.weblogs.com/discuss/msgReader$3567]. ---- An alternative to web scraping is to work with the web host to work out details of a [Web Service] that would provide useful information programatically. ---- [Category Internet] [Dan Razzell]