Version 16 of uri

Updated 2004-04-27 20:00:21

RFC 2396 [L1 ] glosses URI as "Uniform Resource Identifier", and rather famously ordains, "a compact string of characters for identifying an abstract or physical resource." The most common type of URI is the URL. URIs are exactly what the XML world calls "system IDs".

tcllib supplies a package called "uri" to parse RFC-compliant values. Its documentation appears at http://tcllib.sourceforge.net/doc/uri.html .

[examples]


uri::split is the command to break a uri up into its component parts.

The parts identified depend on the schemes supported. For instance, for ftp, the pieces identified are

  • host
  • path
  • port
  • passwd
  • scheme
  • type
  • user

uri doesn't yet support the data: (or file:?) protocols. 'Twould be fun to add that (those) in.

[This'd be a good place for examples.]

LV What does "doesn't yet support" mean?

AK That the code does not know to handle such urls (split/join). There are no regexp patterns either.

LV but I see this behavior with tcl8.4 and tcllib 1.0:

 % package require uri
 1.0
 % set name "file://home/lwv26/myfile.txt"
 file://home/lwv26/myfile.txt
 % uri::split $name
 path /lwv26/myfile.txt scheme file host home

So it LOOKS like file is supported.


Here is a urn: scheme handler. (AK: This handler was added to tcllib after the 1.1.0 release) DMA: so this outdated code could be deleted, i'd suggest...

 # urn-scheme.tcl - Copyright (C) 2001 Pat Thoyts <[email protected]>
 #
 # extend the uri package to deal with URN (RFC 2141)
 # see http://www.normos.org/ietf/rfc/rfc2141.txt
 #
 # Released under the tcllib license.
 #
 # $Id: 850,v 1.17 2004-04-28 06:00:09 jcw Exp $
 # -------------------------------------------------------------------------

 package require uri 1.0
 package provide uri::urn 1.0

 namespace eval uri {
     namespace eval urn {
        variable NIDpart {[a-zA-Z0-9][a-zA-Z0-9-]{0,31}}
         variable esc {%[0-9a-fA-F]{2}}
         variable trans {a-zA-Z0-9$_.+!*'(,):-=@;}
         variable NSSpart "($esc|\[$trans\])+"
         variable URNpart "($NIDpart):($NSSpart)"
        variable url "urn:$NIDpart:$NSSpart"

        lappend [namespace parent]::schemes urn URN
     }
 }

 # -------------------------------------------------------------------------

 # Description:
 #   Called by uri::split with a url to split into its parts.
 #
 proc uri::SplitUrn {uri} {
     #@c Split the given uri into then URN component parts
     #@a uri: the URI to split without it's scheme part.
     #@r List of the component parts suitable for 'array set'

     upvar \#0 [namespace current]::urn::URNpart pattern
     array set parts {nid {} nss {}}
     if {[regexp ^$pattern $uri -> parts(nid) parts(nss)]} {
         return [array get parts]
     } else {
         return {nid {} nss {}}
     }
 }


 # -------------------------------------------------------------------------

 proc uri::JoinUrn args {
     #@c Join the parts of a URN scheme URI
     #@a list of nid value nss value
     #@r a valid string representation for your URI

     array set parts [list nid {} nss {}]
     array set parts $args
     set url [urn::quote "urn:$parts(nid):$parts(nss)"]
     return $url
 }

 # -------------------------------------------------------------------------

 # Quote the disallowed characters according to the RFC for URN scheme.
 # ref: RFC2141 sec2.2
 proc uri::urn::quote {url} {
     variable trans

     set ndx 0
     while {[regexp -start $ndx -indices "\[^$trans\]" $url r]} {
         set ndx [lindex $r 0]
         scan [string index $url $ndx] %c chr
         set rep %[format %.2X $chr]        
         set url [string replace $url $ndx $ndx $rep]
         incr ndx 3
     }
     return $url
 }

 # -------------------------------------------------------------------------

 # Perform the reverse of urn::quote.
 proc uri::urn::unquote {url} {
     set ndx 0
     while {[regexp -start $ndx -indices {%([0-9a-zA-Z]{2})} $url r]} {
         set first [lindex $r 0]
         set last [lindex $r 1]
         set str [string replace [string range $url $first $last] 0 0 0x]
         set c [format %c $str]
         set url [string replace $url $first $last $c]
         set ndx [expr $last + 1]
     }
     return $url
 }

 # -------------------------------------------------------------------------
 # Local Variables:
 #   indent-tabs-mode: nil
 # End:

The new Tcl 8.4a4 VFS layer by Vince Darley simplifies this work. See the "tclvfs" extension on SourceForge [L2 ] for example code which opens http, ftp, zip, and more - using the "blah:..." notation.


Category Package, subset Tcllib