These snippets of code can be used to check if an URL is valid or not and with some changes it can get URL's from a text.
''"In general URI's as defined by http://www.ietf.org/rfc/rfc3986.txt%|%RFC 3986%|% (page 12) may contain any of the following characters: '''A-Z''', '''a-z''', '''0-9''', '''-''', '''.''', '''_''', '''~''', ''':''', '''/''', '''?''', '''#''', [, ], '''@''', '''!''', '''$''', '''&''', ', '''(''', ''')''', '''*''', '''+''', ''',''', ''';''' and '''='''. Any other character needs to be encoded with the percent-encoding ('''%hh'''). Each part of the URI has further restrictions about what characters need to be represented by an percent-encoded word."'' (Gumbo, 2009)
----
======
#
# Checking if an URL is valid or not...
#
set blabla {http://www.ietf.org/rfc/rfc3986.txt}
if {[regexp -- {^(https?://[a-z0-9\-]+\.[a-z0-9\-\.]+(?:/|(?:/[a-zA-Z0-9!#\$%&'\*\+,\-\.:;=\?@\[\]_~]+)*))$} $blabla match url]} {
puts "$url is a valid url."
}
#
# Getting an URL from a HTML code...
#
set blabla {
}
if {[regexp -- {(https?://[a-z0-9\-]+\.[a-z0-9\-\.]+(?:/|(?:/[a-zA-Z0-9!#\$%&'\*\+,\-\.:;=\?@\[\]_~]+)*))} $blabla match url]} {
puts "$url found in the HTML code."
}
======
----
Let's test it and post the results here...
PS: It doesn't work for IPv6 yet.
[AMG]: You had an extra closing parenthesis at the end of the expression. Fixed. I don't know why you use capturing parentheses; you only want to capture the entire match, which (in your code) already gets stored into $match. In the first expression, where you just test if the entire string matches, $match will just get set equal to $blabla. Another thing: the second expression will start the match anywhere, so "foohttp://a.b/c" is accepted. I suggest using the \m constraint [http://www.tcl.tk/man/tcl8.6/TclCmd/re_syntax.htm#M74] to anchor it at the beginning of a word. Your expression rejects hostnames with capital letters; is this really your intent? It also requires at least one dot in the hostname, even though it's perfectly valid to not have one. For example, http://localhost/test.html, which is the same as http://2130706433/test.html, if you're 1337. ;^)
<> internet | example