Version 79 of file join

Updated 2023-08-08 21:59:53 by HE

file join forms a file path from one or more path components. file join, not file normalize, is the the standard way to transform a possible relative path into an absolute path.

See Also

Synopsis

file join name ?name ...?

Documentation

official reference

Description

file join joins one or more strings into a file path using the correct platform-dependent separators. It is the only portable way to compose file names. If a name is a relative path component, then it is joined to the previous file name argument. If a name is an absolute path, all previous arguments are discarded and any subsequent arguments are then joined to it. For example,

file join a b /foo bar

returns

/foo/bar

Any of the names can contain separators, and the result is always canonical for the current platform: / for Unix and Windows, and : for pre-OSX Macintosh.

file join helps to avoid redundant separator characters:

set prefix /
file join $prefix another directory ;# -> /another/directory
lindex $prefix/another/directory ;# -> //another/directory

To join path elements contained in a list, use {*}:

set path [list a b c]
file join {*}$path

Another example:

set path [list a b c d e f g]
set prefix 3
set path [file join {*}[lrange $path $prefix end]]

Or, for historic versions of Tcl without the {*} operator:

set path [list a b c d e f g]
set prefix 3
set cmd [concat {file join} [lrange $path $prefix end]]
set path [eval $cmd]
puts $path

If file join joined list elements, it's behaviour would be incorrect in the following case:

#warning: bad code ahead!
file join {C:/Program Files/Tcl}

On Windows, a UNC network path is treated similar to a drive specification:

% file join //host/location file
//host/location/file

A disconnected host specification is not a UNC path, and file join treats it as a file location:

% file join //host location file
/host/location/file

On Unix, a leading // is removed:

% file join //host/location file
/host/location/file

The pwd Trick

The standard method in Tcl for transform a possibly relative path to an absolute path is:

set filename [file join [pwd] $filename]

DKF PYK: cd can introduces an environment state that is easy to lose track of. It is good practice to instead use pwd to form absolute paths to work with.

Gotcha: Fully-qualified Names and any Previous Names

Any name arguments prior to the last fully-qualified names are dropped:

set newpath [file join lib /usr] ;# -> /usr

This is useful when the goal is to convert an unqualified name to a particular fully-qualified name, for example, to allow the user to provide a relative or a fully-qualified file name:

set newpath [file join [pwd] $filename]

When $b is relative, the current directory will be prepended to it, but when $b is absolute, it isn't changed.

While useful in the case mentioned above, it can be surprising to discover that file paths are not joined as expected because some path is already fully-qualified.

References:

file join behaviour and ye olde tilde fiasco ,comp.lang.tcl ,2009-06-26

Using file join to Normalize Path Separators

file join can be used to "normalize" the path separators for relative paths that you don't want to subject to the full file normalize:

% file join {\foo\bar\grill}
/foo/bar/grill

Back the other way with file nativename:

% file nativename /foo/bar/grill
\foo\bar\grill

Escaping Tilde Substitution

To join a file whose name begins with "~" (tilde), prefix it with "./":

file join pwd ./~filename.ext

See Tilde Substitution

Difference Between 8.6 and 8.7

AMG: In Tcl 8.6.8, [file join //a/b] returns //a/b, but in Tcl 8.7a1, [file join //a/b] returns /a/b. This got me in trouble because I was trying to work with Windows UNC paths. In the end I just had to concatenate strings and forgo [file join]. [file nativename] worked right, at least.

bll 2018-7-11: Seems like a bug to me. A valid path should not be mangled.

bll 2018-7-27: See file join trashes valid network paths .

Difference Between Linux and Windows

AMG: In Tcl 8.6.8 on Windows, [file join C: foobar] returns C:foobar, whereas on Linux it returns C:/foobar. The latter is obvious, but the former is a special case due to volume-relative paths which are a Windows-only feature. In order to get Linux-like behavior, I had to specially check for top-level path components named X: and append / before passing to [file join]. On both Windows and Linux, [file join C:/ foobar] gives C:/foobar which is what I needed all along.

Semi-Normalize a Path

bll 2018-7-11: I define semi-normalize as normalizing the path separators and removing any /../, but not following symlinks (e.g. /var/tmp on Mac OS X gets normalized as /private/var/tmp, which is not really wanted).

See also fileutil::lexnormalize

Buggy by Design: The Name of the Root

The root of a filesystem has no name. It's common to assume that it's name is /, but that's not the case. Unfortunately, file join is buggy by design:

% file join {} home
home

Here the answer for the standard file system should be /home. This is a bug that should be fixed. Because of this bug, a list containt the parts of an absolute file name can not be filesystem-agnostic: The first part must be /, but each virtual filesystem may use a separator of its own choosing.

This is another case that can be used as an argument that the only little language that should be used for structuring data in any context or system is the language of list.


HE 2023-08-06: I disagree with the statements "file join is buggy by design" and "The root of a filesystem has no name.".

Why? Because of the following argumentation:

First of all it looks like that the statement "The root of a filesystem has no name." comes from [L1 ]. At least it is stated there between 18:54, 16 February 2021‎ and today (23:27, 10 April 2023‎). This page refers as source of this information to [L2 ]. Neither to today version nor older versions do this statement.

Instead they write that "While all file systems have a root directory, it may be labeled differently depending on the operating system." and "On Unix systems and in OS X, the root directory is typically labeled simply /".

Furthermore, the idea described in Wikipedia is based on the assumption that '/' is the path separator on Unix systems. And that would logical mean that the directory name must be an empty string.

That is different than how nearly all persons work today on Unix like systems or Linux. All these people use '/' to refer to the root of the file system.

For example use the command

cd ''

in /bin/sh or any other shell will not change to the root of the system. For '' use what your shell takes as the empty string.

cd /

on the other side will change to the root directory.

file join has exactly the job to create a proper path with information people normally are using. And they use '/' to address the root directory.

If the root directory is considered to be an empty string you don't need a specialized command to build a proper path from a given list of path segments. You would be able to use join only.

Therefore, it is correct that

file join / home

provides "/home" and

file join {} home

provides "home" only.

I will not state that ,technically somewhere behind the scenes, the root directory is not an empty string. But, what user see and used to use are '/'. And that is what programming languages should respect and what Tcl does.

PYK {2023 08 08}: It is not true that the name of the root is the reason file join exists separately from join. The primary way file join differs from join is to handling of absolute pathnames. To wit, previous operands are discarded if subsequent operands are absolute pathnames.

Next, According to Posix Chapter 4.13 , If the pathname begins with a slash, the predecessor of the first filename in the pathname shall be taken to be the root directory of the process (such pathnames are referred to as "absolute pathnames"). Therefore, by definition, the root of the filesystem has no name. Next in a Posix shell cd '' is a no-op because, as described in the Posix specifcation of the cd utility , cd joins the operand to the current directory unless the operand begins with /. In Tcl, file join does not take the current directory into account, so comparing it with cd in a Posix shell is apples and oranges. Posix Chapter 3.271 specifies that in a pathname, multiple successive / characters are considered to be the same as one /, which implies that a file name may not be the empty string. That leaves the empty string free to be interpreted as the root of the filesystem.

The point of the "broken by design" argument is that because of the way file join, which is a built-in command, works, it is incompatible with the Tcl's virtual filesystem, which is also built into Tcl. The Tcl Filesystem API does not constrain which values may constitue a filename, so the empty string is an allowed file name. The API also does not specify how to interpret the path, other than to state that the path for a specific virtual filesystem separates the most specific part of the name from the more general part of the name. The more general part of the name is allowed to be the empty string so file join as it is currently implemented munges such filenames. Therefore, file join is broken for such names.

For the default filesystem in Tcl a correct implementation of file join would make the following expression true:

expr {[file join {} {}] eq {/}}

HE 2023-08-08: Hello PYK, first some remarks about your discussion style:

  • You removed the Discussion tag. I think the discussion should kept out of the main view. People interested can open and read it. The other one will very likely not read it. I would understand if you moved your statement to the discussion part. But, removing my choice to discuss not on the big stage is kind of bad habit. But, you want it, you get it.
  • You put an inline comment in my text destroying a reference. That also in not nice because it gave the text a different meaning. Therefore, I removed it. You can write it properly written in the beginning of your part.

Now back to the technical staff.

You wrote "It is not true that the name of the root is the reason file join exists separately from join". That is not what I wrote. I wrote: "file join has exactly the job to create a proper path with information people normally are using. And they use '/' to address the root directory."

Next information in your comment takes the Posix specification as an argument. Why you not simply wrote file join doesn't work Posix compliant? Than the discussion would directly start on a different level. Also you do not need to explain that the command 'cd' will work in Posix different. But, you forgot to explain why all the other 'cd' implementations, are wrong? Including Tcl implementation of it. For me and, I believe, most other people these 'cd' are working exactly as expected.

By the way, since when Tcl is a Posix compliant programming language? If what you want is Posix compliance your comment is better placed in Tcl vs POSIX - what functionality is missing.

Then, nearly the end, you come with the true argumentation. All above, including Posix, are simply distraction? Why not writing that in the first post to make clear why you believe it is an error?

Ok, let us forward with the statement file join doesn't match the requirements needed for the Tcl Filesystem API. That is an API restricting developer as less as possible. But that means, the developer of the virtual file system has to take care that all build in functions are still working. I think there is a test section available one can use and adopt to his new virtual file system. Liberty comes with responsibility.

Moreover, your change will break other functions. I think you will write: "Yes, they all are wrong." Accidentally they work for most of us for more than 25 years. My expectation of how file join and file split should work and how they work today:

file join {*}[file split /de/da/do]    ;# leads to /de/da/do
file join {*}[file split /de/da/../do] ;# leads to /de/da/../do
file join {*}[file split /de/da//do]   ;# leads to 

The origin absolute path without quirks should be possible to split and to join and then be restored. With quirks I mean more than one '/' in sequence as in the third example. Possibly I missed some quirks. On the other hand, how do you want to decide if the result of file split is absolute or relative if the first element can be empty? From your argumentation, if root element can have an empty name, there could be a VFS which allows an empty name for every part of the file system. Which is exactly your example "expr {file join {} {} eq {/}}". That means "///" would be a valid path. But you can't see if this means (+ stays for an empty string) "/+/+/" or "+/+/+/+" or some other variants. Or you assume that the empty name belongs only to the root. Than "expr {file join {} {} eq {/}}" would fail because the first element is dropped because the second starts absolute. You see, your wish will not work properly. At least not with the information you gave.

That means for me, we should and could stay with the current implementation.

And by the way: Instead of breaking code by changing a function forking for more than 25 years possibly the Tcl Filesystem API needs some restrictions, if you are right.

Page Authors

PYK {2023 08 06}
Added the section on the name of the root.