file join

file join forms a file path from one or more path components. file join, not file normalize, is the the standard way to transform a possible relative path into an absolute path.

See Also

Synopsis

file join name ?name ...?

Documentation

official reference

Description

file join joins one or more strings into a file path using the correct platform-dependent separators. It is the only portable and VFS-aware way to compose file names, but it does not append an absolute path to an absolute path, so file separator must be used to perform that particular kind of operation in Tcl. If a name is a relative path component, then it is joined to the previous file name argument. If a name is an absolute path, all previous arguments are discarded and any subsequent arguments are then joined to it. For example,

file join a b /foo bar

returns

/foo/bar

Any of the names can contain separators, and the result is always canonical for the current platform: / for Unix and Windows, and : for pre-OSX Macintosh.

file join helps to avoid redundant separator characters:

set prefix /
file join $prefix another directory ;# -> /another/directory
lindex $prefix/another/directory ;# -> //another/directory

To join path elements contained in a list, use {*}:

set path [list a b c]
file join {*}$path

Another example:

set path [list a b c d e f g]
set prefix 3
set path [file join {*}[lrange $path $prefix end]]

Or, for historic versions of Tcl without the {*} operator:

set path [list a b c d e f g]
set prefix 3
set cmd [concat {file join} [lrange $path $prefix end]]
set path [eval $cmd]
puts $path

If file join joined list elements, it's behaviour would be incorrect in the following case:

#warning: bad code ahead!
file join {C:/Program Files/Tcl}

On Windows, a UNC network path is treated similar to a drive specification:

% file join //host/location file
//host/location/file

A disconnected host specification is not a UNC path, and file join treats it as a file location:

% file join //host location file
/host/location/file

On Unix, a leading // is removed:

% file join //host/location file
/host/location/file

The pwd Trick

The standard method in Tcl for transform a possibly relative path to an absolute path is:

set filename [file join [pwd] $filename]

DKF PYK: cd can introduces an environment state that is easy to lose track of. It is good practice to not used cd at all, and instead use pwd to form absolute paths to work with.

Gotcha: Fully-qualified Names and any Previous Names

Any name arguments prior to the last fully-qualified names are dropped:

set newpath [file join lib /usr] ;# -> /usr

This is useful when the goal is to convert an unqualified name to a particular fully-qualified name, for example, to allow the user to provide a relative or a fully-qualified file name:

set newpath [file join [pwd] $filename]

When $b is relative, the current directory will be prepended to it, but when $b is absolute, it isn't changed.

While useful in the case mentioned above, it can be surprising to discover that file paths are not joined as expected because some path is already fully-qualified.

References:

file join behaviour and ye olde tilde fiasco ,comp.lang.tcl ,2009-06-26

Using file join to Normalize Path Separators

file join can be used to "normalize" the path separators for relative paths that you don't want to subject to the full file normalize:

% file join {\foo\bar\grill}
/foo/bar/grill

Back the other way with file nativename:

% file nativename /foo/bar/grill
\foo\bar\grill

Escaping Tilde Substitution

To join a file whose name begins with "~" (tilde), prefix it with "./":

file join pwd ./~filename.ext

See Tilde Substitution

Difference Between 8.6 and 8.7

AMG: In Tcl 8.6.8, [file join //a/b] returns //a/b, but in Tcl 8.7a1, [file join //a/b] returns /a/b. This got me in trouble because I was trying to work with Windows UNC paths. In the end I just had to concatenate strings and forgo [file join]. [file nativename] worked right, at least.

bll 2018-7-11: Seems like a bug to me. A valid path should not be mangled.

bll 2018-7-27: See file join trashes valid network paths .

Difference Between Linux and Windows

AMG: In Tcl 8.6.8 on Windows, [file join C: foobar] returns C:foobar, whereas on Linux it returns C:/foobar. The latter is obvious, but the former is a special case due to volume-relative paths which are a Windows-only feature. In order to get Linux-like behavior, I had to specially check for top-level path components named X: and append / before passing to [file join]. On both Windows and Linux, [file join C:/ foobar] gives C:/foobar which is what I needed all along.

Semi-Normalize a Path

bll 2018-7-11: I define semi-normalize as normalizing the path separators and removing any /../, but not following symlinks (e.g. /var/tmp on Mac OS X gets normalized as /private/var/tmp, which is not really wanted).

See also fileutil::lexnormalize

Buggy by Design: The Name of the Root

The root of a filesystem has no name. It's common to assume that it's name is /, but that's not the case. Unfortunately, file join is buggy by design:

% file join {} home
home

Here the answer for the standard file system should be /home. This is a bug that should be fixed. Because of this bug, a list containt the parts of an absolute file name can not be filesystem-agnostic: The first part must be /, but each virtual filesystem may use a separator of its own choosing.

This is another case that can be used as an argument that the only little language that should be used for structuring data in any context or system is the language of list.


HE 2023-08-06: I disagree with the statements "file join is buggy by design" and "The root of a filesystem has no name.".

Why? Because of the following argumentation:

First of all it looks like that the statement "The root of a filesystem has no name." comes from [L1 ]. At least it is stated there between 18:54, 16 February 2021‎ and today (23:27, 10 April 2023‎). This page refers as source of this information to [L2 ]. Neither to today version nor older versions do this statement.

Instead they write that "While all file systems have a root directory, it may be labeled differently depending on the operating system." and "On Unix systems and in OS X, the root directory is typically labeled simply /".

Furthermore, the idea described in Wikipedia is based on the assumption that '/' is the path separator on Unix systems. And that would logical mean that the directory name must be an empty string.

That is different than how nearly all persons work today on Unix like systems or Linux. All these people use '/' to refer to the root of the file system.

For example use the command

cd ''

in /bin/sh or any other shell will not change to the root of the system. For '' use what your shell takes as the empty string.

cd /

on the other side will change to the root directory.

file join has exactly the job to create a proper path with information people normally are using. And they use '/' to address the root directory.

If the root directory is considered to be an empty string you don't need a specialized command to build a proper path from a given list of path segments. You would be able to use join only.

Therefore, it is correct that

file join / home

provides "/home" and

file join {} home

provides "home" only.

I will not state that ,technically somewhere behind the scenes, the root directory is not an empty string. But, what user see and used to use are '/'. And that is what programming languages should respect and what Tcl does.

PYK {2023 08 08}: It is not true that the name of the root is the reason file join exists separately from join. The primary way file join differs from join is the handling of absolute pathnames: file join discards previous operands if subsequent operands are absolute pathnames.

Next, According to Posix Chapter 4.13 , If the pathname begins with a slash, the predecessor of the first filename in the pathname shall be taken to be the root directory of the process (such pathnames are referred to as "absolute pathnames"). Therefore, by definition, the root of the filesystem has no name. Next in a Posix shell cd '' is a no-op because, as described in the Posix specifcation of the cd utility , cd joins the operand to the current directory unless the operand begins with /. In Tcl, file join does not take the current directory into account, so comparing it with cd in a Posix shell is apples and oranges. Posix Chapter 3.271 specifies that in a pathname, multiple successive / characters are considered to be the same as one /, which implies that a file name may not be the empty string. That leaves the empty string free to be interpreted as the root of the filesystem.

The point of the "broken by design" argument is that because of the way file join, which is a built-in command, works, it is incompatible with the Tcl's virtual filesystem, which is also built into Tcl. The Tcl Filesystem API does not constrain which values may constitue a filename, so the empty string is an allowed file name. The API also does not specify how to interpret the path, other than to state that the path for a specific virtual filesystem separates the most specific part of the name from the more general part of the name. The more general part of the name is allowed to be the empty string so file join as it is currently implemented munges such filenames. Therefore, file join is broken for such names.

For the default filesystem in Tcl a correct implementation of file join would make the following expression true:

expr {[file join {} {}] eq {/}}

Edit: HE points out below that in this example the current behaviour is in fact correct:

expr {[file join {} {}] eq {}}

HE 2023-08-08: Hello PYK, first some remarks about your discussion style:

  • You removed the Discussion tag. I think the discussion should kept out of the main view. People interested can open and read it. The other one will very likely not read it. I would understand if you moved your statement to the discussion part. But, removing my choice to discuss not on the big stage is kind of bad habit. But, you want it, you get it.
  • You put an inline comment in my text destroying a reference. That also in not nice because it gave the text a different meaning. Therefore, I removed it. You can write it properly written in the beginning of your part.

Now back to the technical stuff.

You wrote "It is not true that the name of the root is the reason file join exists separately from join". That is not what I wrote. I wrote: "file join has exactly the job to create a proper path with information people normally are using. And they use '/' to address the root directory."

Next information in your comment takes the Posix specification as an argument. Why you not simply wrote file join doesn't work Posix compliant? Than the discussion would directly start on a different level. Also you do not need to explain that the command 'cd' will work in Posix different. But, you forgot to explain why all the other 'cd' implementations, are wrong? Including Tcl implementation of it. For me and, I believe, most other people these 'cd' are working exactly as expected.

By the way, since when Tcl is a Posix compliant programming language? If what you want is Posix compliance your comment is better placed in Tcl vs POSIX - what functionality is missing.

Then, nearly the end, you come with the true argumentation. All above, including Posix, are simply distraction? Why not writing that in the first post to make clear why you believe it is an error?

Ok, let us forward with the statement file join doesn't match the requirements needed for the Tcl Filesystem API. That is an API restricting developer as less as possible. But that means, the developer of the virtual file system has to take care that all build in functions are still working. I think there is a test section available one can use and adopt to his new virtual file system. Liberty comes with responsibility.

Moreover, your change will break other functions. I think you will write: "Yes, they all are wrong." Accidentally they work for most of us for more than 25 years. My expectation of how file join and file split should work and how they work today:

file join {*}[file split /de/da/do]    ;# leads to /de/da/do
file join {*}[file split /de/da/../do] ;# leads to /de/da/../do
file join {*}[file split /de/da//do]   ;# leads to 

The origin absolute path without quirks should be possible to split and to join and then be restored. With quirks I mean more than one '/' in sequence as in the third example. Possibly I missed some quirks. On the other hand, how do you want to decide if the result of file split is absolute or relative if the first element can be empty? From your argumentation, if root element can have an empty name, there could be a VFS which allows an empty name for every part of the file system. Which is exactly your example "expr {file join {} {} eq {/}}". That means "///" would be a valid path. But you can't see if this means (+ stays for an empty string) "/+/+/" or "+/+/+/+" or some other variants. Or you assume that the empty name belongs only to the root. Than "expr {file join {} {} eq {/}}" would fail because the first element is dropped because the second starts absolute. You see, your wish will not work properly. At least not with the information you gave.

That means for me, we should and could stay with the current implementation.

And by the way: Instead of breaking code by changing a function working for more than 25 years possibly the Tcl Filesystem API needs some restrictions, if you are right.

PYK {2023 08 14}:

That's true: In the example I gave, the correct behaviour for the default filesystem is in fact the current behaviour:

expr {[file join {} {}] eq {}}

Tcl is getting the right answer here, but for the wrong reason. Tcl isn't considering the second empty string the name of the root, but instead is simply eliding all arguments that are the empty string.

If the empty string were considered the name of the root, the following would be true:

expr {[file join hello {} goodbye] eq {/goodbye}}

From the perspective of eliding all empty strings, this is consistent, but from the perspective of operating on a file named the empty string, this is inconsistent. Other commands such as file dirname and file pathtype, and even glob interpret the empty string as the name of the current directory, so the behaviour of file join is inconsistent with these other commands.

Because of this eliding behaviour, for each pathname there are an infinite number of list representations for that pathname, at least from the perspective of file join. We've seen this type many-to-one interpretation in other Tcl commands, and it has always been a bug. TIP 568 for byte arrays comes to mind. Turns out that lossy interpretatio of input is problematic. Who would've thunk?

On another note, Posix shells don't provide a join command. Instead users insert a "/" character between filenames in a pathname. This means, for example that in a Posix shell, given a variable named $root and a variable named file that is an absolute pathname, the following creates a new name under $root:

newname=$root/$file

In contrast, file join $root $file, does not produce a new name under $root, so in this sense already does not do what might naively be expected of it. It turns out that in Tcl, the only way to do the sort of joining shown in this example is to use the "/" character directly, which once again is broken for a VFS that uses a different path separator.

In summary, joining files in Tcl is broken in multiple ways where other VFS filesystems are concerned. Tcl needs -

HE 2023-08-14: PYK It would be nice if you would not only take one of my arguments and try to turn it false to claim that file join is wrong implemented.

At the end you described that VFS can have an empty file name: "The Tcl Filesystem API does not constrain which values may constitute a filename". That means, generalized, every part of a path could be empty, not only root.

What are the consequences I wrote above.

You wrote:"Other commands such as file dirname and file pathtype, and even glob interpret the empty string as the name of the current directory, so the behaviour of file join is inconsistent with these other commands."

First this is in conflict with your first statement that the empty string should be considered to be root (even if that restrict a theoretical VFS not to have the empty names as sub part of the path).

Now you are fine that "file dirname {}" answers with '.', "file pathtype {}" answers 'relative' which is both different than your statement that:

expr {[file join {} {}] eq {}}

should be true. But, with consistency with file dirname or file pathtype you would get:

expr {[file join {} {}] eq {./.}}

For me it looks like you not have a valid concept up to now. So, you should not blame the current implementation as wrong.

You also not described where it is written that Tcl should be Posix compliance. If you take Posix as the master you should explain where it is stated that Tcl follows this master. Is there an accepted TIP for that? And for which major Tcl version?

Even if there exists such a TIP the implementation will be not backward compatible and therefore, should not part of any 8.x Tcl release.

And at the end, you never explained why you not be able with your VFS implementation to define it in a way that it works properly with all existing commands. As I stated before. The API tries not to restrict what is increasing the responsibility of an implementer of a new VFS.

What you want is a more restricted definition to implement VFS in Tcl so that you can't make errors.

PYK {2023 08 15}: No, that's not what I want. What I want is a way to join file paths that plays nice with Tcl VFS filesystems. I don't see any problem with the unrestricted nature of the VFS specification. Also, I'm only saying that file dirname {} does interpret the empty string as the current directory, not that it should. Also also, I never said that a Tcl filesystem must or should be Posix compliant; I just pointed to Posix as one more source that defines the name of the root as the empty string. It's perfectly feasible to design a VFS filesystem where the empty string indicates the root as well as being the name of another another file. For example, a VFS might specify the following:

  • A pathname is a Tcl list.
  • A pathname is absolute if the first item is the empty string.
  • Each filename in a pathname is itself represented as a list containing a single item, and the value of that single item is the name.
  • The path separator is a single whitespace character.
  • A pathname that is the empty string refers to the root.

In this VFS, any directory may contain a file whose name is the empty string. The pathname of such a file is not ambiguous. file join is of course unable to correctly handle this VFS.

Page Authors

PYK {2023 08 06}
Added the section on the name of the root.