Table of Contents
Introduction
In Unix, whenever a running program opens a file, the kernel stores a reference to it in the process’ memory. Those references to open files are integers starting from 0 for each process, and are called file descriptors or FDs. A process can only read and write files that it has opened, that is, for which it had obtained a file descriptor.
Three file descriptors, 0, 1, and 2, are pre-initialized by the kernel for every process and are available without having to open them. They are thus called standard, and are explained in more detail in .
Other than being automatically present and standard, they are the same as any other file descriptor.
File Descriptors in Action
The first question we might ask ourselves is, how could we see those files descriptors anywhere?
In Unix and especially Linux, enormous amount of data about the running system is accessible to users in /proc
/ and /sys/
.
For maximum convenience, all that data looks like normal files and directories, but is not stored on disk.
When we read or write files in /proc/
and /sys/
, internally we invoke functions in the kernel that operate directly on kernel memory.
On Unix, every running program (a process) is assigned a process ID (PID) — an integer that remains constant for the duration of the process.
By convention, all data about a particular process is available its directory /proc/PID/
, such as /proc/11996/
.
That data includes information about all open file descriptors and their location.
If you are currently in a shell, type echo $$
to see the PID assigned to your process. Then explore the contents of /proc/PID/fd*/
.
echo $$
11996
cd /proc/$$
ls -al fd/
total 0
dr-x------ 2 user user 0 Aug 7 20:47 .
dr-xr-xr-x 9 user user 0 Aug 7 20:47 ..
lrwx------ 1 user user 64 Aug 7 20:56 0 -> /dev/pts/8
lrwx------ 1 user user 64 Aug 7 20:56 1 -> /dev/pts/8
lrwx------ 1 user user 64 Aug 7 20:56 2 -> /dev/pts/8
lrwx------ 1 user user 64 Aug 7 20:56 255 -> /dev/pts/8
ls -al fdinfo/
total 0
dr-x------ 2 user user 0 Aug 7 20:56 .
dr-xr-xr-x 9 user user 0 Aug 7 20:47 ..
-r-------- 1 user user 0 Aug 7 20:56 0
-r-------- 1 user user 0 Aug 7 20:56 1
-r-------- 1 user user 0 Aug 7 20:56 2
-r-------- 1 user user 0 Aug 7 20:56 255
Here we can see that our shell only has 4 files open. We are interested in the first three; standard input, output, and error. They point to a device that represents our terminal. You can read more in .
A similar output can be obtained with lsof -p $$
.
lsof -p $$ | grep CHR
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
bash 11996 user 0u CHR 136,8 0t0 13 /dev/pts/8
bash 11996 user 1u CHR 136,8 0t0 13 /dev/pts/8
bash 11996 user 2u CHR 136,8 0t0 13 /dev/pts/8
bash 11996 user 255u CHR 136,8 0t0 13 /dev/pts/8
Not Just Files
World is a File. — Bill Joy in 1988 at IBM Yorktown.
File descriptors are references to open files.
But one of the key principles in Unix is that everything is represented as a file on disk. It’s just that some files are “special”, so instead of triggering functions in the filesystem driver, they trigger functions in other drivers.
Because of the uniformity of file descriptors and read
and write
system calls that all devices support, common
functionality is automatically extended to just about every device and subsystem in Unix and Linux.
For example, a file can be a physical file on disk, a block device like disk itself, a character device like a terminal or a printer, a socket for network connection, a named pipe for inter-process communication (IPC), and so on.
More about what this means for users, including simple examples of redirection between file descriptors, can be found in .
Quiz
We know the following:
-
In shells such as
bash
, PID of the current process is accessible in variable$$
. -
Directory
/proc/PID/fd/
contains list of the process’ open file descriptors. Tryls -al /proc/$$/fd/
. -
The kernel also provides
/proc/self/
which, for each process, automagically points to its real directory/proc/PID/
. Tryls -al /proc/self/fd/
. -
Redundantly with
/proc/self/fd/
, but for compatibility, the Linux kernel also exports process’ file descriptors in directory/dev/fd/
. Tryls -al /dev/fd/
. -
Thus,
/proc/$$/fd/
,/proc/self/fd/
, and/dev/fd/
are functionally equivalent.
But, why are the contents of those directories not the same when we try to list them?
ls -al /dev/fd/ /proc/$$/fd/ /proc/self/fd/
/dev/fd/:
total 0
dr-x------ 2 user user 0 Aug 8 00:58 .
dr-xr-xr-x 9 user user 0 Aug 8 00:58 ..
lrwx------ 1 user user 64 Aug 8 00:58 0 -> /dev/pts/8
lrwx------ 1 user user 64 Aug 8 00:58 1 -> /dev/pts/8
lrwx------ 1 user user 64 Aug 8 00:58 2 -> /dev/pts/8
lr-x------ 1 user user 64 Aug 8 00:58 3 -> /proc/14693/fd
/proc/14677/fd/:
total 0
dr-x------ 2 user user 0 Aug 8 00:58 .
dr-xr-xr-x 9 user user 0 Aug 8 00:58 ..
lrwx------ 1 user user 64 Aug 8 00:58 0 -> /dev/pts/8
lrwx------ 1 user user 64 Aug 8 00:58 1 -> /dev/pts/8
lrwx------ 1 user user 64 Aug 8 00:58 2 -> /dev/pts/8
lrwx------ 1 user user 64 Aug 8 00:58 255 -> /dev/pts/8
/proc/self/fd/:
total 0
dr-x------ 2 user user 0 Aug 8 00:58 .
dr-xr-xr-x 9 user user 0 Aug 8 00:58 ..
lrwx------ 1 user user 64 Aug 8 00:58 0 -> /dev/pts/8
lrwx------ 1 user user 64 Aug 8 00:58 1 -> /dev/pts/8
lrwx------ 1 user user 64 Aug 8 00:58 2 -> /dev/pts/8
lr-x------ 1 user user 64 Aug 8 00:58 3 -> /proc/14693/fd
To answer the question, let’s examine the invocation and output of ls
carefully.
-
Because of how shells work, variable
$$
will be expanded into a PID of the current process, beforels
is called. Whenls
sees that argument, its value will already be a literal (/proc/14677/fd/
in this example). -
The other two arguments do not contain a variable and will be passed to
ls
as literal/proc/self/fd/
and/dev/fd/
.
So when ls
starts and looks at the arguments it was called with, the first argument, directory /proc/14677/fd/
, will indeed be referring to the shell.
But the other two directories, /proc/self/fd/
and /dev/fd/
, which always automatically point to the current process, will not point to bash
, but to ls
.
So the 3 directories listed by ls
are different because they are showing two different processes.