File Descriptor in Linux

Overview

File descriptor in Linux is an unsigned integer value used to uniquely identify an open file. A file descriptor also describes the open file's characteristics and its access methods. Whenever a process asks to open a file, the kernel associates a file descriptor to the opened file.

File descriptor table and system open file table

There are two entities in the Linux kernel space which are called the file descriptor table and the system open file table. These two tables are used to track each process's access to files and to maintain data integrity and consistency between different operations from processes.

File descriptor table:

The file descriptor table translates a file descriptor in Linux to an open file in the system. File descriptor tables are created for each process.

The file descriptor table has two fields: a pointer to the open file and the flags with which it was opened. The structure of a file descriptor table is as follows:

System open file table

The system open file table contains entries for every open file. This table tracks the current byte offsets for all read and write operations on an open file and also keeps track of the access mode of the opened file.

This table holds the current operations on a file and the kernel treats every read and write operation as an implied seek to the current byte offset. So, if '32' bytes are read or written on an open file, the pointers also moved '32' bytes. The seek position in the file can be moved to any specified location using the lseek kernel subroutine.

Managing file descriptors

As files in a system are shared by multiple processes, it is important to make sure related processes have access to the pointer location handled by the Linux kernel's file descriptor system. The system open file table maintains a count of the number of file descriptors assigned to a file.

Sharing open files: Whenever a file is opened, an entry describing it is created inside the system open file table. These table entries aim to make sure every process has a different input-output offset so that data is not overwritten by another process.
Duplicating file descriptors: There are multiple ways to duplicate a file descriptor. They are:
1. dup and dup2 subroutines: The dup subroutine is a kernel subroutine which is used to create a copy of a file descriptor. The new duplicate file descriptor is created in the file descriptor table where the original file descriptor is already present. This process increments one to the original reference counter and returns the index of this duplicate file descriptor.
  
  dup2 subroutine looks for the file descriptor requested and closes if it is open. If a specific file descriptor table entry is needed, this allows processes to decide which descriptor entry the copy will occupy.
2. fork subroutine: A fork subroutine is used to create a child process which inherits the file descriptors of the parent process. The child process then executes its own processes.
3. fcntl (file descriptor control) subroutine: fcntl is a kernel subroutine which can manipulate file structure and can control open file descriptors. This subroutine can be used to perform some of the following changes to a file descriptor:
  - Duplicate a file descriptor, similar to dup.
  - Open a file in the O_APPEND access method.
  - Get or set the close-on-exec flag.
  - Close all file descriptors
Preset file descriptor values: A shell running inside a Linux terminal usually opens three files with file descriptors 0, 1 and 2. Programs running in a terminal can access these file descriptors to collect input from the user, show output or handle and show errors on the terminal. As a program starts using more files, the new file descriptors are created in ascending order.

File descriptor resource limit

A resource limit is set which controls how many file descriptors can be assigned to a process. This value is usually set inside /etc/security/limits.conf.

To display the current hard limit on your system, we run the command: ulimit -aH.

file descriptor hard limit

To change the limits, we can add the following lines to /etc/security/limits.conf:

We then need to edit the file /etc/pam.d/login with the following lines:

Change the system file limit in /proc/sys/fs/file-max:

Then use the ulimit to set the hard limit mentioned inside /etc/security/limits.conf:

Reboot the system for the changes to take effect.

Stdin, stdout, and stderr

The three main types of file descriptors are:

Standard input: The integer assigned for this is '0'. This descriptor is the default input data stream. In a terminal emulator, this defaults to keyboard inputs made by the user. It is abbreviated as stdin.
Standard output: The interger assigned for this is '1'. This descriptor is the default output data stream. In a terminal emulator, this defaults to the user's screen. It is abbreviated as stdout.
Standard error: The interger assigned for this is '2'. The descriptor is the default error data stream for the end user. In a terminal emulator, this defaults to the user's screen as well. It is abbreviated as stderr.

Redirecting file descriptors

By default, the standard file descriptors display their outputs to the terminal output. But, this behaviour can be overridden using stream redirections.

We use the syntax, file_descriptor>/path/to/redirection to redirect these descriptors.

Example:

Here, the find command might throw permission errors but since we used 2>/dev/null, this redirects the output from file descriptor '2' which is the stderr descriptor to /dev/null. The /dev/null is a special device in Linux that has no contents. In this case, the error messages will not be shown on the terminal screen.

redirection of file descriptors

We can also redirect file descriptors into one another to further process their outputs. For example,

This redirects the stderr file descriptor to the standard output, which is the file descriptor '1'.

Conclusion

File descriptors in Linux are unsigned integers used to uniquely identify an open file. The three default file descriptors are 0, 1 and 2 which denote stdin, stdout and stderr respectively.
File descriptors can be shared among multiple processes. They can also be duplicated using kernel subroutines like dup, dup2, fcntl and fork.
File descriptors can be redirected to different data streams using stream redirections. The syntax used for this is: file_descriptor>/path/to/redirection. Example: find / -name '*name' 2>/dev/null.