1. Process Control
1.1 Creating new processes: fork()
1.1.1 What does fork() do?
#include <sys/types.h> #include <unistd.h> pid_t fork(void);
The fork()
function is used to create a new process from an
existing process. The new process is called the child process, and the
existing process is called the parent. You can tell which is which by
checking the return value from fork()
. The parent gets the
child's pid returned to him, but the child gets 0 returned to him. Thus
this simple code illustrate's the basics of it.
pid_t pid; switch (pid = fork()) { case -1: /* Here pid is -1, the fork failed */ /* Some possible reasons are that you're */ /* out of process slots or virtual memory */ perror("The fork failed!"); break; case 0: /* pid of zero is the child */ /* Here we're the child...what should we do? */ /* ... */ /* but after doing it, we should do something like: */ _exit(0); default: /* pid greater than zero is parent getting the child's pid */ printf("Child's pid is %d\n",pid); }
Of course, one can use if()... else...
instead of
switch()
, but the above form is a useful idiom.
Of help when doing this is knowing just what is and is not inherited by the child. This list can vary depending on Unix implementation, so take it with a grain of salt. Note that the child gets copies of these things, not the real thing.
Inherited by the child from the parent:
- process credentials (real/effective/saved UIDs and GIDs)
- environment
- stack
- memory
- open file descriptors (note that the underlying file positions are shared between the parent and child, which can be confusing)
- close-on-exec flags
- signal handling settings
- nice value
- scheduler class
- process group ID
- session ID
- current working directory
- root directory
- file mode creation mask (umask)
- resource limits
- controlling terminal
Unique to the child:
- process ID
- different parent process ID
- Own copy of file descriptors and directory streams.
- process, text, data and other memory locks are NOT inherited.
- process times, in the tms struct
- resource utilizations are set to 0
- pending signals initialized to the empty set
- timers created by timer_create not inherited
- asynchronous input or output operations not inherited
1.1.2 What's the difference between fork() and vfork()?
Some systems have a system call vfork()
, which was originally
designed as a lower-overhead version of fork()
. Since
fork()
involved copying the entire address space of the process,
and was therefore quite expensive, the vfork()
function was
introduced (in 3.0BSD).
However, since vfork()
was introduced, the
implementation of fork()
has improved drastically, most notably
with the introduction of `copy-on-write', where the copying of the
process address space is transparently faked by allowing both processes
to refer to the same physical memory until either of them modify
it. This largely removes the justification for vfork()
; indeed, a
large proportion of systems now lack the original functionality of
vfork()
completely. For compatibility, though, there may still be
a vfork()
call present, that simply calls fork()
without
attempting to emulate all of the vfork()
semantics.
As a result, it is very unwise to actually make use of any of the
differences between fork()
and vfork()
. Indeed, it is
probably unwise to use vfork()
at all, unless you know exactly
why you want to.
The basic difference between the two is that when a new process is
created with vfork()
, the parent process is temporarily
suspended, and the child process might borrow the parent's address
space. This strange state of affairs continues until the child process
either exits, or calls execve()
, at which point the parent
process continues.
This means that the child process of a vfork()
must be careful to
avoid unexpectedly modifying variables of the parent process. In
particular, the child process must not return from the function
containing the vfork()
call, and it must not call
exit()
(if it needs to exit, it should use _exit()
;
actually, this is also true for the child of a normal fork()
).
1.1.3 Why use _exit rather than exit in the child branch of a fork?
There are a few differences between exit()
and _exit()
that become significant when fork()
, and especially
vfork()
, is used.
The basic difference between exit()
and _exit()
is that
the former performs clean-up related to user-mode constructs in the
library, and calls user-supplied cleanup functions, whereas the latter
performs only the kernel cleanup for the process.
In the child branch of a fork()
, it is normally incorrect to use
exit()
, because that can lead to stdio buffers being flushed
twice, and temporary files being unexpectedly removed. In C++ code the
situation is worse, because destructors for static objects may be run
incorrectly. (There are some unusual cases, like daemons, where the
parent should call _exit()
rather than the child; the
basic rule, applicable in the overwhelming majority of cases, is that
exit()
should be called only once for each entry into
main
.)
In the child branch of a vfork()
, the use of exit()
is
even more dangerous, since it will affect the state of the parent
process.
1.2 Environment variables
1.2.1 How can I get/set an environment variable from a program?
Getting the value of an environment variable is done by using
getenv()
.
#include <stdlib.h> char *getenv(const char *name);
Setting the value of an environment variable is done by using
putenv()
.
#include <stdlib.h> int putenv(char *string);
The string passed to putenv must not be freed or made invalid,
since a pointer to it is kept by putenv()
. This means that it
must either be a static buffer or allocated off the heap. The string
can be freed if the environment variable is redefined or deleted via
another call to putenv()
.
Remember that environment variables are inherited; each process has a separate copy of the environment. As a result, you can't change the value of an environment variable in another process, such as the shell.
Suppose you wanted to get the value for the TERM
environment
variable. You would use this code:
char *envvar; envvar=getenv("TERM"); printf("The value for the environment variable TERM is "); if(envvar) { printf("%s\n",envvar); } else { printf("not set.\n"); }
Now suppose you wanted to create a new environment variable called
MYVAR
, with a value of MYVAL
. This is how you'd do it.
static char envbuf[256]; sprintf(envbuf,"MYVAR=%s","MYVAL"); if(putenv(envbuf)) { printf("Sorry, putenv() couldn't find the memory for %s\n",envbuf); /* Might exit() or something here if you can't live without it */ }
1.2.2 How can I read the whole environment?
If you don't know the names of the environment variables, then the
getenv()
function isn't much use. In this case, you have to dig
deeper into how the environment is stored.
A global variable, environ
, holds a pointer to an array of
pointers to environment strings, each string in the form
"NAME=value"
. A NULL
pointer is used to mark the end of
the array. Here's a trivial program to print the current environment
(like printenv
):
#include <stdio.h> extern char **environ; int main() { char **ep = environ; char *p; while ((p = *ep++)) printf("%s\n", p); return 0; }
In general, the environ
variable is also passed as the third,
optional, parameter to main()
; that is, the above could have been
written:
#include <stdio.h> int main(int argc, char **argv, char **envp) { char *p; while ((p = *envp++)) printf("%s\n", p); return 0; }
However, while pretty universally supported, this method isn't actually defined by the POSIX standards. (It's also less useful, in general.)
1.3 How can I sleep for less than a second?
The sleep()
function, which is available on all Unixes, only
allows for a duration specified in seconds. If you want finer
granularity, then you need to look for alternatives:
-
Many systems have a function
usleep()
-
You can use
select()
orpoll()
, specifying no file descriptors to test; a common technique is to write ausleep()
function based on either of these (see the comp.unix.questions FAQ for some examples) -
If your system has itimers (most do), you can roll your own
usleep()
using them (see the BSD sources forusleep()
for how to do this) -
If you have POSIX realtime, there is a
nanosleep()
function
Of the above, select()
is probably the most portable (and
strangely, it is often much more efficient than usleep()
or an
itimer-based method). However, the behaviour may be different if signals
are caught while asleep; this may or may not be an issue depending on
the application.
Whichever route you choose, it is important to realise that you may be
constrained by the timer resolution of the system (some systems allow
very short time intervals to be specified, others have a resolution of,
say, 10ms and will round all timings to that). Also, as for
sleep()
, the delay you specify is only a minimum value;
after the specified period elapses, there will be an indeterminate delay
before your process next gets scheduled.
1.4 How can I get a finer-grained version of alarm()?
Modern Unixes tend to implement alarms using the setitimer()
function, which has a higher resolution and more options than the simple
alarm()
function. One should generally assume that alarm()
and setitimer(ITIMER_REAL)
may be the same underlying timer, and
accessing it both ways may cause confusion.
Itimers can be used to implement either one-shot or repeating signals; also, there are generally 3 separate timers available:
ITIMER_REAL
-
counts real (wall clock) time, and sends the
SIGALRM
signal ITIMER_VIRTUAL
-
counts process virtual (user CPU) time, and sends the
SIGVTALRM
signal ITIMER_PROF
-
counts user and system CPU time, and sends the
SIGPROF
signal; it is intended for interpreters to use for profiling.
Itimers, however, are not part of many of the standards, despite having been present since 4.2BSD. The POSIX realtime extensions define some similar, but different, functions.
1.5 How can a parent and child process communicate?
A parent and child can communicate through any of the normal inter-process communication schemes (pipes, sockets, message queues, shared memory), but also have some special ways to communicate that take advantage of their relationship as a parent and child.
One of the most obvious is that the parent can get the exit status of the child.
Since the child inherits file descriptors from its parent, the parent
can open both ends of a pipe, fork, then the parent close one end and
the child close the other end of the pipe. This is what happens when
you call the popen()
routine to run another program from within
yours, i.e. you can write to the file descriptor returned from
popen()
and the child process sees it as its stdin, or you can
read from the file descriptor and see what the program wrote to its
stdout. (The mode parameter to popen()
defines which; if you want
to do both, then you can do the plumbing yourself without too much
difficulty.)
Also, the child process inherits memory segments mmapped anonymously (or by mmapping the special file `/dev/zero') by the parent; these shared memory segments are not accessible from unrelated processes.
1.6 How do I get rid of zombie processes?
1.6.1 What is a zombie?
When a program forks and the child finishes before the parent, the
kernel still keeps some of its information about the child in case the
parent might need it -- for example, the parent may need to check the
child's exit status. To be able to get this information, the parent
calls wait()
; when this happens, the kernel can discard the
information.
In the interval between the child terminating and the parent calling
wait()
, the child is said to be a `zombie'. (If you do `ps', the
child will have a `Z' in its status field to indicate this.) Even
though it's not running, it's still taking up an entry in the process
table. (It consumes no other resources, but some utilities may show
bogus figures for e.g. CPU usage; this is because some parts of the
process table entry have been overlaid by accounting info to save
space.)
This is not good, as the process table has a fixed number of entries and
it is possible for the system to run out of them. Even if the system
doesn't run out, there is a limit on the number of processes each user
can run, which is usually smaller than the system's limit. This is one
of the reasons why you should always check if fork()
failed, by
the way!
If the parent terminates without calling wait(), the child is `adopted'
by init
, which handles the work necessary to cleanup after the
child. (This is a special system program with process ID 1 -- it's
actually the first program to run after the system boots up).
1.6.2 How do I prevent them from occuring?
You need to ensure that your parent process calls wait()
(or
waitpid()
, wait3()
, etc.) for every child process that
terminates; or, on some systems, you can instruct the system that you
are uninterested in child exit states.
Another approach is to fork()
twice, and have the
immediate child process exit straight away. This causes the grandchild
process to be orphaned, so the init process is responsible for cleaning
it up. For code to do this, see the function fork2()
in the
examples section.
To ignore child exit states, you need to do the following (check your system's manpages to see if this works):
struct sigaction sa; sa.sa_handler = SIG_IGN; #ifdef SA_NOCLDWAIT sa.sa_flags = SA_NOCLDWAIT; #else sa.sa_flags = 0; #endif sigemptyset(&sa.sa_mask); sigaction(SIGCHLD, &sa, NULL);
If this is successful, then the wait()
functions are prevented
from working; if any of them are called, they will wait until all
child processes have terminated, then return failure with
errno == ECHILD
.
The other technique is to catch the SIGCHLD signal, and have the signal
handler call waitpid()
or wait3()
. See the examples
section for a complete program.
1.7 How do I get my program to act like a daemon?
A daemon process is usually defined as a background process that does not belong to a terminal session. Many system services are performed by daemons; network services, printing etc.
Simply invoking a program in the background isn't really adequate for these long-running programs; that does not correctly detach the process from the terminal session that started it. Also, the conventional way of starting daemons is simply to issue the command manually or from an rc script; the daemon is expected to put itself into the background.
Here are the steps to become a daemon:
-
fork()
so the parent can exit, this returns control to the command line or shell invoking your program. This step is required so that the new process is guaranteed not to be a process group leader. The next step,setsid()
, fails if you're a process group leader. -
setsid()
to become a process group and session group leader. Since a controlling terminal is associated with a session, and this new session has not yet acquired a controlling terminal our process now has no controlling terminal, which is a Good Thing for daemons. -
fork()
again so the parent, (the session group leader), can exit. This means that we, as a non-session group leader, can never regain a controlling terminal. -
chdir("/")
to ensure that our process doesn't keep any directory in use. Failure to do this could make it so that an administrator couldn't unmount a filesystem, because it was our current directory. [Equivalently, we could change to any directory containing files important to the daemon's operation.] -
umask(0)
so that we have complete control over the permissions of anything we write. We don't know what umask we may have inherited. [This step is optional] -
close()
fds 0, 1, and 2. This releases the standard in, out, and error we inherited from our parent process. We have no way of knowing where these fds might have been redirected to. Note that many daemons usesysconf()
to determine the limit_SC_OPEN_MAX
._SC_OPEN_MAX
tells you the maximun open files/process. Then in a loop, the daemon can close all possible file descriptors. You have to decide if you need to do this or not. If you think that there might be file-descriptors open you should close them, since there's a limit on number of concurrent file descriptors. - Establish new open descriptors for stdin, stdout and stderr. Even if you don't plan to use them, it is still a good idea to have them open. The precise handling of these is a matter of taste; if you have a logfile, for example, you might wish to open it as stdout or stderr, and open `/dev/null' as stdin; alternatively, you could open `/dev/console' as stderr and/or stdout, and `/dev/null' as stdin, or any other combination that makes sense for your particular daemon.
Almost none of this is necessary (or advisable) if your daemon is being
started by inetd
. In that case, stdin, stdout and stderr are all
set up for you to refer to the network connection, and the
fork()
s and session manipulation should not be done (to
avoid confusing inetd
). Only the chdir()
and
umask()
steps remain as useful.
1.8 How can I look at process in the system like ps does?
You really don't want to do this.
The most portable way, by far, is to do popen(pscmd, "r")
and
parse the output. (pscmd should be something like `"ps -ef"' on
SysV systems; on BSD systems there are many possible display options:
choose one.)
In the examples section, there are two complete versions of this; one for SunOS 4, which requires root permission to run and uses the `kvm_*' routines to read the information from kernel data structures; and another for SVR4 systems (including SunOS 5), which uses the `/proc' filesystem.
It's even easier on systems with an SVR4.2-style `/proc'; just read a psinfo_t structure from the file `/proc/PID/psinfo' for each PID of interest. However, this method, while probably the cleanest, is also perhaps the least well-supported. (On FreeBSD's `/proc', you read a semi-undocumented printable string from `/proc/PID/status'; Linux has something similar.)
1.9 Given a pid, how can I tell if it's a running program?
Use kill()
with 0 for the signal number.
There are four possible results from this call:
-
kill()
returns 0- this implies that a process exists with the given PID, and the system would allow you to send signals to it. It is system-dependent whether the process could be a zombie.
-
kill()
returns @math{-1},errno == ESRCH
- either no process exists with the given PID, or security enhancements are causing the system to deny its existence. (On some systems, the process could be a zombie.)
-
kill()
returns @math{-1},errno == EPERM
- the system would not allow you to kill the specified process. This means that either the process exists (again, it could be a zombie) or draconian security enhancements are present (e.g. your process is not allowed to send signals to anybody).
-
kill()
returns @math{-1}, with some other value oferrno
- you are in trouble!
The most-used technique is to assume that success or failure with
EPERM
implies that the process exists, and any other error
implies that it doesn't.
An alternative exists, if you are writing specifically for a system (or all those systems) that provide a `/proc' filesystem: checking for the existence of `/proc/PID' may work.
1.10 What's the return value of system/pclose/waitpid?
The return value of
system()
,pclose()
, orwaitpid()
doesn't seem to be the exit value of my process... or the exit value is shifted left 8 bits... what's the deal?
The man page is right, and so are you! If you read the man page for
waitpid()
you'll find that the return code for the process is
encoded. The value returned by the process is normally in the top 16
bits, and the rest is used for other things. You can't rely on this
though, not if you want to be portable, so the suggestion is that you
use the macros provided. These are usually documented under
wait()
or wstat
.
Macros defined for the purpose (in `<sys/wait.h>') include (stat is
the value returned by waitpid()
):
WIFEXITED(stat)
- Non zero if child exited normally.
WEXITSTATUS(stat)
- exit code returned by child
WIFSIGNALED(stat)
- Non-zero if child was terminated by a signal
WTERMSIG(stat)
- signal number that terminated child
WIFSTOPPED(stat)
- non-zero if child is stopped
WSTOPSIG(stat)
- number of signal that stopped child
WIFCONTINUED(stat)
- non-zero if status was for continued child
WCOREDUMP(stat)
-
If
WIFSIGNALED(stat)
is non-zero, this is non-zero if the process left behind a core dump.
1.11 How do I find out about a process' memory usage?
Look at getrusage()
, if available.
1.12 Why do processes never decrease in size?
When you free memory back to the heap with free()
, on almost all
systems that doesn't reduce the memory usage of your program.
The memory free()
d is still part of the process' address space,
and will be used to satisfy future malloc()
requests.
If you really need to free memory back to the system, look at using
mmap()
to allocate private anonymous mappings. When these are
unmapped, the memory really is released back to the system. Certain
implementations of malloc()
(e.g. in the GNU C Library)
automatically use mmap()
where available to perform large
allocations; these blocks are then returned to the system on
free()
.
Of course, if your program increases in size when you think it shouldn't, you may have a `memory leak' -- a bug in your program that results in unused memory not being freed.
1.13 How do I change the name of my program (as seen by `ps')?
On BSDish systems, the ps
program actually looks into the address
space of the running process to find the current argv[]
, and
displays that. That enables a program to change its `name' simply by
modifying argv[]
.
On SysVish systems, the command name and usually the first 80 bytes of
the parameters are stored in the process' u-area, and so can't be
directly modified. There may be a system call to change this (unlikely),
but otherwise the only way is to perform an exec()
, or write into
kernel memory (dangerous, and only possible if running as root).
Some systems (notably Solaris) may have two separate versions of
ps
, one in `/usr/bin/ps' with SysV behaviour, and one in
`/usr/ucb/ps' with BSD behaviour. On these systems, if you change
argv[]
, then the BSD version of ps
will reflect the
change, and the SysV version won't.
Check to see if your system has a function setproctitle()
.
1.14 How can I find a process' executable file?
This would be a good candidate for a list of `Frequently Unanswered Questions', because the fact of asking the question usually means that the design of the program is flawed. :-)
You can make a `best guess' by looking at the value of argv[0]
.
If this contains a `/', then it is probably the absolute or
relative (to the current directory at program start) path of the
executable. If it does not, then you can mimic the shell's search of
the PATH
variable, looking for the program. However, success is
not guaranteed, since it is possible to invoke programs with arbitrary
values of argv[0]
, and in any case the executable may have been
renamed or deleted since it was started.
If all you want is to be able to print an appropriate invocation name
with error messages, then the best approach is to have main()
save the value of argv[0]
in a global variable for use by the
entire program. While there is no guarantee whatsoever that the value
in argv[0]
will be meaningful, it is the best option available in
most circumstances.
The most common reason people ask this question is in order to locate configuration files with their program. This is considered to be bad form; directories containing executables should contain nothing except executables, and administrative requirements often make it desirable for configuration files to be located on different filesystems to executables.
A less common, but more legitimate, reason to do this is to allow the
program to call exec()
on itself; this is a method used
(e.g. by some versions of sendmail
) to completely reinitialise
the process (e.g. if a daemon receives a SIGHUP
).
1.14.1 So where do I put my configuration files then?
The correct directory for this usually depends on the particular flavour
of Unix you're using; `/var/opt/PACKAGE', `/usr/local/lib',
`/usr/local/etc', or any of several other possibilities.
User-specific configuration files are usually hidden `dotfiles' under
$HOME
(e.g. `$HOME/.exrc').
From the point of view of a package that is expected to be usable across a range of systems, this usually implies that the location of any sitewide configuration files will be a compiled-in default, possibly using a `--prefix' option on a configure script (Autoconf scripts do this). You might wish to allow this to be overridden at runtime by an environment variable. (If you're not using a configure script, then put the default in the Makefile as a `-D' option on compiles, or put it in a `config.h' header file, or something similar.)
User-specific configuration should be either a single dotfile under
$HOME
, or, if you need multiple files, a dot-subdirectory.
(Files or directories whose names start with a dot are omitted from
directory listings by default.) Avoid creating multiple entries under
$HOME
, because this can get very cluttered. Again, you can allow
the user to override this location with an environment
variable. Programs should always behave sensibly if they fail to find
any per-user configuration.
1.15 Why doesn't my process get SIGHUP when its parent dies?
Because it's not supposed to.
SIGHUP
is a signal that means, by convention, "the terminal line
got hung up". It has nothing to do with parent processes, and is
usually generated by the tty driver (and delivered to the foreground
process group).
However, as part of the session management system, there are exactly two
cases where SIGHUP
is sent on the death of a process:
-
When the process that dies is the session leader of a session that is
attached to a terminal device,
SIGHUP
is sent to all processes in the foreground process group of that terminal device. -
When the death of a process causes a process group to become orphaned,
and one or more processes in the orphaned group are stopped, then
SIGHUP
andSIGCONT
are sent to all members of the orphaned group. (An orphaned process group is one where no process in the group has a parent which is part of the same session, but not the same process group.)
1.16 How can I kill all descendents of a process?
There isn't a fully general approach to doing this. While you can
determine the relationships between processes by parsing ps
output, this is unreliable in that it represents only a snapshot of the
system.
However, if you're lauching a subprocess that might spawn further subprocesses of its own, and you want to be able to kill the entire spawned job at one go, the solution is to put the subprocess into a new process group, and kill that process group if you need to.
The preferred function for creating process groups is setpgid()
.
Use this if possible rather than setpgrp()
because the latter
differs between systems (on some systems `setpgrp();' is equivalent
to `setpgid(0,0);', on others, setpgrp()
and setpgid()
are identical).
See the job-control example in the examples section.
Putting a subprocess into its own process group has a number of effects. In particular, unless you explicitly place the new process group in the foreground, it will be treated as a background job with these consequences:
-
it will be stopped with
SIGTTIN
if it attempts to read from the terminal -
if
tostop
is set in the terminal modes, it will be stopped withSIGTTOU
if it attempts to write to the terminal (attempting to change the terminal modes should also cause this, independently of the current setting oftostop
) -
The subprocess will not receive keyboard signals from the terminal
(e.g.
SIGINT
orSIGQUIT
)
In many applications input and output will be redirected anyway, so the
most significant effect will be the lack of keyboard signals. The parent
application should arrange to catch at least SIGINT
and
SIGQUIT
(and preferably SIGTERM
as well) and clean up any
background jobs as necessary.