Unix Pipes


Multiple Command Pipelines: Architecture

Creating a pipeline between two processes is fairly simple, but building a multiple command pipeline is more complicated. The relationship between all of the processes in question is different than what one would expect when creating a simple pipeline between two processes. Normally a pipeline between two processes results in a fork() where child and parent are able to communicate.

In an extension of this model to n pipes, it is natural to assume a chain of processes in which each is the child of the previous one, until the n'th child is forked. But this model does not work because the parent shell must wait for the last command in the pipeline to complete, not the first, as would be the case with a chained pipeline.

A multiple process pipeline can be represented graphically as:

In this example we see that the parent shell (163) forks one child process (202) and then waits for it to complete. The child process (202) is the parent of all the pipe command processes. The child creates two pipes and then calls fork() for each of its children (203 and 204). Each new child process redirects STDIN and STDOUT to a pipe appropriately and calls exec() to execute the proper command. A process that has been exec()ed will never return. When the child (202) of the parent shell (163) reaches the last command it simply redirects STDIN to the second pipe and exec()s the last command. The parent (163) waits for this last command to exit. This is very important. The parent shell must wait on the last command to finish before continuing. If it does not, interactive commands such as "less" will not work properly.

The processes in the above figure have the following relationships:

Parent PID          Child PID
----------          ---------
163                 202
202                 203
202                 204
    

One important thing to note here is that each process in the pipeline is a child of the original child of the shell (pid 202). They are not children of each other the further down the pipeline we go. Another thing to note is that only the shell (process 163) executes a wait. All the others simply die after they exec their respective command.

Multiple Command Pipelines: File Descriptor Considerations

Alternate Architecture for Multiple Pipeline Commands

A second process architecture to implement pipes is shown below. Duplication of files descriptors and closing all inherited descriptors must still be carefully addressed.

This architecture has an advantage over the previous one in that it can be implemented such that each forked shell has knowledge of two pipes and four file descriptors, maximum (shell 202 need create only one pipe itself). In the previous architecture, all the subsequent shells inherit all the pipes and file descriptors from shell 202.

Example Pipe Programs

In /usr/class/cis762/shell/examples there is an example program demonstrating how to use dup2, pipe and exec in the creation of pipes. Please note that in the example the piped commands are hard-coded, i.e., the pipes do not handle any command, or any number of piped commands, as is needed for a general shell piping mechanism. For this reason, the program uses execlp instead of execvp. execlp is easier to use when you know the command name ahead of time. execvp is easier to use if you are generating the command name in a character array and passing an array element to the exec command.

You may want to look at a second example of one, hard-coded pipe.

References

This is a good pipes tutorial.
Sandra Mamrak and Shaun Rowland
Last modified: April 27, 2004