Controlling multiple asynchronous processes in Perl

Running processes in parallel

Perl is a great glueware language which includes a number of facilities that allow you to run other processes from your Perl script - for example, you can open a file using a | character in the file handle, and you can run another process by calling it up in backquotes.

Examples

open (PRHI,"du -sk * |"); # Report disc use to me
open (PRHO,"| mail ... "); # mail the output on PRHO
$weblet = `wget http://www.wellho.net/index.html`;
# Read a web page

In all these cases, though, you start off writing synchronised code in which you need to be aware of when to write to, or read from, the other process. You're also running just running your single Perl process so you've only got a single flow.

FORKING

The fork() function allows you to divide your Perl process into two - each identical except for the value that's returned by the fork call itself. Once you've forked, your (now) two processes can each go off down their own route in the logic.

When you fork, one of the processes retains the process id that the original pre-fork process used, and the fork() function returns to it the new process id that's associated with the other process. This process is known as the parent.

The other process, know as the child process, gets a new process id assigned to it and a zero return value from the fork. If it need to know its parent's process if for later on, that id should be saved PRIOR TO the fork. Thus:

$parent = $$;
unless ($pid = fork()) {
                # Child process goes here
  # $parent is parent and $$ is child
                exit();
                }
# Parent process goes here
# $pid is child, $$ is parent

Forking is only half of the story, though. Once you've got two or more processes running, under most circumstances they'll need to be able to communicate with one another. Typically a child process that's running a task will need to communicate with a parent to return results, or to let the parent know that the child task has completed. And you need to select your mechanism for this communication very carefully - you certainly don't want to put any of the processes into a tight checking loop that would run to the severe detriment of other processes running on the same c.p.u,

COMMUNICATING VIA PIPES

If you open a PIPE before you fork your process, you're setting up a read and a write file handle with the write linked to the read. When you fork, the write file handle of the parent is attached to the read file handle of the child, and the read file handle of the parent to the write file handle of the child. In other words, the parent can write a message to the pipe that the child will be able to read, and the child can write a message to the pipe that the parent will be able to read.

For a parent forking just a single child, simple file handles can be used and the scheme is easy to work. For a parents that's forking multiple children, you really need the parent to have a list of file handles and unfortunately this isn't possible - you have to use a "typeglob".

Here's an example of a parent process that's forking a whole list of children:

@waitlist = (10,20,16,12,8,22);
foreach $item (@waitlist) {
        pipe *{$item},RETHAND;
        unless ($pid = fork()) {
                # Child process
                sleep $item;
                print RETHAND "Completed $item\n";
                exit();
                }
        }

Each of the children is going to run a sleep command - that's an example of the processing that you would do in there - and then prints its results back to the parent.

Using the code above, each of the children will be asleep at the same time, for anywhere between 8 and 22 seconds.

Let's then wait for the children to wake up:

foreach $item (@waitlist) {
        $response = <$item>;
        $now = localtime();
        print "$now - $response";
        }

That works well enough - after the longest sleep time of 22 seconds, all the child processes will have completed and the parent process will then exit.

fire:~/feb06 grahamellis$ perl pingalong
Started Sun Feb 26 15:52:10 2006
Forked by Sun Feb 26 15:52:10 2006
Sun Feb 26 15:52:20 2006 - Completed 10
Sun Feb 26 15:52:30 2006 - Completed 20
Sun Feb 26 15:52:30 2006 - Completed 16
Sun Feb 26 15:52:30 2006 - Completed 12
Sun Feb 26 15:52:30 2006 - Completed 8
Sun Feb 26 15:52:32 2006 - Completed 22
Completed Sun Feb 26 15:52:32 2006
fire:~/feb06 grahamellis$

Complete Source of this example
That's OK, but not brilliant. If you look at the timing of each of the completed processes, you'll see that the parent waited for the children in the order they were forked - so the parent is waiting for some of the slower children to complete while the faster but younger ones are waiting for attention.

ALERTING A PROCESS VIA A SIGNAL

If a process wants to communicate with another asynchronously (i.e. unexpectedly), it can do so via a signal.

The receiving process must first be set up to allow it to receive signals; this is done by setting an appropriate member of the %SIG hash. There are around 30 different signals available, but most of them already have other uses and we suggest you use the elements called USR1 or USR2. Thus:

$SIG{USR1} = "doneit";

sub doneit {
$gotone = 1;
}

in the source code of the parent / prior to the fork. This means that any process receiving a USR1 signal will simply set the $gotone variable to 1 so that the code in that receiving process can handle it in due course. It is important to limit the code in the signal handler function to almost nothing, since the signal can be received at any time and more complex code can lead to very nasty synchronisation and locking problems.

Also in the parent code, then, you'll check for the signal being received; since our parent isn't doing any processing in its own right, we'll put it into a slow sleep loop:

while (! $gotone) {
sleep 1;
}

Now all the child need to do is to use a kill command (yes, killing is not fatal to either child or parent) to send the signal:

kill "USR1",$parent;

and the parent will be alerted.

How does the parent know which child process has contacted it, though, if all it receives is a nondescript tug on the sleeve? It has to look and find out.

The first way you do this, which is somewhat clunky and inefficient if there's a lot of signalling going on, is through a temporary file. This mechanism, though, is perfectly adequate for a series of processes where there isn't much communication - for example, a setup where a Perl script is monitoring 20 or 30 long-running tests.

The child need to write a message to the parent BEFORE it signals:

open (FH,">zz$item.tmp");
print FH "Completed $item\n";
close FH;
kill "USR1",$parent;

and the parent then looks for, reads, deletes any signal files:

@temps = <zz*.tmp>;
foreach $rfile (@temps) {
        open (RF,$rfile);
        $response = <RF>;
        close RF;
        unlink $rfile;
}

If you run our previous example, with a whole series of faster and slower children spawned, under this new scheme, there's great news. The children that only sleep for a few seconds alert the parent and are dealt with while the others continue to slumber - even if the short-sleep children were started after the slower ones:

fire:~/feb06 grahamellis$ perl pinga2
Started Sun Feb 26 16:22:39 2006
Forked by Sun Feb 26 16:22:39 2006
Sun Feb 26 16:22:47 2006 - Completed 8
Sun Feb 26 16:22:49 2006 - Completed 10
Sun Feb 26 16:22:51 2006 - Completed 12
Sun Feb 26 16:22:55 2006 - Completed 16
Sun Feb 26 16:22:59 2006 - Completed 20
Sun Feb 26 16:23:01 2006 - Completed 22
Completed Sun Feb 26 16:23:01 2006
fire:~/feb06 grahamellis$

Complete Source of this example

READING BACK ASYNCHRONOUSLY VIA SIGNALS AND PIPES

Temporary files aren't a great way of signalling on a busy system - it's much easier to write information to pipes in the children and have the parent check the pipe. However, any attempt to read from the pipe without appropriate causation would result in the reader waiting until data is available - the very problem we had in the first example and managed to alleviate in the second.

If we use a combination of signals and pipes, and Perl's select command, we can have each child process write back on a much more efficient pipe, and then signal the parent to say that it's done so. The parent can then look around with the select and see where the message is from.

Here's the code to reap all the sleeping children from that last example:

while ($kids > 0) {
        while (! $gotone) {
                sleep 1;
                }
        $gotone = 0;
        foreach $item (@waitlist) {
                $rin = $win = "";
                vec($rin, fileno(*{$item}), 1) = 1;
                $ein = $rin ;
                if (select($rin,$win,$ein,0)) {
                        sysread($item,$response,40);
                        $now = localtime();
                        print "$now - $response";
                        close $item;
                        $kids--;
                        }
                }
        }

A little more complex - and note the use of sysread rather than <> or read to overcome any buffering issues. It runs quickly, though, and all the children are dealt with within a second of them finishing, even if the finish out of any forecastable order.

fire:~/feb06 grahamellis$ perl pinga3
Started Sun Feb 26 16:31:01 2006
Forked by Sun Feb 26 16:31:01 2006
Sun Feb 26 16:31:09 2006 - Completed 8
Sun Feb 26 16:31:11 2006 - Completed 10
Sun Feb 26 16:31:13 2006 - Completed 12
Sun Feb 26 16:31:17 2006 - Completed 16
Sun Feb 26 16:31:21 2006 - Completed 20
Sun Feb 26 16:31:23 2006 - Completed 22
Completed Sun Feb 26 16:31:23 2006
fire:~/feb06 grahamellis$

Complete Source of this example

See also Perl for Larger Projects course