2

Blow is an implementation of Sieve of Eratosthenes in C

#include "kernel/stat.h"
#include "kernel/types.h"
#include "user/user.h"

void cull(int p) {
  int n;

  while (read(0, &n, sizeof(n))) {
    // n is not prime
    if (n % p != 0) {
      write(1, &n, sizeof(n));
    }
  }
}

void redirect(int k, int pd[]) {
  close(k);
  dup(pd[k]);
  close(pd[0]);
  close(pd[1]);
}

void right() {
  int pd[2], p;

  // read p
  if (read(0, &p, sizeof(p))) {
    printf("prime %d\n", p);

    pipe(pd);

    if (fork()) {
      // parent
      redirect(0, pd);
      right();
    } else {
      redirect(1, pd);
      cull(p);
    }
  }
}

int main(int argc, char *argv[]) {
  int pd[2];
  pipe(pd);

  if (fork()) {
    redirect(0, pd);
    right();
  } else {
    redirect(1, pd);
    for (int i = 2; i < 36; i++) {
      write(1, &i, sizeof(i));
    }
  }

  exit(0);


I am not quite following the logic here:

1). Why does redirect need to close pd1?

2). cull() is reading from the file descriptor 0, but in right() the child process will close 0. Why doesn't this cause any problem?

3). why it is necessary to use redirect here when we only want read from 0 and write to 1?

Update:

1). I found this implementation in the following blog post:

https://abcdlsj.github.io/post/mit-6.828-lab1xv6-and-unix-utilities/

2). I think the idea is borrowed from the inventor of this algorithm

3). I think the reason that the header is so is because it is implemented in a toy operating system for educational purpose.

4
  • 2
    Where did you get this implementation? Understanding the context of where this implementation appears will help with better answers
    – bolov
    Commented Nov 14, 2020 at 23:03
  • This is a really strange implementation. It doesn't really seem to be a Sieve of Eratosthenes, it's more of a trial division implementation. It's also written in a weird parallelized way. This might be a fast implementation, except that it's only checking 36 numbers, so the overhead of creating so many processes will make it slow.
    – Nick ODell
    Commented Nov 14, 2020 at 23:20
  • That us a weird set of non-standard headers to be using. Commented Nov 14, 2020 at 23:47
  • Hi guys, I have updated the question. May I know the reason for the downvote so that I can improve the question to meet the community standard?
    – z.z
    Commented Nov 15, 2020 at 0:20

1 Answer 1

6

The code is an adaptation of the paper Coroutine prime number sieve by M. Douglas McIlroy to the Xv6 operating system, a re-implementation of Version 6 Unix used for teaching. The technique is from 1968 as an experiment in co-routines via pipes. The paper explains the algorithm and rationale.

The program implements the sieve of Eratosthenes as a kind of production line, in which each worker culls multiples of one prime from a passing stream of integers, and new workers are recruited as primes are discovered.

When Unix came to be, my fascination with coroutines led me to badger its author, KenThompson, to allow writes in one process to go not only to devices but also to matching reads in another process.

...the coroutine sieve has been a standard demo for languages or systems that support interprocess communication. Implementations using Unix processes typically place the three coroutines—source, cull and sink—in distinct executable files.The fact that the whole program can be written as a single source file, in a language that supports neither concurrency nor IPC, is a tribute not only to Unix’s pipes, but also to its clean separation of program initiation into fork for duplicating address spaces and exec for initializing them.

I believe using stdin and stdout is an artifact of its origins in the early days of Unix when piping stdin and stdout between processes was first introduced. It makes a lot more sense in shell.

#!/bin/bash

source() {
    seq 2 1000000
}

cull() {
    while true
    do
        read n
        (($n % $1 != 0)) && echo $n
    done
}

sink() {
    read p
    echo $p
    cull $p | sink &
}

source | sink

In C, as we'll see, it's simpler to skip the redirection and pass around pipes.


First, what's going on?

redirect is redirecting stdin and stdout to a pipe. 0 is stdin and 1 is stdout. This can be made more clear by using STDIN_FILENO and STDOUT_FILENO.

  1. main makes a pipe.
  2. main forks.
    1. The child redirects stdout to the pipe.
    2. The child streams numbers to the pipe via stdout.
      • The first number must be 2.
  3. main redirects stdin to the pipe.
  4. main calls right.
  5. right reads the first prime, 2, from stdin which is a pipe to the number stream.
[number stream] ->(2) [right]

After the initial condition, a switcheroo happens inside right.

  1. right makes a pipe.
  2. right forks.
    1. The child redirects its stdout to the pipe.
    2. The child's stdin is still reading from the number stream.
    3. The child calls cull.
    4. cull reads from stdin (the number stream) and writes to stdout (right).
  3. right redirects its stdin to the pipe, reading from cull.
  4. right recurses.
[number stream] ->(n) [cull] ->(p) [right]

After the first call right is reading primes from cull and writing them to the real stdout. cull reads candidates from the number stream and writes primes to right.

When the number stream loop ends the process ends and closes its pipe to cull. Once all the numbers have been read from the pipe, cull to reads EOF ending its loop and its process, closing its pipe to right. right reads EOF and returns back to main which exits.


To explain redirect we need to understand redirection in C.

First, a simple one-way pipe.

  int pd[2];
  pipe(pd);

  //parent
  if (fork()) {
      // Parent must close the input side else reading from pd[0] will
      // continue to try to read from pd[1] even after the child closes
      // their pipe.
      close(pd[1]);
    
      int p;
      while( read(pd[0], &p, sizeof(p)) ) {
          printf("p = %d\n", p);
      }
    
      fprintf(stderr, "parent done reading\n");
  }
  // child
  else {
      // Not strictly necessary, but the child will not be reading.
      close(pd[0]);
    
      for (int i = 2; i < 10; i++) {
          write(pd[1], &i, sizeof(i));
      }

      // Tell the parent we're done writing to the pipe.
      // The parent will get EOF on its next read. If the child
      // does not close the pipe, the parent will hang waiting to read.
      close(pd[1]);
      fprintf(stderr, "child done writing\n");

      // Pipes are closed automatically when a process exits, but
      // let's simulate the child not immediately exiting to
      // illustrate why it's important to explicitly close pipes.
      sleep(1);
  }

The parent and child share a pipe. The parent reads from one end, the child writes to the other. The child closes their write end when they're done so the parent doesn't hang trying to read. The parent closes their write end immediately so their pipe doesn't try to read from it.

Instead of passing the pipe around, redirect is redirecting the parent's half to stdin and the child's half to stdout. Let's do that in our simple example using dup2. dup2 duplicates a descriptor to another, first closing the target.

  int pd[2];
  pipe(pd);

  if (fork()) {
      // Redirect pd[0] to stdin.
      dup2(pd[0], STDIN_FILENO);
      // Parent still has to close its input side.
      close(pd[1]);
 
      int p;
      while( read(STDIN_FILENO, &p, sizeof(p)) ) {
          printf("p = %d\n", p);
      }
    
      fprintf(stderr, "parent done reading\n");
  } else {
      // Redirect pd[1] to stdout.
      dup2(pd[1], STDOUT_FILENO);
      // Close the original pd[1] so the parent doesn't try to read from it.
      close(pd[1]);

      for (int i = 2; i < 10; i++) {
          write(STDOUT_FILENO, &i, sizeof(i));
      }

      // Tell the parent we're done writing.
      close(STDOUT_FILENO);
      fprintf(stderr, "child done writing\n");
    
      sleep(1);
  }

The final twist is dup. dup duplicates pd[k] to the lowest numbered descriptor currently not in use by the process. redirect(0, pd) closes descriptor 0 and then copies pd[0] to the lowest numbered descriptor: 0.

redirect(1, pd) closes descriptor 1 and then copies pd[1] to what it hopes is the lowest numbered descriptor: 1. If something else closed 0, redirect(1, pd) will copy pd[1] to descriptor 0 and the code will not work. This can be avoided by using dup2 which makes it explicit which file descriptor you're copying to.

// close(k) and dup(pd[k]) to k safely and explicitly.
dup2(pd[k], k);

redirect can be rewritten as:

void redirect(int k, int pd[]) {
  dup2(pd[k], k);
  close(pd[0]);
  close(pd[1]);
}

Note that is all for a one-way pipe. cull uses bi-directional pipes, but the idea is the same.

By redirecting its pipe to stdin and stdout the program can use the pipe without having to pass the pipe around. This lets right read the first prime from the number generator and then let cull read the rest. It could also be done explicitly.


With some simpler examples in place, now we can answer the questions.

1). Why does redirect need to close pd[1]?

The parent must close the input side of its pipe, even after it's been duplicated or closed by the child, else the pipe will remain open and the parent will hang trying to read from it.

  1. cull() is reading from the file descriptor 0, but in right() the child process will close 0. Why doesn't this cause any problem?

right closes its 0 and then copies pd[0] to 0. cull does not close 0, it closes pd[0]. cull reads from the original 0 which is the number generator.

  1. Why it is necessary to use redirect here when we only want read from 0 and write to 1?

Because we need 0 and 1 to be different things at different times. We don't really want to read from 0 and write to 1. We want to read and write from pipes which happen to be attached to 0 and 1. The program is redirecting its pipes to 0 and 1 to demonstrate how Unix pipes and redirection works internally.

It took a dabbler like me some time to figure out how this program worked, it would have been a lot easier if I'd read the original paper and seen the shell script version first.

It can be rewritten to use explicit pipes. This avoids action at a distance, is easier to understand, and still demonstrates pipes and co-routines, but it no longer illustrates redirection.

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

void cull(int p, int read_d, int write_d) {
    int n;
    while (read(read_d, &n, sizeof(n))) {
        if (n % p != 0) {
            write(write_d, &n, sizeof(n));
        }
    }
}

void right(int read_d) {
    int p;
    if (read(read_d, &p, sizeof(p))) {
        printf("prime %d\n", p);

        int cull_pipe[2];
        pipe(cull_pipe);

        if (fork()) {
            // right() reads from cull().
            close(cull_pipe[1]);
            right(cull_pipe[0]);
        } else {
            // cull() reads from the number generator and writes to right().
            close(cull_pipe[0]);
            cull(p, read_d, cull_pipe[1]);
            close(cull_pipe[1]);
        }
    }
}

int main(int argc, char *argv[]) {
    int pd[2];
    pipe(pd);

    if (fork()) {
        // The first call to right() reads from the number generator.
        close(pd[1]);
        right(pd[0]);
    } else {
        close(pd[0]);
        for (int i = 2; i < 6; i++) {
            write(pd[1], &i, sizeof(i));
        }
        close(pd[1]);
    }

    exit(0);
}

Other notes:

The headers can be replaced with standard headers.

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
1

Not the answer you're looking for? Browse other questions tagged or ask your own question.