Re: [Exim] What to do about non-monitonic process ids

Top Page
Delete this message
Reply to this message
Author: Jim Knoble
Date:  
To: exim-users
Subject: Re: [Exim] What to do about non-monitonic process ids
Circa 2003-01-30 16:08:48 +0000 dixit Philip Hazel:

: Folks, I need some feedback.
:
: It has been brought to my attention that the long-established Unix
: tradition of allocating process ID numbers sequentially, and wrapping
: around at some limit (usually 32767) is breaking down. OpenBSD
: apparently no longer does this. It is argued that making the "next"
: process id unpredictable improves security.
:
: The problem is that Exim, along with a lot of other software, assumes
: that the same process id will never be re-used within one second. This
: assumption is used in Exim in two places: (i) in constructing message
: ids; and (ii) in constructing unique file names, for maildir in
: particular. (The original maildir "rules" suggested doing it this way.)

Is this facility used anywhere else for constructing unique filenames?
In particular, is it used anywhere the filenames should be unpredictable?
If so, it make be better to use an implementation of mkstemp() for the
cases where unpredictable (and different) filenames are required, and
to use the message-ID technique for Maildir message filenames.

: I was going to post some ideas for discussion here, but it got rather
: long, so instead I have put them in a file at
:
: http://cus.cam.ac.uk/~ph10/exim-pid-message

[In which find:]

: IDEA 1: A MAJOR UPHEAVAL
: ------------------------
:
: In this scenario we give up any kind of compatibility, and go for a new
: message ID, to last for a long time (I would hope). I suggest something
: like this:
:
: hh-tttttt-mmmm-pppppp-ss
:
: where hh is the localhost_number (up to 3844 hosts), tttttt is the time
: of arrival in seconds, mmmm is the time of arrival microseconds, pppppp
: is the process number, and ss the sequence number (again up to 3844).
: This increases the length of the message from 16 to 24. Everything that
: processes log files would have to be updated.

This is my recommendation, with a change (described below). Exim v4 is
still in the relatively early adoption stage, and changing it now,
once, is liable to be easier than changing things slightly now, finding
out that it doesn't work the way we expected anyway, and having to
change it again later.

I recommend, however, an alternate form of the timestamp: Use D. J.
Bernstein's TAI library ( http://cr.yp.to/libtai.html , licensed in the
public domain) to produce timestamps in the 12-byte TAI64NA format
(spans several hundred billion years with attosecond precision). In
base 62, 12-byte TAI64NA expands to only 15 'digits':

$ bc
ibase = 16
obase = 62
FFFFFFFFFFFFFFFFFFFFFFFF
10 50 24 02 88 17 07 34 61 42 62 22 43 43 77
$

This makes the message ID look like this:

hh-ttttttttttttttt-pppppp-ss

which is only a few characters longer than Philip's proposed:

hh-tttttt-mmmm-pppppp-ss

This means that exim will not have to change the message ID again when
the seconds of a 32-bit (struct timeval).tv_sec run out.

In fact, if this is going to happen, we might as well add support for
64-bit PIDs now, as 64-bit architectures become more widespread. This
only takes 10 base-62 bytes to represent an 8-byte PID:

$ bc
ibase = 16
obase = 62
FFFFFFFFFFFFFFFF
22 12 25 35 47 10 58 27 17 01
$

which gets us to:

hh-ttttttttttttttt-pppppppppp-ss

This shouldn't have to change for quite a while.

: IDEA 2: A BODGE TO LAST A WHILE LONGER
: --------------------------------------
:
: In this scenario, we keep a 16-character message ID in the same format
: as now, and use the current zero bytes at the most significant end of
: the process number to hold a sub-second time, assuming that PIDs are no
: bigger than 65535.

Bad assumption; are there not already systems that support 32-bit PIDs?

: Sub-bodge A: Use the top two characters to hold a milli- (_not_ micro-)
: second time. This will break down for PIDs >= 14 776 336.

Yuck; half-supporting 32-bit PIDs is worse than not supporting them at all.

: Sub-bodge B: Use only the most significant base-62 digit of the PID
: field, and _add_ into it the sub-second time, in 50ths of a second. A
: 32-byte PID can be no larger than 4 294 967 296, whereas a 6-digit
: base-62 number can hold up to 56 800 235 584. So it just fits. The
: disadvantage, of course, is that the time granularity is just 1/50 of a
: second. That means that a receiving process can never be in existence
: for less than 1/50 of a second.

Yuck again. With processors getting faster (and support for
multiprocessor systems becoming better and more widespread), coarse
time granularity seems like a poor assumption to make.

: IDEA 3: USE THOSE HYPHENS
: -------------------------
:
: In this scenario we abolish the hyphens in the ID, and make it up to
: 16-characters by extending the time field with another two base-62
: digits, which can hold milliseconds. This is entirely compatible and
: doesn't eat into the PID field, but it makes the IDs less readable.
: You'd have things like this:
:
: 18e2IE450006eL00

This sounds like the sort of thing that is liable to look fine to us
now, but not work somehow in the end.

In sum, let's do this now, once, and right.

My US$0.35---inflation, combined with a weak dollar and poor world
financial markets. :)

Cheers.

- --
jim knoble | jmknoble@??? | http://www.pobox.com/~jmknoble/
(GnuPG fingerprint: 31C4:8AAC:F24E:A70C:4000::BBF4:289F:EAA8:1381:1491)
"I am non-refutable." --Enik the Altrusian