Re: [Exim] What to do about non-monitonic process ids

Góra strony
Delete this message
Reply to this message
Autor: Vadim Vygonets
Data:  
Dla: exim-users
Temat: Re: [Exim] What to do about non-monitonic process ids
Quoth Philip Hazel on Fri, Jan 31, 2003:
> I don't think filenames need to be unpredictable in Exim.


But they may. It's not an undesireable condition.

> So, what is to be done right now? For Exim 4.14? I think we have to
> remain within the 16-character field at this stage.


OpenBSD has mkstemps. The man page says:

     int
     mkstemps(char *template, int suffixlen);


     The mkstemps() function acts the same as mkstemp(), except it permits a
     suffix to exist in the template.  The template should be of the form
     /tmp/tmpXXXXXXsuffix. mkstemps() is told the length of the suffix string,
     i.e., strlen("suffix");


(creates a temp file, returns the file descriptor).

You may wish to feed it ("tttttt-XXXXXX-pp", 3), where t is time
and pp is (PID % 100), or something.

What's the number of file names you should generate per second to
be likely (50% chance) to hit the same file name twice? (The
number of possibilities is 26 ** 6 according to OpenBSD man page,
but (26+26) ** 6 according to the code.)

> There's no problem with the randomness itself, just the possibility of
> re-use within the same second. If the implementation guaranteed not to
> re-use the same PID until the clock had ticked, there would be no
> problem.


But, as I wrote before, you're not really guarranteed this even
on traditional UNIX systems.

> > What I think is, the ID's are not human-readable anyway (at least, not
> > to THIS human!).
>
> The only problem with this is that it makes it harder to recognize that
> something *is* an Exim message id (for example, for pattern matching in
> scripts that are reading log files).


You can use one hyphen (if mkstemp(2) is used, before the 'X's).

> > Another possibility is to move up to a larger base system.
>
> I thought of that, but I decided that there weren't many extra
> characters available that wouldn't break something. You would have to
> exclude chars that are problems in file names, chars that are shell
> metacharacters (for convenience), chars that can't appear on the LHS of
> a Message-ID: header line (because Exim uses its ID to construct
> Message-ID for messages that don't have one).


'-' and '+' in the beginning of file names can be a nuisance
(think "less"). And don't tell me to prepend the file with "./"
every time.

> I received one other new suggestion: to change the epoch in the
> timestamp. Exim didn't exist before 1995, so if I started a new epoch in
> 2003, it will be 25 years before there is the possibility of any
> clashing IDs (and then only with messages that by then would be over 30
> years old).


Actually, you can use the low 24 bit of the time, which will give
you about 194 days for the messages to expire, and about
1.34359023165633553801 free digits ;)

Anyway, what will you gain by changing the epoch? You will not
get shorter seconds. If you change the "universe lifetime"
(sizeof(time_t)), then you'll at least gain a free digit or two.
By *not* changing the epoch but keeping time_t 32 bit the next
possible clash will be about 136 years from the first message
ever sent with exim (which will be irrelevant because the
recipient will be long dead by then).

> The final -xx of the ID is not much used. For hosts that do not set
> localhost_number, it contains a sequence for multiple messages received
> by one process in one second. In that situation (localhost_number
> unset), I could use these digits to hold the millisecond time instead.
> The receiving process could ensure that it doesn't exit until at least 1
> millisecond after the timestamp of the final message.


Nitpick: 10 milliseconds, one centisecond (this sounds terrible).
Unless you use three digits, of course.

But if the process gets, say, 75 messages in 500 milliseconds, it
will have to wait 250 milliseconds. So just incrementing the
number should be good enough (for efficiency in case the process
gets the same PID, you can start at a random number and increment
from there).

> What to do when localhost_number *is* set?


Then you can't receive more than 15 messages per second per
process under the current scheme anyway. This kind of bothered
me, actually.

> Currently, the number is
> permitted to be in the range 0-255. I don't know how much it is used,
> but is seems plausible that this could be reduced to 0-50, in which case
> it could be stuffed into the most significant base-62 digit of the
> pid part of the message id. That still allows for 32-bit pids.


Nice.

Vadik.

--
There is hopeful symbolism in the fact that flags do not wave in a
vacuum.
        -- Arthur C. Clarke