On Wed, 21 Jan 2004, Greg A. Woods wrote:
| (i.e. I don't think his prediction about worm writers writing their own
| linkers has come true (yet) :-)
:)
|
| You missed the 'n' in third position -- it's quite common. :-)
|
Out of interest where does the 'n' possibility derive from ? Have you got
a real live M$ exe that BASE-64 encodes with an 'n' in 3rd position ?
I'm quite ready to believe I've done the maths all wrong!
Its clear that the first 2 bytes are 4D 5A (MZ ascii)
http://hyatus.dune2.info/Miscellanous/exe_header.html
http://win32assembly.online.fr/pe-tut1.html
Now lets try BASE-64 encoding this:-
3 bytes input gives 4 bytes BASE-64 output
(e.g.
http://www.freesoft.org/CIE/RFC/1521/7.htm)
So, what does 2 bytes input give ??
Hex: 4d 5a XX
Binary: 01001101 01011010 xxxxxxxx
6-bit chunks: 010011 010101 1010xx xxxxxx
Decimal: 19 21 40-43
BASE-64 dict: T V opqr
In other words regexp TV[o-r]
| This regular expression is a balancing act. Much more recently someone
| on the Postfix list noted that even hobbit's expression will stand a
| chance (remote as it may be) of matching BASE-64 encoded raw data that
| one might find in some spreadsheet or photo of the kids, or such.
Ours matches (by design) a BASE-64 encoded attachment of _any_ file that
starts with MZ. Note that it can only match at the *start* because of the
spaces. The BASE-64 dictionary does not use spaces.
I've no idea what are the chances of a photo of the kids starting with MZ.
But I can report we've used the " TV[o-r]" sig since august/sobig time,
and its got rid of some millions of executables, with no complaints yet...
I'll check out the quoting tomorrow.
Cheers
--
Chris Edwards, Glasgow University Computing Service