Author: Sheen, Tony Date: To: 'Matthew Byng-Maddick', 'exim-users@exim.org' Subject: RE: [Exim] [FFPA-Announce] Beta Testers wanted...
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
--
[ Picked text/plain from multipart/alternative ] > On Wed, Feb 11, 2004 at 03:16:13PM -0000, Sheen, Tony wrote:
> [ > I wrote: ]
> >> On Wed, Feb 11, 2004 at 02:51:40PM -0000, Sheen, Tony wrote:
> >>> I have spent quite some time researching and developing an add-in to the > >>> excellent Exiscan-ACL that uses more than just the extension to identify a > >>> file type. The result of all this work is FFPA (File FingerPrint Analyser). > >> [...]
> >> Out of interest, why are you reimplimenting the magic file and the file(1) > >> infrastructure, rather than tying that into Exim? > > It's not just another magic file / file(1) implementation. > Erm, well, as far as I can tell, you're looking for magic numbers in the
> data. In what way is that different to what file(1) does, and the data
> that the magic file contains.
file(1) was plan 'A', but it didn't work on my Solaris machines. A text
attachment failed to get through to me because it started: 'MZ are the
initials of the guy who created the EXE header for MS." When I found it in
the frozen directory and extracted it, I then checked it with file(1) and
sure enough it reported it was an EXE file!
> > It is FULLY tied into Exim via Exiscan and runs as part of the Exiscan
> > demiming process in the DATA ACL (I didn't want to reinvent that wheel
> > either). > I realised that. Sorry. I didn't make myself clear. What I meant was to
> ask why you hadn't done it by reading the magicfile to look up the magic
> numbers, while still keeping it within exim. That way you wouldn't have
> to have individual scanners for the file types.
To get around the above problem I researched the EXE file format and found
that MZ is not just used as a magic number for EXE files by Micro$oft. It is
also used for every type of DLL, SYS, AX, OCX and SCR (etc etc etc etc) and
that further info in the file states whether is OS2 1.xx, OS2 2.xx, OS2
3.xx, Win32, Win95, WinME, NT, NT64, big endian, little endian, what
processor it is intended for (again etc etc) and if it is a DLL or something
else.
For RIFF format files, the standard is much the same. After the RIFF header,
there is no fixed order for the different chunks, and the same is true of
the subchunks and subsubchunks (etc). Some RIFF files contain chunks of
completely unrelated types, some AVI files have audio before video or
vice-versa, some AVI files even have multiple 'junk' chunks in the hope of
preventing them from being identified by utilities such as file(1), some
encoders add non-standard chunks of their own or extra information to the
end of what otherwise appears to be a standard chunk... But there are nearly
always some key chunks which appear in say a WAVE format file - and it's the
key info that I look for regardless of the chunk order.
I could go on (and on, and on, ad infinitum, ad nauseum) and risk sending
the entire list membership to sleep (or drive them to suicide).
To finish, late last year I was sent a Linux magic file which is over 10
times the size of my largest Solaris one and it still didn't catch or
correctly identify all the files in my test samples! At which point I
started FFPA...